Skip to Content

SOLID

Your rating: None Average: 3.6 (7 votes)

SOLID is a validation and cleanup tool for helping you make your SFM dictionary file cleaner and more conformant to whatever standard you choose (the most typical choice is the MDF standard).
Problems with SFM
Many vernacular dictionaries are stored in SFM (standard format markers) files, but different users often use the same markers differently, or different markers altogether, and a single user typically does not use his/her markers consistently. Also, SFM itself is an inherently ambiguous file format in terms of structure, and this problem is compounded by the fact that users often omit "implicit" markers. Thus, this flat-looking (SFM cannot be indented) snippet from an SFM dictionary file...
  \lx jump
  \de leap, hop
  \sn 2
  \de mug or attack someone
  \nt need to check the spelling
probably represents this hierarchical (tree-like) structure:
  {lx jump
    {ps
      {sn 1
        {de leap, hop} }
      {sn 2
        {de mug, attack}
        {nt need to check the spelling} } } }
Notice that both senses are verbs, but there could actually be two parts of speech here (vi and vt); the SFM data is ambiguous. And if the \sn 2 line were omitted, we wouldn't even know whether there were two senses or just one sense with two definitions. Likewise, we don't really know that the note field refers to the spelling of the second definition; it could be referring to the spelling of the lexeme, jump. And this is just a simple example!
Because of these problems, newer dictionary-editing tools are using an XML-based standard named LIFT (Lexicon Interchange FormaT) as the replacement text-file format for SFM. But in the meantime, many lexicons are still in SFM and need to be interpreted accurately. (The preceding example was in MDF, a kind of SFM specifically for lexicon files.)
Two main scenarios for using SOLID
SOLID is generally used to "see what you've got" and fix major problems before importing a lexicon into FLEx (FieldWorks), or to periodically clean up a lexicon that's being edited in Toolbox on an ongoing basis. In either case, the closer your lexicon is to standard MDF, the less work you'll have to do, though neither SOLID nor FLEx will require you to convert to MDF.
Those using SOLID in order to import into FLEx can avoid wasted effort by doing the cleanup work iteratively, since satisfying SOLID is not the same as satisfying the FLEx importer. If this is your goal, consider this approach:

  1. Create and backup a blank FLEx database (that includes any necessary custom fields)
  2. Quickly clean up the most significant problems; convert to standard MDF wherever convenient
  3. Do a "dry-run" import into the FLEx database
  4. Learning from the problems encountered, adjust settings in FLEx and/or SOLID as needed
  5. Restore the blank FLEx database and repeat the above steps until all major problems are resolved
  6. Export from FLEx to SFM and spot check the data. If satisfied, you can start confidently editing in FLEx

Note: The FLEx importer is more forgiving than a strict application of the MDF standard would be, so you may not need to achieve full MDF compliance in order to achieve a good import. Even so, SOLID can help you see where your non-compliant data might force the importer to make too many guesses as to your intended structure.
Most people who edit their vernacluar lexicons day-to-day in SFM probably use Toolbox (or its predecessor, Shoebox) as their primary editor. Toolbox is very flexible and transparent, but this flexibility is both its biggest strength and biggest weakness: Toolbox will let you enter your data however you want to, so it's quite easy to use inconsistent data values and even to shoot yourself in the foot with inconsistent structures. So, if you intend to use Toolbox to edit a dictionary, you'll want to also use SOLID to help you keep your data's structure clean and consistent (and probably Range Sets in Toolbox as well). If having to discipline yourself this much isn't appealing, consider using a more dictionary-specific product such as WeSay (wesay.org) or FieldWorks (FLEx).
How to use SOLID
You should close your editor (e.g. Toolbox) before viewing or editing your lexicon file with SOLID, but there is no need to import/export the file. Just tell SOLID where the file is (e.g. D:\mydata\lexicon\dictionary.txt), and SOLID will create a matching .solid file in the same location (e.g. D:\mydata\lexicon\dictionary.solid). This is an XML file that stores all of your settings for SOLID, but any data edits you make using SOLID will be saved into the lexicon itself. Note: When you choose to "Save", both files are saved; if you choose not to, neither is saved.
The SOLID user interface consists of some buttons along the top, two panes on the left, and one pane on the right. Once you've opened an SFM lexicon file, the top-left pane shows all of the SFM field markers currently in use in that file, along with the structure/rules currently defined for those markers, the bottom-left pane gives a summary of all violations of those structures/rules, and the right pane shows (one by one) each of the records that match an item you've clicked on in a left pane.
For example, if the top-left pane specifies that \xv (example sentence) is a child of \rf (reference--where the example came from), and 54 out of your 612 example sentences do not have an \rf field above them, the top-left pane will show a count of 612 in the \xv row, and the bottom-left pane will include an error row for \xv with a count of 54. Clicking on that row allows you to use the right pane to look at each of the "bad" examples, one by one, and to fix them manually. The offending line in the right pane is shown in bright red until the problem has been fixed and you've clicked Refresh (or Recheck). Be aware that often the real problem is not with the red line, but with a preceding line that is neither a valid sibling nor parent to the red line.
If you decide that a missing \rf field is no big deal, you can use the top-left pane to specify that if \rf is missing above \xv it can be inferred by SOLID. This will cause the an blue field marker, \+rf , to appear above each of those 54 examples, and once you've done a Recheck to reload the entire file, that row of 54 errors will disappear. However, those inferred blue markers will not be automatically saved into your actual data file. WARNING: use this feature sparingly--if you tell every field that it can infer its parent, you're essentially telling SOLID to ignore all real structural errors and to instead misinterpret your data wherever they occur.
Issues:

  • as of this writing (v0.9.322.0), if an inferred marker is not in a valid location, SOLID will correctly list it among the errors, but it will still be displayed in blue, rather than red. In such cases, manually removing the + sign and refreshing will cause it to turn red. It will then be "real" (and saved) rather than merely inferred.
  • as of v0.9.322.0, the Find feature is quite useful, but the Replace feature often seems to work without actually doing anything.

Unlike Toolbox, SOLID allows you to specify multiple parents for a given field. For example, \ps can be a child of either \lx or \se. Also, the right pane does not left-align all of the fields but instead indents them according to the defined hierarchy. (This is well worth the minor display glitches it causes.) Unfortunately, both Toolbox and SOLID's top-left pane show all of the fields in a flat list, rather than in a tree view that visually indicates the specified hierarchy. Granted, doing so would create some visual redundancy (e.g. \ps would have identical rows under \lx and \se), but surely this is what the user would want, and not too difficult for a computer to handle. Otherwise, the user interface is quite intuitive.
As of FieldWorks 5.4.1 (and earlier), FLEx does not usually handle fields with multiple parents very well (though it does handle \ps well), so you may need to

  1. split certain field markers into two or three markers, probably via a CC table. For example, if you've used the \nt field under \lx, \se, and \sn, you might replace all of the non-sense-level ones with \nt_lx or \nt_se. SOLID can verify that the CC table's output is reasonable (not necessarily that it's 100% correct).
  2. OR, import into Residue (Automatic) and then use Bulk Edit to move the pieces of data into the right locations.

SOLID seems quite reliable at preserving the integrity of the input file, though it does remove blank lines (I think). Still, the first few times you use SOLID, it is educational to use a "difference tool" such as WinMerge (available from portableapps.com) to compare the file SOLID has just saved to a backup you made just before opening SOLID. And if you create or edit CC tables, Python scripts, or other bulk-editing code, I consider checking the before-and-after to be an essential habit. Otherwise, you may inadvertently garble your data.
SOLID also detects violations of the encoding scheme you've selected; e.g. records that include old ANSI characters (such as old smart quotes) in a file that you've specified to be in unicode.
Finally, SOLID includes a few experimental bulk-fixing features under the Quick Fixes button. These include very useful but risky fixes such as making all inferred \+sn fields real \sn fields, or moving certain misplaced fields higher up in the hierarchy. Use carefully.

Version

0.9.322.0

Developer

palaso.org

Supported

Supported Operating Systems

Windows, Windows 9x, Windows 2000, Windows XP, Windows 2003/Home Server, Windows Vista

Unicode Support

Yes

Suitable tasks

  • LINGUISTICS
  • Validate the structure of lexical data

Interface Language

English

License

Freeware

Download Page

Website

User Group

Screenshots