Configuration
The unboxer's behavior can be modified by editing a .yaml file and passing it to the unbox
command:
A key-value pair added in your configuration file will override the default value.
Default values are stored in one of three built-in files.
There is a general configuration file for interlinear text, and specific files for toolbox and shoebox.
These two files are identically structured and contain best-guess default values for the two applications.
By default, the toolbox configuration is loaded; shoebox can be specified with --format shoebox
.
General configuration
File: interlinear_config.yaml
.
This configuration file contains the default values for variables related to interlinear text.
Tab-aligned fields
A list of fields containing text which will be aligned when rendered in an interlinear representation. Note that you need to use the field labels present after renaming, see here.
Analyzed_Word
: cldf#Analyzed_WordGloss
: cldf#Gloss- Part of speech (
Part_Of_Speech
): currently specified as a dictionary entry property in CLDF, usable as a foreign key in cldf-ldd, with a corresponding component.
Slugification
Turn record IDs (\ref
) into database-usable IDs, e.g. ConvInGarden.003
into convingarden-003
.
Clitic space correction
Remove spaces after proclitics and before enclitics.
Cell separator
How multiple values in a cell (like variants, or meanings) are delimited.
Skip empty records
Toolbox
File: toolbox.yaml
.
This is the default configuration for toolbox projects.
File encoding
toolbox files should be in UTF-8.
Text record marker
The field holding text record identifiers.
Mapping interlinear fields to columns
The fields listed here will be renamed accordingly.
Note that all field markers are represented here without the leading \
that is used in toolbox.
Unsegmented object line
The unsegmented first line of the record.
Segmented object line
Segmented gloss line
Part of speech
Translation
Lexicon entry marker
The field holding lexicon entry identifiers.
Mapping lexicon fields to columns
The fields listed here will be renamed accordingly.
Headword
Part of speech
Meaning
Date
Variants
Example IDs
Text information
Stored in record_marker
, as a separate
entry, or none
?
Shoebox
File: shoebox.yaml
.
This is the default configuration for shoebox projects.
File encoding
The default is the most frequent single-byte character encoding.
Other values I've had to use: cp1256
,
iso8859_2
,
latin_1
.
Text record marker
The field holding text record identifiers.
Mapping interlinear fields to columns
The fields listed here will be renamed accordingly.
Unsegmented object line
The unsegmented first line of the record.
Segmented object line
Segmented gloss line
Part of speech
Translation
Comment
Lexicon entry marker
The field holding lexicon entry identifiers.
Mapping lexicon fields to columns
The fields listed here will be renamed accordingly.
Headword
Part of speech
Meaning
Example IDs
Text information
Stored in record_marker
, as a separate
entry, or none
?