Creating a CLDF dataset
CLDF is a lightweight, human-readable format for linguistic data, using linked data in CSV files.
The cldfviz
library offers the possibility to intersperse markdown with data.
cldf-ldd is a set of cldf components designed to encode fieldwork data, used by both conversion libraries below.
- if you have a FLEx database, click here
- if you have a shoebox or toolbox database, click here
- if you don't have a database, the
cldflex
repo contains example flextext and lift files that you can use. - you can also use some other existing CLDF dataset to go through the workflow
FLEx
The cldflex library can be used to convert FLEx databases into CSV files (and from there into CLDF datasets, if desired). To get your corpus & lexicon ready for grammar writing, follow these steps:
- export your FLEx text database as
.flextext
: Navigate toText & Words > Interlinear Texts
, open theAnalyzed
tab, use the menuFile > Export Interlinear...
and choose the first option ("ELAN, SayMore, FLEx"). You can include one or multiple texts. - export your FLEx lexicon as
.lift
: Navigate toLexicon
, use the menuFile > Export...
and choose the option "Full Lexicon (LIFT 0.13 XML)". - use
cldflex
to transform the contents of your.flextext
and.lift
files to a CLDF dataset: Depending on the setup of your FLEx database, this will throw all sorts of warnings at you and complain about inconsistencies. You can fix them in your database, but things should mostly work even if you ignore them. If you think a particular warning is inaccurate, open an issue here. - get writing
*box
The unboxer is a tool to convert shoe- and toolbox databases into CSV files (and from there into CLDF datasets, if desired). To get your corpus & lexicon ready for grammar writing, follow these steps:
- identify the path to your corpus database
- identify the path to your lexicon
- (optional) identify the path to your parsing database
- run this command, inserting the paths:
unboxer corpus </path/to/corpus.db> --lexicon </path/to/lexicon.db> --parsing </path/to/parse.db> --cldf
Last update:
October 2, 2023