-
Notifications
You must be signed in to change notification settings - Fork 20
Build Process
This page provides additional detail on what happens as part of the iKnow build process. For step-by-step build instructions, see the README.md file in the main repo.
The iKnow engine relies on Language Models (also known as Knowledge Bases or simply "KB") for its language-specific parsing of sentences. A KB's source is expressed as a set of CSV files, which are not containing outright code but capture linguistic tokens, rules and other metadata specific to a language, plus some comments and sample sentences. These files are maintained under /kb in a human-readable (and editable) source format, usually through simple text editors like Notepad++.
When accumulated language model edits to the files in /kb
represent a comprehensive update, it's time to compile them ahead of a full iKnow engine build. Compiling the language models means transforming them from CSV format into a collection of artefacts the iKnow engine can use at runtime:
- Data in
lexrep.csv
is compiled into a C++ state machine that ends up in .inl files in/modules/aho/inl/<language>/lexrep/
, enabling high-performance matching of input text to the linguistic tokens on which iKnow bases its parsing - Data in the other csv files gets loaded as shared memory dumps to enable efficient runtime loading
⚠️ This compilation process is currently implemented in ObjectScript and involves a manual step. Work in progress aims to translate this to a standalone process implemented in C++ or Python
The Delve toolkit provides mechanisms to load these CSV files into a dynamically-generated language model in order to hand-test them during development without having to wait for a full kit build. Note also that KBs (and specifically the vocabulary the use to express rules and labels) depends on features being made available in the engine.
Once the Language Models are compiled into .inl files in /modules/aho/inl/
, a regular C++ build (through Visual Studio .sln or a Makefile) turns the full codebase into the required .dll or .so files. See here for an introduction on the important sections of our source code.