Skip to content

Build Process

Benjamin De Boe edited this page Jan 27, 2020 · 24 revisions

The Build Process

This page provides additional detail on what happens as part of the iKnow build process. For step-by-step build instructions, see the README.md file in the main repo.

The Language Models

The iKnow engine relies on Language Models (also known as Knowledge Bases or simply "KB") for its language-specific parsing of sentences. A KB's source is expressed as a set of CSV files, which are not containing outright code but capture linguistic tokens, rules and other metadata specific to a language, plus some comments and sample sentences. These files are maintained under /kb in a human-readable (and editable) source format, usually through simple text editors like Notepad++.

When accumulated language model edits to the files in /kb represent a comprehensive update, it's time to compile them ahead of a full iKnow engine build. Compiling the language models means transforming them from CSV format into a collection of artefacts the iKnow engine can use at runtime:

  • Data in lexrep.csv is compiled into a C++ state machine that ends up in .inl files in /modules/aho/inl/<language>/lexrep/, enabling high-performance matching of input text to the linguistic tokens on which iKnow bases its parsing
  • Data in the other csv files gets loaded as shared memory dumps to enable efficient runtime loading

⚠️ This compilation process is currently implemented in ObjectScript and involves a manual step. Work in progress aims to translate this to a standalone process implemented in C++ or Python

The Delve toolkit provides mechanisms to load these CSV files into a dynamically-generated language model in order to hand-test them during development without having to wait for a full kit build. Note also that KBs (and specifically the vocabulary the use to express rules and labels) depends on features being made available in the engine.

The Code

Once the Language Models are compiled into .inl files in /modules/aho/inl/, a regular C++ build (through Visual Studio .sln or a Makefile) turns the full codebase into the required .dll or .so files. See here for an introduction on the important sections of our source code.