Skip to content

Scripts for ingesting Cherokee data into OLD instances for the DAILP project

Notifications You must be signed in to change notification settings

dativebase/dailp-ingest

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DAILP Ingest

This repository contains scripts for ingesting DAILP Cherokee data into the DAILP Cherokee OLD.

Note: this repository does not contain any Cherokee data. This script assumes that certain specific files are in a sister directory to ingest/ named inputs/. If these files are not present or if their contents are not what the main script expects them to be, then the ingest will fail.

Installation

Create a python (3.6, 3.7) virtual environment and install the dependencies and dailp-ingest in development mode:

$ python --version
Python 3.6.0
$ python -m venv venv
$ source venv/bin/activate
(venv) pip install -e .

Ingest

Once the dailp-ingest package is installed, you should be able to call the dailp-ingest executable:

(venv) dailp-ingest

Configuration

The dailp-ingest executable needs to know the URL and authentication credentials of the OLD instance that you want to modify. These can be set with the following environment variables:

OLD_URL
OLD_USERNAME
OLD_PASSWORD

It can be convenient to define these environment variables in config files under a config/ directory. For example, the following config file at config/chroldlocaldev could be used to configure dailp-ingest to target a locally served Docker Compose DativeBase Deploy:

export OLD_URL="http://127.0.0.1:61001/chrold"
export OLD_USERNAME="admin"
export OLD_PASSWORD="adminA_1"

Then to load the environment:

(venv) source config/chroldlocaldev

Next Steps

  • TODO: remove allomorph tags from Modals

  • Clean up current ingest script as per Jeff's instructions.

  • Fix "Modal Suffixes" allomorph tags

    • QUESTION: How should I tag the allomorphs? Waiting for Jeff to respond.
  • Fix "Clitics" allomorph tags

    • QUESTION: How should I tag the allomorphs? Waiting for Jeff to respond.
  • Make sure inflected verbs use only the ingested affixes

  • Ingest surface forms from the other tense/aspect classes in Liwen's spreadsheet. Get input from Jeff during or first.

  • Possible alteration to "Prepronominal Prefixes (PPP)" ingest:

    • The following note may or may not be relevant when ingesting inflected verbs.

    • NOTE: when analyzing 'output-VERB-Uchihara-and-AllDictionaryEntries.csv', some tags will need modification: For the PPPs, remember that a few need their tags modified:

      • Tags in the PPP Column need to be modified:

        DIST -> DIST1 PART -> PART1

        Note that these PPPs take a different form for certain categories, possibly requiring manual annotation:

        • Verbs specified for DIST1 select for DIST2 in PCT forms of the Imperative
        • Verbs specified for PART1 select for PART2 in INF forms
        • Also, I noticed that the tag for one of PPPs in this column is misspelled: TRSNL should be TRNSL (= Translocative) (I think there are only two instances)
  • Ingest "Aspectual Suffixes--DAILP" spreadsheet. BLOCKED BY JEFF.

About

Scripts for ingesting Cherokee data into OLD instances for the DAILP project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages