Skip to content

Commit

Permalink
Merge pull request #83 from danachaillard/master
Browse files Browse the repository at this point in the history
Pull Dana's forked repo into main manuscript-object
  • Loading branch information
gschare authored Mar 16, 2021
2 parents 5127332 + 5fa06a4 commit 5abf324
Show file tree
Hide file tree
Showing 107 changed files with 14,075 additions and 1,490 deletions.
786 changes: 786 additions & 0 deletions .ipynb_checkpoints/Elmo_contextual_embeddings-checkpoint.ipynb

Large diffs are not rendered by default.

3,967 changes: 3,967 additions & 0 deletions .ipynb_checkpoints/vocabulary_abstraction-checkpoint.ipynb

Large diffs are not rendered by default.

1 change: 0 additions & 1 deletion .vscode/settings.json
Original file line number Diff line number Diff line change
@@ -1,3 +1,2 @@
{
"python.pythonPath": "/Users/matthewkumar/.local/share/virtualenvs/manuscript-object-YZi5cZmr/bin/python"
}
Binary file added Dana'sdoc.pdf
Binary file not shown.
531 changes: 531 additions & 0 deletions Elmo_contextual_embeddings.ipynb

Large diffs are not rendered by default.

34 changes: 33 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,22 @@
# manuscript-object

The BnF() Class represents a python version of BnF Ms 640. A static object within this class `complete_manuscript` contains every manuscript. Any other instantiation of the class holds a subset of these entries.
The BnF() Class represents a python version of BnF Ms 640. A static object within this class `complete_manuscript` contains every entry. Any other instantiation of the class holds a subset of these entries.

Entries have their own class, Recipe(), which formats each entry and parses out specific features.

When BnF() is defined, `complete_manuscript` is instantiated, which calls a function in `manuscript helpers` to generate the complete manuscript. The manuscript object, which is positioned in the same directory as the manuscript data, opens each file in `/ms-xml/` for each version, and conscripts an entry object with it. These entry objects are held in a list called `entries`, and sorted.

# Dana's work

The files I worked with are Elmo_contextual_embeddings.ipynb, vocabulary_abstraction.ipynb and word_clouds.py.
they require python packages, respectively numpy pandas tensorflow tensorflow_hub scikit-learn spacy ipython chart_studio, pandas nltk print-tree2 and wordcloud.

Elmo is a machine learning model who uses context to try and find semantic similarities between sentences. It is very memory heavy. In my notebook, I feed it the whole manuscript or some categories of entries. The results are in Sentence_encode, they are html files so they will open in a browser.

Vocabulary abstraction creates a semantic tree with all the single words from a category in the thesaurus. It uses the wordnet dictionnary which allows us to access hypernyms. All the outputs are directly in the notebook.

word_clouds makes wordclouds for every category of tagged terms from the thesaurus. The outputs are in word_clouds

# Setup

1. If you do not have python3 downloaded on your machine, download it with brew.
Expand Down Expand Up @@ -37,3 +48,24 @@ Then, `cd manuscript object`

4. To enter the pipenv shell, run `pipenv shell`. To exit, press ^D or type `exit`. Inside the pipenv shell, all outside dependencies for the repository are installed.

# Setup for Windows

1. If you do not have python3 downloaded on you machine, I would advise you install Anaconda. It will make it easier for us to install the necessary packages. Go to https://www.anaconda.com/products/individual, scroll to the bottom of the page and download the proper installer for your computer then run it (we want python 3.7).

2. If you do not have it, install git.

3. Open the Windows PowerShell, use cd to go where you want the project to be, then run
```bash
git clone [email protected]:cu-mkp/m-k-manuscript-data.git
git clone [email protected]:cu-mkp/manuscript-object.git
```

4. We need to install all necessary python packages, in anaconda powershell prompt run :
```bash
conda install pandas nltk pip
pip install print-tree2
pip install spacy
pip install wordcloud
```

5. We now need to run our notebooks. In anaconda powershell prompt, use cd to navigate to the folder where you copied the manuscript object. Then run `jupyter notebook`. In Jupyter you can now open the notebooks and run them.
85 changes: 85 additions & 0 deletions Sentence_encode/Sentence encode elmo3.html

Large diffs are not rendered by default.

85 changes: 85 additions & 0 deletions Sentence_encode/Sentence encode tcn.html

Large diffs are not rendered by default.

32 changes: 32 additions & 0 deletions Sentence_encode/Sentence encode.html

Large diffs are not rendered by default.

85 changes: 85 additions & 0 deletions Sentence_encode/Sentence_encode_elmo2_250.html

Large diffs are not rendered by default.

85 changes: 85 additions & 0 deletions Sentence_encode/Sentence_encode_elmo3100.html

Large diffs are not rendered by default.

85 changes: 85 additions & 0 deletions Sentence_encode/Sentence_encode_elmo3200.html

Large diffs are not rendered by default.

85 changes: 85 additions & 0 deletions Sentence_encode/Sentence_encode_elmo3_100.html

Large diffs are not rendered by default.

85 changes: 85 additions & 0 deletions Sentence_encode/Sentence_encode_elmo3_1000.html

Large diffs are not rendered by default.

85 changes: 85 additions & 0 deletions Sentence_encode/Sentence_encode_elmo3_1100.html

Large diffs are not rendered by default.

85 changes: 85 additions & 0 deletions Sentence_encode/Sentence_encode_elmo3_1200.html

Large diffs are not rendered by default.

85 changes: 85 additions & 0 deletions Sentence_encode/Sentence_encode_elmo3_1300.html

Large diffs are not rendered by default.

85 changes: 85 additions & 0 deletions Sentence_encode/Sentence_encode_elmo3_1400.html

Large diffs are not rendered by default.

85 changes: 85 additions & 0 deletions Sentence_encode/Sentence_encode_elmo3_1500.html

Large diffs are not rendered by default.

85 changes: 85 additions & 0 deletions Sentence_encode/Sentence_encode_elmo3_1600.html

Large diffs are not rendered by default.

85 changes: 85 additions & 0 deletions Sentence_encode/Sentence_encode_elmo3_1700.html

Large diffs are not rendered by default.

85 changes: 85 additions & 0 deletions Sentence_encode/Sentence_encode_elmo3_1800.html

Large diffs are not rendered by default.

85 changes: 85 additions & 0 deletions Sentence_encode/Sentence_encode_elmo3_1900.html

Large diffs are not rendered by default.

85 changes: 85 additions & 0 deletions Sentence_encode/Sentence_encode_elmo3_200.html

Large diffs are not rendered by default.

85 changes: 85 additions & 0 deletions Sentence_encode/Sentence_encode_elmo3_2000.html

Large diffs are not rendered by default.

85 changes: 85 additions & 0 deletions Sentence_encode/Sentence_encode_elmo3_2100.html

Large diffs are not rendered by default.

85 changes: 85 additions & 0 deletions Sentence_encode/Sentence_encode_elmo3_2200.html

Large diffs are not rendered by default.

85 changes: 85 additions & 0 deletions Sentence_encode/Sentence_encode_elmo3_2300.html

Large diffs are not rendered by default.

85 changes: 85 additions & 0 deletions Sentence_encode/Sentence_encode_elmo3_2400.html

Large diffs are not rendered by default.

85 changes: 85 additions & 0 deletions Sentence_encode/Sentence_encode_elmo3_250.html

Large diffs are not rendered by default.

85 changes: 85 additions & 0 deletions Sentence_encode/Sentence_encode_elmo3_2500.html

Large diffs are not rendered by default.

85 changes: 85 additions & 0 deletions Sentence_encode/Sentence_encode_elmo3_2600.html

Large diffs are not rendered by default.

85 changes: 85 additions & 0 deletions Sentence_encode/Sentence_encode_elmo3_2700.html

Large diffs are not rendered by default.

85 changes: 85 additions & 0 deletions Sentence_encode/Sentence_encode_elmo3_2800.html

Large diffs are not rendered by default.

85 changes: 85 additions & 0 deletions Sentence_encode/Sentence_encode_elmo3_2900.html

Large diffs are not rendered by default.

85 changes: 85 additions & 0 deletions Sentence_encode/Sentence_encode_elmo3_300.html

Large diffs are not rendered by default.

85 changes: 85 additions & 0 deletions Sentence_encode/Sentence_encode_elmo3_3000.html

Large diffs are not rendered by default.

85 changes: 85 additions & 0 deletions Sentence_encode/Sentence_encode_elmo3_3100.html

Large diffs are not rendered by default.

85 changes: 85 additions & 0 deletions Sentence_encode/Sentence_encode_elmo3_3200.html

Large diffs are not rendered by default.

85 changes: 85 additions & 0 deletions Sentence_encode/Sentence_encode_elmo3_3300.html

Large diffs are not rendered by default.

85 changes: 85 additions & 0 deletions Sentence_encode/Sentence_encode_elmo3_3400.html

Large diffs are not rendered by default.

85 changes: 85 additions & 0 deletions Sentence_encode/Sentence_encode_elmo3_3500.html

Large diffs are not rendered by default.

85 changes: 85 additions & 0 deletions Sentence_encode/Sentence_encode_elmo3_3600.html

Large diffs are not rendered by default.

85 changes: 85 additions & 0 deletions Sentence_encode/Sentence_encode_elmo3_3700.html

Large diffs are not rendered by default.

85 changes: 85 additions & 0 deletions Sentence_encode/Sentence_encode_elmo3_3800.html

Large diffs are not rendered by default.

85 changes: 85 additions & 0 deletions Sentence_encode/Sentence_encode_elmo3_3900.html

Large diffs are not rendered by default.

85 changes: 85 additions & 0 deletions Sentence_encode/Sentence_encode_elmo3_400.html

Large diffs are not rendered by default.

85 changes: 85 additions & 0 deletions Sentence_encode/Sentence_encode_elmo3_4000.html

Large diffs are not rendered by default.

85 changes: 85 additions & 0 deletions Sentence_encode/Sentence_encode_elmo3_500.html

Large diffs are not rendered by default.

85 changes: 85 additions & 0 deletions Sentence_encode/Sentence_encode_elmo3_600.html

Large diffs are not rendered by default.

85 changes: 85 additions & 0 deletions Sentence_encode/Sentence_encode_elmo3_700.html

Large diffs are not rendered by default.

85 changes: 85 additions & 0 deletions Sentence_encode/Sentence_encode_elmo3_800.html

Large diffs are not rendered by default.

85 changes: 85 additions & 0 deletions Sentence_encode/Sentence_encode_elmo3_900.html

Large diffs are not rendered by default.

85 changes: 85 additions & 0 deletions Sentence_encode/Sentence_encode_painting-and-arms.html

Large diffs are not rendered by default.

85 changes: 85 additions & 0 deletions Sentence_encode/Sentence_encode_painting-and-varnish.html

Large diffs are not rendered by default.

85 changes: 85 additions & 0 deletions Sentence_encode/Sentence_encode_painting.html

Large diffs are not rendered by default.

85 changes: 85 additions & 0 deletions Sentence_encode/Sentence_encode_tricks-and-arms.html

Large diffs are not rendered by default.

85 changes: 85 additions & 0 deletions Sentence_encode/Sentence_encode_tricks-and-casting.html

Large diffs are not rendered by default.

85 changes: 85 additions & 0 deletions Sentence_encode/Sentence_encode_tricks-and-painting.html

Large diffs are not rendered by default.

85 changes: 85 additions & 0 deletions Sentence_encode/Sentence_encode_varnish.html

Large diffs are not rendered by default.

85 changes: 85 additions & 0 deletions Sentence_encode/entries_encode.html

Large diffs are not rendered by default.

61 changes: 61 additions & 0 deletions Tree/body_part_terms.csv

Large diffs are not rendered by default.

476 changes: 476 additions & 0 deletions Tree/material_terms.csv

Large diffs are not rendered by default.

303 changes: 303 additions & 0 deletions Tree/tool_terms.csv

Large diffs are not rendered by default.

Binary file added __pycache__/digital_manuscript.cpython-37.pyc
Binary file not shown.
Binary file added __pycache__/manuscript_helpers.cpython-37.pyc
Binary file not shown.
Binary file added __pycache__/margin.cpython-37.pyc
Binary file not shown.
Binary file added __pycache__/recipe.cpython-37.pyc
Binary file not shown.
14 changes: 6 additions & 8 deletions digital_manuscript.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -26,18 +26,16 @@
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": 2,
"metadata": {},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"'Black varnish for sword guard, bands for trunks, &c'"
]
"text/plain": "'Black varnish for sword guard, bands for trunks, &c'"
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
"execution_count": 2
}
],
"source": [
Expand Down Expand Up @@ -5644,9 +5642,9 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.4"
"version": "3.7.3-final"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
}
6 changes: 3 additions & 3 deletions manuscript_helpers.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,8 @@
'profession', 'sensory', 'tool', 'time', 'weapon']

cwd = os.getcwd()
m_path = cwd if 'manuscript-object' not in cwd else f'{cwd}/../m-k-manuscript-data'
m_k_data_to_thesaurus = f'{m_path}/manuscript-object/thesaurus'
m_path = cwd if 'manuscript-object' not in cwd else f'{cwd}/..'
m_k_data_to_thesaurus = f'{cwd}/thesaurus'

def use_thesaurus(entries: Dict[str, Recipe]) -> List[Recipe]:
"""
Expand All @@ -29,7 +29,7 @@ def use_thesaurus(entries: Dict[str, Recipe]) -> List[Recipe]:
"""
if not os.path.exists(m_k_data_to_thesaurus):
print('Thesaurus not found. Generating now.')
os.system(f'python {cwd}/manuscript-object/thesaurus.py')
os.system(f'python {cwd}/thesaurus.py')
print('Finished Generating Thesaurus')

# manual_corrections = pd.read_csv('manual_vocab.csv')
Expand Down
Loading

0 comments on commit 5abf324

Please sign in to comment.