Skip to content

Grid Files

Keith Alcock edited this page May 18, 2023 · 7 revisions

As described in the official documentation, there are two sets of files found in the Grid's process_files directory. One set has to do with each loaded corpus, containing all the texts, vectors, distances, etc. of the corpus. The other set relates to each specific grid and includes column names, cell locations, and the like. This page explains in detail how to create these files and what is in them. The process actually begins outside of the process_files directory with what we'll call "input files" containing the texts that should appear in the Grid, so we'll start there first.

Input Files

The Grid takes as raw materials a collection of (text or convertible to text) files (documents) within a single directory (folder) which can be processed by the Grid's backend when it is provided with the path to that directory. This documentation will use an example directory, Expert Interviews, located somewhere outside the location of existing Grid files.

image

In the directory there are three documents, Expert A.txt, Expert B.txt, and Expert C.txt, with contents as follows:

Expert A.txt

Sentence 1 from Expert A is about grain. Sentence 2 from Expert A is about rainfall. Sentence 3 from Expert A is about economy.

Expert B.txt

Sentence 1 from Expert B is about livestock. Sentence 2 from Expert B is about irrigation. Sentence 3 from Expert B is about finance.

Expert C.txt

Sentence 1 from Expert C is about galamsey. Sentence 2 from Expert C is about machinery. Sentence 3 from Expert C is about transportation.

Per Corpus Files

Input files are converted into per corpus files by a process that begins with the Grid's Gallery view which displays the collection of known grids and links to "Create new Grid!" and "Upload or update corpus". The Gallery is the first thing you should see when starting up the Grid.

image

Click on "Upload or update corpus" and enter the path to the directory containing your collection of input files. Include a trailing slash on the path. It may be easiest to use an absolute path, but it is not required. Click the "Ready!" button. If the operation is successful, positive messages should appear below the button. You'll need the names of the corpus and row label files later, so make note of them.

image

The process should result in five files having been added to the process_files directory. Their names are built from [your_corpus_name] which in this case is "Expert Interviews".

image

Expert Interviews.csv contains all the parsed sentences from the input files.

,sentence
0,"Sentence 1 from Expert B is about livestock.
"
1,"Sentence 2 from Expert B is about irrigation.
"
2,"Sentence 3 from Expert B is about finance.
"
3,"Sentence 1 from Expert C is about galamsey.
"
4,"Sentence 2 from Expert C is about machinery.
"
5,"Sentence 3 from Expert C is about transportation.
"
6,"Sentence 1 from Expert A is about grain.
"
7,"Sentence 2 from Expert A is about rainfall.
"
8,"Sentence 3 from Expert A is about economy.
"

Expert Interviews_row_labels.csv includes the row labels for each sentence. They are derived from the filenames used for input. Also included is a stripped version of each sentence which contains only those words which will be used in calculating vectors and distances.

,Unnamed: 0,readable,Expert A,Expert B,Expert C,all,stripped
0,0,"Sentence 1 from Expert B is about livestock.
",0,1,0,1,sentence expert b livestock
1,1,"Sentence 2 from Expert B is about irrigation.
",0,1,0,1,sentence expert b irrigation
2,2,"Sentence 3 from Expert B is about finance.
",0,1,0,1,sentence expert b finance
3,3,"Sentence 1 from Expert C is about galamsey.
",0,0,1,1,sentence expert c galamsey
4,4,"Sentence 2 from Expert C is about machinery.
",0,0,1,1,sentence expert c machinery
5,5,"Sentence 3 from Expert C is about transportation.
",0,0,1,1,sentence expert c transportation
6,6,"Sentence 1 from Expert A is about grain.
",1,0,0,1,sentence expert grain
7,7,"Sentence 2 from Expert A is about rainfall.
",1,0,0,1,sentence expert rainfall
8,8,"Sentence 3 from Expert A is about economy.
",1,0,0,1,sentence expert economy

cleaned_Expert Interviews.csv contains similar information.

,stripped,readable
0,sentence expert b livestock,Sentence 1 from Expert B is about livestock.
1,sentence expert b irrigation,Sentence 2 from Expert B is about irrigation.
2,sentence expert b finance,Sentence 3 from Expert B is about finance.
3,sentence expert c galamsey,Sentence 1 from Expert C is about galamsey.
4,sentence expert c machinery,Sentence 2 from Expert C is about machinery.
5,sentence expert c transportation,Sentence 3 from Expert C is about transportation.
6,sentence expert grain,Sentence 1 from Expert A is about grain.
7,sentence expert rainfall,Sentence 2 from Expert A is about rainfall.
8,sentence expert economy,Sentence 3 from Expert A is about economy.

cleaned_Expert Interviews_doc_distances_lem.npy is a binary, numpy file containing distances used for clustering.

Finally, cleaned_Expert Interviews_doc_vecs_lem.json contains all the vectors in json format. Although the file contains text, even for this small corpus it is too long to reproduce here.

Per Grid Files

The next step is to use the corpus to generate a grid. For that, return to the Gallery view and click on "Create new Grid!", which should bring up a new dialog with four questions. The answers to the first two questions about the corpus and row labels can be taken from the values displayed by the dialog from the previous step. They are again derived from [your_corpus_name]. The other two questions are answered here with example responses so that the files that will result match what is shown here. Click the "Ready!" button

image

If it all works, you should soon see a grid. However, the per grid files will not yet have been saved. To do that, you need to click on the diskette icon in the upper right corner of the browser window.

image

Now there should be five additional files in the process_files directory. Their names are based on the grid name that was used, here "Expert Grid".

image

Expert Grid_specs.csv contains the roadmap that connects the grid to its corpus. The existence of this file will trigger a grid's appearance in the Gallery's list of grids as shown at the bottom of this page.

,anchor,row_filename,corpus,filename
0,expert,Expert Interviews_row_labels,cleaned_Expert Interviews,Expert Grid

Expert Grid_documents.csv is very similar to Expert Interviews_row_labels.csv.

,readable,stripped,Expert A,Expert B,Expert C,all
0,0. Sentence 1 from Expert B is about livestock.,sentence expert b livestock,False,True,False,True
1,1. Sentence 2 from Expert B is about irrigation.,sentence expert b irrigation,False,True,False,True
2,2. Sentence 3 from Expert B is about finance.,sentence expert b finance,False,True,False,True
3,3. Sentence 1 from Expert C is about galamsey.,sentence expert c galamsey,False,False,True,True
4,4. Sentence 2 from Expert C is about machinery.,sentence expert c machinery,False,False,True,True
5,5. Sentence 3 from Expert C is about transportation.,sentence expert c transportation,False,False,True,True
6,6. Sentence 1 from Expert A is about grain.,sentence expert grain,True,False,False,True
7,7. Sentence 2 from Expert A is about rainfall.,sentence expert rainfall,True,False,False,True
8,8. Sentence 3 from Expert A is about economy.,sentence expert economy,True,False,False,True

Expert Grid_cells.csv contains information on a per cell basis, recording things like the row, column, text, and whether it is frozen or not.

,row,col,frozen_col,readable,seeded_doc
0,Expert A,grain | sentence,False,6. Sentence 1 from Expert A is about grain.,False
1,Expert A,economy | sentence,False,8. Sentence 3 from Expert A is about economy.,False
2,Expert A,rainfall | sentence,False,7. Sentence 2 from Expert A is about rainfall.,False
3,Expert B,sentence | c,False,0. Sentence 1 from Expert B is about livestock.,False
4,Expert B,sentence | c,False,1. Sentence 2 from Expert B is about irrigation.,False
5,Expert B,sentence | c,False,2. Sentence 3 from Expert B is about finance.,False
6,Expert C,sentence | c,False,3. Sentence 1 from Expert C is about galamsey.,False
7,Expert C,sentence | c,False,4. Sentence 2 from Expert C is about machinery.,False
8,Expert C,sentence | c,False,5. Sentence 3 from Expert C is about transportation.,False
9,all,grain | sentence,False,6. Sentence 1 from Expert A is about grain.,False
10,all,economy | sentence,False,8. Sentence 3 from Expert A is about economy.,False
11,all,rainfall | sentence,False,7. Sentence 2 from Expert A is about rainfall.,False
12,all,sentence | c,False,0. Sentence 1 from Expert B is about livestock.,False
13,all,sentence | c,False,1. Sentence 2 from Expert B is about irrigation.,False
14,all,sentence | c,False,2. Sentence 3 from Expert B is about finance.,False
15,all,sentence | c,False,3. Sentence 1 from Expert C is about galamsey.,False
16,all,sentence | c,False,4. Sentence 2 from Expert C is about machinery.,False
17,all,sentence | c,False,5. Sentence 3 from Expert C is about transportation.,False

Expert Grid_tokens.csv keeps track of each sentence's tokens.

,tokens
0,"['sentence', 'expert', 'b', 'livestock']"
1,"['sentence', 'expert', 'b', 'irrigation']"
2,"['sentence', 'expert', 'b', 'finance']"
3,"['sentence', 'expert', 'c', 'galamsey']"
4,"['sentence', 'expert', 'c', 'machinery']"
5,"['sentence', 'expert', 'c', 'transportation']"
6,"['sentence', 'expert', 'grain']"
7,"['sentence', 'expert', 'rainfall']"
8,"['sentence', 'expert', 'economy']"

Expert Grid_vectors.csv is a large file containing vectors, too large to reproduce here. Neither of these last two files is especially used by the program. They are side-effects of grid processing.

If you now return to the Gallery, you should see the new grid added to the collection. Clicking on its link should return you to the last saved version.

image

Happy gridding!