-
Notifications
You must be signed in to change notification settings - Fork 6
Grid Files
As described in the official documentation, there are two sets of files found in the Grid's process_files
directory. One set has to do with each loaded corpus, containing all the texts, vectors, distances, etc. of the corpus. The other set relates to each specific grid and includes column names, cell locations, and the like. This page explains in detail how to create these files and what is in them. The process actually begins outside of the process_files
directory with what we'll call "input files" containing the texts that should appear in the Grid, so we'll start there first.
The Grid takes as raw materials a collection of (text or convertible to text) files (documents) within a single directory (folder) which can be processed by the Grid's backend when it is provided with the path to that directory. This documentation will use an example directory, Expert Interviews
, located somewhere outside the location of existing Grid files.
In the directory there are three documents, Expert A.txt, Expert B.txt, and Expert C.txt, with contents as follows:
Sentence 1 from Expert A is about grain. Sentence 2 from Expert A is about rainfall. Sentence 3 from Expert A is about economy.
Sentence 1 from Expert B is about livestock. Sentence 2 from Expert B is about irrigation. Sentence 3 from Expert B is about finance.
Sentence 1 from Expert C is about galamsey. Sentence 2 from Expert C is about machinery. Sentence 3 from Expert C is about transportation.
Input files are converted into per corpus files by a process that begins with the Grid's Gallery view which displays the collection of known grids and links to "Create new Grid!" and "Upload or update corpus". The Gallery is the first thing you should see when starting up the Grid.
Click on "Upload or update corpus" and enter the path to the directory containing your collection of input files. Include a trailing slash on the path. It may be easiest to use an absolute path, but it is not required. Click the "Ready!" button. If the operation is successful, positive messages should appear below the button. You'll need the names of the corpus and row label files later, so make note of them.
The process should result in five files having been added to the process_files
directory. Their names are built from [your_corpus_name]
which in this case is "Expert Interviews".
Expert Interviews.csv contains all the parsed sentences from the input files.
,sentence
0,"Sentence 1 from Expert B is about livestock.
"
1,"Sentence 2 from Expert B is about irrigation.
"
2,"Sentence 3 from Expert B is about finance.
"
3,"Sentence 1 from Expert C is about galamsey.
"
4,"Sentence 2 from Expert C is about machinery.
"
5,"Sentence 3 from Expert C is about transportation.
"
6,"Sentence 1 from Expert A is about grain.
"
7,"Sentence 2 from Expert A is about rainfall.
"
8,"Sentence 3 from Expert A is about economy.
"
Expert Interviews_row_labels.csv includes the row labels for each sentence. They are derived from the filenames used for input. Also included is a stripped version of each sentence which contains only those words which will be used in calculating vectors and distances.
,Unnamed: 0,readable,Expert A,Expert B,Expert C,all,stripped
0,0,"Sentence 1 from Expert B is about livestock.
",0,1,0,1,sentence expert b livestock
1,1,"Sentence 2 from Expert B is about irrigation.
",0,1,0,1,sentence expert b irrigation
2,2,"Sentence 3 from Expert B is about finance.
",0,1,0,1,sentence expert b finance
3,3,"Sentence 1 from Expert C is about galamsey.
",0,0,1,1,sentence expert c galamsey
4,4,"Sentence 2 from Expert C is about machinery.
",0,0,1,1,sentence expert c machinery
5,5,"Sentence 3 from Expert C is about transportation.
",0,0,1,1,sentence expert c transportation
6,6,"Sentence 1 from Expert A is about grain.
",1,0,0,1,sentence expert grain
7,7,"Sentence 2 from Expert A is about rainfall.
",1,0,0,1,sentence expert rainfall
8,8,"Sentence 3 from Expert A is about economy.
",1,0,0,1,sentence expert economy
cleaned_Expert Interviews.csv contains similar information.
,stripped,readable
0,sentence expert b livestock,Sentence 1 from Expert B is about livestock.
1,sentence expert b irrigation,Sentence 2 from Expert B is about irrigation.
2,sentence expert b finance,Sentence 3 from Expert B is about finance.
3,sentence expert c galamsey,Sentence 1 from Expert C is about galamsey.
4,sentence expert c machinery,Sentence 2 from Expert C is about machinery.
5,sentence expert c transportation,Sentence 3 from Expert C is about transportation.
6,sentence expert grain,Sentence 1 from Expert A is about grain.
7,sentence expert rainfall,Sentence 2 from Expert A is about rainfall.
8,sentence expert economy,Sentence 3 from Expert A is about economy.
cleaned_Expert Interviews_doc_distances_lem.npy is a binary, numpy file containing distances used for clustering.
Finally, cleaned_Expert Interviews_doc_vecs_lem.json contains all the vectors in json format. Although the file contains text, even for this small corpus it is too long to reproduce here.
The next step is to use the corpus to generate a grid. For that, return to the Gallery view and click on "Create new Grid!", which should bring up a new dialog with four questions. The answers to the first two questions about the corpus and row labels can be taken from the values displayed by the dialog from the previous step. They are again derived from [your_corpus_name]
. The other two questions are answered here with example responses so that the files that will result match what is shown here. Click the "Ready!" button
If it all works, you should soon see a grid. However, the per grid files will not yet have been saved. To do that, you need to click on the diskette icon in the upper right corner of the browser window.
Now there should be five additional files in the process_files
directory. Their names are based on the grid name that was used, here "Expert Grid".
Expert Grid_specs.csv contains the roadmap that connects the grid to its corpus. The existence of this file will trigger a grid's appearance in the Gallery's list of grids as shown at the bottom of this page.
,anchor,row_filename,corpus,filename
0,expert,Expert Interviews_row_labels,cleaned_Expert Interviews,Expert Grid
Expert Grid_documents.csv is very similar to Expert Interviews_row_labels.csv
.
,readable,stripped,Expert A,Expert B,Expert C,all
0,0. Sentence 1 from Expert B is about livestock.,sentence expert b livestock,False,True,False,True
1,1. Sentence 2 from Expert B is about irrigation.,sentence expert b irrigation,False,True,False,True
2,2. Sentence 3 from Expert B is about finance.,sentence expert b finance,False,True,False,True
3,3. Sentence 1 from Expert C is about galamsey.,sentence expert c galamsey,False,False,True,True
4,4. Sentence 2 from Expert C is about machinery.,sentence expert c machinery,False,False,True,True
5,5. Sentence 3 from Expert C is about transportation.,sentence expert c transportation,False,False,True,True
6,6. Sentence 1 from Expert A is about grain.,sentence expert grain,True,False,False,True
7,7. Sentence 2 from Expert A is about rainfall.,sentence expert rainfall,True,False,False,True
8,8. Sentence 3 from Expert A is about economy.,sentence expert economy,True,False,False,True
Expert Grid_cells.csv contains information on a per cell basis, recording things like the row, column, text, and whether it is frozen or not.
,row,col,frozen_col,readable,seeded_doc
0,Expert A,grain | sentence,False,6. Sentence 1 from Expert A is about grain.,False
1,Expert A,economy | sentence,False,8. Sentence 3 from Expert A is about economy.,False
2,Expert A,rainfall | sentence,False,7. Sentence 2 from Expert A is about rainfall.,False
3,Expert B,sentence | c,False,0. Sentence 1 from Expert B is about livestock.,False
4,Expert B,sentence | c,False,1. Sentence 2 from Expert B is about irrigation.,False
5,Expert B,sentence | c,False,2. Sentence 3 from Expert B is about finance.,False
6,Expert C,sentence | c,False,3. Sentence 1 from Expert C is about galamsey.,False
7,Expert C,sentence | c,False,4. Sentence 2 from Expert C is about machinery.,False
8,Expert C,sentence | c,False,5. Sentence 3 from Expert C is about transportation.,False
9,all,grain | sentence,False,6. Sentence 1 from Expert A is about grain.,False
10,all,economy | sentence,False,8. Sentence 3 from Expert A is about economy.,False
11,all,rainfall | sentence,False,7. Sentence 2 from Expert A is about rainfall.,False
12,all,sentence | c,False,0. Sentence 1 from Expert B is about livestock.,False
13,all,sentence | c,False,1. Sentence 2 from Expert B is about irrigation.,False
14,all,sentence | c,False,2. Sentence 3 from Expert B is about finance.,False
15,all,sentence | c,False,3. Sentence 1 from Expert C is about galamsey.,False
16,all,sentence | c,False,4. Sentence 2 from Expert C is about machinery.,False
17,all,sentence | c,False,5. Sentence 3 from Expert C is about transportation.,False
Expert Grid_tokens.csv keeps track of each sentence's tokens.
,tokens
0,"['sentence', 'expert', 'b', 'livestock']"
1,"['sentence', 'expert', 'b', 'irrigation']"
2,"['sentence', 'expert', 'b', 'finance']"
3,"['sentence', 'expert', 'c', 'galamsey']"
4,"['sentence', 'expert', 'c', 'machinery']"
5,"['sentence', 'expert', 'c', 'transportation']"
6,"['sentence', 'expert', 'grain']"
7,"['sentence', 'expert', 'rainfall']"
8,"['sentence', 'expert', 'economy']"
Expert Grid_vectors.csv is a large file containing vectors, too large to reproduce here. Neither of these last two files is especially used by the program. They are side-effects of grid processing.
If you now return to the Gallery, you should see the new grid added to the collection. Clicking on its link should return you to the last saved version.
Happy gridding!
- Datasets
- Grid
- Habitus Application
- Other