Merge pull request #1 from BU-Spark/team

Finished all backend code, ready to merge
BU-Spark · Mar 25, 2024 · 4b05de4 · 4b05de4
2 parents b1d6c9a + e3e5a40
commit 4b05de4
Show file tree

Hide file tree

Showing 27 changed files with 6,041 additions and 2 deletions.
diff --git a/.gitignore b/.gitignore
@@ -168,3 +168,4 @@ $RECYCLE.BIN/
 
 # .nfs files are created when an open file is removed but is still being accessed
 .nfs*
+modules/.env
diff --git a/README.md b/README.md
@@ -1,3 +1,69 @@
-# TEMPLATE-base-repo
+![logo](logo.png)
 
-All Pull Requests must follow the Pull Request Template, with a title formatted like such `[Project Name]: <Descriptive Title>`
+# DRAINS: Deed Restriction Artificial Intelligence Notification System
+SPARK! x MassMutual Data Days for Good
+
+Created by Alessandra Lanz, Sahir Doshi, Cindy Zhang, Vijay Fisch, Sindhuja Kumar, Naman Nagaria, Valentina Haddad
+
+## Project Overview
+This project, developed for the [Longmeadow Historical Society](https://www.longmeadowhistoricalsociety.org), introduces an automated tool designed to identify racist restrictions within historical property deeds. Utilizing advanced text analysis techniques, the program processes TIFF images of property deeds, evaluates the text for racist content, and extracts critical information—specifically the deed date and page number—into a CSV format for efficient access and analysis.
+
+### Key Features
+
+- Image Processing: Accepts property deed images in TIFF format.
+- Content Analysis: Employs text recognition and analysis algorithms to detect racist language.
+- Data Extraction: Automates the extraction of deed date and page number for each document analyzed.
+
+Our aim is to assist the Longmeadow Historical Society in their efforts to document and understand historical injustices, contributing to a broader societal recognition and rectification of past discriminations.
+
+### Dataset Used
+The historical property deeds (mainly 1900s) of Massachusetts.
+
+## Quick Start
+### Requirements
+Install essential libraries:
+```
+pip install -r requirements.txt
+```
+
+### Set up OpenAI_API_KEY
+In folder `modules`: 
+
+1. Duplicate the file `env.template`
+
+2. Add your `api key` and `organization id` to `OPENAI_API_KEY` and `OPENAI_ORG_ID`. You can get your api key and organization ID via the link: https://platform.openai.com/api-keys, 
+https://platform.openai.com/account/organization
+
+3. Rename this file to `.env`
+
+> For different ChatGPT versions, you can change the `model` parameter in `racist_chatgpt_analysis.py`.   
+It's on line 13:
+`model="gpt-3.5-turbo"`  
+To access ChatGPT-4, you can update this line to:
+`model="gpt-4-0125-preview"`
+
+### Run the code
+In file `main.py`, change the folder path to your path(line 36).
+```python
+racism_threshold('/Your/Path/To/Files')
+```
+For the **Windows Operating System**, you need to edit the path manually to make sure all slashes are **backslashes**. 
+
+Then in command line, run:
+```
+python main.py
+```
+
+## Modules Overview
+
+`OCR.py`: Employs Google's OCR (Optical Character Recognition) technology, via the PyTesseract library, to convert deed images in TIFF format to searchable and analyzable text.
+
+`bigotry_dict.py`: Contains a hardcoded dictionary of terms associated with racist language that is used to scrutinize the deed text for potential matches.
+
+`locate.py`: Utilizes PyTesseract OCR to identify and extract specific information from the deed text, such as the deed date, book of origin, and page number.
+
+`racist_chatgpt_analysis.py`: Integrates with OpenAI's ChatGPT API to process the text-based deeds for advanced racism detection, offering a nuanced analysis that goes beyond keyword matching.
+
+`racist_text_query.py`: A failsafe text query module that acts as a backup for the ChatGPT analysis, manually checking deeds against the bigotry dictionary to ensure no instances of racist language are overlooked.
+
+`pagenum.py`: A failsafe page number extraction module that acts as a backup for the data extraction done by `locate.py` by cropping the corners of the image for enlargement and easy OCR translation. 
diff --git a/drain/.eslintrc.cjs b/drain/.eslintrc.cjs
@@ -0,0 +1,21 @@
+module.exports = {
+  root: true,
+  env: { browser: true, es2020: true },
+  extends: [
+    'eslint:recommended',
+    'plugin:react/recommended',
+    'plugin:react/jsx-runtime',
+    'plugin:react-hooks/recommended',
+  ],
+  ignorePatterns: ['dist', '.eslintrc.cjs'],
+  parserOptions: { ecmaVersion: 'latest', sourceType: 'module' },
+  settings: { react: { version: '18.2' } },
+  plugins: ['react-refresh'],
+  rules: {
+    'react/jsx-no-target-blank': 'off',
+    'react-refresh/only-export-components': [
+      'warn',
+      { allowConstantExport: true },
+    ],
+  },
+}
diff --git a/drain/.gitignore b/drain/.gitignore
diff --git a/drain/README.md b/drain/README.md
@@ -0,0 +1,8 @@
+# React + Vite
+
+This template provides a minimal setup to get React working in Vite with HMR and some ESLint rules.
+
+Currently, two official plugins are available:
+
+- [@vitejs/plugin-react](https://github.com/vitejs/vite-plugin-react/blob/main/packages/plugin-react/README.md) uses [Babel](https://babeljs.io/) for Fast Refresh
+- [@vitejs/plugin-react-swc](https://github.com/vitejs/vite-plugin-react-swc) uses [SWC](https://swc.rs/) for Fast Refresh
diff --git a/drain/index.html b/drain/index.html
@@ -0,0 +1,15 @@
+<!doctype html>
+<html lang="en">
+  <head>
+    <meta charset="UTF-8" />
+    <link rel="icon" type="image/svg+xml" href="/vite.svg" />
+    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
+    <link rel="stylesheet" href="https://fonts.googleapis.com/css2?family=Material+Symbols+Outlined:opsz,wght,FILL,[email protected],100..700,0..1,-50..200" />
+    <link href="https://fonts.googleapis.com/css2?family=Merriweather:ital,wght@0,300;0,400;0,700;0,900;1,300;1,400;1,700;1,900&display=swap" rel="stylesheet">
+    <title>Vite + React</title>
+  </head>
+  <body>
+    <div id="root"></div>
+    <script type="module" src="/src/main.jsx"></script>
+  </body>
+</html>
Original file line number	Diff line number	Diff line change
Expand Up		@@ -168,3 +168,4 @@ $RECYCLE.BIN/

		# .nfs files are created when an open file is removed but is still being accessed
		.nfs*
		modules/.env