-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #1 from BU-Spark/team
Finished all backend code, ready to merge
- Loading branch information
Showing
27 changed files
with
6,041 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,69 @@ | ||
# TEMPLATE-base-repo | ||
![logo](logo.png) | ||
|
||
All Pull Requests must follow the Pull Request Template, with a title formatted like such `[Project Name]: <Descriptive Title>` | ||
# DRAINS: Deed Restriction Artificial Intelligence Notification System | ||
SPARK! x MassMutual Data Days for Good | ||
|
||
Created by Alessandra Lanz, Sahir Doshi, Cindy Zhang, Vijay Fisch, Sindhuja Kumar, Naman Nagaria, Valentina Haddad | ||
|
||
## Project Overview | ||
This project, developed for the [Longmeadow Historical Society](https://www.longmeadowhistoricalsociety.org), introduces an automated tool designed to identify racist restrictions within historical property deeds. Utilizing advanced text analysis techniques, the program processes TIFF images of property deeds, evaluates the text for racist content, and extracts critical information—specifically the deed date and page number—into a CSV format for efficient access and analysis. | ||
|
||
### Key Features | ||
|
||
- Image Processing: Accepts property deed images in TIFF format. | ||
- Content Analysis: Employs text recognition and analysis algorithms to detect racist language. | ||
- Data Extraction: Automates the extraction of deed date and page number for each document analyzed. | ||
|
||
Our aim is to assist the Longmeadow Historical Society in their efforts to document and understand historical injustices, contributing to a broader societal recognition and rectification of past discriminations. | ||
|
||
### Dataset Used | ||
The historical property deeds (mainly 1900s) of Massachusetts. | ||
|
||
## Quick Start | ||
### Requirements | ||
Install essential libraries: | ||
``` | ||
pip install -r requirements.txt | ||
``` | ||
|
||
### Set up OpenAI_API_KEY | ||
In folder `modules`: | ||
|
||
1. Duplicate the file `env.template` | ||
|
||
2. Add your `api key` and `organization id` to `OPENAI_API_KEY` and `OPENAI_ORG_ID`. You can get your api key and organization ID via the link: https://platform.openai.com/api-keys, | ||
https://platform.openai.com/account/organization | ||
|
||
3. Rename this file to `.env` | ||
|
||
> For different ChatGPT versions, you can change the `model` parameter in `racist_chatgpt_analysis.py`. | ||
It's on line 13: | ||
`model="gpt-3.5-turbo"` | ||
To access ChatGPT-4, you can update this line to: | ||
`model="gpt-4-0125-preview"` | ||
|
||
### Run the code | ||
In file `main.py`, change the folder path to your path(line 36). | ||
```python | ||
racism_threshold('/Your/Path/To/Files') | ||
``` | ||
For the **Windows Operating System**, you need to edit the path manually to make sure all slashes are **backslashes**. | ||
|
||
Then in command line, run: | ||
``` | ||
python main.py | ||
``` | ||
|
||
## Modules Overview | ||
|
||
`OCR.py`: Employs Google's OCR (Optical Character Recognition) technology, via the PyTesseract library, to convert deed images in TIFF format to searchable and analyzable text. | ||
|
||
`bigotry_dict.py`: Contains a hardcoded dictionary of terms associated with racist language that is used to scrutinize the deed text for potential matches. | ||
|
||
`locate.py`: Utilizes PyTesseract OCR to identify and extract specific information from the deed text, such as the deed date, book of origin, and page number. | ||
|
||
`racist_chatgpt_analysis.py`: Integrates with OpenAI's ChatGPT API to process the text-based deeds for advanced racism detection, offering a nuanced analysis that goes beyond keyword matching. | ||
|
||
`racist_text_query.py`: A failsafe text query module that acts as a backup for the ChatGPT analysis, manually checking deeds against the bigotry dictionary to ensure no instances of racist language are overlooked. | ||
|
||
`pagenum.py`: A failsafe page number extraction module that acts as a backup for the data extraction done by `locate.py` by cropping the corners of the image for enlargement and easy OCR translation. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
module.exports = { | ||
root: true, | ||
env: { browser: true, es2020: true }, | ||
extends: [ | ||
'eslint:recommended', | ||
'plugin:react/recommended', | ||
'plugin:react/jsx-runtime', | ||
'plugin:react-hooks/recommended', | ||
], | ||
ignorePatterns: ['dist', '.eslintrc.cjs'], | ||
parserOptions: { ecmaVersion: 'latest', sourceType: 'module' }, | ||
settings: { react: { version: '18.2' } }, | ||
plugins: ['react-refresh'], | ||
rules: { | ||
'react/jsx-no-target-blank': 'off', | ||
'react-refresh/only-export-components': [ | ||
'warn', | ||
{ allowConstantExport: true }, | ||
], | ||
}, | ||
} |
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
# React + Vite | ||
|
||
This template provides a minimal setup to get React working in Vite with HMR and some ESLint rules. | ||
|
||
Currently, two official plugins are available: | ||
|
||
- [@vitejs/plugin-react](https://github.com/vitejs/vite-plugin-react/blob/main/packages/plugin-react/README.md) uses [Babel](https://babeljs.io/) for Fast Refresh | ||
- [@vitejs/plugin-react-swc](https://github.com/vitejs/vite-plugin-react-swc) uses [SWC](https://swc.rs/) for Fast Refresh |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
<!doctype html> | ||
<html lang="en"> | ||
<head> | ||
<meta charset="UTF-8" /> | ||
<link rel="icon" type="image/svg+xml" href="/vite.svg" /> | ||
<meta name="viewport" content="width=device-width, initial-scale=1.0" /> | ||
<link rel="stylesheet" href="https://fonts.googleapis.com/css2?family=Material+Symbols+Outlined:opsz,wght,FILL,[email protected],100..700,0..1,-50..200" /> | ||
<link href="https://fonts.googleapis.com/css2?family=Merriweather:ital,wght@0,300;0,400;0,700;0,900;1,300;1,400;1,700;1,900&display=swap" rel="stylesheet"> | ||
<title>Vite + React</title> | ||
</head> | ||
<body> | ||
<div id="root"></div> | ||
<script type="module" src="/src/main.jsx"></script> | ||
</body> | ||
</html> |
Oops, something went wrong.