The project consists of a Jupyter notebook that:
- Extract text from PDF files containing candidate CVs and a job position description.
- Normalize and structure the extracted text using OpenAI's GPT model.
- Evaluate candidates by matching their CVs against the job position description.
- Output the results in JSON format for analysis in Power BI.
Before you begin, ensure you have the following:
- Installing Python 3.7 or Higher: Download Python
- Installing Visual Studio Code (VS Code) a. VS Code Official Download Page: Download Visual Studio Code b. Microsoft: Set up VS Code - Official setup overview from Microsoft.
- Installing Git a. Git for Windows: Download Git
If you're unfamiliar with Git, these steps will guide you through cloning the repository using VS Code.
- Open VS Code.
- Open the Command Palette by pressing Ctrl+Shift+P.
- Type "Git: Clone" and select it.
- Enter the Repository URL: https://github.com/OscarValerock/BIBB-PBI-CV-AI-Analysis.git
- Choose a Local Directory: Select a folder on your computer to store the project.
- Open the Repository: VS Code will prompt you to open the repository once cloned. Click Open.
Once you've cloned the repository and have your project open in Visual Studio Code, it's best practice to create a virtual environment for your project. This isolates the required Python packages, making your setup more stable and organized. Here's how to do it using the VS Code Command Palette:
- In the Command Palette (Ctrl+Shift+P), type Python: Create Environment and select Create Environment.
- VS Code will prompt you to select a folder. Select your project folder (the cloned repository) and choose venv as the virtual environment type.
- Accept to install the packages from requirements.txt
Create a file named Constants.py in your project's root directory. This file will store your OpenAI API key.
OpenAIKey = "your-openai-api-key"
Important: Replace "your-openai-api-key" with your actual OpenAI API key. Keep this file secure and avoid sharing it publicly.
graph TD
subgraph "VS Code - Jupyter"
A[Start]
B[Extract text from CV PDFs]
A --> B
B --> C[OCR_Results.json]
F[Extract text from Position PDF]
A --> F
F --> G[OCR_Position.json]
E[LLM_Normalized_CV.json]
I[LLM_Position.json]
K[LLM_Analysis.json]
subgraph "OpenAI"
D[Summarize CVs]
H[Normalize Position Description]
J[Evaluate Candidates]
end
C --> D
D --> E
G --> H
H --> I
E --> J
I --> J
J --> K
end
E --> L[Analyze with Power BI]
I --> L
K --> L
%% Define styles
classDef openai color:#FFFFFF, fill:#FF0000,stroke:#FF0000,stroke-width:2px;
classDef vscode color:#FFFFFF, fill:#008000,stroke:#008000,stroke-width:2px;
classDef powerbi color:#000000, fill:#FFD700,stroke:#FFD700,stroke-width:2px;
%% Apply styles
class D,H,J openai
class A,B,C,E,F,G,I,K vscode
class L powerbi