From 39976ed5f12dd8070420629ae9006a8fbf5c3ba9 Mon Sep 17 00:00:00 2001
From: sergeypanin1994 <sergeypanin1994@gmail.com>
Date: Wed, 22 Jan 2025 01:35:48 +0300
Subject: [PATCH] fix(docs): corrected grammar and consistency in CHAI Python
 Loaders documentation
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

- Fixed capitalization of "Python"
- Changed "entrypoint" to "entry point" for correct spelling
- Replaced "we'd run" with "we run" for clarity
- Reworded numerical reference: "3" → "three"
- Improved phrasing for scheduling utility description
- Clarified example usage sentence structure
---
 core/README.md | 77 +++++++++++++++++++++++---------------------------
 1 file changed, 35 insertions(+), 42 deletions(-)

diff --git a/core/README.md b/core/README.md
index d208162..3b00c19 100644
--- a/core/README.md
+++ b/core/README.md
@@ -1,81 +1,74 @@
 # Core Tools for CHAI Python Loaders
 
-This directory contains a set of core tools and utilities to facilitate loading the CHAI
-database with package manager data, using python helpers. These tools provide a common
-foundation for fetching, transforming, and loading data from various package managers
-into the database.
+This directory contains a set of core tools and utilities to facilitate loading the CHAI database with package manager data using Python helpers. These tools provide a common foundation for fetching, transforming, and loading data from various package managers into the database.
 
 ## Key Components
 
 ### 1. [Config](config.py)
 
-Config always runs first, and is the entrypoint for all loaders. It includes;
+Config always runs first and serves as the entry point for all loaders. It includes:
 
-- Execution flags:
-  - `FETCH` determines whether we request the data from source
-  - `TEST` enables a test mode, to test specific portions of the pipeline
-  - `NO_CACHE` to determine whether we save the intermediate pipeline files
-- Package Manager flags
-  - `pm_id` gets the package manager id from the db, that we'd run the pipeline for
-  - `source` is the data source for that package manager. `SOURCES` defines the map.
+- **Execution flags:**
+  - `FETCH`: Determines whether to request data from the source.
+  - `TEST`: Enables a test mode for specific portions of the pipeline.
+  - `NO_CACHE`: Specifies whether intermediate pipeline files should be saved.
 
-The next 3 configuration classes retrieve the IDs for url types (homepage, documentation,
-etc.), dependency types (build, runtime, etc.) and user types (crates user, github user)
+- **Package Manager flags:**
+  - `pm_id`: Retrieves the package manager ID from the database for which the pipeline will be executed.
+  - `source`: Defines the data source for the package manager. `SOURCES` contains the mapping.
+
+The next three configuration classes retrieve the IDs for:
+- URL types (homepage, documentation, etc.).
+- Dependency types (build, runtime, etc.).
+- User types (Crates user, GitHub user).
 
 ### 2. [Database](db.py)
 
-The DB class offers a set of methods for interacting with the database, including:
+The `DB` class provides a set of methods for interacting with the database, including:
 
-- Inserting and selecting data for packages, versions, users, dependencies, and more
-- Caching mechanisms to improve performance
-- Batch processing capabilities for efficient data insertion
+- Inserting and selecting data for packages, versions, users, dependencies, and more.
+- Caching mechanisms to improve performance.
+- Batch processing for efficient data insertion.
 
 ### 3. [Fetcher](fetcher.py)
 
-The Fetcher class provides functionality for downloading and extracting data from
-package manager sources. It supports:
+The `Fetcher` class provides functionality for downloading and extracting data from package manager sources. It supports:
 
-- Downloading tarball files
-- Extracting contents to a specified directory
-- Maintaining a "latest" symlink so we always know where to look
+- Downloading tarball files.
+- Extracting contents to a specified directory.
+- Maintaining a "latest" symlink for easy access to the most recent data.
 
 ### 4. [Logger](logger.py)
 
-A custom logging utility that provides consistent logging across all loaders.
+A custom logging utility that ensures consistent logging across all loaders.
 
 ### 5. [Models](models/__init__.py)
 
 SQLAlchemy models representing the database schema, including:
 
-- Package, Version, User, License, DependsOn, and other relevant tables
+- `Package`, `Version`, `User`, `License`, `DependsOn`, and other relevant tables.
 
-> [!NOTE]
->
-> This is currently used to actually generate the migrations as well
+> **Note:**  
+> These models are also used to generate database migrations.
 
 ### 6. [Scheduler](scheduler.py)
 
-A scheduling utility that allows loaders to run at specified intervals.
+A scheduling utility that enables loaders to run at specified intervals.
 
 ### 7. [Transformer](transformer.py)
 
-The Transformer class provides a base for creating package manager-specific transformers.
-It includes:
+The `Transformer` class provides a base for creating package manager-specific transformers. It includes:
 
-- Methods for locating and reading input files
-- Placeholder methods for transforming data into the required format
+- Methods for locating and reading input files.
+- Placeholder methods for transforming data into the required format.
 
 ## Usage
 
 To create a new loader for a package manager:
 
 1. Create a new directory under `package_managers/` for your package manager.
-1. Implement a fetcher that inherits from the base Fetcher, that is able to fetch
-   the raw data from the package manager's source.
-1. Implement a custom Transformer class that inherits from the base Transformer, that
-   figures out how to map the raw data provided by the package managers into the data
-   model described in the [models](models/__init__.py) module.
-1. Create a main script that utilizes the core components (Config, DB, Fetcher,
-   Transformer, Scheduler) to fetch, transform, and load data.
-
-Example usage can be found in the [crates](../package_managers/crates) loader.
+2. Implement a fetcher that inherits from the base `Fetcher` and fetches raw data from the package manager's source.
+3. Implement a custom `Transformer` class that inherits from the base `Transformer` and maps raw data to the data model described in the [models](models/__init__.py) module.
+4. Create a main script that utilizes the core components (`Config`, `DB`, `Fetcher`, `Transformer`, `Scheduler`) to fetch, transform, and load data.
+
+An example implementation can be found in the [Crates](../package_managers/crates) loader.