From 39976ed5f12dd8070420629ae9006a8fbf5c3ba9 Mon Sep 17 00:00:00 2001 From: sergeypanin1994 Date: Wed, 22 Jan 2025 01:35:48 +0300 Subject: [PATCH] fix(docs): corrected grammar and consistency in CHAI Python Loaders documentation MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Fixed capitalization of "Python" - Changed "entrypoint" to "entry point" for correct spelling - Replaced "we'd run" with "we run" for clarity - Reworded numerical reference: "3" → "three" - Improved phrasing for scheduling utility description - Clarified example usage sentence structure --- core/README.md | 77 +++++++++++++++++++++++--------------------------- 1 file changed, 35 insertions(+), 42 deletions(-) diff --git a/core/README.md b/core/README.md index d208162..3b00c19 100644 --- a/core/README.md +++ b/core/README.md @@ -1,81 +1,74 @@ # Core Tools for CHAI Python Loaders -This directory contains a set of core tools and utilities to facilitate loading the CHAI -database with package manager data, using python helpers. These tools provide a common -foundation for fetching, transforming, and loading data from various package managers -into the database. +This directory contains a set of core tools and utilities to facilitate loading the CHAI database with package manager data using Python helpers. These tools provide a common foundation for fetching, transforming, and loading data from various package managers into the database. ## Key Components ### 1. [Config](config.py) -Config always runs first, and is the entrypoint for all loaders. It includes; +Config always runs first and serves as the entry point for all loaders. It includes: -- Execution flags: - - `FETCH` determines whether we request the data from source - - `TEST` enables a test mode, to test specific portions of the pipeline - - `NO_CACHE` to determine whether we save the intermediate pipeline files -- Package Manager flags - - `pm_id` gets the package manager id from the db, that we'd run the pipeline for - - `source` is the data source for that package manager. `SOURCES` defines the map. +- **Execution flags:** + - `FETCH`: Determines whether to request data from the source. + - `TEST`: Enables a test mode for specific portions of the pipeline. + - `NO_CACHE`: Specifies whether intermediate pipeline files should be saved. -The next 3 configuration classes retrieve the IDs for url types (homepage, documentation, -etc.), dependency types (build, runtime, etc.) and user types (crates user, github user) +- **Package Manager flags:** + - `pm_id`: Retrieves the package manager ID from the database for which the pipeline will be executed. + - `source`: Defines the data source for the package manager. `SOURCES` contains the mapping. + +The next three configuration classes retrieve the IDs for: +- URL types (homepage, documentation, etc.). +- Dependency types (build, runtime, etc.). +- User types (Crates user, GitHub user). ### 2. [Database](db.py) -The DB class offers a set of methods for interacting with the database, including: +The `DB` class provides a set of methods for interacting with the database, including: -- Inserting and selecting data for packages, versions, users, dependencies, and more -- Caching mechanisms to improve performance -- Batch processing capabilities for efficient data insertion +- Inserting and selecting data for packages, versions, users, dependencies, and more. +- Caching mechanisms to improve performance. +- Batch processing for efficient data insertion. ### 3. [Fetcher](fetcher.py) -The Fetcher class provides functionality for downloading and extracting data from -package manager sources. It supports: +The `Fetcher` class provides functionality for downloading and extracting data from package manager sources. It supports: -- Downloading tarball files -- Extracting contents to a specified directory -- Maintaining a "latest" symlink so we always know where to look +- Downloading tarball files. +- Extracting contents to a specified directory. +- Maintaining a "latest" symlink for easy access to the most recent data. ### 4. [Logger](logger.py) -A custom logging utility that provides consistent logging across all loaders. +A custom logging utility that ensures consistent logging across all loaders. ### 5. [Models](models/__init__.py) SQLAlchemy models representing the database schema, including: -- Package, Version, User, License, DependsOn, and other relevant tables +- `Package`, `Version`, `User`, `License`, `DependsOn`, and other relevant tables. -> [!NOTE] -> -> This is currently used to actually generate the migrations as well +> **Note:** +> These models are also used to generate database migrations. ### 6. [Scheduler](scheduler.py) -A scheduling utility that allows loaders to run at specified intervals. +A scheduling utility that enables loaders to run at specified intervals. ### 7. [Transformer](transformer.py) -The Transformer class provides a base for creating package manager-specific transformers. -It includes: +The `Transformer` class provides a base for creating package manager-specific transformers. It includes: -- Methods for locating and reading input files -- Placeholder methods for transforming data into the required format +- Methods for locating and reading input files. +- Placeholder methods for transforming data into the required format. ## Usage To create a new loader for a package manager: 1. Create a new directory under `package_managers/` for your package manager. -1. Implement a fetcher that inherits from the base Fetcher, that is able to fetch - the raw data from the package manager's source. -1. Implement a custom Transformer class that inherits from the base Transformer, that - figures out how to map the raw data provided by the package managers into the data - model described in the [models](models/__init__.py) module. -1. Create a main script that utilizes the core components (Config, DB, Fetcher, - Transformer, Scheduler) to fetch, transform, and load data. - -Example usage can be found in the [crates](../package_managers/crates) loader. +2. Implement a fetcher that inherits from the base `Fetcher` and fetches raw data from the package manager's source. +3. Implement a custom `Transformer` class that inherits from the base `Transformer` and maps raw data to the data model described in the [models](models/__init__.py) module. +4. Create a main script that utilizes the core components (`Config`, `DB`, `Fetcher`, `Transformer`, `Scheduler`) to fetch, transform, and load data. + +An example implementation can be found in the [Crates](../package_managers/crates) loader.