Skip to content

marto97/npm-monitoring

Repository files navigation

npm-monitoring

Monitor all public NPM packages.

How it works

  1. config/default.json is the main configuration file.

  2. Download all public metadata

  • creates folder medata_yyyyMMdd for example metadata_20240809in targetMetadataDirectory

  • save snapshot of metadata in format npm_packages_snapshot_yyyyMMdd.json for example npm_packages_snapshot_20240809.json

  • retry mechanism in case the download/network fails with optional parameters:

    • maxRetries - Maximum number of retries (optional).
    • delay - Delay between retries in milliseconds (optional).
  • in case of retry it uses readLastDocumentId function which reads the json file with downloaded metadata and returns the lastDocumentId

  • the variable lastDocumentId is used as a startkey to continue downloading from last downloaded package metadata

  • the retry mechanism terminates when the data is fetched and saved successfully

  1. Download source code for all public NPM packages
  • use all-the-package-names package to retrieve a list of all the public package names on npm. Includes scoped packages and updated daily.
  • replace all / symbols with + symbol to avoid unnecessary sub-directories (for scoped packages)
  • using pacote - fetches package manifests and tarballs from the npm registry
  • worker threads for multithreading to download multiple packages at the same time
  • CONCURRENCY_LIMIT uses all available CPU cores
  1. Find Malware Packages using downloaded metadata

This module provides a function to process a large JSON file and identify any packages that are flagged as "security holding packages." It reads the JSON file line by line, extracts the relevant package names, and writes them to an output file. The module is designed to handle very large JSON files (e.g., 180 GB) with efficient memory usage and logging for monitoring progress.

  • Efficient Processing: Processes large JSON files line by line to avoid high memory usage.
  • Error Handling: Catches and logs errors during JSON parsing, including the line number where the error occurred.
  • Progress Logging: Logs the progress of the operation at regular intervals, making it easier to monitor the process.
  • Output File: Extracted package names are written to a specified output file.

Parameters

  • inputFilePath: The path to the input JSON file that contains the package metadata.
  • outputFilePath: The path to the output file where the names of security holding packages will be written.
  • targetSecurityHoldingPackagesDirectory: The directory where the output file will be stored.

Monitoring

  • The function logs the start time, progress every 100,000 lines, and the end time. The function processes the file in a memory-efficient manner by reading it line by line. It logs progress every 100,000 lines to avoid performance issues due to excessive logging. If needed the logging interval can be ajdusted.

  • Errors encountered during processing are logged with the line number for easier debugging. If the function encounters any issues while processing a line, it will log the error along with the line number, helping to identify and correct any potential issues in the JSON file.

TODO

  1. refactor, clean up and fix the code
  2. document everything
  3. scan for downloaded source code for security holding package and create a dataset for them
  4. analyase the source code of these packages
  5. contact NPM for API key

Archive

  1. getLatestPackageVersion from local JSON file when possible to reduce NPM requests
  2. implement scans to scan for vulnerabilities
  3. implement how many times a package is used as a dependancy
  4. implement how many times a package is downloaded

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published