A Python library with common code (cache, queue, workers logic, processing steps, configuration, utils, logging, exceptions) used by the services and the workers
Set the assets (images and audio files stored locally) environment variables to configure the following aspects:
ASSETS_BASE_URL
: base URL for the assets files. Set accordingly to the datasets-server domain, e.g., https://datasets-server.huggingface.co/assets. Defaults toassets
(TODO: default to an URL).ASSETS_STORAGE_DIRECTORY
: directory where the asset files are stored. Defaults to empty, which means the assets are located in thedatasets_server_assets
subdirectory inside the OS default cache directory.
Set the common environment variables to configure the following aspects:
COMMON_HF_ENDPOINT
: URL of the HuggingFace Hub. Defaults tohttps://huggingface.co
.COMMON_HF_TOKEN
: App Access Token (ask moonlanding administrators to get one, only theread
role is required) to access the gated datasets. Defaults to empty.COMMON_LOG_LEVEL
: log level, amongDEBUG
,INFO
,WARNING
,ERROR
, andCRITICAL
. Defaults toINFO
.
Set environment variables to configure the storage of precomputed API responses in a MongoDB database (the "cache"):
CACHE_MONGO_DATABASE
: name of the database used for storing the cache. Defaults todatasets_server_cache
.CACHE_MONGO_URL
: URL used to connect to the MongoDB server. Defaults tomongodb://localhost:27017
.
Set environment variables to configure the job queues to precompute API responses. The job queues are stored in a MongoDB database.
QUEUE_MAX_JOBS_PER_NAMESPACE
: maximum number of started jobs for the same namespace (the user or organization, before the/
separator in the dataset name, or the "canonical" dataset name if not present). Defaults to 1.QUEUE_MONGO_DATABASE
: name of the database used for storing the queue. Defaults todatasets_server_queue
.QUEUE_MONGO_URL
: URL used to connect to the MongoDB server. Defaults tomongodb://localhost:27017
.
Set environment variables to configure the worker loop that processes the queue.
WORKER_LOOP_MAX_DISK_USAGE_PCT
: maximum disk usage of every storage disk in the list (in percentage) to allow a job to start. Set to 0 to disable the test. Defaults to 90.WORKER_LOOP_MAX_LOAD_PCT
: maximum load of the machine (in percentage: the max between the 1m load and the 5m load divided by the number of CPUs *100) allowed to start a job. Set to 0 to disable the test. Defaults to 70.WORKER_LOOP_MAX_MEMORY_PCT
: maximum memory (RAM + SWAP) usage of the machine (in percentage) allowed to start a job. Set to 0 to disable the test. Defaults to 80.WORKER_LOOP_SLEEP_SECONDS
: wait duration in seconds at each loop iteration before checking if resources are available and processing a job if any is available. Note that the loop doesn't wait just after finishing a job: the next job is immediately processed. Defaults to15
.WORKER_LOOP_STORAGE_PATHS
: comma-separated list of paths to check for disk usage. Defaults to empty.