GPU Testing on PrimeIntellect
Welcome to the PrimeIntellect GPU testing documentation! PrimeIntellect is an innovative aggregator platform that simplifies the use of virtual machines (VMs) equipped with PyTorch images, enabling users to efficiently set up and execute deep learning models across various GPU configurations.
Pruning aims to make models smaller and more efficient by reducing the number of operational elements within them. The key methods include:
- Dropping Layers: Each layer in a neural network processes data differently, contributing to its ability to learn distinct aspects of the data. By selectively removing layers, the model can become leaner without significantly impacting performance.
- Attention Heads: These are components of transformer models that help in parallel processing and understanding the context better rather than translating on a word-by-word basis. Pruning attention heads can streamline the model while maintaining its contextual awareness.
- Embedding Channels: These are vector representations of data. Reducing the number of embedding channels can decrease the model's size and simplify the inputs it needs to process.
Distillation is a technique used to transfer knowledge from a large language model (LLM) to a smaller language model (SLM). This method involves training the smaller model to mimic the behavior and predictions of the larger one, effectively condensing the knowledge without needing the same computational resources.
The attached image provides a detailed overview of our testing metrics across different GPU models. For each GPU configuration, you can observe key parameters such as CPU, memory specifications, disk size, VRAM, and performance metrics such as time, RAM usage, accuracy, and cost associated with running specific deep learning models like Nvidia Minitron 8B, Mistral v0.3 (7B), Llama 3.1 (8B), and Gemma 2 (9B). This data is crucial for evaluating the efficiency and cost-effectiveness of deploying models on different GPU setups provided by PrimeIntellect.
I conducted an experiment where I performed inference on the MMLU (Massive Multitask Language Understanding) dataset using various models. The goal was to evaluate the accuracy of each model across a wide range of subjects. After obtaining the inference results, I applied a natural language processing (NLP) algorithm to compare the AI-generated answers with the expected correct answers from the dataset.
The comparison was done using a fuzzy string matching algorithm, which assesses the similarity between the AI's answer and the correct answer.
This function works by first preprocessing the answers to ensure consistency in comparison. It then calculates a similarity ratio using the fuzzy string matching technique. The function checks whether the AI's answer contains the correct answer, or vice versa, and whether the similarity ratio exceeds a threshold of 80%. Additionally, it verifies if the correct answer index is present in the AI's response.
By applying this algorithm, I was able to accurately determine the correctness of the AI-generated answers, allowing for a more nuanced evaluation of the model's performance on the MMLU dataset.
Subject/model | Minitron | Mistral | Llama | Gemma |
---|---|---|---|---|
abstract_algebra | 0.364 | 0.545 | 0.455 | 0.273 |
anatomy | 0.071 | 0.357 | 0.143 | 0.571 |
astronomy | 0.063 | 0.313 | 0.063 | 0.375 |
business_ethics | 0.091 | 0.727 | 0.091 | 0.273 |
clinical_knowledge | 0.069 | 0.414 | 0.241 | 0.345 |
college_biology | 0.063 | 0.375 | 0.250 | 0.500 |
college_chemistry | 0.250 | 0.250 | 0.250 | 0.625 |
college_computer_science | 0.091 | 0.273 | 0.182 | 0.273 |
college_mathematics | 0.091 | 0.273 | 0.273 | 0.182 |
college_medicine | 0.136 | 0.409 | 0.273 | 0.409 |
college_physics | 0.182 | 0.273 | 0.364 | 0.273 |
computer_security | 0.091 | 0.455 | 0.273 | 0.364 |
conceptual_physics | 0.231 | 0.346 | 0.385 | 0.423 |
econometrics | 0.083 | 0.667 | 0.333 | 0.333 |
electrical_engineering | 0.188 | 0.375 | 0.313 | 0.375 |
elementary_mathematics | 0.195 | 0.293 | 0.317 | 0.341 |
formal_logic | 0.143 | 0.214 | 0.286 | 0.214 |
global_facts | 0.100 | 0.200 | 0.500 | 0.200 |
high_school_biology | 0.125 | 0.438 | 0.156 | 0.313 |
high_school_chemistry | 0.091 | 0.182 | 0.091 | 0.182 |
high_school_computer_science | 0.111 | 0.778 | 0.111 | 0.111 |
high_school_european_history | 0.056 | 0.556 | 0.111 | 0.556 |
high_school_geography | 0.136 | 0.455 | 0.409 | 0.545 |
high_school_government_and_politics | 0.238 | 0.476 | 0.381 | 0.476 |
high_school_macroeconomics | 0.070 | 0.512 | 0.186 | 0.349 |
high_school_mathematics | 0.069 | 0.448 | 0.414 | 0.241 |
high_school_microeconomics | 0.115 | 0.385 | 0.115 | 0.269 |
high_school_physics | 0.000 | 0.235 | 0.000 | 0.118 |
high_school_psychology | 0.200 | 0.617 | 0.367 | 0.483 |
high_school_statistics | 0.043 | 0.435 | 0.261 | 0.391 |
high_school_us_history | 0.045 | 0.545 | 0.136 | 0.409 |
high_school_world_history | 0.038 | 0.462 | 0.192 | 0.231 |
human_aging | 0.087 | 0.348 | 0.391 | 0.348 |
human_sexuality | 0.083 | 0.250 | 0.250 | 0.250 |
international_law | 0.077 | 0.462 | 0.077 | 0.231 |
jurisprudence | 0.091 | 0.364 | 0.091 | 0.182 |
logical_fallacies | 0.056 | 0.500 | 0.278 | 0.278 |
machine_learning | 0.364 | 0.545 | 0.273 | 0.455 |
management | 0.182 | 0.455 | 0.455 | 0.636 |
marketing | 0.280 | 0.560 | 0.440 | 0.640 |
medical_genetics | 0.091 | 0.636 | 0.182 | 0.455 |
miscellaneous | 0.430 | 0.628 | 0.593 | 0.500 |
moral_disputes | 0.158 | 0.368 | 0.132 | 0.237 |
moral_scenarios | 0.130 | 0.380 | 0.090 | 0.360 |
nutrition | 0.152 | 0.273 | 0.242 | 0.394 |
philosophy | 0.088 | 0.382 | 0.206 | 0.471 |
prehistory | 0.029 | 0.343 | 0.114 | 0.343 |
professional_accounting | 0.065 | 0.258 | 0.065 | 0.194 |
professional_law | 0.012 | 0.165 | 0.076 | 0.271 |
professional_medicine | 0.065 | 0.419 | 0.161 | 0.290 |
professional_psychology | 0.087 | 0.493 | 0.116 | 0.333 |
public_relations | 0.083 | 0.500 | 0.167 | 0.250 |
security_studies | 0.037 | 0.222 | 0.037 | 0.296 |
sociology | 0.045 | 0.409 | 0.045 | 0.409 |
us_foreign_policy | 0.091 | 0.455 | 0.000 | 0.636 |
virology | 0.222 | 0.389 | 0.278 | 0.167 |
world_religions | 0.316 | 0.632 | 0.579 | 0.579 |
Moyenne | 12.558 | 41.604 | 23.258 | 35.484 |
After launching an instance on PrimeIntellect, follow these steps to connect to your VM:
- Download the Private Key: Once your VM is ready, download the private key provided by PrimeIntellect. This key is necessary to securely connect to your VM.
- Change Permissions on the Private Key: Before using the key, you must change its permissions to ensure that it is secure. Open a terminal on your computer and navigate to the directory where you downloaded the key. Then, execute the following command:
chmod 400 [your-key-name].pem
- Connect to the VM: With the key's permissions set, you're ready to connect to the VM. In the same terminal window, use the connection command provided by PrimeIntellect. It will look something like this:
ssh -i [your-key-name].pem ubuntu@[vm-ip-address]
Replace [your-key-name] with the name of your key file and [vm-ip-address] with the IP address provided for your VM.
Once you've connected to a VM, setting up and running the test scripts is straightforward. Follow these steps:
- Clone the Repository
- Open your VM's terminal.
- Execute the following command to clone the repository containing the test scripts:
git clone https://github.com/Hugo-SEQUIER/prime-intellect-test.git
- Prepare the Scripts
- Ensure that all scripts in the cloned repository are executable by running:
find prime-intellect-test -type f -name "*.sh" -exec chmod +x {} \;
- Ensure that all scripts in the cloned repository are executable by running:
Now all you have to do is navigate the project and run the training script.
cd prime-intellect-test cd llama_8b ./training.sh
By following this guide, you can leverage PrimeIntellect's VMs to perform comprehensive benchmarks on different GPUs using the pre-configured PyTorch images. This will aid in making informed decisions about which GPU configuration best suits your deep learning tasks in terms of performance and cost efficiency.