Large Language Models as Realistic Microservice Trace Generators

Dependencies

We use python language where dependencies can be found in trace_gen/requirements.txt. Also, we install our codes as a python package we use poetry. Use the following commands:

conda create -n tracellm python=3.8 -y
conda activate tracellm
pip install poetry
# In the root directory
poetry install
cd trace_gen
pip install -r requirements.txt

Data Preprocessing

We use CallGraph data in Alibaba microservice v2022 traces as our training data. Fetch the first 20 files of CallGraph using the scripts in Alibaba's repo and preprocess the data using our scripts.

To convert the separate API calls into call graphs and remove redundant call graphs, use the following command. Make sure to change file directories before you execute the command.

> python trace_gen/preprocess/trace_to_training_data.py
> python trace_gen/preprocess/remove_redundant_training_data.py

To collect call graph stats required to generate instructions, run the following commands:

> python trace_gen/preprocess/trace_to_cg_stats.py
> python trace_gen/preprocess/merge_cg_stats.py

To convert the call graphs to text representations, run the following command. Make sure to change file directories before you execute the command.

Also, make sure to set the task_type correctly depending on your use cases:

TraceGenTaskType.graph_gen_non_recursive: tabular format
TraceGenTaskType.graph_gen_recursive: recursive format
TraceGenTaskType.graph_gen: instruction-tuning

> python trace/preprocess/training_data_to_text_representations.py

Data Examples

We include part of training datasets in the dataset_examples folder:

tabular_dataset.txt: dataset in tabular format
recursive_dataset.txt: dataset in recursive format
recursive_instruction_dataset.txt: dataset in recursive format with instructions

Training

Model training scripts can be found in trace_gen/train.

pretraining.py: For pretraining LLaMA-7B with trace data.
sft.py: For supervised-fine-tuning the model with instruction datasets.

Generation

To get the accuracy report, follow the script in trace_gen/generate/run_accuracy_eval.sh. To run the script, prompt files and lora adapaters after training are required. We provide example lora adapters in the checkpoints directory.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
checkpoints/llama-2-7b		checkpoints/llama-2-7b
dataset_examples		dataset_examples
trace_gen		trace_gen
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Large Language Models as Realistic Microservice Trace Generators

Dependencies

Data Preprocessing

Data Examples

Training

Generation

About

Releases

Packages

Languages

ldos-project/TraceLLM

Folders and files

Latest commit

History

Repository files navigation

Large Language Models as Realistic Microservice Trace Generators

Dependencies

Data Preprocessing

Data Examples

Training

Generation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages