Skip to content

Latest commit

 

History

History
10 lines (7 loc) · 1.19 KB

README.md

File metadata and controls

10 lines (7 loc) · 1.19 KB

continue-dlt-demo

A demo on fine-tuning an LLM on coding data to improve autocomplete predictions for continue.dev within an organization. Using the tools: dlt, Hugging Face and Ollama

Creating the dataset from continue.dev autocomplete suggestions

The continue.dev tool has an autocomplete feature, and stores data in a autocomplete.jsonl file on whether the person using the tool has accepted or rejected the suggestions made. The continue-hf-pipeline.py file contains code for a custom dlt destination that takes this data, converts it to a parquet file and pushes it to a Hugging Face dataset repo, in a format ready for finetuning an LLM.

Finetuning process

I finetuned the starcoder2:3b model using the SFTTrainer from Hugging Face, based on the finetuning code that the creators of that model open-sourced.

I tried finetuning both on the dlt github repository, as well as the autocomplete dataset mentioned above. The code for the finetuning process can be found here: https://colab.research.google.com/drive/1jjb14BDlEeGjRmeXnfm41gDBlTNvsscn?usp=sharing