diff --git a/example/LLM/llama/README.md b/example/LLM/llama/README.md index f75d96df94..b6ef382f28 100644 --- a/example/LLM/llama/README.md +++ b/example/LLM/llama/README.md @@ -24,7 +24,6 @@ Current supports datasets: - For finetune: - openassistant - ## Build Starwhale Runtime ```bash diff --git a/example/llm-finetune/README.md b/example/llm-finetune/README.md new file mode 100644 index 0000000000..53d861797b --- /dev/null +++ b/example/llm-finetune/README.md @@ -0,0 +1,40 @@ +LLM Finetune +====== + +LLM finetune is a state-of-art task for large language model. + +In these examples, we will use Starwhale to finetune a set of LLM base models, evaluate and release models. The demos are in the [starwhale/llm-finetuning](https://cloud.starwhale.cn/projects/401/overview) project of Starwhale Cloud. + +What we learn +------ + +- use the `@starwhale.finetune` decorator to define a finetune handler for Starwhale Model to finish the LLM finetune. +- use the `@starwhale.evaluation.predict` to define a model evaluation for LLM. +- use the `@starwhale.handler` to define a web handler for LLM online evaluation. +- use one Starwhale Runtime to run all models. +- build Starwhale Dataset by the one-line command from the Huggingface, no code. + +Models +------ + +- [Baichuan2](https://github.com/baichuan-inc/Baichuan2): Baichuan 2 is the new generation of open-source large language models launched by Baichuan Intelligent Technology. It was trained on a high-quality corpus with 2.6 trillion tokens. +- [ChatGLM3](https://github.com/THUDM/ChatGLM3): ChatGLM3 is a new generation of pre-trained dialogue models jointly released by Zhipu AI and Tsinghua KEG. ChatGLM3-6B is the open-source model in the ChatGLM3 series. + +Datasets +------ + +- [Belle multiturn chat](https://huggingface.co/datasets/BelleGroup/multiturn_chat_0.8M): The dataset includes approx. 0.8M Chinese multiturn dialogs between human and assistant from BELLE Group. + + ```bash + # build the origin dataset from huggingface + swcli dataset build -hf BelleGroup/multiturn_chat_0.8M --name belle-multiturn-chat + + # build the random 10k items by baichuan2 + swcli dataset build --json https://raw.githubusercontent.com/baichuan-inc/Baichuan2/main/fine-tune/data/belle_chat_ramdon_10k.json --name belle_chat_random_10k + ``` + +- [COIG](https://huggingface.co/datasets/BAAI/COIG): The Chinese Open Instruction Generalist (COIG) project is a harmless, helpful, and diverse set of Chinese instruction corpora. + + ```bash + swcli dataset build -hf BAAI/COIG --name coig + ``` diff --git a/example/llm-finetune/models/baichuan2/.gitignore b/example/llm-finetune/models/baichuan2/.gitignore new file mode 100644 index 0000000000..6e6a98b21c --- /dev/null +++ b/example/llm-finetune/models/baichuan2/.gitignore @@ -0,0 +1,2 @@ +pretrain/ +.cache/ \ No newline at end of file diff --git a/example/llm-finetune/models/baichuan2/.swignore b/example/llm-finetune/models/baichuan2/.swignore new file mode 100644 index 0000000000..988107fe19 --- /dev/null +++ b/example/llm-finetune/models/baichuan2/.swignore @@ -0,0 +1 @@ +.cache/ \ No newline at end of file diff --git a/example/llm-finetune/models/baichuan2/README.md b/example/llm-finetune/models/baichuan2/README.md new file mode 100644 index 0000000000..6429d6566a --- /dev/null +++ b/example/llm-finetune/models/baichuan2/README.md @@ -0,0 +1,57 @@ +Baichuan2 Finetune with Starwhale +====== + +- 🍬 Parameters: 7b +- 🔆 Github: https://github.com/baichuan-inc/Baichuan2 +- 🥦 Author: Baichuan Inc. +- 📝 License: baichuan +- 🐱 Starwhale Example: https://github.com/star-whale/starwhale/tree/main/example/llm-finetune/models/baichuan2 +- 🌽 Introduction: Baichuan 2 is the new generation of large-scale open-source language models launched by Baichuan Intelligence inc..It is trained on a high-quality corpus with 2.6 trillion tokens and has achieved the best performance in authoritative Chinese and English benchmarks of the same size.Baichuan2-7b-chat is chat model of Baichuan 2, which contains 7 billion parameters. + +In this example, we will use Baichuan2-7b-chat as the base model to finetune and evaluate. + +- Evaluate baichuan2-7b-chat model. +- Provide baichuan2-7b-chat multi-turn chat online evaluation. +- Fine-tune baichuan2-7b-chat model with belle-multiturn-chat dataset. +- Evaluate the fine-tuned model. +- Provide the fine-tuned model multi-turn chat online evaluation. +- Fine-tune fine-tuned baichuan2-7n-chat model. + +Because of 4bit quantization technical, the single T4/A10/A100 gpu card is ok for evaluation and finetune. + +Build Starwhale Model +------ + +```bash +python3 build.py +``` + +Run Online Evaluation in the Standalone instance +------ + +```bash +# for source code +swcli model run -w . -m evaluation --handler evaluation:chatbot + +# for model package with runtime +swcli model run --uri baichuan2-7b-chat --handler evaluation:chatbot --runtime llm-finetune +``` + +Run Starwhale Model for evaluation in the Standalone instance +------ + +```bash +swcli dataset cp https://cloud.starwhale.cn/projects/401/datasets/161/versions/223/ . +swcli -vvv model run -w . -m evaluation --handler evaluation:copilot_predict --dataset z-bench-common --dataset-head 3 +``` + +Finetune base model +------ + +```bash +# build finetune dataset from baichuan2 +swcli dataset build --json https://raw.githubusercontent.com/baichuan-inc/Baichuan2/main/fine-tune/data/belle_chat_ramdon_10k.json --name belle_chat_random_10k + +swcli -vvv model run -w . -m finetune --dataset belle_chat_random_10k --handler finetune:lora_finetune +``` + diff --git a/example/llm-finetune/models/baichuan2/build.py b/example/llm-finetune/models/baichuan2/build.py new file mode 100644 index 0000000000..ff3ea5d379 --- /dev/null +++ b/example/llm-finetune/models/baichuan2/build.py @@ -0,0 +1,32 @@ +from huggingface_hub import snapshot_download + +import starwhale + +try: + from .utils import BASE_MODEL_DIR + from .finetune import lora_finetune + from .evaluation import chatbot, copilot_predict +except ImportError: + from utils import BASE_MODEL_DIR + from finetune import lora_finetune + from evaluation import chatbot, copilot_predict + +starwhale.init_logger(3) + + +def build_starwhale_model() -> None: + BASE_MODEL_DIR.mkdir(parents=True, exist_ok=True) + + snapshot_download( + repo_id="baichuan-inc/Baichuan2-7B-Chat", + local_dir=BASE_MODEL_DIR, + ) + + starwhale.model.build( + name="baichuan2-7b-chat", + modules=[copilot_predict, chatbot, lora_finetune], + ) + + +if __name__ == "__main__": + build_starwhale_model() diff --git a/example/llm-finetune/models/baichuan2/evaluation.py b/example/llm-finetune/models/baichuan2/evaluation.py new file mode 100644 index 0000000000..e0bd174218 --- /dev/null +++ b/example/llm-finetune/models/baichuan2/evaluation.py @@ -0,0 +1,139 @@ +from __future__ import annotations + +import os +import typing as t + +import torch +import gradio +from peft import PeftModel +from transformers import AutoTokenizer, BitsAndBytesConfig, AutoModelForCausalLM +from transformers.generation.utils import GenerationConfig + +from starwhale import handler, evaluation + +try: + from .utils import BASE_MODEL_DIR, ADAPTER_MODEL_DIR +except ImportError: + from utils import BASE_MODEL_DIR, ADAPTER_MODEL_DIR + +_g_model = None +_g_tokenizer = None + + +def _load_model_and_tokenizer() -> t.Tuple: + global _g_model, _g_tokenizer + + if _g_model is None: + print(f"load model from {BASE_MODEL_DIR} ...") + _g_model = AutoModelForCausalLM.from_pretrained( + BASE_MODEL_DIR, + device_map="auto", + torch_dtype=torch.float16, + trust_remote_code=True, + load_in_4bit=True, # for lower gpu memory usage + quantization_config=BitsAndBytesConfig( + load_in_4bit=True, + llm_int8_threshold=6.0, + llm_int8_has_fp16_weight=False, + bnb_4bit_compute_dtype=torch.float16, + bnb_4bit_use_double_quant=True, + bnb_4bit_quant_type="nf4", + ), + ) + _g_model.generation_config = GenerationConfig.from_pretrained(BASE_MODEL_DIR) + + if (ADAPTER_MODEL_DIR / "adapter_config.json").exists(): + print(f"load adapter from {ADAPTER_MODEL_DIR} ...") + _g_model = PeftModel.from_pretrained( + _g_model, str(ADAPTER_MODEL_DIR), is_trainable=False + ) + + if _g_tokenizer is None: + print(f"load tokenizer from {BASE_MODEL_DIR} ...") + _g_tokenizer = AutoTokenizer.from_pretrained( + BASE_MODEL_DIR, use_fast=False, trust_remote_code=True + ) + + return _g_model, _g_tokenizer + + +@evaluation.predict( + resources={"nvidia.com/gpu": 1}, + replicas=1, + log_mode="plain", + log_dataset_features=[""], +) +def copilot_predict(data: dict) -> str: + model, tokenizer = _load_model_and_tokenizer() + # support z-bench-common dataset: https://cloud.starwhale.cn/projects/401/datasets/161/versions/223/files + messages = [{"role": "user", "content": data["prompt"]}] + + config_dict = model.generation_config.to_dict() + # TODO: use arguments + config_dict.update( + max_new_tokens=int(os.environ.get("MAX_MODEL_LENGTH", 512)), + do_sample=True, + temperature=float(os.environ.get("TEMPERATURE", 0.7)), + top_p=float(os.environ.get("TOP_P", 0.9)), + top_k=int(os.environ.get("TOP_K", 30)), + repetition_penalty=float(os.environ.get("REPETITION_PENALTY", 1.3)), + ) + return model.chat( + tokenizer, + messages=messages, + generation_config=GenerationConfig.from_dict(config_dict), + ) + + +@handler(expose=17860) +def chatbot() -> None: + with gradio.Blocks() as server: + chatbot = gradio.Chatbot(height=800) + msg = gradio.Textbox(label="chat", show_label=True) + _max_gen_len = gradio.Slider( + 0, 1024, value=256, step=1.0, label="Max Gen Len", interactive=True + ) + _top_p = gradio.Slider( + 0, 1, value=0.7, step=0.01, label="Top P", interactive=True + ) + _temperature = gradio.Slider( + 0, 1, value=0.95, step=0.01, label="Temperature", interactive=True + ) + gradio.ClearButton([msg, chatbot]) + + def response( + from_user: str, + chat_history: t.List, + max_gen_len: int, + top_p: float, + temperature: float, + ) -> t.Tuple[str, t.List]: + dialog = [] + for _user, _assistant in chat_history: + dialog.append({"role": "user", "content": _user}) + if _assistant: + dialog.append({"role": "assistant", "content": _assistant}) + dialog.append({"role": "user", "content": from_user}) + + model, tokenizer = _load_model_and_tokenizer() + from_assistant = model.chat( + tokenizer, + messages=dialog, + generation_config=GenerationConfig( + max_new_tokens=max_gen_len, + do_sample=True, + temperature=temperature, + top_p=top_p, + ), + ) + + chat_history.append((from_user, from_assistant)) + return "", chat_history + + msg.submit( + response, + [msg, chatbot, _max_gen_len, _top_p, _temperature], + [msg, chatbot], + ) + + server.launch(server_name="0.0.0.0", server_port=17860, share=True) diff --git a/example/llm-finetune/models/baichuan2/finetune.py b/example/llm-finetune/models/baichuan2/finetune.py new file mode 100644 index 0000000000..b6f0bb3d70 --- /dev/null +++ b/example/llm-finetune/models/baichuan2/finetune.py @@ -0,0 +1,188 @@ +from __future__ import annotations + +import os +import typing as t +from dataclasses import dataclass + +import torch +from peft import ( + TaskType, + PeftModel, + LoraConfig, + get_peft_model, + prepare_model_for_kbit_training, +) +from transformers import ( + Trainer, + AutoTokenizer, + BitsAndBytesConfig, + PreTrainedTokenizer, + AutoModelForCausalLM, +) +from transformers.training_args import TrainingArguments + +from starwhale import dataset, finetune + +try: + from .utils import BASE_MODEL_DIR, ADAPTER_MODEL_DIR + from .evaluation import copilot_predict +except ImportError: + from utils import BASE_MODEL_DIR, ADAPTER_MODEL_DIR + from evaluation import copilot_predict + +# fork from https://github.com/baichuan-inc/Baichuan2/blob/main/fine-tune/fine-tune.py + +torch.backends.cuda.matmul.allow_tf32 = True + + +@dataclass +class DataCollatorForCausalLM: + tokenizer: PreTrainedTokenizer + source_max_len: int + target_max_len: int + + user_tokens = [195] + assistant_tokens = [196] + ignore_index = -100 + + def __call__(self, example: t.Dict) -> t.Dict: + input_ids = [] + labels = [] + + for message in example["conversations"]: + from_ = message["from"] + value = message["value"] + value_ids = self.tokenizer.encode(value) + + if from_ == "human": + input_ids += self.user_tokens + value_ids + labels += [self.tokenizer.eos_token_id] + [self.ignore_index] * len( + value_ids + ) + else: + input_ids += self.assistant_tokens + value_ids + labels += [self.ignore_index] + value_ids + input_ids.append(self.tokenizer.eos_token_id) + labels.append(self.tokenizer.eos_token_id) + input_ids = input_ids[: self.tokenizer.model_max_length] + labels = labels[: self.tokenizer.model_max_length] + input_ids += [self.tokenizer.pad_token_id] * ( + self.tokenizer.model_max_length - len(input_ids) + ) + labels += [self.ignore_index] * (self.tokenizer.model_max_length - len(labels)) + input_ids = torch.LongTensor(input_ids) + labels = torch.LongTensor(labels) + attention_mask = input_ids.ne(self.tokenizer.pad_token_id) + return { + "input_ids": input_ids, + "labels": labels, + "attention_mask": attention_mask, + } + + +@finetune( + resources={"nvidia.com/gpu": 1}, + require_train_datasets=True, + model_modules=[copilot_predict], +) +def lora_finetune(train_datasets: t.List[str]) -> None: + # TODO: support multi train datasets + train_dataset = train_datasets[0] + if isinstance(train_dataset, str): + train_dataset = dataset(train_dataset, readonly=True) + + model = AutoModelForCausalLM.from_pretrained( + BASE_MODEL_DIR, + trust_remote_code=True, + torch_dtype=torch.float16, + device_map="auto", # for multi-gpus + load_in_4bit=True, # for lower gpu memory usage + quantization_config=BitsAndBytesConfig( + load_in_4bit=True, + llm_int8_threshold=6.0, + llm_int8_has_fp16_weight=False, + bnb_4bit_compute_dtype=torch.float16, + bnb_4bit_use_double_quant=True, + bnb_4bit_quant_type="nf4", + ), + ) + model.model_parallel = True + model.is_parallelizable = True + model.config.torch_dtype = torch.float16 + model = prepare_model_for_kbit_training(model) + + # support finetune a lora-finetuned model + if (ADAPTER_MODEL_DIR / "adapter_config.json").exists(): + print(f"loading adapters {ADAPTER_MODEL_DIR}...") + model = PeftModel.from_pretrained( + model, + str(ADAPTER_MODEL_DIR), + is_trainable=True, + ) + else: + print("init model with peft lora config...") + peft_config = LoraConfig( + task_type=TaskType.CAUSAL_LM, + target_modules=["W_pack"], + inference_mode=False, + r=64, + lora_alpha=16, + bias="none", + lora_dropout=0.05, + ) + model = get_peft_model(model, peft_config) + model.print_trainable_parameters() + + for name, module in model.named_modules(): + if "norm" in name: + module = module.to(torch.float32) + + tokenizer = AutoTokenizer.from_pretrained( + BASE_MODEL_DIR, + use_fast=False, + trust_remote_code=True, + model_max_length=int(os.environ.get("MODEL_MAX_LENGTH", 512)), + ) + + # TODO: support finetune arguments + # copy from https://github.com/baichuan-inc/Baichuan2/blob/main/README.md#%E5%8D%95%E6%9C%BA%E8%AE%AD%E7%BB%83 + train_args = TrainingArguments( + output_dir=str(ADAPTER_MODEL_DIR), + optim="adamw_torch", + report_to="none", + num_train_epochs=int(os.environ.get("NUM_TRAIN_EPOCHS", 2)), + max_steps=int(os.environ.get("MAX_STEPS", 18)), + per_device_train_batch_size=2, # more batch size will cause OOM + gradient_accumulation_steps=16, + save_strategy="no", # no need to save checkpoint for finetune + learning_rate=2e-5, + lr_scheduler_type="constant", + adam_beta1=0.9, + adam_beta2=0.98, + adam_epsilon=1e-8, + max_grad_norm=1.0, + weight_decay=1e-4, + warmup_ratio=0.0, + logging_steps=10, + gradient_checkpointing=False, + remove_unused_columns=False, + ) + + # TODO: support deepspeed + + trainer = Trainer( + model=model, + tokenizer=tokenizer, + args=train_args, + train_dataset=train_dataset.to_pytorch( + transform=DataCollatorForCausalLM( + tokenizer=tokenizer, source_max_len=16, target_max_len=512 + ) + ), + ) + + print("Starting model training...") + train_result = trainer.train(resume_from_checkpoint=None) + print(train_result.metrics) + trainer.save_state() + trainer.save_model() diff --git a/example/llm-finetune/models/baichuan2/utils.py b/example/llm-finetune/models/baichuan2/utils.py new file mode 100644 index 0000000000..ea0c70176c --- /dev/null +++ b/example/llm-finetune/models/baichuan2/utils.py @@ -0,0 +1,8 @@ +from __future__ import annotations + +from pathlib import Path + +ROOT_DIR = Path(__file__).parent +BASE_MODEL_DIR = ROOT_DIR / "pretrain" / "base" +ADAPTER_MODEL_DIR = ROOT_DIR / "pretrain" / "adapter" +CACHE_DIR = ROOT_DIR / ".cache" diff --git a/example/llm-finetune/runtime/requirements.txt b/example/llm-finetune/runtime/requirements.txt new file mode 100644 index 0000000000..dcb5ce923d --- /dev/null +++ b/example/llm-finetune/runtime/requirements.txt @@ -0,0 +1,14 @@ +transformers==4.31.0 # baichuan2 works with this version, the latest version(4.35.1) will raise AttributeError in AutoTokenizer.from_pretrained +transformers-stream-generator==0.0.4 +torch==2.0.1 +accelerate==0.24.0 +gradio==3.50.2 +deepspeed==0.12.2 +PEFT==0.6.1 +xformers==0.0.22 +cpm_kernels +bitsandbytes +colorama +tokenizers +sentencepiece +git+https://github.com/star-whale/starwhale.git@bebd503#subdirectory=client&egg=starwhale diff --git a/example/llm-finetune/runtime/runtime.yaml b/example/llm-finetune/runtime/runtime.yaml new file mode 100644 index 0000000000..8af8d9ebaa --- /dev/null +++ b/example/llm-finetune/runtime/runtime.yaml @@ -0,0 +1,10 @@ +name: llm-finetune +mode: venv +environment: + arch: noarch + os: ubuntu:20.04 + cuda: "11.7" + python: "3.10" + starwhale_version: git+https://github.com/star-whale/starwhale.git@bebd503#subdirectory=client&egg=starwhale +dependencies: + - requirements.txt