diff --git a/example/LLM/llama/README.md b/example/LLM/llama/README.md
index f75d96df94..b6ef382f28 100644
--- a/example/LLM/llama/README.md
+++ b/example/LLM/llama/README.md
@@ -24,7 +24,6 @@ Current supports datasets:
 - For finetune:
   - openassistant
 
-
 ## Build Starwhale Runtime
 
 ```bash
diff --git a/example/llm-finetune/README.md b/example/llm-finetune/README.md
new file mode 100644
index 0000000000..53d861797b
--- /dev/null
+++ b/example/llm-finetune/README.md
@@ -0,0 +1,40 @@
+LLM Finetune
+======
+
+LLM finetune is a state-of-art task for large language model.
+
+In these examples, we will use Starwhale to finetune a set of LLM base models, evaluate and release models. The demos are in the [starwhale/llm-finetuning](https://cloud.starwhale.cn/projects/401/overview) project of Starwhale Cloud.
+
+What we learn
+------
+
+- use the `@starwhale.finetune` decorator to define a finetune handler for Starwhale Model to finish the LLM finetune.
+- use the `@starwhale.evaluation.predict` to define a model evaluation for LLM.
+- use the `@starwhale.handler` to define a web handler for LLM online evaluation.
+- use one Starwhale Runtime to run all models.
+- build Starwhale Dataset by the one-line command from the Huggingface, no code.
+
+Models
+------
+
+- [Baichuan2](https://github.com/baichuan-inc/Baichuan2): Baichuan 2 is the new generation of open-source large language models launched by Baichuan Intelligent Technology. It was trained on a high-quality corpus with 2.6 trillion tokens.
+- [ChatGLM3](https://github.com/THUDM/ChatGLM3): ChatGLM3 is a new generation of pre-trained dialogue models jointly released by Zhipu AI and Tsinghua KEG. ChatGLM3-6B is the open-source model in the ChatGLM3 series.
+
+Datasets
+------
+
+- [Belle multiturn chat](https://huggingface.co/datasets/BelleGroup/multiturn_chat_0.8M): The dataset includes approx. 0.8M Chinese multiturn dialogs between human and assistant from BELLE Group.
+
+    ```bash
+    # build the origin dataset from huggingface
+    swcli dataset build -hf BelleGroup/multiturn_chat_0.8M --name belle-multiturn-chat
+
+    # build the random 10k items by baichuan2
+    swcli dataset build --json https://raw.githubusercontent.com/baichuan-inc/Baichuan2/main/fine-tune/data/belle_chat_ramdon_10k.json --name belle_chat_random_10k
+    ```
+
+- [COIG](https://huggingface.co/datasets/BAAI/COIG): The Chinese Open Instruction Generalist (COIG) project is a harmless, helpful, and diverse set of Chinese instruction corpora.
+
+    ```bash
+    swcli dataset build -hf BAAI/COIG --name coig
+    ```
diff --git a/example/llm-finetune/models/baichuan2/.gitignore b/example/llm-finetune/models/baichuan2/.gitignore
new file mode 100644
index 0000000000..6e6a98b21c
--- /dev/null
+++ b/example/llm-finetune/models/baichuan2/.gitignore
@@ -0,0 +1,2 @@
+pretrain/
+.cache/
\ No newline at end of file
diff --git a/example/llm-finetune/models/baichuan2/.swignore b/example/llm-finetune/models/baichuan2/.swignore
new file mode 100644
index 0000000000..988107fe19
--- /dev/null
+++ b/example/llm-finetune/models/baichuan2/.swignore
@@ -0,0 +1 @@
+.cache/
\ No newline at end of file
diff --git a/example/llm-finetune/models/baichuan2/README.md b/example/llm-finetune/models/baichuan2/README.md
new file mode 100644
index 0000000000..6429d6566a
--- /dev/null
+++ b/example/llm-finetune/models/baichuan2/README.md
@@ -0,0 +1,57 @@
+Baichuan2 Finetune with Starwhale
+======
+
+- 🍬 Parameters: 7b
+- 🔆 Github: https://github.com/baichuan-inc/Baichuan2
+- 🥦 Author: Baichuan Inc.
+- 📝 License: baichuan
+- 🐱 Starwhale Example: https://github.com/star-whale/starwhale/tree/main/example/llm-finetune/models/baichuan2
+- 🌽 Introduction: Baichuan 2 is the new generation of large-scale open-source language models launched by Baichuan Intelligence inc..It is trained on a high-quality corpus with 2.6 trillion tokens and has achieved the best performance in authoritative Chinese and English benchmarks of the same size.Baichuan2-7b-chat is chat model of Baichuan 2, which contains 7 billion parameters.
+
+In this example, we will use Baichuan2-7b-chat as the base model to finetune and evaluate.
+
+- Evaluate baichuan2-7b-chat model.
+- Provide baichuan2-7b-chat multi-turn chat online evaluation.
+- Fine-tune baichuan2-7b-chat model with belle-multiturn-chat dataset.
+- Evaluate the fine-tuned model.
+- Provide the fine-tuned model multi-turn chat online evaluation.
+- Fine-tune fine-tuned baichuan2-7n-chat model.
+
+Because of 4bit quantization technical, the single T4/A10/A100 gpu card is ok for evaluation and finetune.
+
+Build Starwhale Model
+------
+
+```bash
+python3 build.py
+```
+
+Run Online Evaluation in the Standalone instance
+------
+
+```bash
+# for source code
+swcli model run -w . -m evaluation --handler evaluation:chatbot
+
+# for model package with runtime
+swcli model run --uri baichuan2-7b-chat --handler evaluation:chatbot --runtime llm-finetune
+```
+
+Run Starwhale Model for evaluation in the Standalone instance
+------
+
+```bash
+swcli dataset cp https://cloud.starwhale.cn/projects/401/datasets/161/versions/223/ .
+swcli -vvv model run -w . -m evaluation --handler evaluation:copilot_predict --dataset z-bench-common --dataset-head 3
+```
+
+Finetune base model
+------
+
+```bash
+# build finetune dataset from baichuan2
+swcli dataset build --json https://raw.githubusercontent.com/baichuan-inc/Baichuan2/main/fine-tune/data/belle_chat_ramdon_10k.json --name belle_chat_random_10k
+
+swcli -vvv model run -w . -m finetune --dataset belle_chat_random_10k --handler finetune:lora_finetune
+```
+
diff --git a/example/llm-finetune/models/baichuan2/build.py b/example/llm-finetune/models/baichuan2/build.py
new file mode 100644
index 0000000000..ff3ea5d379
--- /dev/null
+++ b/example/llm-finetune/models/baichuan2/build.py
@@ -0,0 +1,32 @@
+from huggingface_hub import snapshot_download
+
+import starwhale
+
+try:
+    from .utils import BASE_MODEL_DIR
+    from .finetune import lora_finetune
+    from .evaluation import chatbot, copilot_predict
+except ImportError:
+    from utils import BASE_MODEL_DIR
+    from finetune import lora_finetune
+    from evaluation import chatbot, copilot_predict
+
+starwhale.init_logger(3)
+
+
+def build_starwhale_model() -> None:
+    BASE_MODEL_DIR.mkdir(parents=True, exist_ok=True)
+
+    snapshot_download(
+        repo_id="baichuan-inc/Baichuan2-7B-Chat",
+        local_dir=BASE_MODEL_DIR,
+    )
+
+    starwhale.model.build(
+        name="baichuan2-7b-chat",
+        modules=[copilot_predict, chatbot, lora_finetune],
+    )
+
+
+if __name__ == "__main__":
+    build_starwhale_model()
diff --git a/example/llm-finetune/models/baichuan2/evaluation.py b/example/llm-finetune/models/baichuan2/evaluation.py
new file mode 100644
index 0000000000..e0bd174218
--- /dev/null
+++ b/example/llm-finetune/models/baichuan2/evaluation.py
@@ -0,0 +1,139 @@
+from __future__ import annotations
+
+import os
+import typing as t
+
+import torch
+import gradio
+from peft import PeftModel
+from transformers import AutoTokenizer, BitsAndBytesConfig, AutoModelForCausalLM
+from transformers.generation.utils import GenerationConfig
+
+from starwhale import handler, evaluation
+
+try:
+    from .utils import BASE_MODEL_DIR, ADAPTER_MODEL_DIR
+except ImportError:
+    from utils import BASE_MODEL_DIR, ADAPTER_MODEL_DIR
+
+_g_model = None
+_g_tokenizer = None
+
+
+def _load_model_and_tokenizer() -> t.Tuple:
+    global _g_model, _g_tokenizer
+
+    if _g_model is None:
+        print(f"load model from {BASE_MODEL_DIR} ...")
+        _g_model = AutoModelForCausalLM.from_pretrained(
+            BASE_MODEL_DIR,
+            device_map="auto",
+            torch_dtype=torch.float16,
+            trust_remote_code=True,
+            load_in_4bit=True,  # for lower gpu memory usage
+            quantization_config=BitsAndBytesConfig(
+                load_in_4bit=True,
+                llm_int8_threshold=6.0,
+                llm_int8_has_fp16_weight=False,
+                bnb_4bit_compute_dtype=torch.float16,
+                bnb_4bit_use_double_quant=True,
+                bnb_4bit_quant_type="nf4",
+            ),
+        )
+        _g_model.generation_config = GenerationConfig.from_pretrained(BASE_MODEL_DIR)
+
+        if (ADAPTER_MODEL_DIR / "adapter_config.json").exists():
+            print(f"load adapter from {ADAPTER_MODEL_DIR} ...")
+            _g_model = PeftModel.from_pretrained(
+                _g_model, str(ADAPTER_MODEL_DIR), is_trainable=False
+            )
+
+    if _g_tokenizer is None:
+        print(f"load tokenizer from {BASE_MODEL_DIR} ...")
+        _g_tokenizer = AutoTokenizer.from_pretrained(
+            BASE_MODEL_DIR, use_fast=False, trust_remote_code=True
+        )
+
+    return _g_model, _g_tokenizer
+
+
+@evaluation.predict(
+    resources={"nvidia.com/gpu": 1},
+    replicas=1,
+    log_mode="plain",
+    log_dataset_features=[""],
+)
+def copilot_predict(data: dict) -> str:
+    model, tokenizer = _load_model_and_tokenizer()
+    # support z-bench-common dataset: https://cloud.starwhale.cn/projects/401/datasets/161/versions/223/files
+    messages = [{"role": "user", "content": data["prompt"]}]
+
+    config_dict = model.generation_config.to_dict()
+    # TODO: use arguments
+    config_dict.update(
+        max_new_tokens=int(os.environ.get("MAX_MODEL_LENGTH", 512)),
+        do_sample=True,
+        temperature=float(os.environ.get("TEMPERATURE", 0.7)),
+        top_p=float(os.environ.get("TOP_P", 0.9)),
+        top_k=int(os.environ.get("TOP_K", 30)),
+        repetition_penalty=float(os.environ.get("REPETITION_PENALTY", 1.3)),
+    )
+    return model.chat(
+        tokenizer,
+        messages=messages,
+        generation_config=GenerationConfig.from_dict(config_dict),
+    )
+
+
+@handler(expose=17860)
+def chatbot() -> None:
+    with gradio.Blocks() as server:
+        chatbot = gradio.Chatbot(height=800)
+        msg = gradio.Textbox(label="chat", show_label=True)
+        _max_gen_len = gradio.Slider(
+            0, 1024, value=256, step=1.0, label="Max Gen Len", interactive=True
+        )
+        _top_p = gradio.Slider(
+            0, 1, value=0.7, step=0.01, label="Top P", interactive=True
+        )
+        _temperature = gradio.Slider(
+            0, 1, value=0.95, step=0.01, label="Temperature", interactive=True
+        )
+        gradio.ClearButton([msg, chatbot])
+
+        def response(
+            from_user: str,
+            chat_history: t.List,
+            max_gen_len: int,
+            top_p: float,
+            temperature: float,
+        ) -> t.Tuple[str, t.List]:
+            dialog = []
+            for _user, _assistant in chat_history:
+                dialog.append({"role": "user", "content": _user})
+                if _assistant:
+                    dialog.append({"role": "assistant", "content": _assistant})
+            dialog.append({"role": "user", "content": from_user})
+
+            model, tokenizer = _load_model_and_tokenizer()
+            from_assistant = model.chat(
+                tokenizer,
+                messages=dialog,
+                generation_config=GenerationConfig(
+                    max_new_tokens=max_gen_len,
+                    do_sample=True,
+                    temperature=temperature,
+                    top_p=top_p,
+                ),
+            )
+
+            chat_history.append((from_user, from_assistant))
+            return "", chat_history
+
+        msg.submit(
+            response,
+            [msg, chatbot, _max_gen_len, _top_p, _temperature],
+            [msg, chatbot],
+        )
+
+    server.launch(server_name="0.0.0.0", server_port=17860, share=True)
diff --git a/example/llm-finetune/models/baichuan2/finetune.py b/example/llm-finetune/models/baichuan2/finetune.py
new file mode 100644
index 0000000000..b6f0bb3d70
--- /dev/null
+++ b/example/llm-finetune/models/baichuan2/finetune.py
@@ -0,0 +1,188 @@
+from __future__ import annotations
+
+import os
+import typing as t
+from dataclasses import dataclass
+
+import torch
+from peft import (
+    TaskType,
+    PeftModel,
+    LoraConfig,
+    get_peft_model,
+    prepare_model_for_kbit_training,
+)
+from transformers import (
+    Trainer,
+    AutoTokenizer,
+    BitsAndBytesConfig,
+    PreTrainedTokenizer,
+    AutoModelForCausalLM,
+)
+from transformers.training_args import TrainingArguments
+
+from starwhale import dataset, finetune
+
+try:
+    from .utils import BASE_MODEL_DIR, ADAPTER_MODEL_DIR
+    from .evaluation import copilot_predict
+except ImportError:
+    from utils import BASE_MODEL_DIR, ADAPTER_MODEL_DIR
+    from evaluation import copilot_predict
+
+# fork from https://github.com/baichuan-inc/Baichuan2/blob/main/fine-tune/fine-tune.py
+
+torch.backends.cuda.matmul.allow_tf32 = True
+
+
+@dataclass
+class DataCollatorForCausalLM:
+    tokenizer: PreTrainedTokenizer
+    source_max_len: int
+    target_max_len: int
+
+    user_tokens = [195]
+    assistant_tokens = [196]
+    ignore_index = -100
+
+    def __call__(self, example: t.Dict) -> t.Dict:
+        input_ids = []
+        labels = []
+
+        for message in example["conversations"]:
+            from_ = message["from"]
+            value = message["value"]
+            value_ids = self.tokenizer.encode(value)
+
+            if from_ == "human":
+                input_ids += self.user_tokens + value_ids
+                labels += [self.tokenizer.eos_token_id] + [self.ignore_index] * len(
+                    value_ids
+                )
+            else:
+                input_ids += self.assistant_tokens + value_ids
+                labels += [self.ignore_index] + value_ids
+        input_ids.append(self.tokenizer.eos_token_id)
+        labels.append(self.tokenizer.eos_token_id)
+        input_ids = input_ids[: self.tokenizer.model_max_length]
+        labels = labels[: self.tokenizer.model_max_length]
+        input_ids += [self.tokenizer.pad_token_id] * (
+            self.tokenizer.model_max_length - len(input_ids)
+        )
+        labels += [self.ignore_index] * (self.tokenizer.model_max_length - len(labels))
+        input_ids = torch.LongTensor(input_ids)
+        labels = torch.LongTensor(labels)
+        attention_mask = input_ids.ne(self.tokenizer.pad_token_id)
+        return {
+            "input_ids": input_ids,
+            "labels": labels,
+            "attention_mask": attention_mask,
+        }
+
+
+@finetune(
+    resources={"nvidia.com/gpu": 1},
+    require_train_datasets=True,
+    model_modules=[copilot_predict],
+)
+def lora_finetune(train_datasets: t.List[str]) -> None:
+    # TODO: support multi train datasets
+    train_dataset = train_datasets[0]
+    if isinstance(train_dataset, str):
+        train_dataset = dataset(train_dataset, readonly=True)
+
+    model = AutoModelForCausalLM.from_pretrained(
+        BASE_MODEL_DIR,
+        trust_remote_code=True,
+        torch_dtype=torch.float16,
+        device_map="auto",  # for multi-gpus
+        load_in_4bit=True,  # for lower gpu memory usage
+        quantization_config=BitsAndBytesConfig(
+            load_in_4bit=True,
+            llm_int8_threshold=6.0,
+            llm_int8_has_fp16_weight=False,
+            bnb_4bit_compute_dtype=torch.float16,
+            bnb_4bit_use_double_quant=True,
+            bnb_4bit_quant_type="nf4",
+        ),
+    )
+    model.model_parallel = True
+    model.is_parallelizable = True
+    model.config.torch_dtype = torch.float16
+    model = prepare_model_for_kbit_training(model)
+
+    # support finetune a lora-finetuned model
+    if (ADAPTER_MODEL_DIR / "adapter_config.json").exists():
+        print(f"loading adapters {ADAPTER_MODEL_DIR}...")
+        model = PeftModel.from_pretrained(
+            model,
+            str(ADAPTER_MODEL_DIR),
+            is_trainable=True,
+        )
+    else:
+        print("init model with peft lora config...")
+        peft_config = LoraConfig(
+            task_type=TaskType.CAUSAL_LM,
+            target_modules=["W_pack"],
+            inference_mode=False,
+            r=64,
+            lora_alpha=16,
+            bias="none",
+            lora_dropout=0.05,
+        )
+        model = get_peft_model(model, peft_config)
+    model.print_trainable_parameters()
+
+    for name, module in model.named_modules():
+        if "norm" in name:
+            module = module.to(torch.float32)
+
+    tokenizer = AutoTokenizer.from_pretrained(
+        BASE_MODEL_DIR,
+        use_fast=False,
+        trust_remote_code=True,
+        model_max_length=int(os.environ.get("MODEL_MAX_LENGTH", 512)),
+    )
+
+    # TODO: support finetune arguments
+    # copy from https://github.com/baichuan-inc/Baichuan2/blob/main/README.md#%E5%8D%95%E6%9C%BA%E8%AE%AD%E7%BB%83
+    train_args = TrainingArguments(
+        output_dir=str(ADAPTER_MODEL_DIR),
+        optim="adamw_torch",
+        report_to="none",
+        num_train_epochs=int(os.environ.get("NUM_TRAIN_EPOCHS", 2)),
+        max_steps=int(os.environ.get("MAX_STEPS", 18)),
+        per_device_train_batch_size=2,  # more batch size will cause OOM
+        gradient_accumulation_steps=16,
+        save_strategy="no",  # no need to save checkpoint for finetune
+        learning_rate=2e-5,
+        lr_scheduler_type="constant",
+        adam_beta1=0.9,
+        adam_beta2=0.98,
+        adam_epsilon=1e-8,
+        max_grad_norm=1.0,
+        weight_decay=1e-4,
+        warmup_ratio=0.0,
+        logging_steps=10,
+        gradient_checkpointing=False,
+        remove_unused_columns=False,
+    )
+
+    # TODO: support deepspeed
+
+    trainer = Trainer(
+        model=model,
+        tokenizer=tokenizer,
+        args=train_args,
+        train_dataset=train_dataset.to_pytorch(
+            transform=DataCollatorForCausalLM(
+                tokenizer=tokenizer, source_max_len=16, target_max_len=512
+            )
+        ),
+    )
+
+    print("Starting model training...")
+    train_result = trainer.train(resume_from_checkpoint=None)
+    print(train_result.metrics)
+    trainer.save_state()
+    trainer.save_model()
diff --git a/example/llm-finetune/models/baichuan2/utils.py b/example/llm-finetune/models/baichuan2/utils.py
new file mode 100644
index 0000000000..ea0c70176c
--- /dev/null
+++ b/example/llm-finetune/models/baichuan2/utils.py
@@ -0,0 +1,8 @@
+from __future__ import annotations
+
+from pathlib import Path
+
+ROOT_DIR = Path(__file__).parent
+BASE_MODEL_DIR = ROOT_DIR / "pretrain" / "base"
+ADAPTER_MODEL_DIR = ROOT_DIR / "pretrain" / "adapter"
+CACHE_DIR = ROOT_DIR / ".cache"
diff --git a/example/llm-finetune/runtime/requirements.txt b/example/llm-finetune/runtime/requirements.txt
new file mode 100644
index 0000000000..dcb5ce923d
--- /dev/null
+++ b/example/llm-finetune/runtime/requirements.txt
@@ -0,0 +1,14 @@
+transformers==4.31.0  # baichuan2 works with this version, the latest version(4.35.1) will raise AttributeError in AutoTokenizer.from_pretrained
+transformers-stream-generator==0.0.4
+torch==2.0.1
+accelerate==0.24.0
+gradio==3.50.2
+deepspeed==0.12.2
+PEFT==0.6.1
+xformers==0.0.22
+cpm_kernels
+bitsandbytes
+colorama
+tokenizers
+sentencepiece
+git+https://github.com/star-whale/starwhale.git@bebd503#subdirectory=client&egg=starwhale
diff --git a/example/llm-finetune/runtime/runtime.yaml b/example/llm-finetune/runtime/runtime.yaml
new file mode 100644
index 0000000000..8af8d9ebaa
--- /dev/null
+++ b/example/llm-finetune/runtime/runtime.yaml
@@ -0,0 +1,10 @@
+name: llm-finetune
+mode: venv
+environment:
+  arch: noarch
+  os: ubuntu:20.04
+  cuda: "11.7"
+  python: "3.10"
+  starwhale_version: git+https://github.com/star-whale/starwhale.git@bebd503#subdirectory=client&egg=starwhale
+dependencies:
+  - requirements.txt