大模型微调实战指南

提示：下面的示意图使用 Mermaid 绘制，您可以在支持的 Markdown 渲染器中直接查看。

graph LR
    Data[原始数据] --> Preprocess[数据预处理]
    Preprocess --> Tokenizer[Tokenizer]
    Tokenizer --> Model[预训练模型]
    Model --> LoRA[LoRA 微调层]
    LoRA --> Trainer[Trainer]
    Trainer --> FineTuned[微调后模型]

关键概念

全参数微调：更新模型所有参数，计算成本高。
参数高效微调 (PEFT)：如 LoRA、QLoRA，只训练少量新增矩阵，显著降低显存需求。

环境准备


_10pip install transformers datasets peft torch

示例：使用 LoRA 微调 LLaMA‑2‑7B（简化版）


_43from transformers import AutoModelForCausalLM, AutoTokenizer, Trainer, TrainingArguments
_43from peft import LoraConfig, get_peft_model
_43from datasets import load_dataset
_43
_43model_name = "meta-llama/Llama-2-7b-chat-hf"
_43tokenizer = AutoTokenizer.from_pretrained(model_name)
_43model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")
_43
_43# LoRA 配置
_43lora_config = LoraConfig(
_43    r=8,
_43    lora_alpha=32,
_43    target_modules=["q_proj", "v_proj"],
_43    lora_dropout=0.1,
_43    bias="none",
_43)
_43model = get_peft_model(model, lora_config)
_43
_43# 加载示例数据集（Alpaca）
_43train_data = load_dataset("json", data_files="./alpaca_data.json")
_43
_43def tokenize_function(example):
_43    return tokenizer(example["instruction"] + "\n" + example["input"], truncation=True, max_length=512)
_43
_43train_dataset = train_data["train"].map(tokenize_function, batched=True)
_43
_43training_args = TrainingArguments(
_43    output_dir="./lora-finetuned",
_43    per_device_train_batch_size=4,
_43    num_train_epochs=3,
_43    learning_rate=2e-4,
_43    fp16=True,
_43    logging_steps=10,
_43)
_43
_43trainer = Trainer(
_43    model=model,
_43    args=training_args,
_43    train_dataset=train_dataset,
_43)
_43
_43trainer.train()
_43model.save_pretrained("./lora-finetuned")

说明：上述代码在单卡 24 GB 显存的机器上即可完成微调。

小结

LoRA 让微调大模型变得轻量且高效。
只需少量显存即可在本地完成微调实验。
通过 peft 库可以快速在任意 HuggingFace 模型上应用 LoRA。

后续：可进一步探索 QLoRA（量化 + LoRA）以在更低显存下微调更大的模型。

Discussion4

Join the conversation

Michael Chang·20h ago

The section on Context Windows vs RAG was really illuminating. I've been debating which approach to take for our internal knowledge base. Do you think the 1M+ context windows in newer models will eventually make RAG obsolete?

Sarah ChenAuthor·18h ago

Great question, Michael! I don't think RAG is going away anytime soon. Even with huge context windows, RAG offers better latency, cost-efficiency, and most importantly - the ability to cite sources explicitly.

Priya Patel·Dec 22, 2025

I finally understand how Positional Encodings work! The visual analogy with the clock hands was brilliant. 👏

DevOps Ninja·Dec 22, 2025

Any chance you could cover Quantization (GGUF/GPTQ) in a future post? trying to run these locally on my MacBook and it's a bit of a jungle out there.