In this guide, we fine-tune the Meta-Llama-3.1-8B-bnb-4bit model, optimized by Unsloth, using Low-Rank Adaptation (LoRA) on the Alpaca-cleaned dataset. We leverage Beam’s infrastructure for compute and storage, then deploy an inference endpoint. Throughout the process, we’ll track and evaluate our fine-tuning performance using Weights & Biases (wandb).
We define a shared Image configuration for both fine-tuning and inference, ensuring consistency. The image includes necessary dependencies and installs Unsloth from its GitHub repository.
To use Weights & Biases (wandb) for tracking, you’ll need your API key. You can find it in your wandb dashboard under the “API keys” section. Copy the key and replace YOUR_WANDB_KEY in the wandb login command.
finetune.py
from beam import Image# Weights & Biases API Key (replace with your key)WANDB_API_KEY ="YOUR_WANDB_KEY"image =( Image(python_version="python3.11").add_python_packages(["ninja","packaging","wheel","torch","xformers","trl","peft","accelerate","bitsandbytes","wandb"]).add_commands(["pip uninstall unsloth -y",'pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"',f"wandb login {WANDB_API_KEY}"]))# ConstantsMODEL_NAME ="unsloth/Meta-Llama-3.1-8B-bnb-4bit"MAX_SEQ_LENGTH =2048VOLUME_PATH ="./model_storage"
After completion, verify that the files are saved in your Beam Volume:
beam ls model-storage/fine_tuned_model
Here’s the expected output with the fine-tuned files:
Name Size Modified Time IsDir ───────────────────────────────────────────────────────────── fine_tuned_model/README.md 4.99 KiB 1 hour ago No fine_tuned_model/adapter_config.json 805.00 B 1 hour ago No fine_tuned_model/adapter_model.safeten… 160.06 MiB 1 hour ago No fine_tuned_model/checkpoint-60/ 1 hour ago Yes fine_tuned_model/special_tokens_map.js… 459.00 B 1 hour ago No fine_tuned_model/tokenizer.json 16.41 MiB 1 hour ago No fine_tuned_model/tokenizer_config.json 49.46 KiB 1 hour ago No ...
We tracked our fine-tuning process using Weights & Biases, which provided detailed metrics on training progress. The dashboard showed that the training loss started at approximately 1.85 and, despite significant fluctuations, exhibited a general downward trend, ending at around 0.95 by step 60. This suggests that the model was learning patterns from the Alpaca-cleaned dataset over the 60 training steps.
The dashboard shows a consistent decrease in training loss over time, confirming that our model was learning effectively from the Alpaca dataset.
To understand the impact of fine-tuning the Meta Llama 3.1 8B model with Unsloth on the Alpaca-cleaned dataset, we evaluated both the base model and the fine-tuned model on two widely used benchmarks: HellaSwag (a commonsense reasoning task) and MMLU (Massive Multitask Language Understanding, covering a broad range of subjects). The results highlight the fine-tuned model’s improvements over the base model, demonstrating the effectiveness of our fine-tuning process.
The table below summarizes the overall performance on HellaSwag and MMLU. The fine-tuned model shows modest but consistent gains across both benchmarks.
Benchmark
Base Model
Fine-tuned Model
Improvement
HellaSwag (acc)
59.09%
60.37%
+1.28%
HellaSwag (acc_norm)
77.93%
78.75%
+0.82%
MMLU (overall)
61.42%
62.33%
+0.91%
HellaSwag: The fine-tuned model improves accuracy (acc) by 1.28% and normalized accuracy (acc_norm) by 0.82%, indicating better commonsense reasoning capabilities.
MMLU: An overall improvement of 0.91% suggests the model has enhanced its general knowledge and reasoning across diverse topics.
The fine-tuned model demonstrates consistent improvements over the base model, particularly in tasks requiring logical reasoning, ethical judgment, and commonsense understanding. These gains align with the Alpaca-cleaned dataset’s focus on instruction-following and coherent responses.