Fine-Tune an LLM
Alpaca LoRA Training and Inference
This example demonstrates how to fine-tune and deploy Alpaca-LoRA on Beam.
To run this example, you’ll need a free account on Beam. If you signup here, you’ll get 10 hours of free credit to get started.
Training
We’re going to implement the code in the Llama LoRA repo in a script we can run on Beam.
I’m using the Instruction Tuning with GPT-4 dataset, which is hosted on Huggingface.
The first thing we’ll do is setup the compute environment to run Llama. The training script is run on a 24Gi A10G GPU:
This example only demonstrates the high-level workflow, so specific functions like train
are hidden. You can find the entire source code on Github.
In order to run this on Beam, we use the beam run
command:
When we run this command, the training function will run on Beam’s cloud, and we’ll see the progress of the training process streamed to our terminal:
Deploying Inference API
When the model is trained, we can deploy an API to run inference on our fine-tuned model.
Let’s create a new function for inference. If you look closely, you’ll notice that we’re using a different decorator this time: rest_api
instead of run
.
This will allow us to deploy the function as a REST API.
We can deploy this as a REST API by running this command:
If we navigate to the URL printed in the shell, we’ll be able to copy the full cURL request to call the REST API.
We’ll modify the request slightly with a payload for the model:
And here’s the response from the fine-tuned model:
Was this page helpful?