function
decorator so that we can run our fine-tuning application as if it were on our local machine.
python finetune.py
. After beginning training, you should see something like the following in your terminal:
inference.py
, we are loading up our model with the additional fine-tuned weights and setting up an endpoint to send it requests.
Here, we make use of the Beam’s on_start
functionality so that we only load the model when the container starts instead of every time we receive a request. Let’s explore the endpoint
decorator below.
Monitoring compute usage in real-time in the dashboard
Signal
abstraction to fire an event to the inference app when the model has finished training.
This allows us to communicate between apps on Beam. In this example, we have it setup to re-run our on-start method when a signal is received. This way, if we re-train our model, we can load the newest weights without restarting the container.
beam
CLI.
-d '{"prompt": "hi"}
. The response we get back will be in the following format:
<|im_end|>
. You could strip this token in the endpoint logic if you would like, but it is worth keeping around if you will be appending this response to a longer running conversation.