Chat with DeepSeek R1
In this example we are going to use vLLM to host an API for deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
on Beam.
View the Code
See the code for this example on Github.
Initial Setup
First, clone the vLLM example to your computer.
We’ll use our vLLM abstraction to host an OpenAI compatible DeepSeek API on Beam.
From inside the vLLM directory, run the following command to deploy the API:
This code will deploy a DeepSeek R1 API on Beam, and print out the API URL.
Running the API
We provide an interactive command line interface to run the API. You’ll be prompted to enter the API URL from the deployment output above. If you select stream mode, the API will stream the response to the console.
The first time you run the API, the model weights will be downloaded from Hugging Face. This may take a few minutes, but will be cached for future runs.
Interacting with DeepSeek R1
You can now interact with the DeepSeek R1 API. The API will stream the response to the console, and print out the tokens generated and the time taken.
Was this page helpful?