Amazon EKS上部署DeepSeek-R1模型教程

随着DeepSeek的开源，AI开发迎来了更广泛的普及。DeepSeek-R1采用了思维链（CoT）推理技术，能够将复杂问题拆解为一系列更小、更易处理的步骤，因而提升了模型在解决数学问题、逻辑谜题等任务时的能力。借助Amazon EKS Auto Mode托管DeepSeek，就无需再为管理底层的Kubernetes基础架构而烦恼，可以将更多精力投入到模型的部署与应用上。本文将详细介绍如何使用亚马逊云科技Amazon EKS托管DeepSeek-R1模型。

亚马逊云科技官网：点击访问（注册立享40+免费云产品）

以下选用DeepSeek-R1-Distill-Llama-8B蒸馏模型。与拥有671B参数的完整DeepSeek-R1模型相比，该蒸馏模型对资源的需求显著降低，虽然功能上稍逊一筹，但提供了更为轻量级的部署选择。如果您希望部署完整的DeepSeek – R1模型，只需在vLLM配置中替换掉蒸馏模型即可。

一、安装PreReqs

为了简化设置流程，本文将借助Amazon CloudShell进行操作。

# Installing kubectl
curl -LO “https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl”
sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl

# Install Terraform
sudo yum install -y yum-utils
sudo yum-config-manager –add-repo https://rpm.releases.hashicorp.com/AmazonLinux/hashicorp.repo
sudo yum -y install terraform

二、使用Terraform创建启用Auto Mode的Amazon EKS集群

利用Terraform可以轻松完成基础架构的配置，包括Amazon VPC、Amazon ECR存储库以及启用Auto Mode的Amazon EKS集群。

# Clone the GitHub repo with the manifests
git clone -b v0.1 https://github.com/aws-samples/deepseek-using-vllm-on-eks
cd deepseek-using-vllm-on-eks

# Apply the Terraform configuration
terraform init
terraform apply -auto-approve

# After Terraform finishes, configure kubectl with the new EKS cluster
$(terraform output configure_kubectl | jq -r)

三、创建一个支持GPU的自定义NodePool

# Create a custom NodePool with GPU supportkubectl apply -f manifests/gpu-nodepool.yaml

# Check if the NodePool is in ‘Ready’ statekubectl get nodepool/gpu-nodepool

四、部署DeepSeek模型

为简化使用vLLM部署DeepSeek – R1 – Distill – Llama – 8B模型的过程，本演示提供了一个 sed命令，可轻松设置模型名称和参数。

# Use the sed command to replace the placeholder with the model name and configuration parameters
sed -i “s|__MODEL_NAME_AND_PARAMETERS__|deepseek-ai/DeepSeek-R1-Distill-Llama-8B –max_model 2048|g” manifests/deepseek-deployment-gpu.yaml

# Deploy the DeepSeek model on Kubernetes
kubectl apply -f manifests/deepseek-deployment-gpu.yaml

# Check the pods in the ‘deepseek’ namespace
kubectl get po -n deepseek

起初当Amazon EKS Auto Mode为底层Amazon EC2实例配置所需的GPU驱动程序时Pod可能会处于Pending（待处理）状态。若Pod在Pending状态停留了几分钟，请确认亚马逊云科技账户有足够的服务配额来启动所需的实例，您需要检查G或P实例的配额限制情况。

注意：这些配额是基于vCPU而非实例数量，因此请务必按要求申请配额。

# Wait for the pod to reach the ‘Running’ state
watch -n 1 kubectl get po -n deepseek

# Verify that a new Node has been created
kubectl get nodes -l owner=data-engineer

# Check the logs to confirm that vLLM has started
kubectl logs deployment.apps/deepseek-deployment -n deepseek

部署准备就绪后，日志条目将显示“应用程序启动完成”。

五、与DeepSeek LLM交互

接下来，创建一个本地代理，通过curl请求与模型进行交互。

# Set up a proxy to forward the service port to your local terminal
kubectl port-forward svc/deepseek-svc -n deepseek 8080:80 > port-forward.log 2>&1 &

# Send a curl request to the model
curl -X POST “http://localhost:8080/v1/chat/completions” -H “Content-Type: application/json” –data ‘{
“model”: “deepseek-ai/DeepSeek-R1-Distill-Llama-8B”,
“messages”: [
{
“role”: “user”,
“content”: “What is Kubernetes?”
}
]
}’

根据模型输出的复杂程度，响应可能需要几秒钟时间，可以通过deepseek – deployment日志监控进度。

六、为模型构建聊天机器人UI界面

直接使用API请求固然可行，但构建一个更友好的聊天机器人UI界面来与模型交互会带来更好的体验。该界面的源代码已在GitHub中提供。

# Retrieve the ECR repository URI created by Terraform
export ECR_REPO=$(terraform output ecr_repository_uri | jq -r)

# Build the container image for the Chatbot UI
docker build -t $ECR_REPO:0.1 chatbot-ui/application/.

# Login to ECR and push the image
aws ecr get-login-password | docker login –username AWS –password-stdin $ECR_REPO
docker push $ECR_REPO:0.1

# Update the deployment manifest to use the image
sed -i “s#__IMAGE_DEEPSEEK_CHATBOT__#$ECR_REPO:0.1#g” chatbot-ui/manifests/deployment.yaml

# Generate a random password for the Chatbot UI login
sed -i “s|__PASSWORD__|$(openssl rand -base64 12 | tr -dc A-Za-z0-9 | head -c 16)|” chatbot-ui/manifests/deployment.yaml

# Deploy the UI and create the ingress class required for load balancers
kubectl apply -f chatbot-ui/manifests/ingress-class.yaml
kubectl apply -f chatbot-ui/manifests/deployment.yaml

# Get the URL for the load balancer to access the application
echo http://$(kubectl get ingress/deepseek-chatbot-ingress -n deepseek -o json | jq -r ‘.status.loadBalancer.ingress[0].hostname’)

等待几秒钟，以便负载均衡器完成配置。要访问聊天机器人UI界面，需要使用存储在Kubernetes密钥中的用户名和密码。

echo -e “Username=$(kubectl get secret deepseek-chatbot-secrets -n deepseek -o jsonpath='{.data.admin-username}’ | base64 –decode)\nPassword=$(kubectl get secret deepseek-chatbot-secrets -n deepseek -o jsonpath='{.data.admin-password}’ | base64 –decode)”

登录后，界面将显示一个新的“聊天机器人”选项卡，可以在其中与模型进行交互。