Complete Guide: Deploying DeepSeek R1 on an RTX 4090 GPU

February 8, 2025

4 minutes

8372 Views

Introduction

DeepSeek R1 is a powerful open-source language model designed for various AI applications. If you're looking to deploy it on an RTX 4090 GPU, this guide will walk you through the entire process, from hardware requirements to running the model efficiently.

Complete Guide: Deploying DeepSeek R1 on an RTX 4090 GPU

By the end of this guide, you'll have a fully functional DeepSeek R1 deployment running locally on your RTX 4090.

1. Hardware Requirements

Before we begin, ensure your system meets the following requirements:

GPU: NVIDIA RTX 4090 (24GB VRAM)
CPU: At least 8-core processor
RAM: Minimum 32GB (recommended)
Storage: At least 100GB free space (SSD preferred)
Operating System: Ubuntu 20.04+ / Windows 11 / macOS (for Mac users)

For RTX 4090, you can run up to DeepSeek R1 32B. Larger models like DeepSeek R1 70B require multiple GPUs.

2. Install Required Dependencies

To run DeepSeek R1, you'll need the Ollama framework, which simplifies model management.

2.1 Install Ollama

Ollama is an easy-to-use tool for running large language models locally.

On Ubuntu / Linux

Open a terminal and run:

curl -fsSL https://ollama.com/install.sh | sh

On macOS

Download and install Ollama from: https://ollama.com/download

On Windows

Ollama currently does not support Windows natively. Use WSL2 (Ubuntu) for the best experience.
Follow the WSL2 installation guide before proceeding.

After installation, verify with:

ollama -v

If successful, it will display the installed version.

3. Download and Set Up DeepSeek R1

3.1 Choose the Right Model Version

DeepSeek R1 comes in different sizes. For a single RTX 4090, DeepSeek R1 32B is the best choice.

3.2 Pull the Model

To download the model, run:

ollama pull deepseek-r1

This will automatically fetch the latest available version of DeepSeek R1.

You can check available models on the DeepSeek R1 GitHub page.

4. Running DeepSeek R1 on RTX 4090

After downloading the model, you can run it with:

ollama run deepseek-r1

This starts an interactive session where you can input prompts and receive AI-generated responses.

4.1 Running a Test Query

You can also pass a test query directly from the command line:

ollama run deepseek-r1 "What is the capital of France?"

Expected output:

The capital of France is Paris.

5. Optimizing Performance for RTX 4090

To fully utilize your RTX 4090, follow these optimizations:

5.1 Enable CUDA for Faster Performance

DeepSeek R1 runs best with CUDA acceleration. Ensure you have installed:

NVIDIA Driver (Latest): Download from NVIDIA Drivers
CUDA Toolkit (v12.x recommended): Download from CUDA Toolkit
cuDNN Library: Install from NVIDIA cuDNN

After installation, verify CUDA support:

nvcc --version

5.2 Run DeepSeek R1 with GPU Optimization

Modify the Ollama launch command to force GPU acceleration:

OLLAMA_BACKEND=cuda ollama run deepseek-r1

This ensures the model utilizes your RTX 4090 for the best performance.

6. Running DeepSeek R1 as an API

If you want to integrate DeepSeek R1 into applications, you can use the Ollama API.

6.1 Start the API Server

Run the following command to start an API server:

ollama serve

6.2 Making API Requests

Use curl or any HTTP client to send requests:

curl -X POST http://localhost:11434/api/generate -d '{
  "model": "deepseek-r1",
  "prompt": "What is artificial intelligence?"
}'

This returns a JSON response with the AI-generated answer.

For more advanced API usage, refer to the Ollama API Documentation.

7. Troubleshooting Common Issues

Issue	Solution
Model not downloading	Ensure you have a stable internet connection. Try `ollama pull deepseek-r1` again.
CUDA not working	Ensure your NVIDIA drivers and CUDA toolkit are properly installed. Run `nvidia-smi` to check GPU status.
High memory usage	Reduce batch size or use a smaller model like DeepSeek R1 7B.
API not responding	Ensure the Ollama server is running by checking `ps aux

For more issues, check the DeepSeek AI GitHub Discussions.

Conclusion

By following this guide, you should now have DeepSeek R1 running efficiently on your RTX 4090. Whether you're using it for research, chatbot development, or AI-powered applications, this setup will give you powerful AI capabilities on your local machine.

Looking for a High-Performance GPU Server?

If you need enterprise-grade RTX 4090 GPU servers, check out SurferCloud's RXT 4090 GPU Servers.

Why Choose SurferCloud?

✅ High-performance RTX 4090 GPUs
✅ Affordable cloud-based pricing
✅ Global availability with low-latency networking
✅ Flexible hourly and monthly billing

🔗 Explore SurferCloud's RTX 4090 GPU Servers here: https://surfercloud.com/gpu.

Tags : DeepSeek R1 Install Ollama RTX 4090 GPU

7 minutes INDUSTRY INFORMATION