API based LLM model access while LLM is self-hosted
Architecture Overview
┌──────────────────────────────────────────────────────────┐
│ REMOTE CLIENT │
│ (Laptop / Phone / Tablet) │
│ Twingate Client App │
└──────────────────┬───────────────────────────────────────┘
│ Encrypted Zero Trust Tunnel
▼
┌──────────────────────────────────────────────────────────┐
│ HOSTINGER VPS (Cloud Relay) │
│ Docker: twingate/connector container │
│ Authenticates via Twingate Identity Provider │
└──────────────────┬───────────────────────────────────────┘
│ Secure Tunnel (No open ports needed)
▼
┌──────────────────────────────────────────────────────────┐
│ HOME UBUNTU WORKSTATION │
│ │
│ ┌─────────────┐ ┌──────────────┐ ┌─────────────┐ │
│ │ Open WebUI │◄──│ Ollama │◄──│ AMD ROCm │ │
│ │ :9090 │ │ :11434 │ │ gfx1100 │ │
│ └─────────────┘ └──────────────┘ └─────────────┘ │
│ │
│ GPU: Sapphire NITRO+ RX 7900 XTX Vapor-X (24GB VRAM) │
└──────────────────────────────────────────────────────────┘Hardware Specifications
| Component | Specification |
|---|---|
| OS | Ubuntu 24.04.3 LTS x86\_64 |
| Kernel | 6.14.0-37-generic |
| Motherboard | Gigabyte X870E AORUS ELITE WIFI7 |
| CPU | AMD Ryzen 7 7800X3D (8 cores, 16 threads @ 5.05 GHz) |
| L3 Cache | 96 MB (3D V-Cache) |
| RAM | 64 GB DDR5 |
| GPU | Sapphire NITRO+ AMD Radeon RX 7900 XTX Vapor-X |
| VRAM | 24 GB GDDR6 |
| GPU Arch | RDNA 3 (gfx1100, 48 compute units, 2526 MHz) |
| Shell | Bash 5.2.21 |
| Resolution | 1920x1080 |
Geekbench 6 OpenCL Performance
| Benchmark | Score | Throughput |
|---|---|---|
| Overall | 216,220 | — |
| Background Blur | 100,194 | 414.7 images/sec |
| Horizon Detection | 335,188 | 10.4 Gpixels/sec |
| Edge Detection | 376,170 | 14.0 Gpixels/sec |
| Stereo Matching | 831,163 | 790.1 Gpixels/sec |
| Particle Physics | 622,445 | 27,394.3 FPS |
Installing ROCm for GPU Acceleration
The RX 7900 XTX is natively supported as gfx1100 under ROCm. This setup enables GPU-accelerated LLM inference.
Add ROCm Repository
# Update system packages
sudo apt update && sudo apt upgrade -y
# Install prerequisites
sudo apt install -y wget gnupg2
# Add ROCm repository
sudo mkdir --parents --mode=0755 /etc/apt/keyrings
wget https://repo.radeon.com/rocm/rocm.gpg.key -O - | \\
gpg --dearmor | sudo tee /etc/apt/keyrings/rocm.gpg > /dev/null
echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/6.1 noble main" | \\
sudo tee /etc/apt/sources.list.d/rocm.list
sudo apt updateInstall and Verify ROCm
# Install ROCm
sudo apt install -y rocm
# Add to PATH
echo 'PATH=/opt/rocm/bin:/opt/rocm/opencl/bin:$PATH' >> ~/.profile
source ~/.profile
# Verify GPU detection
sudo /opt/rocm/bin/rocminfo | grep gfx
# Expected: gfx1100
# Add user to required groups
sudo usermod -aG render,video $USERThe RX 7900 XTX is recognized as gfx1100 and requires no version overrides.
Setting Up Ollama with ROCm
Install Ollama
curl -fsSL https://ollama.com/install.sh | shOllama automatically detects AMD GPUs when ROCm drivers are installed. The 24 GB VRAM allows running models up to approximately 30B parameters with 4-bit quantization.
Configure Network Binding
Ollama binds to 127.0.0.1 by default. To allow Docker containers to connect, expose it on all interfaces:
sudo systemctl edit ollama.serviceAdd this configuration:
[Service]
Environment="OLLAMA_HOST=0.0.0.0:11434"Apply changes:
sudo systemctl daemon-reload
sudo systemctl restart ollamaTest GPU Acceleration
# Download a model
ollama pull llama3.1:8b
# Run inference
ollama run llama3.1:8b "Explain how GPU acceleration works"
# Monitor GPU utilization
watch -n 1 rocm-smiYou should observe VRAM allocation on the GPU, confirming hardware acceleration.
Deploying Open WebUI
Open WebUI provides a web interface for interacting with Ollama models.
Install Docker
curl -fsSL https://get.docker.com -o get-docker.sh
sudo sh get-docker.sh
# Enable non-root Docker usage
sudo usermod -aG docker $USER
newgrp dockerRun Open WebUI Container
docker run -d \\
-p 9090:8080 \\
--name open-webui \\
--restart unless-stopped \\
--add-host=host.docker.internal:host-gateway \\
-e OLLAMA_BASE_URL=http://host.docker.internal:11434 \\
-v $HOME/.open-webui:/app/backend/data \\
ghcr.io/open-webui/open-webui:mainAccess the interface at http://localhost:9090. Create an admin account on first launch.
Configuring Twingate Zero Trust Access
This setup uses Twingate to provide secure remote access without opening inbound ports on the home network.
Architecture Details
- Hostinger VPS runs a Twingate connector that maintains an outbound connection to Twingate's control plane
- Home workstation can optionally run a second connector or be accessed via the VPS relay
- Remote clients authenticate through an identity provider and connect via the Twingate client
- Result: Secure access to the workstation's Open WebUI without port forwarding
Deploy Connector on Hostinger VPS
SSH into the VPS and run:
# Install Docker if needed
curl -fsSL https://get.docker.com -o get-docker.sh
sudo sh get-docker.sh
# Deploy Twingate connector
# Obtain credentials from Twingate Admin Console → Add Connector
docker run -d \\
--sysctl net.ipv4.ping_group_range="0 2147483647" \\
--env TWINGATE_NETWORK="<YOUR_NETWORK_NAME>" \\
--env TWINGATE_ACCESS_TOKEN="<ACCESS_TOKEN>" \\
--env TWINGATE_REFRESH_TOKEN="<REFRESH_TOKEN>" \\
--env TWINGATE_LABEL_HOSTNAME="$(hostname)" \\
--env TWINGATE_LOG_ANALYTICS="v2" \\
--name twingate-connector \\
--restart unless-stopped \\
--pull always \\
twingate/connector:1Configure Access in Twingate Console
- Navigate to Remote Networks and select your network
- Add a Resource pointing to workstation-local-ip:9090
- Configure access policies and assign to appropriate user groups
- Install the Twingate client on devices that need access
Docker Compose Configuration
For unified management, use this docker-compose.yml:
version: "3.8"
services:
open-webui:
image: ghcr.io/open-webui/open-webui:main
container_name: open-webui
restart: unless-stopped
ports:
- "9090:8080"
extra_hosts:
- "host.docker.internal:host-gateway"
environment:
- OLLAMA_BASE_URL=http://host.docker.internal:11434
volumes:
- open-webui-data:/app/backend/data
twingate-connector:
image: twingate/connector:1
container_name: twingate-home-connector
restart: unless-stopped
pull_policy: always
sysctls:
- net.ipv4.ping_group_range=0 2147483647
environment:
- TWINGATE_NETWORK=<YOUR_NETWORK>
- TWINGATE_ACCESS_TOKEN=<ACCESS_TOKEN>
- TWINGATE_REFRESH_TOKEN=<REFRESH_TOKEN>
- TWINGATE_LABEL_HOSTNAME=home-workstation
- TWINGATE_LOG_ANALYTICS=v2
volumes:
open-webui-data:Deploy with:
docker compose up -dVerification
| Component | Command | Expected Output | |
|---|---|---|---|
| ROCm GPU detection | rocminfo \ | grep gfx | gfx1100 |
| Ollama service status | systemctl status ollama | active (running) | |
| GPU utilization | rocm-smi (while model loaded) | VRAM usage on device 0 | |
| Open WebUI accessibility | curl http://localhost:9090 | HTML response | |
| Twingate connectivity | Check Twingate Admin Console | Connector status: online | |
| Remote access | Access via Twingate client | Open WebUI login page |
Model Performance
Tested configurations with 24 GB VRAM and 64 GB system RAM:
| Model | Parameters | Quantization | VRAM Usage | Tokens/sec |
|---|---|---|---|---|
| Llama 3.1 8B | 8B | Q4\_K\_M | \~5 GB | 80-100 |
| DeepSeek-R1 | 14B | Q4\_K\_M | \~9 GB | 45-60 |
| Qwen 2.5 | 32B | Q4\_K\_M | \~20 GB | 20-30 |
| Llama 3.1 70B | 70B | Q4\_K\_M | \~22 GB\* | 8-12 |
- No port forwarding required: Twingate connectors establish outbound-only connections
- Identity-based authentication: Integrates with Google Workspace, Okta, Azure AD, or any OIDC provider
- Native GPU support: The RX 7900 XTX (gfx1100) is officially supported by ROCm without workarounds
- Automatic GPU detection: Ollama automatically uses ROCm-compatible GPUs when available
- VPS role: The Hostinger VPS acts as a relay and does not perform inference computations