Inference capacity for
open-source LLMs.

OpenAI-compatible endpoints for DeepSeek, MiniMax, and other open models. Dedicated GPU infrastructure. Billed per token.

Request access See models

Operating in beta · Invoicing in USD

Available models

Model	Context	Input / M	Output / M	Status
DeepSeek-V3	164K	$0.26	$0.71	Available
DeepSeek-V3.1	128K	$0.12	$0.60	Available
DeepSeek-V3.2	164K	$0.21	$0.30	Available
DeepSeek-R1	64K	$0.56	$2.00	Available
DeepSeek-V4	1M	—	—	Coming soon
MiniMax-M2.5	196K	$0.09	$0.79	Available
GLM-5	—	—	—	Coming soon

DeepSeek-V3Available
Context
164K
Input / M
$0.26
Output / M
$0.71
DeepSeek-V3.1Available
Context
128K
Input / M
$0.12
Output / M
$0.60
DeepSeek-V3.2Available
Context
164K
Input / M
$0.21
Output / M
$0.30
DeepSeek-R1Available
Context
64K
Input / M
$0.56
Output / M
$2.00
DeepSeek-V4Coming soon
Context
1M
Input / M
—
Output / M
—
MiniMax-M2.5Available
Context
196K
Input / M
$0.09
Output / M
$0.79
GLM-5Coming soon
Context
—
Input / M
—
Output / M
—

Volume discounts available from 100M tokens / day.

Contact us for wholesale rates and dedicated capacity agreements.

Prices in USD per million tokens. Subject to change during beta.

What you get

OpenAI-compatible API

Drop-in replacement for api.openai.com/v1. No SDK changes, no custom tooling.

Dedicated capacity

Reserved throughput for production workloads. Not a shared free-tier queue.

Real contracts

Enterprise MSA, DPA, and invoicing available. Volume commits get dedicated rate limits.

Built for

AI-native startups migrating off official model APIs for cost or rate-limit reasons
Vertical SaaS companies embedding LLM features under their own brand
AI gateways and routing platforms looking for wholesale inference supply
Research teams needing predictable throughput for synthetic data and evaluation

GPU rentals

We also rent dedicated GPU time on the same infrastructure. H100, H200, and A100 configurations are available for training, fine-tuning, and custom inference deployments.

Minimum engagement: one week. Contact for pricing.

Inquire about GPUs

Get in touch

For API access, wholesale inquiries, or partnership discussions.

Or email us directly at hello@unit23.xyz.

Inference capacity foropen-source LLMs.