Inference capacity for
open-source LLMs.
OpenAI-compatible endpoints for DeepSeek, MiniMax, and other open models. Dedicated GPU infrastructure. Billed per token.
Operating in beta · Invoicing in USD
Available models
- DeepSeek-V3Available
- Context
- 164K
- Input / M
- $0.26
- Output / M
- $0.71
- DeepSeek-V3.1Available
- Context
- 128K
- Input / M
- $0.12
- Output / M
- $0.60
- DeepSeek-V3.2Available
- Context
- 164K
- Input / M
- $0.21
- Output / M
- $0.30
- DeepSeek-R1Available
- Context
- 64K
- Input / M
- $0.56
- Output / M
- $2.00
- DeepSeek-V4Coming soon
- Context
- 1M
- Input / M
- —
- Output / M
- —
- MiniMax-M2.5Available
- Context
- 196K
- Input / M
- $0.09
- Output / M
- $0.79
- GLM-5Coming soon
- Context
- —
- Input / M
- —
- Output / M
- —
Volume discounts available from 100M tokens / day.
Contact us for wholesale rates and dedicated capacity agreements.
Prices in USD per million tokens. Subject to change during beta.
What you get
OpenAI-compatible API
Drop-in replacement for api.openai.com/v1. No SDK changes, no custom tooling.
Dedicated capacity
Reserved throughput for production workloads. Not a shared free-tier queue.
Real contracts
Enterprise MSA, DPA, and invoicing available. Volume commits get dedicated rate limits.
Built for
- AI-native startups migrating off official model APIs for cost or rate-limit reasons
- Vertical SaaS companies embedding LLM features under their own brand
- AI gateways and routing platforms looking for wholesale inference supply
- Research teams needing predictable throughput for synthetic data and evaluation
GPU rentals
We also rent dedicated GPU time on the same infrastructure. H100, H200, and A100 configurations are available for training, fine-tuning, and custom inference deployments.
Minimum engagement: one week. Contact for pricing.
Get in touch
For API access, wholesale inquiries, or partnership discussions.
Or email us directly at hello@unit23.xyz.