Citadea Neural
Dedicated LPUs for flat-rate inference
Sovereign inference on dedicated LPUs, billed at a flat rate — so an agent reasoning loop never produces a frightening invoice.
Dedicated LPU (Language Processing Unit) accelerators optimised for low-latency inference and high token throughput on open-source models (Llama, Mistral and more). Up to 70% less energy than traditional GPUs — and a predictable flat-rate bill instead of per-token shock.
Dedicated LPUs, open-source models
Neural serves open-source models such as Llama and Mistral on dedicated, non-shared LPU accelerators, with real-time token streaming and edge latency under 20 ms. LPUs draw up to 70% less energy than comparable GPU inference.
Flat rate beats token shock
Instead of per-token pricing that punishes high-utilisation agent workloads, Neural sells a predictable monthly slice. Above roughly 30% sustained utilisation, flat-rate inference is simply cheaper — and always forecastable.
Compared to Together AI, Fireworks, Baseten and AWS Bedrock.
What's included
- Dedicated LPUs (non-shared)
- Open-source models (Llama, Mistral)
- Real-time token streaming
- Predictable flat-rate pricing
- EU AI Act-aligned
- Edge latency < 20 ms
Quickstart
$ citadea neural infer --model <id>