Nexa SDK is a local on-device inference framework for ONNX and GGML models, supporting text generation, image generation, vision-language models (VLM), audio-language models, speech-to-text (ASR), and text-to-speech (TTS) capabilities. Installable via Python Package or Executable Installer.
Multi-Device Support: CPU, GPU (CUDA, Metal, ROCm, Vulkan), NPU, PC, Mobile, Wearables, Automobiles, Robotics
OpenAI-Compatible Server: Supports function calling and streaming with JSON schema.
Interactive UI: Built with Streamlit for easy model interaction and testing.
Sensitive data stays on your device with on-device AI, ensuring privacy without compromise.
Get models fine-tuned to your data and optimized for your devices, ensuring maximum efficiency and performance.
Finetuning for Your Data and Use Case
Quantization for Efficient Deployment
Dedicated Expert Support
Deploy AI solutions on your own infrastructure for enhanced control and speed, on-premise or on any device.
On-Premise or Private Deployment
Deploy on any device types
Device Speed Optimization
We guide you from design to deployment, offering comprehensive support to build AI systems that meet your business goals.
Design Your On-Device AI System
Build and Deploy Complete AI Solutions
Dedicated Support and Training
“Octopus v2 represents a major leap towards making powerful AI accessible to everyone.”
ELITIZON Ltd, CTO
“Octopus v2 marks a significant leap towards sustainable, accessible, and user-friendly AI applications, addressing concerns around privacy, cost, and latency.”
Axtria, Head of AI
“A monumental leap in function calling efficiency on devices, making real-world applications faster and smarter than ever imagined.”
Birdiefy AI, ex CPO& Cofounder
🤯
Hugging Face, CTO
“a groundbreaking new framework for on-device AI agents.”
SWIFT, CIO
“Extremely fast, better than Llama+RAG, great results”
Hugging face, CLO
“Octopus v2 represents a major leap towards making powerful AI accessible to everyone.”
ELITIZON Ltd, CTO
“Octopus v2 marks a significant leap towards sustainable, accessible, and user-friendly AI applications, addressing concerns around privacy, cost, and latency.”
Axtria, Head of AI
“A monumental leap in function calling efficiency on devices, making real-world applications faster and smarter than ever imagined.”
Birdiefy AI, ex CPO& Cofounder
🤯
Hugging Face, CTO
“a groundbreaking new framework for on-device AI agents.”
SWIFT, CIO
“Extremely fast, better than Llama+RAG, great results”
Hugging face, CLO
“Octopus v2 represents a major leap towards making powerful AI accessible to everyone.”
ELITIZON Ltd, CTO
“Octopus v2 marks a significant leap towards sustainable, accessible, and user-friendly AI applications, addressing concerns around privacy, cost, and latency.”
Axtria, Head of AI
“A monumental leap in function calling efficiency on devices, making real-world applications faster and smarter than ever imagined.”
Birdiefy AI, ex CPO& Cofounder
🤯
Hugging Face, CTO
“a groundbreaking new framework for on-device AI agents.”
SWIFT, CIO
“Extremely fast, better than Llama+RAG, great results”
Hugging face, CLO
“Octopus v2 represents a major leap towards making powerful AI accessible to everyone.”
ELITIZON Ltd, CTO
“Octopus v2 marks a significant leap towards sustainable, accessible, and user-friendly AI applications, addressing concerns around privacy, cost, and latency.”
Axtria, Head of AI
“A monumental leap in function calling efficiency on devices, making real-world applications faster and smarter than ever imagined.”
Birdiefy AI, ex CPO& Cofounder
🤯
Hugging Face, CTO
“a groundbreaking new framework for on-device AI agents.”
SWIFT, CIO
“Extremely fast, better than Llama+RAG, great results”
Hugging face, CLO
“Octopus v2 represents a major leap towards making powerful AI accessible to everyone.”
ELITIZON Ltd, CTO
“Octopus v2 marks a significant leap towards sustainable, accessible, and user-friendly AI applications, addressing concerns around privacy, cost, and latency.”
Axtria, Head of AI
“A monumental leap in function calling efficiency on devices, making real-world applications faster and smarter than ever imagined.”
Birdiefy AI, ex CPO& Cofounder
🤯
Hugging Face, CTO
“a groundbreaking new framework for on-device AI agents.”
SWIFT, CIO
“Extremely fast, better than Llama+RAG, great results”
Hugging face, CLO
“Octopus v2 represents a major leap towards making powerful AI accessible to everyone.”
ELITIZON Ltd, CTO
“Octopus v2 marks a significant leap towards sustainable, accessible, and user-friendly AI applications, addressing concerns around privacy, cost, and latency.”
Axtria, Head of AI
“A monumental leap in function calling efficiency on devices, making real-world applications faster and smarter than ever imagined.”
Birdiefy AI, ex CPO& Cofounder
🤯
Hugging Face, CTO
“a groundbreaking new framework for on-device AI agents.”
SWIFT, CIO
“Extremely fast, better than Llama+RAG, great results”
Hugging face, CLO
“Octopus v2 represents a major leap towards making powerful AI accessible to everyone.”
ELITIZON Ltd, CTO
“Octopus v2 marks a significant leap towards sustainable, accessible, and user-friendly AI applications, addressing concerns around privacy, cost, and latency.”
Axtria, Head of AI
“A monumental leap in function calling efficiency on devices, making real-world applications faster and smarter than ever imagined.”
Birdiefy AI, ex CPO& Cofounder
🤯
Hugging Face, CTO
“Interesting idea to incorporate the functions into the model with fine-tuning to get reliable generation from small LLMs.”
Hugging face, Tech lead & LLMs
“With remarkable progress in on-device language modeling and function request abilities, Octopus v2 could revolutionize software development and spur innovation.”
BrandGuard AI, AI/ML Leader
“It is a prime example of efficiency and cost-effectiveness.”
Chainstack, Product Lead
“an on-device action model, developers are showcasing the potential of Gemma to create impactful and accessible AI solutions.”
Google I/O PR post
“a groundbreaking new framework for on-device AI agents. The new era of on-device AI agents is coming.”
Rundown AI, Founder
“Striking a balance between high accuracy and low latency, it's a game-changer in on-device AI performance.”
Radio Workflow, Founder
“Interesting idea to incorporate the functions into the model with fine-tuning to get reliable generation from small LLMs.”
Hugging face, Tech lead & LLMs
“With remarkable progress in on-device language modeling and function request abilities, Octopus v2 could revolutionize software development and spur innovation.”
BrandGuard AI, AI/ML Leader
“It is a prime example of efficiency and cost-effectiveness.”
Chainstack, Product Lead
“an on-device action model, developers are showcasing the potential of Gemma to create impactful and accessible AI solutions.”
Google I/O PR post
“a groundbreaking new framework for on-device AI agents. The new era of on-device AI agents is coming.”
Rundown AI, Founder
“Striking a balance between high accuracy and low latency, it's a game-changer in on-device AI performance.”
Radio Workflow, Founder
“Interesting idea to incorporate the functions into the model with fine-tuning to get reliable generation from small LLMs.”
Hugging face, Tech lead & LLMs
“With remarkable progress in on-device language modeling and function request abilities, Octopus v2 could revolutionize software development and spur innovation.”
BrandGuard AI, AI/ML Leader
“It is a prime example of efficiency and cost-effectiveness.”
Chainstack, Product Lead
“an on-device action model, developers are showcasing the potential of Gemma to create impactful and accessible AI solutions.”
Google I/O PR post
“a groundbreaking new framework for on-device AI agents. The new era of on-device AI agents is coming.”
Rundown AI, Founder
“Striking a balance between high accuracy and low latency, it's a game-changer in on-device AI performance.”
Radio Workflow, Founder
“Interesting idea to incorporate the functions into the model with fine-tuning to get reliable generation from small LLMs.”
Hugging face, Tech lead & LLMs
“With remarkable progress in on-device language modeling and function request abilities, Octopus v2 could revolutionize software development and spur innovation.”
BrandGuard AI, AI/ML Leader
“It is a prime example of efficiency and cost-effectiveness.”
Chainstack, Product Lead
“an on-device action model, developers are showcasing the potential of Gemma to create impactful and accessible AI solutions.”
Google I/O PR post
“a groundbreaking new framework for on-device AI agents. The new era of on-device AI agents is coming.”
Rundown AI, Founder
“Striking a balance between high accuracy and low latency, it's a game-changer in on-device AI performance.”
Radio Workflow, Founder
“Interesting idea to incorporate the functions into the model with fine-tuning to get reliable generation from small LLMs.”
Hugging face, Tech lead & LLMs
“With remarkable progress in on-device language modeling and function request abilities, Octopus v2 could revolutionize software development and spur innovation.”
BrandGuard AI, AI/ML Leader
“It is a prime example of efficiency and cost-effectiveness.”
Chainstack, Product Lead
“an on-device action model, developers are showcasing the potential of Gemma to create impactful and accessible AI solutions.”
Google I/O PR post
“a groundbreaking new framework for on-device AI agents. The new era of on-device AI agents is coming.”
Rundown AI, Founder
“Striking a balance between high accuracy and low latency, it's a game-changer in on-device AI performance.”
Radio Workflow, Founder
“Interesting idea to incorporate the functions into the model with fine-tuning to get reliable generation from small LLMs.”
Hugging face, Tech lead & LLMs
“With remarkable progress in on-device language modeling and function request abilities, Octopus v2 could revolutionize software development and spur innovation.”
BrandGuard AI, AI/ML Leader
“It is a prime example of efficiency and cost-effectiveness.”
Chainstack, Product Lead
“an on-device action model, developers are showcasing the potential of Gemma to create impactful and accessible AI solutions.”
Google I/O PR post
“a groundbreaking new framework for on-device AI agents. The new era of on-device AI agents is coming.”
Rundown AI, Founder
“Striking a balance between high accuracy and low latency, it's a game-changer in on-device AI performance.”
Radio Workflow, Founder
“Interesting idea to incorporate the functions into the model with fine-tuning to get reliable generation from small LLMs.”
Hugging face, Tech lead & LLMs
“With remarkable progress in on-device language modeling and function request abilities, Octopus v2 could revolutionize software development and spur innovation.”
BrandGuard AI, AI/ML Leader
“It is a prime example of efficiency and cost-effectiveness.”
Chainstack, Product Lead
“an on-device action model, developers are showcasing the potential of Gemma to create impactful and accessible AI solutions.”
Google I/O PR post