Run Multimodal AI Models on Your Local Devices.
After releasing Octopus series models, we tried running AI models on laptops, Android phones, and iPhones. To support multi-modality, we experimented with massive vision-language models (VLM), TTS models, and ASR models. We tried multiple solutions like llama.cpp, ollama, onnxruntime, MLC-LLM, MLX, and more, but felt frustrated. Firstly, most on-device serving frameworks couldn't support tasks like image generation, ASR, and TTS. Secondly, there were many different file formats, like GGUF and ONNX, with no unified solution. Moreover, most inference engines were CPU-only, with limited CUDA and Metal support, resulting in response delays and rapid battery drain. This wasn't just an inconvenience, it was a dealbreaker for on-device AI.
It became clear that existing tools couldn't handle real-time processing efficiently, suffered from poor battery performance, and relied too much on constant internet access. That's what sparked the development of Nexa SDK. We set out to create a toolkit that could handle real-world applications without compromising speed, efficiency, or privacy, making on-device AI truly practical, whether it is on your mobile, laptop, or other edge devices.
Step 1: Download and Install Nexa SDK as a Python Package with CLI
Setting up Nexa SDK is straightforward! Choose your device and copy the right CLI from github or through our local AI model hub interface to install Nexa SDK in your terminal.
Here is the example installation command to install Nexa SDK with GPU (CUDA) support in Windows PowerShell.
$env:CMAKE_ARGS="-DGGML_CUDA=ON -DSD_CUBLAS=ON"; pip install nexaai --prefer-binary --index-url https://nexaai.github.io/nexa-sdk/whl/cu124 --extra-index-url https://pypi.org/simple --no-cache-dir
See the documentation for choosing between CPU and GPU versions
Step 2: Build with simple CLI
Running with Nexa is simple! Here is an example of running Gemma 1.1 2B with Nexa SDK.
nexa run gemma-1.1-2b-instruct:q4_0
Here are some videos building with different kinds of models with Nexa SDK
nexa run llama3.1
nexa run llama3.1 -st
nexa run faster-whisper-tiny -st
nexa run llava-llama3 -st
nexa run lcm-dreamshaper
nexa server llama3.1
For more information and guidance, visit https://github.com/NexaAI/nexa-sdk.
Feature | Nexa SDK | Ollama | Optimum | LM Studio |
---|---|---|---|---|
GGML Support | ✅ | ✅ | ❌ | ✅ |
ONNX Support | ✅ | ❌ | ✅ | ❌ |
Text Generation | ✅ | ✅ | ✅ | ✅ |
Image Generation | ✅ | ❌ | ❌ | ❌ |
Vision-Language Models | ✅ | ✅ | ✅ | ✅ |
Text-to-Speech | ✅ | ❌ | ✅ | ❌ |
Server Capability | ✅ | ✅ | ✅ | ✅ |
User Interface | ✅ | ❌ | ❌ | ✅ |
Here are some examples using Nexa SDK:
Using the Nexa SDK, this project creates a local interactive AI character that supports voice input, voice output, and local profile image generation, all powered by the Llama3 Uncensored Model, all without connection to the internet.
In this example, the Nexa SDK powers a sophisticated financial query system with on-device processing to ensure data privacy. Key features include adjustable parameters such as model selection, temperature, max tokens, top-k, and top-p, allowing for fine-tuned responses based on user needs.
Go to Nexa SDK Examples to explore and contribute more use case examples!
Benchmark toolkit
A benchmark toolkit will be introduced to help users evaluate and optimize model performance. Support for mobile and browser platforms will expand the SDK’s accessibility and usability across different devices.
Mobile and browser support
Increased multimodal support will enable integration of audio, image, and video capabilities, broadening the range of applications.
More multimodal support (audio/image/video) and more integration with other tools (OpenWebUI/Mem0)
Additionally, the SDK will offer enhanced integration with other tools, such as OpenWebUI and Mem0, to streamline workflows and improve interoperability. These developments aim to make the Nexa SDK more versatile and user-friendly.
More exciting use examples that showcase the amazing capability of Nexa SDK
The roadmap also includes more exciting use case examples of running Qwen 2.5, LLaMA, Phi3.5, Whisper, Flux and Stable Diffusion models efficiently on laptops, mobile devices, or edge devices using Nexa SDK, showcasing its versatility and innovation.
Follow us on Twitter and join our Discord to stay updated with release notes and be part of the discussion.
For collaboration opportunities, contact us at: octopus@nexa.ai.