Back to blog
Sep 9, 2024
After releasing Octopus series models, we tried running AI models on laptops, Android phones, and iPhones. To support multi-modality, we experimented with massive vision-language models (VLM), TTS models, and ASR models. We tried multiple solutions like llama.cpp, ollama, onnxruntime, MLC-LLM, MLX, and more, but felt frustrated. Firstly, most on-device serving frameworks couldn't support tasks like image generation, ASR, and TTS. Secondly, there were many different file formats, like GGUF and ONNX, with no unified solution. Moreover, most inference engines were CPU-only, with limited CUDA and Metal support, resulting in response delays and rapid battery drain. This wasn't just an inconvenience, it was a dealbreaker for on-device AI.
It became clear that existing tools couldn't handle real-time processing efficiently, suffered from poor battery performance, and relied too much on constant internet access. That's what sparked the development of Nexa SDK. We set out to create a toolkit that could handle real-world applications without compromising speed, efficiency, or privacy, making on-device AI truly practical, whether it is on your mobile, laptop, or other edge devices.
Setting up Nexa SDK is straightforward! Choose your device and copy the right CLI from github or through our local AI model hub interface to install Nexa SDK in your terminal.
Here is the example installation command to install Nexa SDK with GPU (CUDA) support in Windows PowerShell.
$env:CMAKE_ARGS="-DGGML_CUDA=ON -DSD_CUBLAS=ON"; pip install nexaai --prefer-binary --index-url https://nexaai.github.io/nexa-sdk/whl/cu124 --extra-index-url https://pypi.org/simple --no-cache-dir
See the documentation for choosing between CPU and GPU versions
Running with Nexa is simple! Here is an example of running Gemma 1.1 2B with Nexa SDK.
nexa run gemma-1.1-2b-instruct:q4_0
Here are some videos building with different kinds of models with Nexa SDK
nexa run llama3.1
nexa run llama3.1 -st
nexa run faster-whisper-tiny -st
nexa run llava-llama3 -st
nexa run lcm-dreamshaper
nexa server llama3.1
For more information and guidance, visit https://github.com/NexaAI/nexa-sdk.
Feature | Nexa SDK | Ollama | Optimum | LM Studio |
---|---|---|---|---|
GGML Support | ✅ | ✅ | ❌ | ✅ |
ONNX Support | ✅ | ❌ | ✅ | ❌ |
Text Generation | ✅ | ✅ | ✅ | ✅ |
Image Generation | ✅ | ❌ | ❌ | ❌ |
Vision-Language Models | ✅ | ✅ | ✅ | ✅ |
Text-to-Speech | ✅ | ❌ | ✅ | ❌ |
Server Capability | ✅ | ✅ | ✅ | ✅ |
User Interface | ✅ | ❌ | ❌ | ✅ |
Here are some examples using Nexa SDK:
Go to Nexa SDK Examples to explore and contribute more use case examples!
Follow us on Twitter and join our Discord to stay updated with release notes and be part of the discussion.
For collaboration opportunities, contact us at: octopus@nexa.ai.
Kudos to Nexa AI team.
Blog written by <Zack>, <Yin> and <Ayla>.
Join +8,000 developers