Back to blog
Jan 1, 2025
2024 marked a transformative year for both on-device AI and Nexa AI. From launching exciting products to growing our team, we've reached new heights together. We’re thrilled to celebrate these milestones and deeply grateful to our community for being an integral part of our journey.
For a brief Nexa AI's year in numbers and impact:
Our Llama.cpp-compatible compression solution delivers 3x lighter models with 4x size reduction while achieving 100%+ accuracy recovery. This powers Nexa AI's end-to-end solution for architecture optimization, model compression, customized application and the best model and deployment framework.
The world's fastest audio-language model, processing both text and audio inputs with just 2.6B parameters. optimized for edge deployment for offline voice QA, voice-in conversation, voice-prompted content generation, recording summarization and tone modification.
A sub-billion (968M) multimodal model supporting both visual and text inputs, optimized for image captioning on edge devices. Features 9x tokens reduction and enhanced accuracy through DPO training. Great for art analysis, scene comprehension, style recognition, color perception, and world knowledge.
Pioneering long-context processing for on-device language models, achieving 10X power reduction and 5X speed increase without compromising quality or the model's memory footprint.
Multimodal function-calling, reasoning and planning tasks with 70x energy efficiency, 35x faster than Llama 3-8n solutions. 4 patents filed. Featuring Octopus v2, a 0.5B on-device action model that outperforms GPT-4 in accuracy and latency for function-calling tasks.
Nexa AI's on-device AI inference toolkit: supports multimodality, hardware acceleration(CPU, GPU, iGPU, NPU), local UI across devices, empowering 2.5x acceleration at 100+ tokens/s decoding speed.
We also launched the Tiny Model Hub for curated list of quantized text, image, audio and multimodal models supported by Nexa SDK, and a Small Language Model Leaderboard, benchmarked using Nexa Eval.
Nexa SDK is the first inference toolkit that enables edge deployment of Qwen2-Audio, a small-scale SOAT model with support for audio and text inputs.
Privacy-conscious local document interaction: competitive speed, outstanding information retrieval and context understanding.
A comprehensive review of development of on-device language models, efficient architectures, compression techniques, hardware acceleration strategies and edge-cloud deployment approaches and case studies.
RAG (Retrieval-Augmented Generation) + Actions: Quickly retrieve, summarize, and visualize insights from documents and presentations with a seamless, sub-second user experience.
AI Buddy: A real-time, voice-based AI assistant delivering conversational AI agent (tool use) capabilities with sub-second latency.
A hybrid AI operating system with edge-cloud deployment and natural language interactions, ensuring maximum privacy and responsive performance across applications: sub-second response times (35x faster than Llama3, 4x faster than GPT-4) with 70x better energy efficiency, while matching GPT-4's accuracy in function-calling.
1K+ on-device AI community members across Discord and Wechat
The future of AI is on-device: real-time, privacy-first, and decentralized. Nexa AI is proud to continue pioneering in this space. We're committed to pushing the boundaries of on-device AI and a thriving community.
Join us at CES 2025 to experience our latest innovations, including our collaborative showcase with AMD. Visit us at LVCC North Hall, #9177 or schedule a quick call with us to explore how Nexa AI can help transform your on-device AI use cases.
Join +8,000 developers