Nexa
Discord
navigation

Back to blog

Nexa AI 2024 Year Review

Jan 1, 2025

2024 marked a transformative year for both on-device AI and Nexa AI. From launching exciting products to growing our team, we've reached new heights together. We’re thrilled to celebrate these milestones and deeply grateful to our community for being an integral part of our journey.

Nexa AI 2024 at a Glance

For a brief Nexa AI's year in numbers and impact:

  • End-to-end on-device AI solutions: from customized small AI models, model compression to edge inference for our enterprise partners in AI PC, Mobile, Wearables, Automobile, Fintech, Martech, and more.
  • 10+ small models released with 100k+ downloads, reposted by industry leaders from Google, AMD, Lenovo, Hugging Face, and others.
  • 1 comprehensive on-device AI inference toolkit with 5K GitHub stars, highlighted by AMD and featured in a joint showcase at CES 2025.
  • 10+ on-device AI research papers published and 4 patents filed.
  • 20+ on-device AI events hosted and attended.

Product Releases

Nexa Quant

Our Llama.cpp-compatible compression solution delivers 3x lighter models with 4x size reduction while achieving 100%+ accuracy recovery. This powers Nexa AI's end-to-end solution for architecture optimization, model compression, customized application and the best model and deployment framework.


Models

OmniAudio - 2.6B (8800+ downloads)

The world's fastest audio-language model, processing both text and audio inputs with just 2.6B parameters. optimized for edge deployment for offline voice QA, voice-in conversation, voice-prompted content generation, recording summarization and tone modification.

OmniVLM - 968M (14k+ downloads)

A sub-billion (968M) multimodal model supporting both visual and text inputs, optimized for image captioning on edge devices. Features 9x tokens reduction and enhanced accuracy through DPO training. Great for art analysis, scene comprehension, style recognition, color perception, and world knowledge.

Squid

Pioneering long-context processing for on-device language models, achieving 10X power reduction and 5X speed increase without compromising quality or the model's memory footprint.

Octopus Series

Multimodal function-calling, reasoning and planning tasks with 70x energy efficiency, 35x faster than Llama 3-8n solutions. 4 patents filed. Featuring Octopus v2, a 0.5B on-device action model that outperforms GPT-4 in accuracy and latency for function-calling tasks.

Edge Inference

Nexa SDK

Nexa AI's on-device AI inference toolkit: supports multimodality, hardware acceleration(CPU, GPU, iGPU, NPU), local UI across devices, empowering 2.5x acceleration at 100+ tokens/s decoding speed.

We also launched the Tiny Model Hub for curated list of quantized text, image, audio and multimodal models supported by Nexa SDK, and a Small Language Model Leaderboard, benchmarked using Nexa Eval.

AudioLM on Edge

Nexa SDK is the first inference toolkit that enables edge deployment of Qwen2-Audio, a small-scale SOAT model with support for audio and text inputs.

  • Partnership with Qwen, integration with ModelScope
  • Reposted by Qwen and Google
Local RAG

Privacy-conscious local document interaction: competitive speed, outstanding information retrieval and context understanding.

Comprehensive LLM Literature Review

A comprehensive review of development of on-device language models, efficient architectures, compression techniques, hardware acceleration strategies and edge-cloud deployment approaches and case studies.

  • Open-source with 1K+ stars on GitHub
  • Collaboration with scientist from Meta AI

Success Stories

AMD

RAG (Retrieval-Augmented Generation) + Actions: Quickly retrieve, summarize, and visualize insights from documents and presentations with a seamless, sub-second user experience.

  • AMD Hardware Acceleration utilizing AMD Radeon 880M iGPU and AMD Ryzen AI 7 Pro 360
  • Showcased at AMD Advancing AI 2024
  • Event Planner accelerated by the cutting-edge NPU, GPU, and CPU technologies from AMD, joint showcase at AMD booth at CES 2025
Lenovo

AI Buddy: A real-time, voice-based AI assistant delivering conversational AI agent (tool use) capabilities with sub-second latency.

PinAI

A hybrid AI operating system with edge-cloud deployment and natural language interactions, ensuring maximum privacy and responsive performance across applications: sub-second response times (35x faster than Llama3, 4x faster than GPT-4) with 70x better energy efficiency, while matching GPT-4's accuracy in function-calling.

Developer Community

Nexa SDK
  • Reached 5.2k stars on GitHub in 4 months
  • Endorsed by developers from industry-leading companies, startups and indie developers
Hugging Face Models
  • Listed most liked models in 2024 on Hugging Face
  • Total of 30k+ Nexa model downloads on Hugging Face, plus integrated support for other SOAT open-source models on Hugging face through Nexa SDK
  • Ranked #2 trending models on Hugging Face
  • No.1 paper of the day on Hugging Face

On-Device AI Events

  • AI Agent hackathon
    • 200+ developers attended online and virtual, Super AI Agent Speaker Panel with industry leaders and selected top 3 winning teams for the Super AI agent Hackathon
  • 20+ other events hosted and attended:
    • GenAI Goes Local, Tech Crunch Disrupt 2024, AI Hot 100 Conference ...

On-Device AI Community

1K+ on-device AI community members across Discord and Wechat

Looking Ahead to 2025

The future of AI is on-device: real-time, privacy-first, and decentralized. Nexa AI is proud to continue pioneering in this space. We're committed to pushing the boundaries of on-device AI and a thriving community.

Join us at CES 2025 to experience our latest innovations, including our collaborative showcase with AMD. Visit us at LVCC North Hall, #9177 or schedule a quick call with us to explore how Nexa AI can help transform your on-device AI use cases.

Join +8,000 developers

Stay tuned with the Best in On-Device AI