Nexa
Discord
navigation

Back to blog

For The First Time, You Can Run Qwen2-Audio On Your Device

While audio language models are becoming more popular, deploying them on edge devices remains challenging. Popular frameworks like llama.cpp and Ollama support text and vision models but have limited compatibility with audio models.

Qwen2-Audio is a SOTA small-scale multimodal model that handles audio and text inputs. It enables voice interaction without ASR modules, provides audio analysis, and supports Chinese, English, and major European languages.

Quantized versions of Qwen2-Audio, optimized for on-device deployment and applications, are available on 🤗 HuggingFace.

We're bringing Qwen2-Audio to edge devices with Nexa SDK

To start, install Nexa SDK first and run this on your terminal:

nexa run qwen2audio

Or run it with Streamlit local UI (python package required):

nexa run qwen2audio -st

Drag and drop your audio file into the terminal (or enter file path on Linux). Add text prompt to guide analysis or leave empty for direct voice input to the model.

Quick Note

💻 To see how much RAM is needed to run Qwen2-Audio on your device, check the RAM requirements for different quantization versions listed here - the default q4_K_M version requires 4.2GB of RAM.
🎵 For optimal performance, use 16kHz .wav audio format. Other audio formats and sample rates are supported and will be automatically converted to the required format.

Use Cases

Speech Processing & Understanding
Meeting recording
Screenshot qwen2-audio - 1
Multimodal Chat
A person asking "why do you think cats sleep so much?"
Screenshot of qwen2-audio - 2
Audio Analysis & Recognition
Sound of typing on the keyboard
Screenshot of qwen2-audio - 3
Music Analysis & Recognition
Punk music (loud sound warning)
Screenshot of qwen2-audio - 4
Transcription
Meeting recording
Screenshot of qwen2-audio - 5
Translation
Chinese audio
Screenshot of qwen2-audio - 6

For more use cases and model capabilities, check out Qwen's blog.

For developers, server deployment and Python interface will be the next steps. Please follow Nexa SDK for updates and submit issues for any feature requests.

Kudos to Nexa AI team.

Blog written by <Kai> and <Ayla>.

Join +8,000 developers

Stay tuned with the Best in On-Device AI