Nexa
Discord
navigation

Back to blog

Octo-planner

Nov 10, 2024

TL;DR

  • Efficient: Octo-planner is a 3.8B-parameter language model that can run locally on edge devices, addressing concerns about data privacy, latency, and availability.
  • Accurate: Octo-planner achieves 98%+ accuracy in breaking down user queries into actionable steps for on-device AI agents in one domain.
  • Multi-Domain: Using multi-LoRA training, Octo-planner combines knowledge from different task areas, enabling it to handle complex and diverse queries across various domains (e.g., system actions and e-commerce actions simultaneously).

Evaluation

Octo-planner with Gemma 2b, Gemma 7b and Phi-3 mini as base model

We created a test dataset of 1,000 data points using GPT-4, consisting of diverse user queries and their corresponding action plans. We used GPT-4 as an oracle to evaluate the correctness of generated plans, aiming to create a local planner that can rival close-source cloud-based models in performance.

We tested full fine-tuning on various base models to assess performance. Microsoft Phi-3 Mini achieved 98.1% accuracy, Google Gemma 2b reached 85.6% accuracy, and Google Gemma 7b attained 99.7% accuracy. We chose the Phi-3 Mini model for Octo-planner as it strikes the best balance between model size and performance for on-device deployment.

Introduction

AI agents require effective planning processes to determine the best course of action and execute planned tasks. At Nexa AI, we've made significant strides in this direction with our Octopus model series. We launched Octopus V2 for fast and accurate action taking, and Octopus V3 took a step forward to support multimodal and multilingual capabilities. Now, we're addressing the crucial planning aspect of AI agents with Octo-planner.

Training Methods

Prior to Octo-planner, AI agent planning typically relied on large language models like GPT-4 or Gemini-Pro. These models, while powerful, faced several limitations for on-device use:

  • High computational demands: LLMs require significant processing power, making them impractical for edge devices.
  • Privacy concerns: Cloud-based models necessitate sending user data off-device, raising data privacy issues.
  • Latency: Cloud-dependent planners introduce delays, hampering real-time applications.
  • Offline availability: Many existing planners cannot operate without an internet connection.
  • Cost: Using cloud-based LLMs for planning is often expensive, limiting widespread adoption.

These limitations created a need for an efficient, on-device planning solution that could maintain high accuracy while addressing these constraints for on-device AI agents.

Planner and Action Agents Framework

The planner-action agent structure

Octo-planner separates planning and action execution into two distinct components:

  • Planner Agent (Octo-planner): Decomposes user queries into a sequence of sub-steps.
  • Action Agent (Octopus model): Executes the planned sub-steps sequentially.

This separation allows for specialized optimization, improving modularity, adaptability, and scalability for different domains and task complexities. It enhances interpretability by making the decision-making process more transparent. Furthermore, the planner internalizes function descriptions during training, eliminating the need for lengthy context in each prompt. As a result, the planner-and-action agents framework significantly reduces computational demands and improves efficiency on resource-constrained devices.

The Planning Dataset

Training data for octo-planner

To train Octo-planner, we developed a specialized dataset that pairs user queries with corresponding action plans. These plans are broken down into sequences of 1-5 steps, representing a range of task complexities. We leveraged GPT-4 to generate a diverse array of queries that align with our available functions, ensuring broad coverage of potential user requests.

Quality control was a key focus in our dataset creation process. We implemented a rigorous validation system, also using GPT-4, to assess and filter the generated data. This ensured that only high-quality, accurate query-response pairs were included in the final dataset. This approach teaches Octo-planner about function capabilities during training, allowing it to operate efficiently on devices without needing lengthy descriptions for each query.

What's Next

[Paper - Octo-planner: On-device Language Model for Planner-Action Agents]

Kudos to <Alex>, <Zack>, <Zhen>, <Yikang> and Nexa AI team.

Blog written by <Kai>.

Join +8,000 developers

Stay tuned with the Best in On-Device AI