Back to blog
Nov 10, 2024
We created a test dataset of 1,000 data points using GPT-4, consisting of diverse user queries and their corresponding action plans. We used GPT-4 as an oracle to evaluate the correctness of generated plans, aiming to create a local planner that can rival close-source cloud-based models in performance.
We tested full fine-tuning on various base models to assess performance. Microsoft Phi-3 Mini achieved 98.1% accuracy, Google Gemma 2b reached 85.6% accuracy, and Google Gemma 7b attained 99.7% accuracy. We chose the Phi-3 Mini model for Octo-planner as it strikes the best balance between model size and performance for on-device deployment.
AI agents require effective planning processes to determine the best course of action and execute planned tasks. At Nexa AI, we've made significant strides in this direction with our Octopus model series. We launched Octopus V2 for fast and accurate action taking, and Octopus V3 took a step forward to support multimodal and multilingual capabilities. Now, we're addressing the crucial planning aspect of AI agents with Octo-planner.
Prior to Octo-planner, AI agent planning typically relied on large language models like GPT-4 or Gemini-Pro. These models, while powerful, faced several limitations for on-device use:
These limitations created a need for an efficient, on-device planning solution that could maintain high accuracy while addressing these constraints for on-device AI agents.
Octo-planner separates planning and action execution into two distinct components:
This separation allows for specialized optimization, improving modularity, adaptability, and scalability for different domains and task complexities. It enhances interpretability by making the decision-making process more transparent. Furthermore, the planner internalizes function descriptions during training, eliminating the need for lengthy context in each prompt. As a result, the planner-and-action agents framework significantly reduces computational demands and improves efficiency on resource-constrained devices.
To train Octo-planner, we developed a specialized dataset that pairs user queries with corresponding action plans. These plans are broken down into sequences of 1-5 steps, representing a range of task complexities. We leveraged GPT-4 to generate a diverse array of queries that align with our available functions, ensuring broad coverage of potential user requests.
Quality control was a key focus in our dataset creation process. We implemented a rigorous validation system, also using GPT-4, to assess and filter the generated data. This ensured that only high-quality, accurate query-response pairs were included in the final dataset. This approach teaches Octo-planner about function capabilities during training, allowing it to operate efficiently on devices without needing lengthy descriptions for each query.
[Paper - Octo-planner: On-device Language Model for Planner-Action Agents]
Kudos to <Alex>, <Zack>, <Zhen>, <Yikang> and Nexa AI team.
Blog written by <Kai>.
Join +8,000 developers