- Published on
The paper introduces ThorV2, a novel architecture designed to enhance the function calling capabilities of Large Language Models (LLMs). The study evaluates ThorV2 against leading models from OpenAI and Anthropic using a comprehensive benchmark focused on HubSpot CRM operations. The results demonstrate ThorV2's superior performance in accuracy, reliability, latency, and cost efficiency for both single and multi-API calling tasks.
ThorV2 Architecture
ThorV2 employs an innovative approach called "edge-of-domain modeling," which focuses on correcting errors rather than providing comprehensive upfront instructions. This method significantly reduces token count, improves scalability, and enhances reliability.
Agent-Validator Architecture
ThorV2 uses Domain Expert Validators (DEVs) to inspect and correct API calls generated by the LLM. This iterative process continues until a correct API call is generated, ensuring high accuracy and reliability.
Composite Planning
For multi-step tasks, ThorV2 uses a composite planning approach that generates multiple API calls in a single step, reducing latency and improving efficiency.
Benchmark and Evaluation
The study uses a benchmark dataset based on HubSpot CRM operations, evaluating models on accuracy, reliability, latency, and cost. ThorV2 outperforms comparison models in all metrics, demonstrating its superiority in function calling tasks.
Conclusion
ThorV2 represents a significant advancement in enhancing LLMs' function calling capabilities. Its superior performance in accuracy, reliability, latency, and cost efficiency suggests a promising direction for improving the practical applicability of LLMs in real-world scenarios. The study highlights the potential of ThorV2 to enable more capable and reliable AI assistants across various domains.
Source(s):
Keep reading
Related posts
Mar 15, 2025
0CommentsDeepSeek R2: The AI Model Set to Revolutionize the Industry
DeepSeek is accelerating the release of its R2 model, promising groundbreaking advancements in AI reasoning, coding, and multilingual capabilities. With a focus on cost efficiency and open-source innovation, R2 could challenge Western AI giants like OpenAI and Anthropic.
Dec 8, 2024
0CommentsPydanticAI Production Grade Applications With Generative AI
PydanticAI is a Python framework designed to simplify the development of production-grade applications using Generative AI.
Apr 17, 2025
0CommentsLLM API Pricing Showdown 2025: Cost Comparison of OpenAI, Google, Anthropic, Cohere & Mistral
Comprehensive analysis of per-token API pricing across major LLM providers, revealing cost-saving strategies and competitive positioning in the rapidly evolving AI market.