ThorV2 Architecture Enhancing LLM Function Calling Capabilities

The paper introduces ThorV2, a novel architecture designed to enhance the function calling capabilities of Large Language Models (LLMs). The study evaluates ThorV2 against leading models from OpenAI and Anthropic using a comprehensive benchmark focused on HubSpot CRM operations. The results demonstrate ThorV2's superior performance in accuracy, reliability, latency, and cost efficiency for both single and multi-API calling tasks.

ThorV2 Architecture

ThorV2 employs an innovative approach called "edge-of-domain modeling," which focuses on correcting errors rather than providing comprehensive upfront instructions. This method significantly reduces token count, improves scalability, and enhances reliability.

Agent-Validator Architecture

ThorV2 uses Domain Expert Validators (DEVs) to inspect and correct API calls generated by the LLM. This iterative process continues until a correct API call is generated, ensuring high accuracy and reliability.

Composite Planning

For multi-step tasks, ThorV2 uses a composite planning approach that generates multiple API calls in a single step, reducing latency and improving efficiency.

Benchmark and Evaluation

The study uses a benchmark dataset based on HubSpot CRM operations, evaluating models on accuracy, reliability, latency, and cost. ThorV2 outperforms comparison models in all metrics, demonstrating its superiority in function calling tasks.

Conclusion

ThorV2 represents a significant advancement in enhancing LLMs' function calling capabilities. Its superior performance in accuracy, reliability, latency, and cost efficiency suggests a promising direction for improving the practical applicability of LLMs in real-world scenarios. The study highlights the potential of ThorV2 to enable more capable and reliable AI assistants across various domains.

Source(s):

Benchmarking Floworks against OpenAI & Anthropic: A Novel Framework for Enhanced LLM Function Calling