All posts

  • Published on
    An analysis of the DeepSeek-R1-0528 model release, detailing its key improvements including enhanced benchmark performance, reduced hallucinations, improved front-end capabilities, and the addition of JSON output and function calling support. The post explores the significance of these updates for users and developers within the DeepSeek ecosystem.
  • Published on
    This post explores the X-MAS framework, which investigates the benefits of using diverse Large Language Models (LLMs) within multi-agent systems (MAS). It details X-MAS-Bench, a comprehensive testbed evaluating 27 LLMs across 5 domains and 5 MAS functions, revealing that no single LLM excels universally. Building on these findings, the paper demonstrates significant performance improvements (up to 47-63% on challenging math problems) when transitioning homogeneous MAS to heterogeneous configurations, highlighting the potential of leveraging collective intelligence from diverse LLMs.
  • Published on
    Google I/O '25 unleashed a torrent of AI innovation. Dive into the enhanced Gemini 2.5 Pro, the immersive Google Beam video platform, the creative Lyria RealTime music AI, the powerful TPU Ironwood, and groundbreaking updates to Meet and Search.
  • Published on
    A recent study reveals that while collaborating with generative AI enhances immediate task performance, this benefit doesn't carry over to subsequent solo work. More importantly, transitioning from AI-assisted tasks to independent work leads to a significant decrease in human workers' intrinsic motivation and an increase in boredom, despite potentially increasing the sense of control.
  • Published on
    Explore the history and technical approaches in the quest for Artificial General Intelligence, from early symbolic AI and expert systems to deep learning and probabilistic programming, illustrated by real-world applications like nuclear monitoring.