Vlm

Published on
Jun 1, 202510 min0Comments
ZeroGUI: Automating GUI Agent Training with Zero Human Cost
This post explores ZeroGUI, an online learning framework that eliminates the need for manual data annotation to train GUI agents, achieving significant performance improvements through automated task generation and reward estimation using Vision-Language Models.
Read more
Published on
Dec 28, 20242 min0Comments
Document Inlining Crossing the Modality Gap with Compound AI
This blog post from Fireworks.ai introduces Document Inlining, a new compound AI system designed to enhance Large Language Model (LLM) interaction with non-textual data like PDFs and images.
Read more