Close Menu
Earth & BeyondEarth & Beyond

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Farage is like a tribune for the working class, says former Bank of England economist | Economic policy

    here’s what it will do

    Money expert shares the hardest money conversation he and his wife had

    Facebook X (Twitter) Instagram
    Earth & BeyondEarth & Beyond
    YouTube
    Subscribe
    • Home
    • Business
    • Entertainment
    • Gaming
    • Health
    • Lifestyle
    • Sports
    • Technology
    • Trending & Viral News
    Earth & BeyondEarth & Beyond
    Subscribe
    You are at:Home»Technology»Meet The AI Agent With Multiple Personalities
    Technology

    Meet The AI Agent With Multiple Personalities

    Earth & BeyondBy Earth & BeyondApril 16, 2025003 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Email
    Meet The AI Agent With Multiple Personalities
    Share
    Facebook Twitter LinkedIn Pinterest Email

    In the coming years, agents are widely expected to take over more and more chores on behalf of humans, including using computers and smartphones. For now, though, they’re too error prone to be much use.

    A new agent called S2, created by the startup Simular AI, combines frontier models with models specialized for using computers. The agent achieves state-of-the-art performance on tasks like using apps and manipulating files—and suggests that turning to different models in different situations may help agents advance.

    “Computer-using agents are different from large language models and different from coding,” says Ang Li, cofounder and CEO of Simular. “It’s a different type of problem.”

    In Simular’s approach, a powerful general-purpose AI model, like OpenAI’s GPT-4o or Anthropic’s Claude 3.7, is used to reason about how best to complete the task at hand—while smaller open source models step in for tasks like interpreting web pages.

    Li, who was a researcher at Google DeepMind before founding Simular in 2023, explains that large language models excel at planning but aren’t as good at recognizing the elements of a graphical user interface.

    S2 is designed to learn from experience with an external memory module that records actions and user feedback and uses those recordings to improve future actions.

    On particularly complex tasks, S2 performs better than any other model on OSWorld, a benchmark that measures an agent’s ability to use a computer operating system.

    For example, S2 can complete 34.5 percent of tasks that involve 50 steps, beating OpenAI’s Operator, which can complete 32 percent. Similarly, S2 scores 50 percent on AndroidWorld, a benchmark for smartphone-using agents, while the next best agent scores 46 percent.

    Victor Zhong, a computer scientist at the University of Waterloo in Canada and one of the creators of OSWorld, believes that future big AI models may incorporate training data that helps them understand the visual world and make sense of graphical user interfaces.

    “This will help agents navigate GUIs with much higher precision,” Zhong says. “I think in the meantime, before such fundamental breakthroughs, state-of-the-art systems will resemble Simular in that they combine multiple models to patch the limitations of single models.”

    To prepare for this column, I used Simular to book flights and scour Amazon for deals, and it seemed better than some of the open source agents I tried last year, including AutoGen and vimGPT.

    But even the smartest AI agents are, it seems, still troubled by edge cases and occasionally exhibit odd behavior. In one instance, when I asked S2 to help find contact information for the researchers behind OSWorld, the agent got stuck in a loop hopping between the project page and the login for OSWorld’s Discord.

    OSWorld’s benchmarks show why agents remain more hype than reality for now. While humans can complete 72 percent of OSWorld tasks, agents are foiled 38 percent of the time on complex tasks. That said, when the benchmark was introduced in April 2024, the best agent could complete only 12 percent of the tasks.

    Agent Meet Multiple Personalities
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleGlobal trade outlook for 2025 has ‘deteriorated sharply,’ WTO warns
    Next Article Judge finds probable cause to hold Trump officials in contempt over ‘willful disregard’ of order to stop deportation flights – live | US news
    Earth & Beyond
    • Website

    Related Posts

    here’s what it will do

    June 8, 2025

    Searching for Ancient Rocks in the ‘Forlandet’ Flats

    June 8, 2025

    Bill Atkinson, Macintosh Pioneer and Inventor of Hypercard, Dies at 74

    June 8, 2025
    Leave A Reply Cancel Reply

    Latest Post

    If you do 5 things, you’re more indecisive than most—what to do instead

    UK ministers launch investigation into blaze that shut Heathrow

    The SEC Resets Its Crypto Relationship

    How MLB plans to grow Ohtani, Dodger fandom in Japan into billions for league

    Stay In Touch
    • YouTube
    Latest Reviews

    here’s what it will do

    By Earth & BeyondJune 8, 2025

    Searching for Ancient Rocks in the ‘Forlandet’ Flats

    By Earth & BeyondJune 8, 2025

    Bill Atkinson, Macintosh Pioneer and Inventor of Hypercard, Dies at 74

    By Earth & BeyondJune 8, 2025

    Subscribe to Updates

    Get the latest tech news from FooBar about tech, design and biz.

    Most Popular

    Bitcoin in the bush – crypto mining brings power to rural areas

    March 25, 202513 Views

    Israeli Police Question Palestinian Director Hamdan Ballal After West Bank Incident

    March 25, 20258 Views

    How to print D&D’s new gold dragon at home

    March 25, 20257 Views
    Our Picks

    Farage is like a tribune for the working class, says former Bank of England economist | Economic policy

    here’s what it will do

    Money expert shares the hardest money conversation he and his wife had

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    © 2025 Earth & Beyond.
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms and Conditions
    • Disclaimer

    Type above and press Enter to search. Press Esc to cancel.

    Newsletter Signup

    Subscribe to our weekly newsletter below and never miss the latest product or an exclusive offer.

    Enter your email address

    Thanks, I’m not interested