Close Menu
Earth & BeyondEarth & Beyond

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Eros-Aanand Rai Dispute Over AI-Altered ‘Raanjhanaa’ Ending Escalates

    Nico Iamaleava: UCLA transfer driven by family, not NIL deal

    The Fantastic Four’s cosmic baby Franklin Richards, explained

    Facebook X (Twitter) Instagram
    Earth & BeyondEarth & Beyond
    YouTube
    Subscribe
    • Home
    • Business
    • Entertainment
    • Gaming
    • Health
    • Lifestyle
    • Sports
    • Technology
    • Trending & Viral News
    Earth & BeyondEarth & Beyond
    Subscribe
    You are at:Home»Technology»A new AI coding challenge just published its first results – and they aren’t pretty
    Technology

    A new AI coding challenge just published its first results – and they aren’t pretty

    Earth & BeyondBy Earth & BeyondJuly 24, 2025003 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Email
    A new AI coding challenge just published its first results – and they aren’t pretty
    Share
    Facebook Twitter LinkedIn Pinterest Email

    A new AI coding challenge has revealed its first winner — and set a new bar for AI-powered software engineers. 

    On Wednesday at 5pm PST, the nonprofit Laude Institute announced the first winner of the K Prize, a multi-round AI coding challenge launched by Databricks and Perplexity co-founder Andy Konwinski. The winner was a Brazilian prompt engineer named Eduardo Rocha de Andrade, who will receive $50,000 for the prize. But more surprising than the win was his final score: he won with correct answers to just 7.5% of the questions on the test.

    “We’re glad we built a benchmark that is actually hard,” said Konwinski. “Benchmarks should be hard if they’re going to matter,” he continued, adding: “Scores would be different if the big labs had entered with their biggest models. But that’s kind of the point. K Prize runs offline with limited compute, so it favors smaller and open models. I love that. It levels the playing field.”

    Konwinski has pledged $1 million to the first open-source model that can score higher than 90% on the test.

    Similar to the well-known SWE-Bench system, the K Prize tests models against flagged issues from GitHub as a test of how well models can deal with real-world programming problems. But while SWE-Bench is based on a fixed set of problems that models can train against, the K Prize is designed as a “contamination-free version of SWE-Bench,” using a timed entry system to guard against any benchmark-specific training. For round one, models were due by March 12th. The K Prize organizers then built the test using only GitHub issues flagged after that date.

    The 7.5% top score stands in marked contrast to SWE-Bench itself, which currently shows a 75% top score on its easier ‘Verified’ test and 34% on its harder ‘Full’ test. Konwinski still isn’t sure whether the disparity is due to contamination on SWE-Bench or just the challenge of collecting new issues from GitHub, but he expects the K Prize project to answer the question soon.

    “As we get more runs of the thing, we’ll have a better sense,” he told TechCrunch, “because we expect people to adapt to the dynamics of competing on this every few months.”

    Techcrunch event

    San Francisco
    |
    October 27-29, 2025

    It might seem like an odd place to fall short, given the wide range of AI coding tools already publicly available – but with benchmarks becoming too easy, many critics see projects like the K Prize as a necessary step toward solving AI’s growing evaluation problem.

    “I’m quite bullish about building new tests for existing benchmarks,” says Princeton researcher Sayash Kapoor, who put forward a similar idea in a recent paper. “Without such experiments, we can’t actually tell if the issue is contamination, or even just targeting the SWE-Bench leaderboard with a human in the loop.”

    For Konwinski, it’s not just a better benchmark, but an open challenge to the rest of the industry. “If you listen to the hype, it’s like we should be seeing AI doctors and AI lawyers and AI software engineers, and that’s just not true,” he says. “If we can’t even get more than 10% on a contamination free SWE-Bench, that’s the reality check for me.”

    arent challenge coding Pretty published Results
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleAsia Morning Briefing: Animoca Exec Says U.S. Heat Is Pushing China's Stablecoin Agenda
    Next Article Thai and Cambodian troops exchange fire at disputed border
    Earth & Beyond
    • Website

    Related Posts

    The First Planned Migration of an Entire Country Is Underway

    July 25, 2025

    Amid increased momentum for defense, the NATO Innovation Fund refreshes its investment team

    July 25, 2025

    Starlink-powered ‘T-Satellite’ service is now live on T-Mobile

    July 25, 2025
    Leave A Reply Cancel Reply

    Latest Post

    If you do 5 things, you’re more indecisive than most—what to do instead

    UK ministers launch investigation into blaze that shut Heathrow

    The SEC Resets Its Crypto Relationship

    How MLB plans to grow Ohtani, Dodger fandom in Japan into billions for league

    Stay In Touch
    • YouTube
    Latest Reviews

    The First Planned Migration of an Entire Country Is Underway

    By Earth & BeyondJuly 25, 2025

    Amid increased momentum for defense, the NATO Innovation Fund refreshes its investment team

    By Earth & BeyondJuly 25, 2025

    Starlink-powered ‘T-Satellite’ service is now live on T-Mobile

    By Earth & BeyondJuly 25, 2025

    Subscribe to Updates

    Get the latest tech news from FooBar about tech, design and biz.

    Most Popular

    Bitcoin in the bush – crypto mining brings power to rural areas

    March 25, 202513 Views

    Israeli Police Question Palestinian Director Hamdan Ballal After West Bank Incident

    March 25, 20258 Views

    How to print D&D’s new gold dragon at home

    March 25, 20257 Views
    Our Picks

    Eros-Aanand Rai Dispute Over AI-Altered ‘Raanjhanaa’ Ending Escalates

    Nico Iamaleava: UCLA transfer driven by family, not NIL deal

    The Fantastic Four’s cosmic baby Franklin Richards, explained

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    © 2025 Earth & Beyond.
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms and Conditions
    • Disclaimer

    Type above and press Enter to search. Press Esc to cancel.

    Newsletter Signup

    Subscribe to our weekly newsletter below and never miss the latest product or an exclusive offer.

    Enter your email address

    Thanks, I’m not interested