AlphaGo Moment for Model Architecture Discovery

(arxiv.org)

29 points | by Jimmc414 18 hours ago

3 comments

  • Jimmc414 18 hours ago
    This could be a very big paper if its claims are reproducible. Like approaching attention is all you need big.

    They discovered 106 new state-of-the-art linear attention architectures through a fully autonomous AI research loop. The authors are making comparisons to AlphaGo’s move 37.

    • yorwba 17 hours ago
      The part that is in principle amenable to replication is where they throw a lot of stuff at the wall and see what sticks. The part where they hype their own work, on the other hand... as a rule of thumb, if this really were a breakthrough on the level of AlphaGo, they wouldn't have to make that comparison themselves, someone else would be impressed enough to do it for them.
    • rafaelero 12 hours ago
      Let's definitely wait for replication, but I am honestly not that surprised that it works. I am surprised it took so long for people to give it a real try. It's such an ideal scenario: every experiment is conducted inside the computer, so there is no need to gather data in the real world, which is the pain point for most experiments in science. The LLM is therefore free to try a lot of different combinations and learn in real time what works and what doesn't.
    • constantcrying 1 hour ago
      >This could be a very big paper if its claims are reproducible. Like approaching attention is all you need big.

      If it were it would have the on of the worst title imaginable and include one of the worst abstracts for any such paper.

      It seriously is a very big red flag that a paper is doing this much to talk about how important it is. It is also essentially pre-writing the headlines for journalists to use, making this whole thing, at least, look like it is a PR stunt.

  • BoiledCabbage 12 hours ago
    Interesting paper - it will fascinating to see if it pans out.

    The one thing I didn't see that would be good is some validation that the architecture(s) that perform best on large models are the same architectures that perform best on small models.

    Ie validation the assumption that you can use small models with sma amounts of training/compute to determine the best architecture for large models and high training budgets.

    Even if it doesn't translate it would still be very cool to be able to qui kly evolve better small models (1M to 400M params), but I believe the implied goal (and what everyone wants) is that this exploration and discovery of novel architectures would be applicable for the really big models as well.

    If you could only ai discover larger models by spending OpenAi/Anthropic/... budgets per exploration then we're not really gaining much in terms of novel ideas as the cost (time and budget) would be too prohibitive.

  • supermdguy 10 hours ago
    Interesting work. Not super familiar with neural architecture search, but how do they ensure they’re not overfitting to the test set? Seems like they’re evaluating each model on the test set, and using that to direct future evolution. I get that human teams will often do the same, but wouldn’t the overfitting issues be magnified a lot by doing thousands of iterations of this?