Using the Matrix Cores of AMD RDNA 4 architecture GPUs

(gpuopen.com)

80 points | by ibobev 3 days ago

2 comments

  • semessier 11 hours ago
    ROCm, PyTorch?
    • roenxi 4 hours ago
      People go a bit crazy about CUDA, ROCm and PyTorch, but I've been watching for a few years and have seen no evidence whatsoever that they are serious blockers. PyTorch does work on AMD cards and whatever ROCm can't do doesn't seem to be important because no-one has articulated why they need it in my line of sight. By far AMD's biggest problem is that their linux kernel drivers historically don't seem to be able to handle GEMM workloads without kernel panics.

      Having some senior engineers taking a public interest in putting up this sort of article is rather exciting. I'm not going to give AMD the benefit of the doubt after their horrific performance in the 2010s and early 2020s but observing from a safe distance - they do look like they're on the right track and possibly even a fair way down the path to getting into the game.

      • fancyfredbot 1 hour ago
        You seem to be saying AMD GPUs can run PyTorch but can't run GEMM? Can you explain? I thought PyTorch used GEMM extensively.

        I also don't understand the comment on "whatever ROCm can't do doesn't seem to be important because no-one has articulated why they need it in my line of sight". Isn't the problem with ROCm the lack of support? It's only officially supported on a tiny proportion of AMD's product line?

        • benreesman 1 hour ago
          Parent seems to be saying that stability issues on consumer RDNA cards are the issue as opposed to ROCm support in PyTorch.
          • imtringued 12 minutes ago
            My pet theory is that the scheduler can't handle desktop graphics + ML workloads simultaneously, which leads to a deadlock in the firmware.
  • incomingpain 3 hours ago
    ROCM support for RDNA4 cards in ubuntu is very poor. Worse yet, it's looking like ubuntu 25.10 isnt going to make anything better.
    • benreesman 53 minutes ago
      There's a fundamental tension between "conservative desktop-origin mass appeal Linux distribution" and "extreme performance hardware accelerated numerics coprocessor". In the places where a mass market need is obvious (local LLM) solutions are emerging with vibrant and diverve back end options (gguf).

      I think its OK to install stuff to get extreme scientific compute performance.

      I use NixOS BTW.

      • incomingpain 48 minutes ago
        >There's a fundamental tension between "conservative desktop-origin mass appeal Linux distribution" and "extreme performance hardware accelerated numerics coprocessor". In the places where a mass market need is obvious (local LLM) solutions are emerging with vibrant and diverve back end options (gguf).

        /me urgently awaits devstral 2507 on ollama

        >I use NixOS BTW.

        As a desktop environment? Tell me more! Please!

    • veber-alex 41 minutes ago
      I thought you can just get all the stuff you need in a docker container for both AMD and NVIDIA.

      What does the OS even matter?

    • hardolaf 1 hour ago
      I'm going to say this in a not nice way: that's a you problem.

      You willingly use a distribution which purposely ships out of date software based on some misguided philosophical belief that such a behavior makes the system better or more stable. In reality, it just means that you're running out of date software with security vulnerabilities, bad driver support, and even worse distribution maintainer half-assed patches to fix the aforementioned vulnerabilities.

      I'm not saying that you should switch to Arch Linux, but there is a wide gap between RHEL and Debian based distributions and a continuously rolling distribution. There are distributions that update weekly, biweekly, monthly, quarterly, etc.

      • cpgxiii 48 minutes ago
        AMD officially supports precisely three Linux platforms for current ROCm:

        1. Ubuntu 24.04.4 with kernel 6.11

        2. Ubuntu 22.04.5 with kernel 6.8

        3. RHEL 9.6 with kernel 5.14

        Anything else, like your preferred rolling release distribution, is entirely on your own.

      • incomingpain 51 minutes ago
        >I'm going to say this in a not nice way: that's a you problem.

        I always prefer this.

        >I'm not saying that you should switch to Arch Linux,

        Especially when you Arch isnt supported at all by any version and quite likely to not even work as a video card. Manjaro also not supported.

        >ut there is a wide gap between RHEL and Debian based distributions and a continuously rolling distribution. There are distributions that update weekly, biweekly, monthly, quarterly, etc.

        RHEL seems to be up to date, the RHEL from May is well supported. I have tested out Alma as vms, but ive never used even fedora or centos in ages.

        • hardolaf 16 minutes ago
          I will agree that RHEL has gotten better about upgrading software when they do minor releases but I'm still painfully aware of the pre-9.X days when they would release a new version and the software was already a year out of date.

          I personally used Fedora for a long time at the same time as I ran Arch Linux on servers. I honestly couldn't really tell the difference as long as I was updating Fedora every time a version bump came out. The release cadence was fast enough that it never caused problems. I ended up switching to it for my home devices entirely. Though now I run SteamOS and CachyOS because they're Arch without the headaches of Arch.