This is a good write up and I agree with pretty much all of it.
Two comments:
- LLVM IR is actually remarkably stable these days. I was able to rebase Fil-C from llvm 17 to 20 in a single day of work. In other projects I’ve maintained a LLVM pass that worked across multiple llvm versions and it was straightforward to do.
- LICM register pressure is a big issue especially when the source isn’t C or C++. I don’t think the problem here is necessarily licm. It might be that regalloc needs to be taught to rematerialize
> It might be that regalloc needs to be taught to rematerialize
It knows how to rematerialize, and has for a long time, but the backend is generally more local/has less visibility than the optimizer. This causes it to struggle to consistently undo bad decisions LICM may have made.
> but the backend is generally more local/has less visibility than the optimizer
I don't really buy that. It's operating on SSA, so it has exactly the same view as LICM in practice (to my knowledge LICM doesn't cross function boundary).
LICM can't possibly know the cost of hoisting. Regalloc does have decent visibility into cost. Hence why this feels like a regalloc remat problem to me
There is a rematerialize pass, there is no real reason to couple it with register allocation. LLVM regalloc is already somewhat subpar.
What would be neat is to expose all right knobs and levers so that frontend writers can benchmark a number of possibilities and choose the right values.
I can understand this is easier said than done of course.
> Rematerializing 'safe' computation from across a barrier or thread sync/wait works wonders.
While this is literally "rematerialization", it's such a different case of remat from what I'm talking about that it should be a different phase. It's optimizing for a different goal.
Also feels very GPU specific. So I'd imagine this being a pass you only add to the pipeline if you know you're targeting a GPU.
> Also loads and stores and function calls, but that's a bit finicky to tune. We usually tell people to update their programs when this is needed.
This also feels like it's gotta be GPU specific.
No chance that doing this on a CPU would be a speed-up unless it saved you reg pressure.
Given some of the discussions I've been stuck in over the past couple of weeks, one of the things I especially want to see built out for LLVM is a comprehensive executable test suite that starts not from C but from LLVM IR. If you've ever tried working on your own backend, one of the things you notice is there's not a lot of documentation about all of the SelectionDAG stuff (or GlobalISel), and there is also a lot of semi-generic "support X operation on top of Y operation if X isn't supported." And the precise semantics of X or Y aren't clearly documented, so it's quite easy to build the wrong thing.
> This is somewhat unsurprising, as code review … may not provide immediate value to the person reviewing (or their employer).
If you get “credit” for contributing when you review, maybe people (and even employers, though that is perhaps less likely) would find doing reviews to be more valuable.
Not sure what that looks like; maybe whatever shows up in GitHub is already enough.
Comptimes aee an issue, not only for LLVM itself, but also for users, as a prime example: Rust. Rust has horrible comptimes for anything larger, what makes its a real PITA to use.
It's amazing to me that this is trusted to build so much of software. It's basically impossible to audit yet Rust is supposed to be safe. It's a pipe dream that it will ever be complete or Rust will deprecate it. I think infinite churn is the point.
Two comments:
- LLVM IR is actually remarkably stable these days. I was able to rebase Fil-C from llvm 17 to 20 in a single day of work. In other projects I’ve maintained a LLVM pass that worked across multiple llvm versions and it was straightforward to do.
- LICM register pressure is a big issue especially when the source isn’t C or C++. I don’t think the problem here is necessarily licm. It might be that regalloc needs to be taught to rematerialize
It knows how to rematerialize, and has for a long time, but the backend is generally more local/has less visibility than the optimizer. This causes it to struggle to consistently undo bad decisions LICM may have made.
That's very cool, I didn't realize that.
> but the backend is generally more local/has less visibility than the optimizer
I don't really buy that. It's operating on SSA, so it has exactly the same view as LICM in practice (to my knowledge LICM doesn't cross function boundary).
LICM can't possibly know the cost of hoisting. Regalloc does have decent visibility into cost. Hence why this feels like a regalloc remat problem to me
What would be neat is to expose all right knobs and levers so that frontend writers can benchmark a number of possibilities and choose the right values.
I can understand this is easier said than done of course.
The reason to couple it to regalloc is that you only want to remat if it saves you a spill
Admittedly, this comes up more often in non-CPU backends.
Can you give an example?
Also loads and stores and function calls, but that's a bit finicky to tune. We usually tell people to update their programs when this is needed.
While this is literally "rematerialization", it's such a different case of remat from what I'm talking about that it should be a different phase. It's optimizing for a different goal.
Also feels very GPU specific. So I'd imagine this being a pass you only add to the pipeline if you know you're targeting a GPU.
> Also loads and stores and function calls, but that's a bit finicky to tune. We usually tell people to update their programs when this is needed.
This also feels like it's gotta be GPU specific.
No chance that doing this on a CPU would be a speed-up unless it saved you reg pressure.
If you get “credit” for contributing when you review, maybe people (and even employers, though that is perhaps less likely) would find doing reviews to be more valuable.
Not sure what that looks like; maybe whatever shows up in GitHub is already enough.
I remember part of the selling point of LLVM during its early stage was compilation time being so much faster than GCC.
LLVM started about 15 years after GCC. Considering LLVM is 23 years old already. I wonder if something new again will pop up.
Optimizing compilers are basically impossible to audit, but there are tools like alive2 for checking them.
For starters the tooling would be much slower if it required LLVM.