5 comments

  • acc_10000 2 days ago
    I built this after watching 7/8 CPU cores idle during a Monte Carlo sim. multiprocessing added 189ms serialization overhead to a 9ms computation.

    ironkernel lets you write element-wise expressions with a Python decorator, compiles them to a Rust expression tree at definition time, and executes via rayon on all cores. ~2k lines of Rust, ~500 lines of Python.

    The win is expression fusion: NumPy evaluates `where(x > 0, sqrt(abs(x)) + sin(x), 0)` as 5 passes with 4 temporaries. ironkernel fuses into 1 pass, zero temporaries, and skips dead branches (no NaN from sqrt of negatives). 2.25x NumPy on compound expressions at 10M elements. For BLAS ops like SAXPY, NumPy is faster — ironkernel doesn't call BLAS.

    Early stage: f64 only, 1-D only, expression subset only (intentional — parallel safety guarantee). Numba warm is 3.2x faster (LLVM JIT vs interpreter).

    • nickpsecurity 45 minutes ago
      Thanks for this! Parallel on Python is always a pain point. I'm always grateful for each tool one of you builds to help us speed up our code. :)
  • ata-sesli 2 hours ago
    The expression fusion win is huge for cache locality. Since you're using Rayon for the multicore side, I'm curious if the generated Rust expression tree is 'flat' enough for LLVM to trigger auto-vectorization (SIMD) on the individual cores or if the tree traversal adds enough branching to break that?
  • stephantul 1 hour ago
    Do you have benchmarks? Naively I would compare this to Numba? But maybe I am way off the mark here
  • KeplerBoy 2 hours ago
    For the love of god, don't use these ai generated infographics/diagrams.

    If that's your bar for quality, I'll think less of your code. I can't help it.

    Also your saxpy example seems to be daxpy. s and d are short for single or double precision.

    • dgacmu 1 hour ago
      As a specific example: The generated diagram showing the expression tree under "build in python" is simply wrong. It doesn't correspond to the expression x * 2 + 1, which should have only 1 child node on the right. The "GIL Released - Released" is just confusing. The dataflow omits the fact that the results end up back in python - there should be a return arrow. etc., etc.

      If you use diagrams like this, at least ensure they are accurately conveying the right understanding.

      And in general, listen to the person I'm responding to -- be really deliberate with your graphics or omit. Most AI-generated diagrams are crap.

  • alephnerd 59 minutes ago
    I think other HNers need to keep an eye on these kinds of projects - a decade ago these would have required a team of 3-4 engineers around 1 quarter to build a prototype for, but now we can see one SWE do the same while leveraging Claude Code.

    Plenty of people on HN wish to bury their head under the sand, but this highlights how critical it is becoming to be both a good engineer and adept at using agentic tooling within your development lifecycle.