From Noise to Image – interactive guide to diffusion

(lighthousesoftware.co.uk)

43 points | by simedw 2 days ago

5 comments

adammarples 9 minutes ago
If the prompt is the compass, and represents a point in space, why walk there? Why not just go to that point in image space directly, what would be there? When does the random seed matter if you're aiming at the same point anyway, don't you end up there? Does the prompt vector not exist in the image manifold, or is there some local sampling done to pick images which are more represented in the training data?
whilefalse 2 days ago
Hey, I made this, thanks for posting!
It’s purposefully high level and non-technical for a general audience - my theory was that most people who aren’t into tech/AI don’t care too much about training, or how the system got to be the way that it is.
But they do have some interest in how it actually operates once you’ve typed in a prompt.
Happy to answer any questions or take on board feedback
[-]
- in-silico 25 minutes ago
  I think some of the visualizations would be much better if you used a pixel-space model instead of a latent diffusion model.
  Right now we are only seeing the denoising process after it's been morphed by the latent decoder, which looks a lot less intuitive than actual pixel diffusion.
  If you can't find a suitable pixel-space model, then you can just trivially generate a forward process and play it backwards.
- BobbyTables2 2 hours ago
  Loved the writeup!
  Found the manual latent space exploration part really interesting.
  Too many LLM/diffusion explanations fall in the proverbial “how to draw an owl” meme without giving a taste as to what’s going on.
- adampunk 48 minutes ago
  It's quite clever and thoughtful. thanks for making it!
- plagiarist 1 hour ago
  I enjoyed this a lot.
  The interpolations between butterfly and snail were pretty horrifying. But something like Z-Image you could basically concatenate the text and end up with a normal image of both. Is the latent space for "butterfly and snail" just well off the path between the two individually?
  It's hard to imagine what is nearby in latent space and how text contributes, so I did really like the section adding words to the prompt 1-by-1.
ibizaman 2 hours ago
Oh I particularly loved that you made the prompts themselves interchangeable. Very well done!
K2h 4 hours ago
Scrolling through pics on mobile is difficult. Wanted to see all 29 steps but couldnt scroll it reliably.
[-]
- BobbyTables2 2 hours ago
  Turning off the scroll mode worked very well for me on a mobile.
khazhoux 2 hours ago
Amazing explanations!! I absolutely love this. In 10 minutes it’s given me a huge boost in my intuition on diffusion, which I’ve been missing for years.