Filesystems Are Having a Moment

(madalitso.me)

59 points | by malgamves 6 hours ago

13 comments

  • hmokiguess 1 minute ago
    Notable mention: Plan 9 from Bell Labs.

    https://en.wikipedia.org/wiki/Plan_9_from_Bell_Labs

  • korbatz 1 hour ago
    I was having exact same observation, albeit from a bit diffrent perspective: SaaS. This is where as the code tends to be temporary and very domain specific, the data (files) must strive to be boring standards.

    The problem today is that we build specific, short-lived apps that lock data into formats only they can read. If you don't use universal formats, your system is fragile. We can still open JPEGs from 1995 because the files don't depend on the software used to make them. Using obscure or proprietary formats is just technical debt that will eventually kill your project. File or forget.

    • jmathai 44 minutes ago
      My 10+ year old photo management system [1] relies on the file system and EXIF as the source of truth for my entire photo library.

      It’s proven several times over that it’s the correct approach. Abstractions (formerly Google photos, currently Immich) should just be built on top - but these proprietary databases are only for convenience.

      For work, I’m having the same experience as the author and everything is just markdown and csv files for Claude Code (for research and document writing).

      [1] https://github.com/jmathai/elodie

  • tacitusarc 1 hour ago
    Does everyone just use AI to write these days? Or is the style so infectious that I just see it everywhere? I swear there needs to be some convention around labeling a post with how much AI was used in its creation.
    • sethev 50 minutes ago
      LLMs were trained on stuff that people wrote. I get there are "tells", but don't really think people are as good at identifying AI generated text as they think they are...
    • idiotsecant 17 minutes ago
      This doesn't seem particularly AI slopped to me.
    • q3k 1 hour ago
      Everyone's trying to be the new thought leader enlightened technical essayist. So much fluff everywhere.
      • orsorna 59 minutes ago
        What's wild is that with a few minutes of manual editing it would give exponential return. For instance, a lead sentence in your section saying "here's why X" that was already described by your subheading is unnecessary and could have been wholly removed.
        • gzread 31 minutes ago
          You'd have to have a good idea of how you want the document to read, which is half (or more) of the process of writing it.
  • ramoz 17 minutes ago
    I thing the real impact behind the scenes here is Bash(). Filesystem relevance is a bit coincidental to placing an agent on an operating system and giving it full capability over it.
  • dzello 1 hour ago
    Resonates deeply with me. I’ve moved personal data out of ~10 SaaS systems into a single directory structure in the last year. Agents pay a higher price for fragmentation than humans. A well-organized system of files eliminates that fragmentation. It’s enough for single player. I suspect we’ll see new databases emerge that enable low multi-player (safe writes etc) scenarios without making the filesystem data more opaque. Not unlike what QMD is for search.
  • BoredPositron 12 minutes ago
    I revived my Johnny Decimal system as my single source of truth for almost anything and couldn't be happier. The filing is done mostly by agents now but I still have the overview myself.
  • naaqq 1 hour ago
    This article said some things I couldn’t put into words about different AI tools. Thanks for sharing.
  • rafaepta 47 minutes ago
    Great read. Thanks for sharing
  • jonstewart 25 minutes ago
    It reminds me a lot of Hans Reiser’s original white paper, which can be found at https://web.archive.org/web/20070927003401/http://www.namesy.... Add some embeddings and boom.
  • jmclnx 2 hours ago
    Funny, decades ago (mid-80s), I had to write a onetime fix on a what would be now a very low memory system, the data in question had a unique key of 8 7bit-ascii characters.

    Instead of reading multi-meg data into memory to determine what to do, I used the file system and the program would store data related to the key in sub directories instead. The older people saw what I did and thought that was interesting. With development time factored in, doing it this way ended up being much faster and avoided memory issues that would have occurred.

    So with AI, back to the old ways I guess :)

    • bsenftner 17 minutes ago
      Reminds me of early data driving approaches. Early CD based game consoles had memory constraints, which I sidestepped by writing the most ridiculous simple game engine: the game loop was all data driven, and "going somewhere new" in the game was simply triggering a disc read given a raw sector offset and the number of sectors. That read was then a repeated series of bytes to be written at the memory address given by the first 4 bytes read and next 4 bytes how many bytes to copy. That simple mechanism, paired with a data organizer for creating the disc images, enabled some well known successful games to have "huge worlds" with an executable under 100K, leaving the rest of the console's memory for content assets, animations, whatever.
  • galsapir 1 hour ago
    nice, esp. liked - "our memories, our thoughts, our designs should outlive the software we used to create them"
  • TacticalCoder 1 hour ago
    As TFA basically says: files on a filesystem is a DB. Just a very crude one. There aren't nice indexes for a variety of things. "Views" are not really there (arguably you can create different views with links but it's, once again, very crude). But it's definitely a DB, represented as a tree indeed as TFA mentions.

    My life's data, including all the official stuff (bank statements, notary acts, statements made to the police [witness, etc.], insurance, property titels), all my coding projects, all the family pictures (not just the ones I took) and all the stuff I forgot, is in files, not in a dedicated DB. But these files are a definitely a database.

    And because I don't want to deal with data corruption and even less want to deal with synching now corrupted data, many of my files contains, in their filename, a partial cryptographic checksum. E.g. "dsc239879879.jpg" becomes "dsc239789879-b3-6f338201b7.jpg" (meaning the Blake3 hash of that file has to begin with 6f338201b7 or the file is corrupted).

    At any time, if I want to, I can import these in "real" dedicated DBs. For example I can pass my pictures as a read-only to "I'm Mich" (immich) and then query my pictures: "Find me all the pictures of Eliza" or "Find me all the pictures taken in 2016 on the french riviera".

    But the real database of my all my life is and shall always be files on a filesystem.

    With a "real" database, a backup can be as simple as a dump. With files backuping involve... Making sure you keep a proper version of all your files.

    I'd say files are even more important than the filesystem: a backup on a BluRay disc or on an ext4-formatted SSD or on an exfat formatted SSD or on a tape... Doesn't matter: the files are the data.

    A filesystem is the first "database" with these data: a crude one, with only simple queries. But a filesystem is definitely a database.

    The main advantage of this very simple database is that as long as the data are accessible, you know your data is safe and can always use them to populate more advanced databases if needed.