Academic Research Skills for Claude Code

(github.com)

37 points | by arnon 3 hours ago

4 comments

  • SubiculumCode 0 minutes ago
    The site opens with how it keeps humans in the loop, but when you continue reading it seems almost a full automation feature.
  • apwheele 1 hour ago
    There needs to be a new name for people creating these with no obvious validation.

    Skill spam?

    • AndyNemmity 55 minutes ago
      Define obviously validation? What is the signal that tells you one is reasonable vs another?

      I find the only way to do that is to look at it, if it passes some visual tests, try it, and then a/b test if it's any better than without it.

      • theptip 26 minutes ago
        Some sort of eval. Eg TermBench, implemented in Harbor.

        It’s an insane amount of effort to build shareable, reusable, comprehensive evals, hence why so almost all skills are stuck in the “vibes” phase.

        That said I think it’s quite easy to skim/intuit these sort of skills and do horizontal gene transfer into your own vibes-based system. If you use the skills regularly you can construct a cheap personal eval that is a lot easier to maintain and use it to compare a new skill/plugin. Just things like “please write a paper on <my personal unpublished thesis>” is a good starting point here. You get a good feel for whether a skill is better than vanilla by running it a couple times and watching the failure modes.

        • AndyNemmity 18 minutes ago
          Yeah, I think we're in a phase honestly where you shouldn't use anyone elses skills, and you should instead point your stuff at a repo with skills, have it really read it, and then ask what of value there is to potentially rewrite in your style based on your preferences.

          I have a complex setup with a lot of things based around what I do. I don't know how anyone could reasonably get their head around any of it. It's a research project in itself.

          So I tell people, please don't use it. Just point your claude code at it, and see if there's anything useful for you.

      • apwheele 33 minutes ago
        So yes a/b broadly speaking is what I was saying (test cases and can show it is actually better).

        Even this repo just the "b" showcase, showing the outputs as is (with no clear documentation how those were generated, is it headless in a CI pipeline somewhere?), is not good, https://github.com/Imbad0202/academic-research-skills/tree/m....

        • AndyNemmity 26 minutes ago
          I run a lot of a/b testing. But I'm not sure showing it actually communicates all that much. Since these are non deterministic systems, even showing you an a/b test from when i made the decision a month ago, doesn't really mean a whole lot.

          I agree we need more clear indications of value, I don't quite understand how to legitimately do that in a fair, and honest way.

    • mmooss 7 minutes ago
      The OP evaluates what it has developed with great rigor and describes the evaluation in detail. What do you feel is missing?
    • elashri 1 hour ago
      Skill-slop.
    • adityamwagh 1 hour ago
      SkillBros?
  • evanwolf 20 minutes ago
    Academic skills are a vector for cite injection.
  • appreciatorBus 1 hour ago
    [flagged]