the BSDs in the AI Age

Mon Apr 6 17:41:39 EDT 2026

On 4/6/26 14:19, Dan Cross wrote:
> On Thu, Apr 2, 2026 at 9:16 AM George Rosamond
> <george at ceetonetechnology.com> wrote:
>> I want to initiate a thread on the "BSDs and AI today."
>>
>> A few things first.
>>
>> There are many levels to this discussion, and for the sake of clarity
>> and sanity, please top posting. All replies should be inline.
>>
>> This is useful:
>> https://subspace.kernel.org/etiquette.html#do-not-top-post-when-replying
>>
>> I'm looking to do a presentation on this in the summer for NYC*BUG.
>> There hasn't been anything in our community which provides the
>> high-level overview of the impact of AI, covering things from the impact
>> on the BSD operating systems to the impact on $job, etc. Hopefully this
>> thread can provide some raw materials, and become an outlet for
>> individual experiences and more general views.
> 
> Hopefully this will be an interesting discussion; at any rate, thanks
> for initiating it.

And thank you for replying inline.

> 
>> I initiated a similar fruitful (but private) discussion for another
>> open-source project, and think it's high-time for us to do the same on
>> a public list.
>>
>> ***
>>
>> There's a few layers to this discussions. Note these are discussions
>> points, not "Yes" or "No" surveys.
>>
>> * How are LLMs (big tech or otherwise) impacting $job now? Are you using
>> Claude Code or similar tools for day to day? Was it required or was it
>> your choice? Was there expectations from this tools in terms of
>> productivity, etc? This question raises the impact of AWS Bedrock/Kiro...
> 
> Personally, LLMs are both influencing my job and not influencing my
> job.  The dichotomy is that the surrounding ecosystems are being
> fundamentally shaped by them, but I have not incorporated their output
> directly in my own work.  However, given that every Google web search
> these days more or less includes an AI Mode summary, I'm finding it
> inescapable; furthermore, many of the tools that I routinely use are
> similarly incorporating LLMs, either directly in their construction
> (for example, the Zed editor) or indirectly by hooking into their use
> (again, text editors and so on).
> 

Yes the usual "I'm not using LLMs" rarely means you're not your
providers are not using LLMs. I tend to use search engines with Tor
Browser without JavaScript (which limits the search engines I use), so I
don't have much insight into the explicit impact on internet searches...

> Further, some of my colleagues are making heavy use of LLMs, albeit
> with significant human supervision.  I suspect this is trend that will
> only increase: the quality of output has increased substantially in
> the last few months, and the genie is out of the bottle.  There _is_ a
> "there" there, though whether it's worth it is a question that needs
> to be grappled with.

Very true. While I think there's been a change in output for myself, I
wonder how much humans getting better at prompt engineering matters
(therefore feeding the models I use). I also wonder how much the
"humanization" of the LLM interactions impact users when they are acting
as end users to the LLMs. I mean, you on the prompt versus backend API
calls for systems.

Subconscious speaking: "Wow, the LLM thinks I'm really smart. I'm going
to spend more time here!"

> 
>> * Should BSD projects have explicit LLM-focused policies? Look at the
>> 2nd point in the NetBSD "Commit Guidelines" at
>> https://www.netbsd.org/developers/commit-guidelines.html. OSS-Security
>> already discussed the issue with alleged CVEs discovered by people with
>> LLMs trying to stack their resume with credentials.
> 
> Probably!
> 
> That unsatisfying one-word answer is about the best I suspect can be
> done at the moment.  These tools are in their infancy, and
> collectively we're all grappling with how best to use them, or not use
> them at all, if that's still possible.  I understand that discussions
> like this one are meant to iterate on that as part of the overall
> process.
> 
>> * How should the BSD projects themselves be using LLMs? Integration in
>> the shell (oh, please no...)? Porting of APIs for big tech LLMs?
>> Utilizing LLMs to discover bad code, CVEs, undiscovered vulnerabilities?
> 
> Speaking from my own experience with them....  I decided about six
> weeks ago that I needed to understand these things better, so I went
> through a few exercises messing around with Anthropic's Claude Code.
> 
> What I discovered is that the output is not (yet?) good enough for
> direct incorporation into e.g., an operating system. Where I have
> found that they work best is in either interactively exploring a code
> base ("explain to me how this code uses interface X...."), or in
> building bespoke tooling that I might use to better approach whatever
> I'm actually working on.
> 
> For example, I recently  used Claude to write a tool that extracts
> machine-readable register definitions for a particular vendor's CPUs
> from PDF documents; given around ten volumes, each containing
> many-thousand-pages of text, the tool pulls those definitions and
> writes them into JSON files, which can then be queried with a tool
> like `jq`. Instead of ^F'ing though a multi-volume set of PDF files, I
> have a shell script that can show me the relevant details directly. I
> also had it generate tools to show me what the fields of a populated
> value mean, and did some editor integration so I can "hover" over a
> field and see what it means, what an accessor is changing, and so on.
> 

And I wonder if you even need jq at that point to output or grep the data.

I think this is one of the great advantages of generative AI for a lot
of people in technology, despite many issues, hallucinations, blah blah.

"How soon do we have to contact customers after we have a security
incident?"

I can eyeball quick looking for "24" "48" "72" (ie, hours) but an LLM is
better at it, and I can confirm if a citation is provided.

> This is very handy, but more importantly, the process of building it
> was instructive. The first draft had all sorts of problems: page
> footers inside of field definitions, for example. The LLM kept wanting
> to add ad hoc heuristics to fix individual instances of such problems;
> I finally realized that among the best ways to constrain it to reality
> included a) asking it to explain to me what it was doing, in the form
> of a written "design" document, up front; b) forcing it to use
> test-driven development (to the extent I could force it to do
> anything), so that there was a known metric by which to judge the
> output of a change, c) making it frame the problem as building a
> grammar describing the register definitions I cared about, and then
> implementing a parser for that grammar: page footers could then be
> recognized as lexical tokens and treated like whitespace, solving that
> problem generally.

yes, and for a long while, prompt engineering in all its levels has been
a critical tool in using an LLM productively.

My foundational prompt includes things like:

"2 paragraphs maximum replies."

"POSIX shell not bash"

"stop the flattery in replies I'm not in 5th grade"

> 
> This last point was key: forcing to frame its output in terms of a
> much smaller thing that was a) formally defined, like an EBNF grammar,
> and b) small enough that I could examine and verify myself, I could
> have reasonable confidence in the fidelity of its output.  Still, it
> always biases towards taking the simplest action to effect an outcome,
> often with poor results.  Regardless, I kept at it and eventually got
> it to the point where I was reasonably happy with the output.  After,
> it occurred to me that if I didn't have the level of experience I do,
> I wouldn't have been able to successfully direct the LLM to build the
> tool I wanted.  This led me to coin my own little "Dan's Law": an LLM
> can only write a program that is as good as the human driving it could
> have written.
> 
> The corollary is that these things really are tools for senior
> engineers, who have the requisite experience to analyze their output.
> In the hands of less experienced folks, they're dangerous.  I
> presented that tool at a little internal demo the other day and a
> colleague asked, "how much time do you estimate that Claude saved
> you?"  I think this is the wrong question, and my response was that I
> wasn't sure that Claude really saved me any time: oh sure, it could
> emit text faster than I could type it all in, but I had to continually
> correct it and tell it to go back and start over, and in that sense,
> it wasted a lot of time by doing things that I would have, I hope,
> thought better than doing myself.

So so true. That point goes well with the idea that if you don't know
the question to ask, you wont get the right answer, and you won't
understand it either.

> 
> Finally on this point, applicability of LLMs to a problem domain
> likely follows a power law: 90-99% of the training data for software
> is probably doing more or less the same thing, and the LLM is pretty
> good here.  On the other hand, if you're working in the problem space
> covering the last 1-10%, the LLM is much worse.  You can get it to
> generate a simple web UI, no problem; but a verifiably correct
> implementations of lock-free concurrent data structures?  Eh, not so
> much.
> 
>> * How should individual developers and users consider LLMs as tools for
>> contributing to the BSDs and other open-source projects? I happily used
>> a big tech LLM to deal with an rc file for some very Linuxey software
>> wrapped up in systemd clutter.
> 
> This needs to be prefaced by asking, what does it mean to use an LLM?
> If essentially every web search is now using one indirectly, it seems
> inescapable; but I suspect that's not what you mean: rather, I think
> you're referring to direct use by an individual, and incorporating the
> output of that use into one's work.
> 

Useful clarification, as I distinguished earlier above.

But yes, I mean the latter not the former.

> But still, this definitional issue is important.  Suppose I point an
> LLM at a program and say, "explain what this does to me" and it points
> out a bug, which I then fix and produce a patch for; how does one
> characterize that?  Suppose I verified and developed the patch
> _without_ use of an LLM, would sending the resulting patch upstream
> violate a project's "no AI" clause, given that the LLM pointed it out
> to me in the first place? What if I do a web search for some random
> technical term and the unasked-for AI summary is actually useful?
> 
> Where does one draw the line?  That seems like an urgent and immediate question.
> 
> Anyway, to address what I suspect is the actual question, I think as a
> way to augment a human developer's abilities, basically being a gofer
> and search engine++, it's not out-right awful.  As a way to explore
> and ideate, they're ok.  As a replacement for human output (and
> importantly human judgement) the things are nowhere near capable
> enough for that.  As with the tool I mentioned above, I've found that
> they work _best_ when constrained by something else that can be
> formally verified. I have had good luck asking the LLM to generate a
> formal model of a thing using something like TLA+, Promela, or Alloy,
> and proving that the model matches code (usually by showing me the
> correspondence between the generated model and the base code).  I can
> then then verify the model using it's tools (SPIN, tlc, etc), and use
> it to generate property-based tests for a system, which gives me a
> baseline of behavior that the LLM has to meet in whatever it's doing.
> 
> I strongly suspect that I think formal methods, aggressively applying
> the type systems of strongly- and statically-typed languages to a
> problem domain, and solid understandings of complexity theory and
> formal language design, are going to take on a much greater role for
> practitioners over the next few years.  I never thought I'd say this
> as an OS person, but I suspect that theorem provers are going to take
> on a pretty big role for me over the remainder of my career.
> 
> In fact, I found a bug using TLA+ on Friday; notably, that bug snuck
> past testing and human review.  I think this is less an LLM win and
> more a formal methods win, but I used an LLM to generate the model
> that revealed the bug, so they're related in that sense.

I read the above paragraphs a few times and will read again later.

This is precisely the nuances of usage I think needs to be explored and
appreciated, although I'm convinced it might just be a fleeting moment here.

Think about say, the Goog with golang or more likely Python. I'm willing
to bet for the obvious reasons, that code generation, bug finding, etc
will be incredibly accurate and useful.. and their view of the ecosystem
will become "we maintain the core libraries, and humans dumb and smart
will build stuff with a peripheral role in the process.

Maybe that bazaar known as pypi will look different in a year.

> 
>> Other relevant questions added to this thread are welcomed, including
>> references to other relevant public mailing list discussions.
> 
> I mentioned that these tools are still in their infancy, and that
> feels very true even in how we interact with them: take Claude Code,
> for example.  One can run their CLI, and it feels like playing
> Adventure or Zork or something.  And yes, there _are_ other
> interfaces, including say a VS Code plugin, but features get released
> into the CLI first. Anyway, we're still in the "GET LAMP" era of
> working with these things, and still a long ways from Rogue, let alone
> something one of my kids would consider playing.

Yes, nothing controversial there, but useful metaphors.

> 
> It is also important to acknowledge the ethics here. There are three
> main things that keep me up at night:
> 
> 1. We're re-centralizing the means of producing software.  If these
> things are going to take on a larger role (and by every indication
> they are), then it's deeply concerning to me that a very small handful
> of big players are effectively controlling the show.  Honestly, that
> should concern us all.  Furthermore, I think that the true cost of LLM
> usage is much higher than what we're currently paying.  Using Claude
> Code with the latest model effectively practically requires paying
> Anthropic for the Max subscription, which isn't exactly cheap.  What
> do we do when the firehose of VC money shuts off and the cost
> increases 2x, 5x, or 10x?

So right, I cover that issues on another angle in my presentation.

The captive market of big tech with their LLMs and capex for hardware,
data centers, etc, causing vicious competition among themselves, but
essentially makes us beholden to their fees.

OTOH, the firms employing LLMs for SaaS, etc over API all fall into crisis.

$10k for a pentest twice a year or monthly? Er, how about I give an LLM
confidential data about my applications and pay a teeny fraction of
that, plus the LLM gives me the exact remediation. Oh, and no staff
needs to implement since it's all integrated with agents. We just need
some deskilled devs to review it.

That collapse in value, in the amount of labor in technology operations,
 is devastating and will make the impact of the cloud on sysadmins look
trivial.

> 
> 2. There's the issue of the provenance and ownership of the data used
> for training models. We're starting to see supply chain attacks in
> this area, and people have been pointing out that there are legitimate
> questions about the legality of sourcing that data in the first place,
> and its fair use, for some time. Some folks will dismiss this by
> saying that most of us learn from others or by looking at existing
> references, so why is this different?  I reply that there is a massive
> difference in scale: it was one thing for me to learn about linked
> lists as a kid reading a book on data structures; it's entirely
> different when a machine sucks in the content of every book on data
> structures and reproduces it on demand.  As Warner and others have
> pointed out, the courts haven't caught up and it's all _really_
> uncertain right now.  And did the authors of those books agree to
> having their content used thus?  If the incentive to read those
> references goes away, since the LLM gives me the information anyway,
> and there's correspondingly no financial incentive to write new books,
> how do we move new ideas out of the research domain and into
> mainstream practice?  Do LLMs just pull everything towards the median?
>  (Maybe the "Singularity" will end up being "aggressively mid.")

Distressing but yes.... particularly when an increased in productivity
expected goes hand-in-hand with deskilling instead of freeing up people
to attack hard questions, imagine new things, etc.

And it's worth looking at OWASPs top ten on LLMs/genai more generally in
terms of the supply chain issues.. if anyone can't visualize:

https://genai.owasp.org/llm-top-10/

> 
> 3. There's the environmental impact.  The amount of energy required to
> build a new model is growing super-linearly (it appears to have gone
> from exponential to "merely" quadratic relative to the previous
> generation model), and we're running out of physics for Moore's Law to
> keep it reasonable (it's axiomatic that you can only halve the size of
> a thing so many times until you start running into fundamental
> physical limitations, and we're starting to edge up against that).
> Dedicated accelerator hardware and so forth may be able to help, but
> at some point, we will run out of the ability to train a bigger model.
> What then?  Moreover, in their present form, these things are
> grotesquely inefficient: everything is free-form text.  The whole
> thing really smacks of the sort of thing where the big players created
> a machine for generating simulacrums of plausible text, and then
> realized they could apply that to all kinds of stuff---like software.
> But the amounts of energy (and water!!) required to do so are
> unsustainable.  Honestly, this seems like the worst of the three; one
> could imagine running a local model at home, or even a small cluster
> at a job, but if we're sucking the water table dry to train the model
> required to do that, that's not great.  Most of the AI boosters I'm
> seen seem to be banking on these problems being solved before it
> becomes a really serious problem, or on gains in efficiency due to AI
> use offsetting the increase in energy costs, but I'm skeptical: I've
> seen no concrete plans how to address this challenge, in particular.
> 

It's hard to say whether the deskilling and job losses will be worse
than the environmental costs. The better answer is that they go
hand-in-hand.

I realize that's posed as "ethics" but it seems more existential.

Industry doesn't care about leaking oil tankers if oil is at $200 a
barrel and they're making money hand over fist.

"give me your fines and hand slaps... it doesn't matter"

Data centers could be a so environmentally safe, sane, local, effective,
etc. .but they won't be when it's a core aspect in capex in a feverish
era of competition.

> Ultimately, there don't seem like a lot of easy answers, and I suspect
> we're in for a pretty wild ride over the next few years.
> 
>         - Dan C.

Very much agree, and appreciate your thoughtful answers.

g