the BSDs in the AI Age

Thu Apr 2 13:42:21 EDT 2026

On Thu, Apr 2, 2026 at 10:18 AM Justin Sherrill <justin at shiningsilence.com>
wrote:

> On Thu, Apr 2, 2026 at 9:15 AM George Rosamond <
> george at ceetonetechnology.com> wrote:
>
>> I want to initiate a thread on the "BSDs and AI today."
>>
>> I'm looking to do a presentation on this in the summer for NYC*BUG.
>>
>
> Two quantifiable measures, though they will change by the time you are
> doing a summer presentation:
>
> - What models and software run on BSDs?  There's all sorts of tooling for
> accessing LLMs, but how much have made it to BSD?
>
> - How well do LLMs answer questions about BSD specific technology?  Or how
> exact are they when answering questions that could also be for Linux
> systems?  This one might be enraging, as in "check your systemd settings to
> tune your ZFS pools..." or some such.
>
>
>> * Should BSD projects have explicit LLM-focused policies?
>>
>
> LLM policies right now appear to be a stand-in for other problems.  For
> example, LLM bug reports are high volume and low quality so far, but I
> imagine if they get better, the objection would go away:
>
> https://lwn.net/Articles/1065620/
>
> There's probably also something that needs to be settled with copyright
> and assignment with generated code, but I am out of my depth beyond feeling
> like it's undefined.
>

Copyright is an interesting issue. It brings to light several issues that
the Open Source community is generally unaware of. Copyright law doesn't
stop all copying. There are elements of programs that are not copyrightable
because they embody facts, or there's only one way to express things. In
addition, boilerplate items part of the interface also likely don't enjoy
copyright protection. These details usually don't matter for open source:
If there's no copyright you can copy it freely, if there is, you can copy
it freely (though maybe with a restriction or two).  They only come up
with, say, a table that initializes a device's registers is copied or
something similar that has no creative content.

However, AI-generated code brings these issues back. So if I have claude
generate some code for me, and don't edit it, that likely has no copyright
protection. It also almost certainly doesn't have any copyright violations
in it, at least for the domains that I deal with. Since llms train on
thousands of examples, and looks for patterns and uses those patterns to
generate the code, there's no direct copying. Other domains with fewer
examples may not be so lucky. And there's tools online to look for copying,
you you'll still have to be cautious about interpreting the results (eg,
some copying is OK, like inline copies of the BSD license).

But almost nobody uses unmodified code in production. For the BSDs,
claude's generated code today is unsuitable w/o modification, or a lot of
prompt refinement. As the code is tweaked to work and handle the riggors of
the BSD quality floor, it becomes a combination of the author's work and
claude's. The author's creative content is copyrightable, even if embedded
in what started out life as AI generated, much like my copyright exists if
I modify works in the public domain. In other contexts, there'd be
questions about the extent to which you could protect the code, but since
open source "freely" gives the code away, you either have code in the
public comain, that can be freely copied, or you have code that has a
copyright that you can license to "freely" give it away.

So the copyright risk analysis here suggests the risks would be low for
BSD-license open source projects.

There's other risks, but that's the copyright risk.

I personally favor policies that allow AI generated code, but require the
developer to be able to explain every line, as well as making them
responsible for the whole thing. It's just a tool, and like any other tool
you have to use it correctly.

Warner
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.nycbug.org:8443/pipermail/talk/attachments/20260402/f1a70a3c/attachment-0001.htm>