SDL (Simple DirectMedia Layer) bans AI / LLM code contributions (www.gamingonlinux.com)
from ryujin470@fedia.io to programming@programming.dev on 16 Apr 21:12
https://fedia.io/m/programming@programming.dev/t/3767754

#programming

threaded - newest

HailHydra@infosec.pub on 16 Apr 22:22 next collapse

Based

[deleted] on 16 Apr 22:38 next collapse
.
Dumhuvud@programming.dev on 16 Apr 23:20 next collapse

It’s all fun and games until an LLM outputs someone else’s AGPL code and you merge that into your codebase.

Microslop had an oopsie this February, actually. Their (de)genAI plagiarized a diagram: nvie.com/posts/15-years-later/

mrmaplebar@fedia.io on 17 Apr 00:26 next collapse

I'm not sure the law is as settled as you're making it out to be here... We're still watching a lot of these questions slowly trickle up the courts.

In my opinion, the idea that someone can feed a bunch of unlicensed copyrighted material as input into a generative AI meat grinder and produce public domain outputs doesn't really pass the smell test to me.

For example, could I take use an LLM trained on GPL code to rewrite Linux in a legally distinct way, and then treat it as permissive or proprietary code after minor modifications? Likewise, can you train an LLM on someone else's proprietary code and rewrite it as a GPL program?

This sort of copyright/license laundering seems like an existential threat to the way that copyleft FOSS has existed for decades. I think it makes a lot of sense to be extremely cautious and skeptical of AI-generated code submissions.

Edit: I also want to point out that the issue of whether or not it can be considered "fair use" to train generative AI models on unlicensed copyrighted works is very much an open question. If it was determined that it isn't always fair use, then I don't know what that might mean for many of the existing models that have been trained that way, or the outputs that they produce.

garbage_world@lemmy.world on 17 Apr 06:12 collapse

People are doing that without AI, see rust based “improved” core utils or moss kernel.

vapeloki@lemmy.world on 17 Apr 02:04 next collapse

Ok, so, someone used an LLM do create changes. This new code is no longer under the project license it is, as you say, public domain.

Move forward 400 commits. At what point is most of the code public domain?

While correct you still missed the point completely.

So,not make this as clear as possible: You can not license LLM generated code. Not under GPL, z lib or other copyleft licenses. It may work with public domain licenses.

For MIT licensed projects there is not a big issue.

For the kernel, have a look at the rules. AI models may assist only.

terabyterex@lemmy.world on 17 Apr 02:38 next collapse

You hit the framing issue right on the head. People keep using code and project interchangeably. Code has never been able to be copyrighted. You cant copyright a for loop. I cant create a car class that has properties like make, model, year and copywrite it. Thats never been a thing. Thats why projects are copyrighted. An entire piece of work.

Now your Ship of Theseus has a debateable point. If every single line of code is ai written, does it become an ai project. I woild argue not because of himan involvement to get it there. I also dont see it happening. Not every line of code is replaced. But if it was, that would be a court challenge.

vapeloki@lemmy.world on 17 Apr 03:20 next collapse

So, by your logic I take the kernel source, or parts of it and use them verbatim in my property project?

You can see yourself that you are talking bullshit?

Or course code is and was licensable. Because code is more then hast if, else, for, while.

Ever seen that some code in some projects is dual licensed?

Why would anyone need this if only while projects are licensable?

…europa.eu/…/licence-compatibility-permissivity-r…

terabyterex@lemmy.world on 17 Apr 03:34 collapse

What you are referencing is a project using other projects. You very much can take a function use it. Its when what you take starts to resemble the whole that it becomes a problem. But if you see a project and say “hey i like how they store files in this collection.”, you can use that. Now if you are making a kernel and rip out code tlthat resembles a kernel then no.

litchralee@sh.itjust.works on 17 Apr 03:36 next collapse

Citation needed

vapeloki@lemmy.world on 17 Apr 14:44 collapse

I though a while if i should answer to it. And this is so wrong, and dangerous, that i decided to do.

First: Such licensing questions are part of my day to day Job. I will explain it to you like i explain it to first semester students:

A LICENSE defines the terms under which a copyrighted work can be used, distributed and so on. It does not matter what the work is. A few relevant examples:

  • Beats: in the music industry there is a market for beats. If you take a 10 second beat and in-cooperate it in your own music without having a license for it, you are in violation of copyright law.
  • Paintings: You can not have a copyright on the concept of using paint to draw pictures on a piece of linen, you can not have a copyright on the concept of a specific object drawn on linen. You can have copyright on YOUR version of it. If somebody else takes your work, and only modifies it slightly (or for example, include it into a book, that is a copyright-able piece of work), without having a license for it… copyright violation

So, this brings us back to our topic:

There is no difference between sourcecode, music, paintings or writings for the purpose of licensing.

So, as courts established, AI CAN NOT BE CREATIVE and ALWAYS based on other works. Therefore any AI generated Work is in itself are not protected by copyright.

So, EVERY piece of code that is AI generated is free to go for any purpose, it does not fucking matter if there is a GPL 3.0 header in the file. AI Generated == Public domain.

But: Projects use viral copyleft licenses for a reason. A company could just take the AI generated code, implement it, and if somebody has a problem with it an just shrug “It’s AI generated, it’s public domain”.

litchralee@sh.itjust.works on 17 Apr 03:29 collapse

Code has never been able to be copyrighted. You cant copyright a for loop. I cant create a car class that has properties like make, model, year and copywrite it. Thats never been a thing. Thats why projects are copyrighted. An entire piece of work.

Every single complete sentence in this quote is factually wrong, under both USA copyright law and international copyright law.

Copyright accrues the moment that some work is rendered into a fixed format, such as a sheet of paper but also includes a computer text file. Writing a “for” loop as a homework assignment does create copyright. Ten students writing their homework all create their own copyright, even if the result is coincidentally identical. This isn’t even a point of serious doubt in the law: copyright is very much an exercise of provenance, not of bitwise comparisons.

From when a work is created, every transformation, edit, or addition must all occur within the parameters of some sort of license from the copyright owner, or else an infringement has occurred.

Two people may stand at the same position at the foot of Mt Whitney in California and set up their own camera, one after another, on the same tripod to take the same frame of the scenery. And under copyright law, each owns the copyright to their own photo. One may decide to sell their photo and copyright to an East Coast newspaper, while the other has theirs committed to canvas. The newspaper may not assert a copyright claim against the canvas owner, and the canvas owner cannot assert a claim against the newspaper.

pelya@lemmy.world on 17 Apr 19:49 collapse

Ok, so, someone used an LLM do create changes. This new code is no longer under the project license it is, as you say, public domain.

Except it is, depending on code similarity. The court uses the same rules as with book plagiarism. If LLM uses exactly same code structure and only renames some variables or adds pieces of code that do nothing useful, high chances the court will declare it a derived work and enforce the license.

vapeloki@lemmy.world on 17 Apr 21:34 collapse

Nope. Courts decided: AI output is never copyrightable.

misk@piefed.social on 17 Apr 07:52 collapse

If something is public domain then it’s incompatible with copyleft licenses, like GPL under which Linux is licensed.

Linux team is trolling AI boosters who can’t certify their code is clean unless they trained models themselves.

PoY@lemmygrad.ml on 17 Apr 03:38 next collapse

this thread is just 🦌🍿

Jankatarch@lemmy.world on 17 Apr 06:33 next collapse

Hell yeah!!

Hisse@programming.dev on 17 Apr 08:03 collapse

How would they know though, if the human operating the LLM removes the stupid comments

airbreather@lemmy.world on 17 Apr 08:33 next collapse

You’re absolutely right! It’s not just flawed — it’s impossible to enforce.

/s

More seriously, the core issue isn’t completely novel to large established open-source projects. How do they deal with the possibility that someone might be contributing code from, say, a closed-source competing product (or one whose licence is otherwise incompatible)?

The same answer ought to work here, probably.

FizzyOrange@programming.dev on 17 Apr 16:07 collapse

I suspect just asking would work. The number of people that will use AI to make sloppy PRs is going to be a lot higher than the number that will bare-faced lie about having used AI.