The coming coordination calamity (surfingcomplexity.blog)
from codeinabox@programming.dev to programming@programming.dev on 25 May 08:24
https://programming.dev/post/50942524

#programming

threaded - newest

ell1e@leminal.space on 25 May 09:05 next collapse

AI code is pretty unusably bad for long term use anyway medium.com/…/i-saw-the-horror-of-ai-and-coderabbi… so best solution is to just to handwrite proper code as before. It’s not like we ever had much of an output problem in most coding industries, it was always a quality and bugs problem.

squaresinger@lemmy.world on 25 May 12:22 next collapse

Can you maybe post the text.

ell1e@leminal.space on 28 May 08:20 collapse

I think this is the report it talks about: coderabbit.ai/…/state-of-ai-vs-human-code-generat… Does this link work better?

squaresinger@lemmy.world on 28 May 12:41 collapse

Yeah, it does, thanks!

locuester@lemmy.zip on 25 May 13:52 next collapse

That article is from January. This space moves too fast. It’s not worth reading. I thought things still sucked in Jan too. But they’re impressive af now.

zbyte64@awful.systems on 25 May 15:09 next collapse

It’s impressive until it isn’t because it decided to “fix” an issue by simply ignoring an exception.

ell1e@leminal.space on 26 May 13:01 collapse

machinelearning.apple.com/…/illusion-of-thinking It’s not surprising LLMs keep messing up in what seem to be the most braindead ways.

[deleted] on 25 May 15:16 next collapse
.
locuester@lemmy.zip on 25 May 15:39 collapse

I realize you aren’t happy about it. But it’s true.

I was basically born behind a computer in 1978. Been a fulltime software dev since 1998.

What the latest models are doing is nothing short of incredible. And in 6 months the current models will suck compared to the latest.

Somewhere around Feb is when things really shifted for me personally. I can do all home sys and net admin tasks now by just asking a bot, running a LOCAL model. Frontier models can whip up apps in minutes.

It does require dev/architect knowledge to get quality. You have to understand the broad solution, then just get ai to do the grunt work.

I wrote all 4 of these this week, 100% ai code. I wouldn’t have had the time to write the first three, but it (opus 4.6 I think) oneshot them all in a couple mins:

Homey apps:

Other:

Do these repos have bugs? Yep probably. But they’re working today for me solving my problems.

The same applies on large repos where I do work. When properly guided by a high skill dev/architect, the results are profound. Even non code stuff like terraform and ansible.

Given proper direction, an LLM allows you to perform at a much higher level.

ell1e@leminal.space on 26 May 13:00 collapse

LLMs seem to be inherently dumb: machinelearning.apple.com/…/illusion-of-thinking

And from what I can find in recent studies, no, they didn’t suddenly get smart. They just plagiarize slightly better: www.sciencedirect.com/…/S2949719123000213#b7

We found that the models that consistently output the highest-quality text are also the ones that have the highest memorization rate.

locuester@lemmy.zip on 26 May 14:31 collapse

Are you asking me to reject my professional daily reality?!

You can provide sources all day, but it won’t change my reality of this being the most productivity enhancing tool since MS introduced intellisense in 1996.

If I wanted to shit on AI I could absolutely provide data to make it look like it sucks and laugh at it. It can do some really stupid shit.

In the hands of an expert, this technology is a productivity multiplier. In the hands of a beginner, this technology is a security and code quality problem. If you’re having problems controlling it, look inward.

ell1e@leminal.space on 26 May 15:38 next collapse

Are you asking me to reject my professional daily reality?!

Can you point me to a single field study that shows programmers become faster and not just feel faster, and that doesn’t come with some caveat like they haven’t tested AI coders vs non-AI coders, or coders without significant AI exposure before (since otherwise it won’t rule out simply becoming dependent)?

Even if you could find one, and I was unable to so far, it doesn’t change that:

  1. you are probably faster by verbatim plagiarizing somebody’s other project at a large scale, and

  2. by making yourself addicted and reliant on the AI where your own skill is eroding: www.404media.co/software-developers-say-ai-is-rot… (if you get a paywall: archive.is/tHq80 ) and

  3. by having a higher rate of bugs in your code no matter how carefully you review it coderabbit.ai/…/state-of-ai-vs-human-code-generat… which especially for security sensitive projects may have dire long term consequences, and

  4. by encouraging the environmental destruction brought on in particular by the training of new models.

Two caveats:

  1. Keep in mind more lines of code is not a useful metric for faster project completion and faster maintenance task completion, especially for code bases that are already large.

  2. I’m merely speaking about using LLM code in your project, so for example LLM auto completion or copy&pasting code from a chatbot. I’m mot talking about LLM code reviews that point out issues in natural language.

locuester@lemmy.zip on 26 May 18:12 collapse

No, I don’t study or review research on this subject at the moment. My personal experience is far more reliable.

Look, I’m 50 years old. Been doing this shit forever. It’s an amazing productivity enhancer for ME. I can’t say any more really. I linked unique repos that were built by me in minutes as examples.

I understand your position and your doubt since it’s pretty common opinion in the echo chambers around here. Are you a software engineer?

TehPers@beehaw.org on 26 May 19:58 collapse

Are you asking me to reject my professional daily reality?!

Nobody’s asking you to do anything. If it works for you, then that’s fine.

People are talking about the tech in general and their own experiences with it, alongside relevant research they have found. You are more than welcome to disagree with each other. Nobody is forced to change their opinions or how they work over a short internet conversation.

As an aside, LLMs, like everything else in life, require nuance to evaluate. They excel at specific tasks that are built for them, and are terrible at the wide array of tasks that are not built for them. It’s entirely possible that your work primarily lies in the former while others work in the latter space.

locuester@lemmy.zip on 26 May 20:32 next collapse

Yeah absolutely agree. In another thread I pointed out the difference between a pro using it and a novice using it.

Currently the loudest people seem to be the novices using it, even journalists? Maybe it’s just hatred and determination of people to make it sound bad to fulfill their fantasy of it sucking. Theres definitely an echo chamber effect going around also, a hivemind of “ai sucks”.

Anyhow, I like to add my experience with AI to discussions to counter all the negativity.

ell1e@leminal.space on 27 May 00:20 collapse

Quoting studies to actually back up one’s point is in my opinion far less of an echo chamber and a fantasy than anecdotes of “but for me it feels faster”. Especially when AI is known to slow people down while making them feel faster.

locuester@lemmy.zip on 27 May 01:01 collapse

Yes, that must be wonderful to live in academic world and to throw around papers and shit but down here on the ground a lot of us actually get shit done and we aren’t the delusional ones that are pretending like somethings making us faster because we don’t have the money to waste if it’s not actually making us faster.

Are you an engineer? I’m trying to get a feel for what type of people don’t think that LLMs are useful for software engine

TehPers@beehaw.org on 27 May 02:06 collapse

FYI in many countries the term “engineer” is protected. Software devs would not be allowed to call themselves engineers without some kind of certification.

All that aside, I think you’ll find that a majority of people on this instance write code regularly, whether as a hobby or day job. Also, at my software dev day job, we actually regularly discuss the academic research around LLMs primarily because it has a major impact on our work.

Personally speaking, what I’ve seen over the past few years is that it creates pretty demos really quickly that fall apart the minute you need to actually develop for real. The code becomes an unmaintainable amalgamation of random libraries used to do the same thing multiple ways, and my coworkers who rely on it heavily have learned basically nothing about the libraries or tools they use because they ask the LLM to do it all for them. This is also ignoring the complete lack of motivation I have now for PR reviews knowing that the same mistakes will be made again and again in the future because teaching a coworker a better way to do something does nothing to improve the output of a LLM, which cannot learn.

That’s not to say you can’t use it effectively. There just needs to be a balance between what you do as a developer vs what you have the LLM churn out quickly for you. It requires a lot of direction, enough so that I find it to be a waste of time as opposed to implementing things myself usually. Plus, I actually learn more doing it all myself, like upcoming library versions, changes in the tools and libraries I use since last using them, new language features, and so on.

While I’m not going to do a code review of your linked projects (nor do I believe that would be very useful), it sounds to me like you’ve found a way to make it work for you. That’s awesome. I, unfortunately, am regularly subjected to the slop emitted by it when in the hands of people who are actively destroying what experience they might have once had in favor of doing less work.

locuester@lemmy.zip on 27 May 06:10 collapse

Yeah I’m familiar with some places protecting that word.

I find all the workforce productivity related academic papers in the space right now to be sensationalist and subjective. We just haven’t had enough time to let the dust fall.

Totally understand what you’re struggling with. Ppl still need to care about and understand what they’re writing and make sure things are done properly. You don’t oneshot everything.

Also, it depends on what types of systems you’re working on. Integration and glue code in backend systems is where I live most of the time. Using ai removes a lot of tedious boilerplate.

ell1e@leminal.space on 27 May 00:22 collapse

They excel at specific tasks that are built for them

They are however widely known to be terrible at code, at least compared to an advanced coder. They introduce not only more bugs even after human review, but new kinds of more insideous bugs.

I like to say the main problems with most projects were already the code quality and the bugs, and not that we somehow needed even more low quality lines of code.

(Disclaimer: not talking about passive AI bug analysis here, just using AI to write actual code.)

TehPers@beehaw.org on 27 May 01:50 collapse

They are however widely known to be terrible at code

They are for large tasks. However, for simple pattern repetition tasks, they’re generally fine, code or not. I’ve had success, for example, having them remove pointless, confusing try…except blocks surrounding imports at work. I usually find that I just rewrite anything myself if it’s anything more complex than that because the code it produces makes no sense and taught me nothing.

I like to say the main problems with most projects were already the code quality and the bugs, and not that we somehow needed even more low quality lines of code.

Tell me about it lol.

ell1e@leminal.space on 27 May 11:39 collapse

I’ve had success, for example, having them remove pointless, confusing try…except blocks surrounding imports at work.

And you may have introduced some dangerous hidden bug that way, which you may not have doing it manually.

(I’m not saying that makes it not worth it, this is just what the studies are saying. I personally think it’s not worth it, but I realize there is some subjectivity here.)

TehPers@beehaw.org on 27 May 17:04 collapse

And you may have introduced some dangerous hidden bug that way, which you may not have doing it manually.

You act like I can’t read some import statements and see if they match the import statements on the other side of the diff lol.

There was no bug introduced. All the dependencies were required. If any of the imports did error, then that’s a bug with that package that got surfaced instead.

Olgratin_Magmatoe@slrpnk.net on 26 May 15:16 collapse

They haven’t changed the core functionality of how LLMs work. And that core functionality means they cannot reason through problems.

And until that major issue is solved, they will never be anything more than a tool to pull up syntax for very specific use cases.

locuester@lemmy.zip on 26 May 18:08 collapse

We can call it that if you want.

My tool to pull up syntax for very specific use cases does an amazing job smashing out code for me.

SchwertImStein@lemmy.dbzer0.com on 27 May 03:29 collapse

fuck medium

ell1e@leminal.space on 27 May 22:19 collapse

My apologies, I think this is the report it is talking about: coderabbit.ai/…/state-of-ai-vs-human-code-generat…

SchwertImStein@lemmy.dbzer0.com on 28 May 02:25 collapse

np, thanks for extra work

squaresinger@lemmy.world on 25 May 12:19 next collapse

Is anyone actually productively running multiple agents at once? All the context switching in such a short time span feels like a great way to completely forget what you are doing and losing tasks in the mess.

locuester@lemmy.zip on 25 May 13:55 next collapse

I am getting in the habit of keeping one async agent going in the background working on things while I also use ai in windsurf.

I think windsurf supports this natively with their background agents, but I run my background task in Claude code because then I can use my local qwen 3.6 27b

squaresinger@lemmy.world on 25 May 17:27 next collapse

But what for? Just to burn your employer’s tokens to teach them that AI is a waste of money? (I mean, I’d respect that.)

locuester@lemmy.zip on 25 May 18:22 collapse

I am self employed. I do it because it allows me to do my work in less time, or do more work in the same amount of time. Sometimes I’m having it do little personal projects in the background.

$20/mo for a windsurf sub. Plus like I said, I run qwen 3.6 locally (free) and get very productive output, and that’s also private, which is the main reason I invested in hardware.

Kissaki@programming.dev on 26 May 11:16 collapse

What does this parallel work mean? Does the background agent work on the same codebase as you? Doesn’t that cause conflicts and confusion?

locuester@lemmy.zip on 26 May 12:10 collapse

Nah I typically have it doing something else. And every 15m or so I toggle back and do next step.

Quite often Sysadmin stuff too. I have it do ansible for my pi cluster, and general cluster maintenance like check backups, troubleshoot services, create a firewall rule, etc.

I’ll also ask it research style stuff, like “check out ram usage of ai-1 box and lmk if cache is big enough for 5 concurrent full contexts. If not, change the recipe and restart it. “

cbazero@programming.dev on 25 May 14:25 next collapse

No but you can lie to yourself that you are.

tias@discuss.tchncs.de on 26 May 09:02 next collapse

Multiple top-level agents can’t modify the same codebase simultaneously, they’ll confuse each other. But you can instruct the main agent to spawn sub-agents that it coordinates for you, to increase throughput and reduce token consumption.

SchwertImStein@lemmy.dbzer0.com on 27 May 03:29 collapse

that’s why claude uses git work trees exactly for this use case

ZoteTheMighty@lemmy.zip on 27 May 03:53 collapse

Yeah, AI gets notably worse when it has too much context, even worse than too many managers.

nark3d@thelemmy.club on 26 May 09:35 collapse

squaresinger’s point matches what I’ve found. Once three agents are going, you become the coordination point - you’re holding the plan and reviewing all of it, and that part doesn’t scale the way the generating does. What’s kept it manageable for me is treating each one like an intern on a single, well-specified task I can check before it moves on, rather than running a swarm and hoping it converges. Wrote this up here: prickles.org/tenet/the-intern-pattern/AI1