I just tried vibe coding with Claude
from entwine@programming.dev to programming@programming.dev on 11 Apr 16:29
https://programming.dev/post/48654729

…and I still don’t get it. I paid for a month of Pro to try it out, and it is consistently and confidently producing subtly broken junk. I had tried doing this before in the past, but gave up because it didn’t work well. I thought that maybe this time it would be far along enough to be useful.

The task was relatively simple, and it involved doing some 3d math. The solutions it generated were almost write every time, but critically broken in subtle ways, and any attempt to fix the problems would either introduce new bugs, or regress with old bugs.

I spent nearly the whole day yesterday going back and forth with it, and felt like I was in a mental fog. It wasn’t until I had a full night’s sleep and reviewed the chat log this morning until I realized how much I was going in circles. I tried prompting a bit more today, but stopped when it kept doing the same crap.

The worst part of this is that, through out all of this, Claude was confidently responding. When I said there was a bug, it would “fix” the bug, and provide a confident explanation of what was wrong… Except it was clearly bullshit because it didn’t work.

I still want to keep an open mind. Is anyone having success with these tools? Is there a special way to prompt it? Would I get better results during certain hours of the day?

For reference, I used Opus 4.6 Extended.

#programming

threaded - newest

ghodawalaaman@programming.dev on 11 Apr 16:45 next collapse

I only use AI for generating ok looking UI.

Anthropic says Methos will find bugs on FreeBSD, Bank system etc. What a bullshit.

OwOarchist@pawb.social on 11 Apr 18:02 next collapse

Oh, it will ‘find bugs’ alright. And then flood FreeBSD’s bug report system with bullshit bug reports that turn out to be nothing, but require expert human review to discern that.

homoludens@feddit.org on 11 Apr 18:35 next collapse

It’s easier finding bugs than producing correct, readbale, maintainable code though-

Shin@piefed.social on 12 Apr 06:04 collapse

Most of the current AI models can already search and find bugs, this mythos is pretty much a PR stunt.

cecilkorik@lemmy.ca on 11 Apr 16:47 next collapse

No, I think you do get it. That’s exactly right. Everything you described is absolutely valid.

Maybe the only piece you’re missing is that “almost right, but critically broken in subtle ways” turns out to actually be more than good enough for many people and many purposes. You’re describing the “success” state.

/s but also not /s because this is the unfortunate reality we live in now. We’re all going to eat slop and sooner or later we’re going to be forced to like it.

GiorgioPerlasca@lemmy.ml on 11 Apr 17:28 next collapse

Or maybe we will be forced to switch off LLMs and start solving the bugs introduced by their usage using our minds.

cecilkorik@lemmy.ca on 11 Apr 19:00 collapse

As a professional software developer, I truly hope that is the case (and I plan to charge at least 10x my current rate after the AI bubble pops when I’m looking for my next job as I expect there to be a massive shortage of people skilled enough to actually deal with the nightmare spaghetti AI code bases)

Fun times ahead.

marlowe221@lemmy.world on 11 Apr 21:26 next collapse

You and me both. We will be the next version of the COBOL Cowboys.

tohuwabohu@programming.dev on 11 Apr 21:26 collapse

It will be interesting (read as: bad) times to get to that point and I agree. The Junior market is basically not existent ever since coding agents appeared, stripping the industry of its future Seniors. We will be chained to our desks.

pinball_wizard@lemmy.zip on 12 Apr 02:36 next collapse

almost right, but critically broken in subtle ways” turns out to actually be more than good enough for many people and many purposes. You’re describing the “success” state.

Exactly. The consequences are at worst a problem for “future me”, and at best “somebody else’s problem”.

AI didn’t create this reality, but it’s certainly moved it into the spotlight and to “center stage.”

vga@sopuli.xyz on 12 Apr 12:08 collapse

Maybe the only piece you’re missing is that “almost right, but critically broken in subtle ways”

Sure, but you have to note that it reaches that point in minutes. Sometimes on a task that would take humans a week. The power is not that it creates correct stuff, it’s that it creates almost correct stuff 100 times faster than human. Plus the typical machine benefits: it never gets tired, demotivated, etc.

So then the challenge becomes being able to be that human, who can review stuff extremely well and rapidly, being natural in probing the stuff LLMs tend to be wrong about. Sort of like the same challenge that every tech lead had before LLMs too, but just subtly different, because LLMs don’t exactly think like we do.

Alexstarfire@lemmy.world on 11 Apr 16:52 next collapse

I haven’t used tools to make stuff from scratch but we do use them, or similar, where I work. What kind of stuff are you prompting it for? I find it works best when you give it a very small/simple task to do. And it’s pretty good when it comes to making tests for existing code.

But if the main problem is getting math equations and such wrong I’m not sure there is much we can do to help. You’d have to provide it the equations at a minimum and probably explain to it how they should be used.

But there are definitely times where it can be very frustrating. I had a similar issue yesterday as you did. It made a code change and it wasn’t working how it was supposed to. I kept telling it the problem and it kept trying to fix it but failing. I gave up after far too long and looked at all the code changes it made since it was working correctly before. It just put a change slightly too far down in a process and all I had to do was move it up, wholesale, by like 10 lines and it fixed my problem. Like, how could it not figure out something that simple?

So, it’s not the best at actually fixing things but does work more often than not. But if you can tell it exactly what code is causing the problem and where you want it to be instead, it’ll fix it.

OwOarchist@pawb.social on 11 Apr 18:01 collapse

I find it works best when you give it a very small/simple task to do.

If it’s a small/simple task, why do I need help at all?

Alexstarfire@lemmy.world on 11 Apr 18:23 next collapse

Because it might be something that needs to be done in lots of places. Or it may just be something you don’t want to do so you fire it off then go look at or work on something else.

Now, that might be useless for your work flow, but not every tool is useful in every circumstance.

And you can still use it for larger tasks, but often I need to come behind it and clean up its work. Just like you would an intern or junior dev.

lepinkainen@lemmy.world on 12 Apr 09:26 collapse

Because the simple tasks are boring as fuck?

If an LLM can generate 90% of a HTTP API correctly, why would you want to do it manually?

OwOarchist@pawb.social on 12 Apr 17:19 collapse

If an LLM can generate 90% of a HTTP API correctly, why would you want to do it manually?

Because figuring out which 10% it did wrong and then fixing that will take longer and be more effort than just doing it from scratch myself.

lepinkainen@lemmy.world on 12 Apr 20:19 collapse

You must type really fast then 😅

I personally read code a fuckton faster than I write it. And tests are for determining correctness, reading is just a part of it.

jubilationtcornpone@sh.itjust.works on 11 Apr 16:52 next collapse

I rarely use LLM’s for generating code. Usually, by the time I’ve provided all the necessary context, I might as well have just written the code myself. I do use LLM’s for doing research. As long as it’s understood that the response is only as accurate as the source material, they often do a decent job of distilling down to what I’m actually looking for.

Gsus4@mander.xyz on 11 Apr 16:59 next collapse

Their usual (crap) defense is:

a) you’re not paying enough, so of course it is crap

b) you’re not prompting right, you need to use detailed, precise language…

c) that is just anecdotal evidence, you need to do an actual study, yadda yadda.

d) it will improve…

(any other anyone has noticed?)

solomonschuler@lemmy.zip on 17 Apr 14:49 collapse

English is cheap to replicate, there is no science to prompting it’s asking the goddamn question.

If AI companies are so keen in keeping humans like dumbasses, that’s an issue on their part not my fucking English.

bruce_babbler@lemmy.zip on 11 Apr 17:04 next collapse

You’re probably done with this. But if you give claude a test case or two (or have it try to make them) you can have claude run the test case, and then it will iterate.

Also, aggressively use plan mode and if claude screws up more than three times do /clear, explain that it’s screwing up to it and then give it new instructions.

Feyd@programming.dev on 11 Apr 17:08 next collapse

producing subtly broken junk

The difference between you and people that say it’s amazing is that you are capable of discerning this reality.

OwOarchist@pawb.social on 11 Apr 18:00 next collapse

What I don’t get, though, is how the vibe code bros can’t discern this reality.

How can they sit there and not see that their vibe-coded app just doesn’t do what they wanted it to do? Eventually, you’ve got to try actually running the app, right? And how do you keep drinking the AI kool-aid when you find out that the app doesn’t work?

Feyd@programming.dev on 11 Apr 18:10 next collapse

They’re the same people that copied code from stack overflow that you had to tell them how to actually fix every PR. The difference is the C suite types are backing them this time

Oisteink@lemmy.world on 11 Apr 20:05 next collapse

I do apps that work, i do patches that are production quality. Half the cs world does… I do full stack ai debugging of esp32 projects.

It’s a powerful tool, you just need to learn it’s strong and weak points, just like any other tool you use.

Kissaki@programming.dev on 12 Apr 08:20 collapse

Half the cs world does…

What’s the basis for this claim? I’m doubtful, but don’t have wide data for this.

Oisteink@lemmy.world on 12 Apr 10:32 next collapse

Rough estimate from my personal connections only. Some work places where ai is not possible, but all that have made an effort report good code. You need to work with what it is - a word generator that sometimes gives correct results. Make it research and not trust training. Never let it do things on its own, require a plan and reason. Make it evaluate its own work/plan.

Most issues i have stem from models beeing too eager. Restrain them and remove the “i can do this next…”behaviour.

Context is king - so proper mcp and documentation that is agent facing. I use serena as i can get lsp for yaml, markup and keep these docs like that

zbyte64@awful.systems on 12 Apr 21:13 collapse

Any luck with integrating platform.io? Have a esp32 project but VSCode can’t provide type hinting with it’s main c++ extension that is used by platform.io.

Oisteink@lemmy.world on 12 Apr 21:22 collapse

No sry. I only use idf, and use their create vscode files for lsp to work

And tmux + skills for idf.py work including debug. Also repl on console/uart - agents love cli - including this.

Imo mcp > pure skills for tmux

KeenFlame@feddit.nu on 12 Apr 11:56 collapse

Of course they do, it is hyperbole to think they are completely useless

Lumelore@lemmy.blahaj.zone on 11 Apr 21:00 next collapse

Vibe code bros aren’t real programmers. They’re business people, not computer people. Even if they have a CS degree, they only got that because they think it’ll get them more money. They lack passion and they don’t care about understanding anything. They probably don’t even care about what they’re generating beyond its potential to be used in a grift.

I graduated college not that long ago and my CS classes had quite a few former business majors. They switched because they think it’ll be more lucrative for them but since they only care about money they didn’t bother to actually learn the material especially since they could just vibe code through everything.

b_n@sh.itjust.works on 12 Apr 05:42 collapse

So much this.

After working in tech companies for the last 10 years I’ve noticed the difference between people that “generate code” and those that engineer code.

My worry about the industry is that vibe coding gives the code generators the ability to generate even more code. The engineers (even those that use vibe tools) are not engineering as much code by volume compared to “the generators”.

My hope is that this is one of those “short term gain, long term pain” things that might self correct in a couple of years 🤞.

sobchak@programming.dev on 13 Apr 07:48 collapse

It’s insane that companies are going back to metrics like LOC (or tokens generated), when the industry figured out decades ago that these are horrible, counterproductive metrics.

“The hard thing about building software is deciding what one wants to say, not saying it. No facilitation of expression can give more than marginal gains.” - No Silver Bullet (1986)

tleb@lemmy.ca on 11 Apr 21:47 next collapse

Eventually, you’ve got to try actually running the app, right?

At least at my company, no, they just start selling it.

pinball_wizard@lemmy.zip on 12 Apr 02:34 collapse

Yes. Exactly. In my experience, there’s more code shops that ship shit than that catch their mistakes.

favoredponcho@lemmy.zip on 13 Apr 04:05 collapse

You do try running the app, and then you see what is broken and then you have Claude fix it. The process is still iterative just like regular coding. I haven’t met a software engineer that wrote a perfect app the first try, its always broken, even in subtle ways. Why does everyone think vibecoding needs to be perfect on the first shot?

JustEnoughDucks@feddit.nl on 12 Apr 08:37 collapse

I wonder if it was even able to compile. I am a shitty hobby coder who just does it to make my embedded hardware projects function.

I have yet to get compilable code out of any of the AI bots I have tried. Gemini, mistral, and chatGPT. I am not making an account lol.

I have gotten some compilable python and VBA code for data analysis stuff at work, so I wonder if it is because embedded stuff uses specific SDKs that it can’t handle.

Either way I have given up on it for anything besides bouncing ideas off of or debugging where electromagnetics issues could lie (though it has been completely wrong about that also even though it is using the wrong concepts, it just reminds me of concepts that I might have overlooked)

lakemalcom@sh.itjust.works on 11 Apr 17:08 next collapse

I have yet to be able to vibe code anything relatively involved. The closest I’ve come is a ffmpeg wrapper script to edit out scenes from a video with a fade in/fade out title card. But even then, I ended up at some point having to debug and add my own arg support because it kept screwing things up. The first draft did do something, though.

I find at this point that it’s still only useful if I have a very clear goal in mind with a lot of context on the area I need to make changes to. That lets me get a more specific prompt, and then I’ll still need to review the output. I have only ever gotten a successful one shot like this with tests.

daesorin@programming.dev on 11 Apr 17:16 next collapse

I did the same today. With both Gemini and Claude, and all I can say is that coding is hell.

zerofk@lemmy.zip on 11 Apr 17:17 next collapse

“Almost but not quite” is exactly my experience with Claude.

The only time I’ve had real success is telling it to do a simple API change that touches a dozen files. It took a while and I’m not sure it was faster than doing it manually, but at least it was less boring.

Possibly important context: I only started really using it a few weeks ago.

dgdft@lemmy.world on 11 Apr 17:51 next collapse

Vibe coding, in the sense of telling the model to make codebase changes, then directly using the output produced, is 100% marketing bullshit that does not scale beyond toy examples.

Here’s the rub: Claude is extremely useful as an advanced autocomplete, if and only if you’re guiding it architecturally through every task it runs, and you vet + revise the output yourself between iterations. You cannot effectively pilot entirely from chat in a mature codebase, and you must compile robust documentation and instructions for Claude to know how to work with your codebase.

You also must aggressively manage information in the context window yourself and keep it clean. You mentioned going in circles trying to get the robot to correct itself: huge mistake. Rewind to before the error, and give it better instructions to steer it away from the pitfall it fell into. Same vein, you also need to reset ASAP after pushing into the >100k token mark, because the models start melting into putty soon after (yes, even the “extended” 1M-window ones).

I’m someone who has massively benefited from using modern LLMs in my work, but I’m also a massive hater at the same time: They’re just a tool, not magic, and have to be used with great care and attention to get reasonable results. You absolutely cannot delegate your thinking to them, because it will bite you, hard and fast.

For your use case (3D math), what I recommend is decomposing your end goal into a series of pure functions that you’ll string together. Once you have that list, that’s where Claude comes in. Have it stub those functions for you, then have it implement them one at a time, reviewing the output of every one before proceeding.

eodur@piefed.social on 12 Apr 00:54 next collapse

This is the most pragmatic take I’ve read and it resonates strongly with my own experience. Claude can be a very useful tool, but like any other there is a learning curve and often many sharp edges. I’ve had Claude build some reasonably complex code bases, but it takes work. Its pretty decent at “coding” but pretty terrible at the rest of software engineering.

something183786@lemmy.world on 12 Apr 02:05 collapse

My preferred way of using LLM coders is:

  • plan only
  • read the spec file I just wrote
  • optionally ask me questions in ‘qa.md’, I’ll reply inline Repeat until it stops asking me questions, then switch to a different model and ask again. I usually use both gpt5.3-codex AND Claude Sonnet

Then I have it update the spec. I start a new session to have it implement. Finally review the code. If I don’t like it, undo and revisit the spec. Usually it’s because I’m trying to do too much at once. And I need to break it down into multiple specs.

BehindTheBarrier@programming.dev on 12 Apr 10:29 collapse

Adversarial reviews are also great ways to prune bad ideas and assumptions from plans. Have helped me out greatly and made the better LLMs often go “plan said do X, but doing that is a unknown huge risk that may take longer then the rest of the plan”.

The superpowers plugin does the brainstorm, qa, design plan, implementation plan, implement, review quite well. It should aid the process of actually doing feature type work. I also add adversarial reviews into the process, saves a lot of time debugging what went wrong after implementation.

yardy_sardley@lemmy.ca on 11 Apr 18:02 next collapse

I used Opus 4.6 Extended

Stop being cheap, OP. You clearly just need to shell out multiple billions of dollars for access to mythos /s

pixxelkick@lemmy.world on 11 Apr 18:13 next collapse
  1. Did you have MCP tooling setup so it can get lsp feedback? This helps a lot with code quality as it’ll see warnings/hints/suggestions from the lsp

  2. Unit tests. Unit tests. Unit tests. Unit tests.

I cannot stress enough how much less stupid LLMs get when they jave proper solid Unit tests to run themselves and compare expected vs actual outcomes.

Instead of reasoning out “it should do this” they can just run the damn test and find out.

They’ll iterate on it til it actually works and then you can look at it and confirm if its good or not.

I use Sonnet 4.5 / 4.6 extensively and, yes, its prone to getting the answer almost right but a wrong in the end.

But the unit tests catch this, and it corrects.

Example: I am working on my own fame engine with monogame and its about 95% vibe coded.

This transform math is almost 100% vibe coded: github.com/SteffenBlake/…/TransformRegistry.cs

The reason its solid is because of this: github.com/…/TransformRegistryIntegrationTests.cs

Also vibe coded and then sanity checked by me by hand to confirm the math checks out for the tests.

And yes, it caught multiple bugs, but the agent automatically could respond to that, fix the bug, rerun the tests, and iterate til everything was solid.

Test Driven Development is huge for making agents self police their own code.

colournoun@beehaw.org on 11 Apr 18:32 next collapse

regress with old bugs

Have it write a test suite that enforces the correct behavior, and tell it that the test suite must pass after any change. Make sure it’s not cheating (return true) inside the test suite.

AlphaOmega@lemmy.world on 11 Apr 18:54 next collapse

This sounds on par for all the AI I have been dealing with. I find it works best if you give it a lot of rules, then treat it like a 12 year old and expect wild mistakes for anything more complicated than a simple calculator. I work primarily with Gemini and have it build simple HTML/CSS and it’s infuriating how many times I have told it to use &amp ; instead of &.
Now every time it does anything, it’s always telling me how it included the correct ampersand. It can’t tell me why it screwed up like 5 times prior, it just makes up some BS and apologizes profusely.
The more rules you give it, even if it ignores them sometimes, the better.

Oisteink@lemmy.world on 11 Apr 20:14 collapse

In my view it’s about quality and not quantity of the AGENTS/CLAUDE.md

My experience is that starting with what i dont want and the what i do works best. «never rely on training for API documentation, use context7» «don’t use ls/find/grep for symbols, use serena»

Not the best examples, but but.

homes@piefed.world on 11 Apr 19:03 next collapse

I tried using Claude to convert some bash scripts to docker compose files, and it made several mistakes with case-sensitivity and failure to properly encapsulate certain path declarations that had spaces in them. if it could make such incredibly simple mistakes in converting a script to a markup language, I wouldn’t dare trust it to actually compose anything in an actual programming language like Python or Rust or C# or Swift whatever you’re using.

zbyte64@awful.systems on 12 Apr 21:25 collapse

I have similar problems whenever I send it to investigate a bug and the local runtime is inside a container. It cannot reliably translate paths without the help of an IDE. Hell, it even occasionally mangles API paths if I have it prefixed elsewhere in the codebase (despite having Claude.md etc, your context needs to be pure for it to be reliable). Having it fix a Dockerfile is comically bad.

homes@piefed.world on 12 Apr 21:29 collapse

like… it fixed it when I called it out,, but it made the mistake again later on. I was only using it save time coverting, like, 11 or so files, but it made the mistake 3 or 4 times, not only with the encapsulation, but with the case-sensitivity, too. both with paths, although I couldn’t see any particular pattern to it.

just annoying, and I had to read through each compose file just to check it for errors. in the end it did save time, but much less that I thought it would.

zbyte64@awful.systems on 12 Apr 21:32 collapse

If it gets it wrong the first time I rarely reprompt. I know I can get it to fix it, but it’s usually faster for me to do it because I already figured out where and what to do the fix. Low key think it’s just a ploy to get us to burn more tokens. Sure correcting it means it writes a few lines to the memory file, but it’s only a matter of time before it trips over that context as well.

homes@piefed.world on 12 Apr 21:56 collapse

yeah, I also wonder if it’s a ploy. that’s really the only time I’ve used for any kind of code assistance, and I really haven’t used Claude much, overall, but it generally seems more capable than chatGPT, for example. it felt a bit strange that it would make such a simple mistake.

OpenStars@piefed.social on 11 Apr 19:21 next collapse

The solutions it generated were almost write every time, but critically broken in subtle ways, and any attempt to fix the problems would either introduce new bugs, or regress with old bugs.

This is part of your problem right there. The correct word there, instead of “write”, is “right”. You emotionally typed out a message, got your dopamine hit, then felt satisfied, and now the rest of us have to figure out what you meant to say.

Which is fine, but now imagine that not only you can do this, but AI can do it as well…

If you want something done correctly, then you must do it yourself.

infiniteface@programming.dev on 11 Apr 19:33 next collapse

opus 4.6 is a dream for me. Though I’m in the web dev area which is quite mature and with a lot of training data. The life saver to avoid regression is to comprehensively test your code. This works as a kind of quality checkpoint during development.

Secondly, give it the right tooling and context, that means at the very least a good acp server (editor) and appropriate mcp servers. Search for what’s appropriate in your domain. For 3d math, at the very least I’d think it would need a visual snapshotting tool. There are probably tons of relevant ones.

Thirdly, consistently expand on your CLAUDE.md, add and develop new skills as you go (let it write its own on your instructions). Force it to read them.

It probably depends on a lot of factors, but disciplined usage of these approaches will go a long way. Opus’ context window is huge, which makes the approach more consistent.

Prove_your_argument@piefed.social on 11 Apr 19:44 next collapse

Have you been coding professionally long?

I find that the only time I can use these chatbots for a task I really need to already know what i’m doing so that I can read the output and fix the issues. This is like having junior devs on your team and being a code reviewer more than being a full time coder. They get a lot of things wrong but there’s so much usable that you can save a ton of time over doing everything yourself from scratch.

Just like with junior devs, you can send them back to fix what you know is wrong and give them feedback to improve various things you would prefer done another way. There’s no emotions though, so you can just be blunt and concise with feedback.

GiorgioPerlasca@lemmy.ml on 11 Apr 19:56 next collapse

Nice comparison, but the bugs created by junior software developers are usually much easier to find than the bugs created by LLMs.

pinball_wizard@lemmy.zip on 12 Apr 02:26 collapse

They get a lot of things wrong but there’s so much usable that you can save a ton of time over doing everything yourself from scratch.

Your experience with Junior devs has been quite different from mine.

I work with Junior devs because someday they will be senior devs who owe me a favor, even though they’ve always only costed me time.

Edit: I also work with junior devs because sometimes a tiny corner of my job is both mind-numbingly boring, and also weirdly difficult to automate away.

I assign that work to junior devs because I don’t want to do it.

In doing so, I am wasting the boss’s money, since I could do it faster.

But I consider it but just another part of the price of hiring me, because it keeps me happy.

bluGill@fedia.io on 11 Apr 19:52 next collapse

Claude is very good when driven by someone who knows how to do the job and demands perfection. However if you give it a prompt and take the first result it is normally junk, make it iterate and things get better.

sirdorius@programming.dev on 11 Apr 20:36 next collapse

I’ve just started recently using Claude after being very unimpressed with Copilot, but my current theory is that you should treat everything it writes like a PoC that you found in some obscure github repo. Use it as a reference that you can generate quickly, take out only the good parts, adapt them to your context. It’s harder to delete code than to write it, so it’s easier to just take what you like from its output, rather than try to clean up all the nonsense it generates.

How accurate that is and how useful it is compared to just writing it from scratch varies a lot based on your particular project. You still need a good understanding of the output it produces, otherwise those subtle bugs and low quality adds up. The times it’s the most useful are when it writes a lot of stuff that I would’ve written myself, but I can point to some detail and say “that’s wrong, I’ll write it myself”.

ThirdConsul@lemmy.zip on 11 Apr 20:58 next collapse

.net runtime after 10 months of using and measuring where LLMs (including latest Claude models) shine reported a mindboggling success rate peaking at 75% (sic!) for changes of 1-50 LOC size - and it’s for an agentic model (so you give it a prompt, context, etc, and it can run the codebase, compile it, add tests, reason, repeat from any step, etc etc).

Except it was clearly bullshit because it didn’t work.

Welcome to the LLMs where everything is hallucinated and correctness doesn’t matter.

Is anyone having success with these tools

Define success.

Is there a special way to prompt it?

It gets better the more you use it, you will learn what works for you, and what does not. Right now the hot shit is “autonomous agent swarms” peddled by the token sellers as a way to output correct massive features. Do not touch that for now.

What helps with Claude / llms 101:

  • when it tells you something about an API, using a tool or whatever, tell it tool version and order it to give you documentation page proving the solution is possible.

  • when it oneshots a working solution you will get a dopamine hit. Be aware of that, as it can be addictive or make you trust it. Do not trust it, it sucks long term.

  • it will alwyas default to below average solution. Know where your hotspots are, and be extra judgy there.

  • it will get lazy and lie to you, especially with tests

  • it will not propose code refactors on its own.

  • despite the token peddlers claims, no matter if your using the 1M token context window model, the shit degrades when the context window is over 20k-30k tokens - so switch context windows often for better outcomes, but that means you will be burning more money - which obviously benefits the token peddlers.

  • do not trust the hype - so far any and all tall claim of a breakthrough from the token peddlers were a lie (e.g. vibing working os that can run Doom, vibing a next.js 96% replacement in a week, vibing a browser, compiler, vibing a browser jailbreak via Mythos)

Would I get better results during certain hours of the day?

Afaik USA timezone has worse performance.

Kissaki@programming.dev on 12 Apr 08:24 collapse

.net runtime after 10 months of using and measuring where LLMs (including latest Claude models) shine reported a mindboggling success rate peaking at 75% (sic!) for changes of 1-50 LOC size - and it’s for an agentic model (so you give it a prompt, context, etc, and it can run the codebase, compile it, add tests, reason, repeat from any step, etc etc).

I assume this is from …microsoft.com/…/ten-months-with-cca-in-dotnet-ru…?

ThirdConsul@lemmy.zip on 12 Apr 09:26 collapse

You assume correctly.

tohuwabohu@programming.dev on 11 Apr 21:20 next collapse

I use my own brain to sketch out what I want to work and how. Before writing any code, I use the LLM to point out gaps and how to close them. Pros and cons of certain decisions. Things you would discuss with colleagues. Then, I come up with a plan for the order I want the code to be written in and how to fragment that into smaller, easy to handle modules. I supervise and review each chunk produced, adapt code mostly manually if required, write the edge case tests - most importantly, run it - and move to the next. This is how I use it successfully and get results much faster than the traditional way.

At my job though I can witness how other people use it. I was asked to review a fully vibecoded fullstack app that contains every mistake possible. Unsanitizised input. Hardcoded tokens. Hardcoded credentials. 2500+ LoC classes and functions. Business logic orchestrators masquerading as service. Full table scans on each request. Cross-tenant data leaks. Loading whole tables into the memory. No test coverage for the most critical paths. Tests requiring external services to run. The list goes on. Now they want me to make it production ready in 8 weeks “because you have AI”.

My point: This was an endorphine fueled vibecoding session by someone who has no experience as developer, asked the LLM to “just make it work”, lacking the ability to supervise the work that comes with experience. It was enough to make it rum locally and pitch a “system engineered w/o any developer” to management.

Those systems need guidance just as a Junior would and I am strongly and loudly advocating to restrict access to this incredibly useful tool to people who know what they do. Nobody would allow a manager to use a laser cutter in a carpentry workshop without proper training, worst case is they will burn down the whole shack.

I appreciate you having a open mind about it at least. I needed some time to adjust as well. I don’t even use Opus, most of the time my workflow consistently produces usable code with Sonnet. Maybe you can try what I explained initially? Just don’t try any language you’re not familiar with, that will not end well.

Bonje@lemmy.world on 11 Apr 22:30 next collapse

Our work started giving Claude access. Plugging sonnet 4.6 in with opencode I had it do some terragrunt code. It was mostly correct. Highly documented languages seem to be its best. The modules I had it write cost 4 bucks of tokens total.

It just gave insane ick using it though. I might just resign to using it though because of our backlog and burn out.

athatet@lemmy.zip on 11 Apr 23:17 next collapse

The reason you kept going around in circles and reintroducing bugs you already got rid of is because LLMs don’t remember things. Every time you tell it something it tells it the entire conversation again so it has all the parts. Eventually it runs out of room and starts cutting off the beginning of the convo and now the llm can’t ‘remember’ what it was you were even talking about.

Railcar8095@lemmy.world on 12 Apr 05:41 next collapse

For that you can ask to update a documentation/status file on every update. You can manually add the goal and/or tasks for the future.

With that, I improved my success a lot even when starting new sessions (add in the instructions file to use this file for reference, so you don’t have to remind every time)

KeenFlame@feddit.nu on 12 Apr 11:54 collapse

Kind of, but it really depends on the workflow. Simple 3d math does not extend to a codebase that is impacted by context window

onlinepersona@programming.dev on 12 Apr 04:21 next collapse

It’s not called “correct” coding for a reason.

That’s why people are wrong so often: they feel like something is right, but don’t check. That’s how you get anti -vaxxers, manospere people, MAGA, QAnon, Brexit, etc.

kunaltyagi@programming.dev on 12 Apr 04:53 next collapse

Don’t jump right in to coding.

Take a feature you want, and use the plan feature to break it down. Give the plan a read. Make sure you have tests covering the files it says it’ll need to touch. If not, add tests (can use LLM for that as well).

Then let the LLM work. Success rates for me are around 80% or higher for medium tasks (30 mins–1 hour for me without LLM, 15–30 mins with one, including code review)

If a task is 5 mins or so, it’s usually a hit or miss (since planning would take longer). For tasks longer than 1 hour or so, it depends. Sometimes the code is full of simple idioms that the LLM can easily crush it. Other times I need to actively break it down into digestible chunks

stickyprimer@lemmy.world on 12 Apr 05:11 next collapse

almost write

Indeed.

sobchak@programming.dev on 12 Apr 06:11 next collapse

Key is having it write tests and have it iterate by itself, and also managing context in various ways. It only works on small projects in my experience. And it generates shit code that’s not worth manually working on, so it kind of locks your project in to being always dependent on AI. Being always dependant on AI, and AI hitting a brick wall eventually means you’ll reach a point where you can’t really improve the project anymore. I.e. AI tools are nearly useless.

ReallyCoolDude@lemmy.ml on 12 Apr 09:05 next collapse

I read a lot of these posts that sadly leave out the basic parts: what were your prompts? What does it means in this context ‘vibe coding’? Did you create an initial setup, and slowly build up? Did you left wverything to the agent understanding, and just pushed approve or reject? There are multiple levels of quality that depends on the input. Did you get into context rotting? 3d math means vector math, matrices, or what? Given claude has a serious problem from march at least, the way u use it is paramount. In our team we all use claude with copilot ( sadly, that is a business directive ), and while excpetional at finding small relationships in components and microservices, had to build a long list of skills just to be barely usable in a ‘star trek’ way. The bottom line is that it is that you must be extremely precise when asking. Prompt modeling count a lot. Context build as well. For now, unit tests and data/mocks refactors are working extremely well for me, when i define the tests cases. My agents got to a point where i can safely have small peoperty additions with refactors on multiple repositories at once ( ie: i change the contract on microservice a, microservices b,c,and d are automatically updated ). This last part had to.be built thoug, with memory, engrams, and some fune tuing. It is not always a shit: if not nobody would use it. It is not this revolutionary technology that will make humans obsolete as well ( as they are selling it ).

Michal@programming.dev on 12 Apr 09:37 next collapse

You can’t really just use Claude code raw. You have to give it detailed instructions, use Claude skills,observe results, update prompts. It can be just as consuming, but rather that doing the productive work, you’re just reviewing and correcting AI. People who have success using AI have invested time in their setup and are continuously adjusting it.

KeenFlame@feddit.nu on 12 Apr 11:51 collapse

But all in all extremely much faster. That’s the reason it is not useless. Everyone whines that it takes so much time when no it is not close to manual. It’s not a magic pill and you need the know how still, but no, it does not take “just as time consuming”. You are more productive. But yes, it is also more boring.

RamenJunkie@midwest.social on 12 Apr 17:44 collapse

The biggest benefit from LLM even just belping with coding is I never have to open the hellsite of assholes that is Stack Overflow.

Fuck SO forever.

okamiueru@lemmy.world on 12 Apr 20:28 collapse

What did SO do to warrant such emotion?

ArmchairAce1944@discuss.online on 12 Apr 20:59 next collapse

I didn’t use SO much, but the people can be… Difficult.

RamenJunkie@midwest.social on 12 Apr 23:58 collapse

This question has been asked 1000 times before if you were not so stupid you would have used the search and weeded through 10,000 results, most for outdated versions of your question, to find the answer but then you are using PHP instead of GoSwift++, the hot new flash in the pants .0001ms faster code language so of course you are stupid.

– Average SO reply bot

ArmchairAce1944@discuss.online on 14 Apr 06:20 collapse

Sums up my experience.

webkitten@piefed.social on 12 Apr 10:56 next collapse

Don’t just use it as a drop in replacement for a programmer; use it to automate menial tasks while employing trust but verify with every output it produces.

A well written CLAUDE.md and prompt to restrict it from auto committing, auto pushing, and auto editing without explicit verification before doing anything will keep everything in your control while also aiding menial maintenance tasks like repetitive sections or user tests.

Feyd@programming.dev on 12 Apr 11:20 collapse

verify with every output it produces.

I agree that you can get quality output using these tools, but if you actually take the time to validate and fix everything they’ve output then you spend more time than if you’d just written it, rob yourself of experience, and melt glaciers for no reason in the process.

prompt to restrict it from auto committing, auto pushing, and auto editing without explicit verification

Anything in the prompt is a suggestion, not a restriction. You are correct you should restrict those actions, but it must be done outside of the chatbot layer. This is part of the problem with this stuff. People using it don’t understand what it is or how it works at all and are being ridiculously irresponsible.

repetitive sections

Repetitive sections that are logic can be factored down and should be for maintainability. Those that can’t be can be written with tons of methods. A list of words can be expanded into whatever repetitive boilerplate with sed, awk, a python script etc and you’ll know nothing was hallucinated because it was deterministic in the first place.

user tests.

Tests are just as important as the rest of the code and should be given the same amount of attention instead of being treated as fine as long as you check the box.

webkitten@piefed.social on 12 Apr 11:39 collapse

I agree it’s not perfect; I still only use it very sparingly, I was just just saying as an alternative to trusting everything it does out of the box.

No1@aussie.zone on 12 Apr 11:51 next collapse

were almost write every time

Claude: You too are human, human.

BenevolentOne@infosec.pub on 12 Apr 12:33 collapse

If you make a spelling error, Claude thinks, “we’re doing low quality work”, and it does.

arthur@lemmy.zip on 12 Apr 12:04 next collapse

I’m using (Gemini 3.1 pro in) Gemini cli to build a complex (personal) project to explore how to use these tools. My impression is that the code produced by LLMs is disposable/throwaway. We need to babysit the model and be very hands on to get good results.

thedeadwalking4242@lemmy.world on 12 Apr 12:13 next collapse

I use it for tedious transformations or needle ones haystack problems.

They are better at searching for themes or concepts then they are at actually doing any “thinking tasks”. My rule is that if requires a lot of critical thinking then the LLM can do it.

It’s definitely not all they say it is. I think LLMs will fundamentally always have these problems.

I’ve actually had a much better time using it for in line completion as if recent. It’s much better when the scope of the problem it needs to “solve” ( the code it needs to find and compose to complete your line ) is like the Goldilocks zone. And if the answer it gives is bad I just keep typing.

I really hate the way LLM vibe coded slop is written and architecture. To me is clear these things have extremely limited conception. I’ve compared it to essentially ripping out the human language center, giving it a keyboard and asking it to program for you. It’s just no really what it’s good at.

x00z@lemmy.world on 12 Apr 12:17 next collapse

The trick about vibe coding is that you confidently release the messed up code as something amazing by generating a professional looking readme to accompany it.

wilmo@lemmy.ml on 12 Apr 12:36 collapse

The more Emojis in that Readme the better!

Evotech@lemmy.world on 12 Apr 12:49 next collapse

You need to use plan mode

JubilantJaguar@lemmy.world on 12 Apr 12:52 next collapse

Recently I used it (some free-tier DuckAI model, not Claude) to write a Python script for pasting PNGs into PDFs (complete with Tk interface) while applying a whole bunch of custom transformations. Simple enough, but a total chore with all the back-and-forth of searching for relevant unfamiliar libraries and syntax checking and troubleshooting. Inevitably it would have taken me the whole afternoon by hand. With AI I knocked it out in 25 minutes. That was my epiphany moment.

Since then I’ve noticed a general problem with AI coding. It almost always introduces too much complexity, which I then have to waste time untangling (and often just understanding) before I can proceed. Whereas if I had done it “my way” from the start I might have got there earlier. But I figure this problem is kinda on me.

thedogz22@thelemmy.club on 12 Apr 15:23 collapse

And for me, therein lies why my use of it has become reduced to a really complex rubber duck, or to write something out that I could do by hand, but making my robot butler do it is just faster. Anyone actually leaning into today’s generative AI models for generating code that requires complexity or thought… they shall reap what they sow in the years to come.

Blackmist@feddit.uk on 12 Apr 12:54 next collapse

I think it’s mostly going to be useful for boilerplate generation, and effectiveness is going to vary wildly based on what language you’re using. JS or Python? It’ll probably do OK. Plenty of open source for it to “learn” from. Delphi? Forget it.

Brief experimentation showed it liked to bullshit if it was wrong, rather than fix things.

shaggy@beehaw.org on 12 Apr 13:57 next collapse

I’ve had an opposite experience. Here are some guidelines I follow:

  1. Setup a foundation of rules and knowledge for Claude to fall back on. I define expectations, common definitions, behaviors and anything else that’s not project specific upfront.
  • in Claude.md I reference different domains of behavior, definitions, and rules (Claude has conventions for storing this type of stuff, so ask it to handle organizing information too)
  • create a top-level project definition: this defines what “knowledge” is. It allows you to build up what Claude knows later on as you work on your project. “Update knowledge”, “add this to your knowledge”, etc
  • create a top-level rule: all information in knowledge must have one source of truth. Whenever needed reference the original knowledge source instead of duplicating it. Now you can ask it to “review your knowledge”, "audit and flag knowledge"
  1. explicitly explain everything, leave nothing ambiguous; explain like you’re explaining the problem to a new developer who’s not familiar with the plan or codebase at all. Don’t ask it to write code right away. Ask it to write a plan/spec. Review the plan, make changes and discuss it until the plan is 100%. This plan can include implementation details if you’re ok with that, but it’s not necessary (sometimes I write a separate referenced file called implementation.md beside the plan and have the plan reference it.).
  • Your role as a developer is shifting from writing code, to writing specs, and reviewing code
  1. Once there is nothing left to describe, and no ambiguity in your plan, have it use your plan to write the code. This works amazingly well for me.

A benefit to this method is that there is less wasted effort on my part. If Claude writes the code wrong, I can trace the reason for the mistake to a gap in the plan. I can then update the plan, throw away the code (if I have to), and have Claude reimplement the code again.

Rinse and Repeat.


Keep knowledge, plans, and implementation details clearly separated (you can copy your latest successful knowledge files to new projects to get started on future projects even faster).

Keep the goals of each plan as small and granular as possible (easier the define plans). Knowledge, plans, and implementation details all get tracked in your repository just like your code does.


I’m a career developer, and have been writing code for over 20 years. I’m adding this bit because I understand how AI driven development can look like a threat to developers. Over this last year, I’ve had a shift in this thinking though. I can take what I’ve learned through my career and use it to inform writing successful specifications Claude can use to write effective code. Claude may not solve all of our coding problems, but if used effectively, it solves nearly everything you throw at it.

f3nyx@lemmy.ml on 12 Apr 16:25 next collapse

hey shaggy. I want to touch on your last point as a newer developer:

My department is finally seeing 10x development due to the shift of writing code to writing spec. The main issue is now our pipeline is stuck at review, so all that extra output is effectively wasted. Do you have any tips on what worked for you if you had a similar situation?

Senal@programming.dev on 12 Apr 16:39 next collapse

If you’re stuck at review you aren’t seeing 10x development, you’re seeing 10x code generation.

This is especially important because without the review/test/deploy part of the pipeline you aren’t actually seeing any progress towards business goals.

Once you do get these parts sorted, you can then look at what multiplier you’re seeing.

That’s not to say there isn’t an improvement in your workflow, just that you can’t say with any certainty what kind of improvement without measuring the end to end.

It might turn out that the rest of the pipeline is way easier , in which case your multiplier will be higher, it might also be much harder, in which case the multiplier will be lower.

I’m not taking shots, i mean it seriously, especially if you need to report any of this to the rest of the business.


edit : In addition, if it turns out that review is going to be a bottleneck you can get extra resource pointed in that direction which will benefit the workflow overall.

another edit: i would consider correctly managing the expectations of those you report to as a vital skill.

Dangerhart@lemmy.zip on 12 Apr 16:58 next collapse

Exactly this. My experience with our companies wrapper on Claude lines up with OP, not this comment thread.

Everyone seems to forget everything you write is a liability. You can’t have bugs in code that is never written or generated, comments that don’t exist never become inaccurate, not duplicating “knowledge” into a repo doesnt have a risk of not aligning with business goals long term as they change.

From what I’ve seen, people claiming a “10x increase” did not have a strong foundation to begin with and/or did not utilize tools like IDEs effectively. No offense to thread OP, which seems itself a generated response, but in the time he has done all of that a strong engineer would be long done. Everything listed should be done before ever getting into code along with business and product partners.

zbyte64@awful.systems on 12 Apr 20:54 collapse

Everything listed should be done before ever getting into code along with business and product partners.

Ehh, it really depends on where the risk is and the problem is LLMs can’t evaluate for that unless you feed it everything. Some projects need code experiments before you settle on an architecture, but that’s only if you’re a pioneer (which frankly is where the money is at).

f3nyx@lemmy.ml on 12 Apr 20:05 collapse

that’s a very good distinction, absolutely. its just code generation at this stage.

the review was the bottleneck before (as I believe was already the case for many companies) but now with 10x the code generated for review, the bottleneck has turned into a dripping faucet.

shaggy@beehaw.org on 13 Apr 02:13 collapse

This is our new bottleneck too. Developers roles are shifting to spec writers and code reviewers more and more. I don’t think I’d call this wasted effort though (unless the code produced is worse than what developers would have produced otherwise). I’d think of it as a good problem to have.

We’re doing several things to alleviate this, and I’m genuinely curious how other teams are handling this too.

  • We have Claude running code reviews on our PRs too 😄. In our department, a PR isn’t expected to be reviewed by a dev until the author has addressed or reviewed and dismissed all of the issues Claude has brought up.
  • There is pressure for developers on our team to become better reviewers. I think this is good, because reviewing code is a more valuable skill to prospective employers than writing it is anyway.
f3nyx@lemmy.ml on 13 Apr 03:45 collapse

thanks for the response. for what its worth, most people I ask this question to are attempting some form of your first bulletpoint. I think we’re on the right track there, it only makes sense.

speaking for myself, your second point is the silver lining of all this, to me. ive never had this kind of pressure before, but I hope that its the kind of pressure that makes me a better dev instead of burning me out.

cheers!

RamenJunkie@midwest.social on 12 Apr 17:42 collapse

Yeah, I wonder sometimes if people who fail tonget usable code are

1- Asking it to do much at once.

2- Don’t actually understand the problem or coding enough to ask it to do the right thing.

If you say, “Write a program that does X”, it will most likely fail to give you what you want.

It works great if you break it down into parts.

Write a program that takes this databas input and converts it this way.

Now uodate it so the output gets displayed this way.

Adjust the colors.

Add the ability to save the output in this format.

This looks good but I need to swap these parts of the output and add this data to the output.

That sort of iterative, step by step process. Or even just, when there are bugs, give it the error output, explain that X needs to be Y. Also, at some point you may need to also look at the code. I had an issue where it was running twice on some data and after looking at it I realized it was processing things as images and links, because the links had images (but images did not always have links). I explained this problem, and pointed to where it was and it fixed it.

RamenJunkie@midwest.social on 12 Apr 17:35 next collapse

I would love to read the chat logs.

silver@das-eck.haus on 12 Apr 18:06 next collapse

I think it’s pretty heavily dependent on what you’re trying to do. I’ve gotten a lot of push from higher ups at my company to use copilot wherever possible. So, I’ve spent a lot of time lately having copilot + opus write code for me. Most of what I’m doing is super straightforward middleware APIs or basic internal front ends. Since it has access to very similar codebases for reference, and we have custom agents that point it in the right direction, it’s a pretty good experience.

However, if I ask it to do something totally new, it does okay, more like what you’ve experienced. It takes a lot of hand holding, but it usually gets the job done as long as you’re very descriptive in your prompt. Probably not faster than an experienced developer at the moment though

saplyng@lemmy.world on 12 Apr 18:09 next collapse

I’ve also started using it recently and I’m not sure if the way I’m doing it is particularly “right”.

I don’t have a lot of knowledge of practical coding practices because in school we literally had a new project every two weeks so I never learned things like you need unit tests or proper architectural design. It was mostly making sure whatever project there was that week ran and didn’t crash.

So now I’m working as a sysadmin doing the random junk a sysadmin gets pushed on them. What I’ve been doing is telling it my project plan, Claude will write up something that looks better, and I continue to have a back and forth about architecture and libraries, asking it if it thinks any particular idea is good or bad, until I get to a place I’m happy.

Then because I want to learn rust and implement it myself, I’m having Claude basically guide me through creating it like a teacher would, with it taking on a very Socratic tone (“now that we’ve done this, what do you think is the next step?” “We have a list of CSVs so what do you need to do to read their values?”). And I’ve been moving forward but by bit like this.

I don’t know if it’s a particularly good way, honestly, I’d love feedback from anyone who’s done something similar or whatever!

zbyte64@awful.systems on 12 Apr 20:48 next collapse

In my experience there are three ways to be successful with this tool:

  • write something that already exists so it doesn’t need to think
  • do all the thinking for it upfront (hello waterfall development)
  • work in very small iterations that doesn’t require any leaps of logic. Don’t reprompt when it gets something wrong, instead reshape the code so it can only get it right

The issue with debugging is that it doesn’t actually think. LLMs pattern match to a chain of thought based on signals, not reasoning. For it to debug you need good signals in your code that explicitly tell what it is doing and the LLMs do not write code with that level of observability by default.

Edit: one of my workflows that I had success with is as follows:

  • write a gherkin feature file describing desired functionality, maybe have the LLM create multiple scenarios after I defined one to copy from
  • tell the LLM to write tests using those feature files, does an okay job but needs help making tests run in parallel.
  • if the feature is simple, ask the LLM to make a plan and review it
  • if the feature is complex then stub out the implementation in code and add TODOs, then direct the LLM to plan. Giving explicit goals in the code itself reduces token consumption and yield better plans
spartanatreyu@programming.dev on 13 Apr 01:49 collapse

write something that already exists so it doesn’t need to think

If something already exists, it shouldn’t need to be rewritten.

Doing otherwise is a sign that something has gone wrong.

That was the case before LLMs and it is still the case today.

zbyte64@awful.systems on 13 Apr 02:56 next collapse

Absolutely. It’s amazing how many articles showcasing vibe coding is just people reinventing things like a password generator.

CCMan1701A@startrek.website on 13 Apr 12:03 collapse

What they mean is rewrite something that has a LICENSE my company can’t use.

spartanatreyu@programming.dev on 14 Apr 02:19 collapse

If the rewrite is based on something which has a license that your company can’t use, then the rewrite likely can’t be used either

CCMan1701A@startrek.website on 15 Apr 02:34 collapse

I’m pretty sure if code is AI generated it’s likely considered original, but I’m not a lawyer by any stretch.

spartanatreyu@programming.dev on 16 Apr 05:19 collapse

Only something created by a human can be copyrightable. (See the copyright status of monkey who took a selfie for precedent).

Any code written by an LLM is not copywritable because a human did not write it.


Also the company that trained the LLM is likely in breach of the licenses the code palls under.

ozymandias@sh.itjust.works on 12 Apr 20:50 next collapse

you need to fully be able to program to work with these things, in my experience.
you have to explain what you want very specifically, in precise programming terms.

i tried a preview of chatgpt codex and it’s working better than my free version of claude, but codex creates a whole virtual programming environment, you have to connect it to a github repository, then it spins up an instance with tools you include and actually tests the code and fixes bugs before sending it back to you.
but you still need to be able to find the bugs and fix them yourself.

oh and i think they work best with python, but i’ve also used ruby and dart and it’s decent.
it’s kinda like a power tool, it’ll definitely help you a lot to fix a car but if you can’t do it with wrenches it won’t help very much.

quixote84@midwest.social on 12 Apr 21:53 collapse

I’ve never been able to program in anything more complex than BASIC and command line batch files, but I’m able to get useful output from Claude.

I’m an IT Infrastructure Manager by trade, and I got there through 20 years of supporting everything from desktop to datacenter including weird use cases like controlling systems in a research lab. On top of that, I’ve gotten under the hood of software in the form of running game servers in my spare time.

What you need to get good programs out of AI boils down to 3 things:

  1. The ability to teach an entity whose mistakes resemble those of a gifted child where it went wrong a step or ten back from where it’s currently looking.
  2. The ability to provide useful beta test / debug output regarding programs which aren’t behaving as expected. This does include looking at an error log and having some idea what that error means.
  3. Comfort using (either executing or compiling depending on the language) source code associated with the language you’re doing things in. This might be as simple as “How do I run a Powershell script or verify that I meet the version and module requirements for the script in question?”, or it might be as complicated as building an executable in Visual Studio. Either way whatever the pipeline is from source to execution, it must be a pipeline you’re comfortable working with. If you’re doing things anywhere outside the IT administration space, it’s reasonable to be looking at Python as the best first path rather than Powershell. Personally, I must go where supported first party modules exist for the types of work I’m developing around. In IT Administration, that’s Powershell.

I’ve made tools which automate and improve my entire department’s approach to user data, device data, application inventory, patch management, vulnerability management, and these are changes I started making with a free product three months ago, and two months back I switched to the paid version.

Programming is sort of like conversation in an alien language. For that reason, if you can give precise instructions sometimes you really can pull something new into existence using LLM coding. It’s the same reason that you could say words which have never been said in that specific order before, and have an LLM translate them to Portuguese.

I always used to talk about how everything in a computer was math, and that what interested me more than quantum computing would be a machine which starts performing the same sorts of operations on words or concepts that computers of that day ('90s and '00s when “quantum” was being slapped on everything to mean “fast” or “powerful”) were doing on math. I said that the best indicator when linguistic computing arrives would be that without ever learning to program, I’d start being able to program. I was looking at “Dragon Naturally Speaking” when I had this idea. It was one of the earliest effective speech to text programs. I stopped learning to program immediately and focused exclusively on learning operations from that point forward.

I’ve been testing the code generation abilities of LLMs for about three years. Within the last six months I feel like I’m starting to see evidence that the associations being made internally by LLMs are complex enough to begin considering them the fulfillment of my childhood dream of a “word computer”.

All the shitty stuff about environment and theft of art is all there too, which sucks, but more because our economic model sucks than because LLMs either do or do not suck. If we had a framework for meeting everybody’s basic needs, this software in its current state has the potential to turn everyone with a passion for grammatical and technical precision into a concept based developer practically overnight.

Feyd@programming.dev on 12 Apr 22:08 next collapse

I’ve never been able to program in anything more complex than BASIC and command line batch files, but I’m able to get useful output from Claude.

Chatbots being deemed useful in tasks by people unqualified to make those judgments is a running problem.

eneff@discuss.tchncs.de on 13 Apr 09:14 collapse

I have no qualifications to judge the quality of the generated results, yet the generated results are always of great quality.

Do you seriously not realize how out of touch this sounds?

quixote84@midwest.social on 13 Apr 20:39 collapse

Of course it sounds out of touch. I didn’t say it, or anything like it. Just like the other commenter, you seem to have stopped after the first sentence.

20 years of IT experience from a support perspective does qualify me to put anybody in the programming space on notice. The tools might not be as good as a talented and well trained dev, but they’re already better than a lazy dev. The output I get from Claude Code takes effort to get running. It just takes less of it than the output from my outsourced offshore MSP.

Flames5123@sh.itjust.works on 12 Apr 22:05 next collapse

I have a full pro model for Kiro at work. It does actually work, but we have custom MCP servers for all the internal tools, context on how to use these tools, style guidelines, etc. and then on top of that we have a lot of AI context files in the code base to help the AI understand the code base and make the correct changes.

I’ve been using it on a side project and it works if you know how to constrain it. It does get things wrong a lot. But the big thing about it is doing spec driven development where you give it a write up and it makes a requirements doc and a design doc with a lot of correctness properties in them to follow when generating and making the tasks.

I don’t believe people can vibe code unless they can actually code. It’s a whole different way of coding. I still manually edit what it does a lot.

A lot of people explain it like it’s a brand new junior developer. You need to give it as much context as possible, tell it to exactly what you want, tell it what you don’t want, tell it why, etc. and it still may not listen exactly.

ZoteTheMighty@lemmy.zip on 12 Apr 22:20 next collapse

That’s been my experience. It’s always subtlely wrong, its solutions are hard to maintain, and if you spend too much time with it, it starts forgetting what you said earlier. Managers don’t understand the distinction, they already can’t code well, and only test it in small problems where it’s not context-limited, so they’re amazed.

thirstyhyena@lemmy.world on 13 Apr 00:30 next collapse

I recently started using Pro to debug a problem I couldn’t solve. The one thing I need from it is an extra insight, a second opinion (because I’m the only developer), and it allowing me to let it read the whole folder helps, it identified a problem I didn’t consider because it’s a file outside of where I was looking.

tristynalxander@mander.xyz on 13 Apr 01:24 next collapse

Also working on some 3d maths.

I’ve used the free versions a bit, but not really to the extent that I’d call it vibe coding. The chat bots often know where to find libraries or pre-existing functions that I don’t know. It’s also okay at algorithms for well defined problems, but it often says be careful not to do something I absolutely need to do or visea versa. It’s very hit and miss on debugging. It’ll point out obvious stuff (typos) reliably, and it can do some iteration stuff usually, but it usually doesn’t pick up on other things. Once in a rare while it will impress me by suggesting I look at a particular thing, and I think it manages this better in new chats, but most complex issues fail for it. I use it as a faster stackoverflow, but you need to be able to work through the code yourself, understand what you’re doing, and test that individual steps are doing what they need to do. The bots can’t really do any sort of planning or breaking down a problem into sub-problems, and they really suck at thinking about 3d stuff.

drmoose@lemmy.world on 13 Apr 01:49 next collapse

It’s a tool that you need to learn. Try some of claude.md files people share online for your programming area as a starter. You still need to review what it does but just asking for it to create tests as it creates code does a lot to improve output.

favoredponcho@lemmy.zip on 13 Apr 04:25 next collapse

I use it and it works. It doesn’t give you the right result in one shot, but neither does manual coding. You iterate and prompt again and again. In the end, it saves a ton of time. Engineers are definitely going to lose their jobs because fewer people are needed. I know its tough to accept this and people will go through denial. Part of that is saying the AI code is junk. But, you’ll find it can produce junk and quickly fix it into the right solution faster than an engineer can. It sucks, but this is the new reality. The one thing that is cool once you embrace it is that you realize you can customize your favorite apps or even build anything you want from scratch.

speculate7383@lemmy.today on 13 Apr 05:01 next collapse

customize your favorite apps

can you elaborate?

favoredponcho@lemmy.zip on 13 Apr 05:41 collapse

Github is full of open source apps. Some times the maintainer won’t add a feature you want. You can just clone the repo and ask Claude to do it and then run your own version of it.

baatliwala@lemmy.world on 13 Apr 06:12 next collapse

I think the last part you said is the best way to use LLMs. I am not confident in it building complex architectures but if you want to make a dedicated single use script or a very customised basic application for personal use, it will do it well

echodot@feddit.uk on 13 Apr 07:10 next collapse

You still need programmers because you need people proficient in programming to be able to tell how to fix the junk that it generates into working code.

favoredponcho@lemmy.zip on 13 Apr 11:37 next collapse

Sure, but like I said, it will be fewer.

Chais@sh.itjust.works on 13 Apr 16:00 collapse

And since the chat bot produces entry level code at best those are the positions that will be dropped, starving the field of newcomers in the long run.

lichtmetzger@discuss.tchncs.de on 13 Apr 07:34 collapse

It sucks, but this is the new reality.

Sorry mate, but you drank the AI koolaid from Sam Altman and the other tech oligarchs. The reality is that all of the major AI companies are deep in the red, OpenAI isn’t even making a profit with the 200$ subscription.

The only reason people are able to burn thousands of tokens to vibecode their apps is that they don’t have to pay the price for that, the companies are. This money will run out soon and then we will see the real cost for the bigger models.

If a subscription for Claude Code costs 500$ or even 1000$, will companies still pay for it or let actual humans do the work? We will see. I seriously doubt it, and I don’t want to depend on a subscription-based service to do my work while my skills are atrophying. Thank god my employer doesn’t force me to use AI.

Engineers are definitely going to lose their jobs

This kind of fear-mongering is what I despise most about the whole bubble.

favoredponcho@lemmy.zip on 13 Apr 11:57 collapse

I haven’t drank Koolaid. I’m talking from my experience using it in my professional software engineering job where I lead software projects. I’ve built things that used to take 20 weeks in 1 week with Claude. My employer does not really care about the cost of the tokens. And, when they can have one engineer do 20 weeks of work in 1 week, that to them is actually a cost savings. I already ask myself the question … Should I give this task to another engineer or just vibecode it myself?

OpenAI may not survive because they do have financial issues from overspending, but that barely matters. The company with the strongest coding LLM is Anthropic and it doesn’t sound like they’re having financial difficulty. Either way, now that it is clear what is possible, some company will succeed.They have incentives to do it.

Like I said, it will suck for some people, but its hard to deny the reality at this point.

lichtmetzger@discuss.tchncs.de on 13 Apr 14:48 collapse

I’ve built things that used to take 20 weeks in 1 week with Claude.

That’s ridiculous. You’ve either been a bad coder even before the AI hype or you’re simply lying. I have used these tools and they’re not that good or make you that fast - except when you’re just merging all of the proposed code blind and hope for the best. I fear for the future colleagues who will have to work with the raging dumpster fire you have created for them.

The company with the strongest coding LLM is Anthropic and it doesn’t sound like they’re having financial difficulty

Oh yes, they have the same problems OpenAI has. Just look into the vibecoding subreddits, you can see many people complaining about excessive rate limits and their models getting dumber. A healthy company wouldn’t try to put a cap on the token useage and introduce peak-hour throttling, that’s a big warning sign that they’re overspending as well.

its hard to deny the reality at this point

I only see one person here denying reality. You will be effed in a major way when your employer one day decides that the subscriptions are too expensive or tell you to limit your token useage.

favoredponcho@lemmy.zip on 13 Apr 16:47 collapse

I know it is a big change and will take some time to come to terms with it. But, it is here. I’m not going to argue anymore. It’s pointless.

<img alt="" src="https://lemmy.zip/pictrs/image/4efa1076-6d11-45ae-b602-c134edbed182.avif">

lichtmetzger@discuss.tchncs.de on 13 Apr 17:18 collapse

Did you just pull a random infographic out of your ass without even mentioning the source? I reverse-searched it and it comes from Anthropic, of all places - the guys that run Claude Code.

Forbes took a look at that study, I love this money quote from it:

These flaws turn Anthropic’s dataset into an overstated labor-market conclusion. The study’s findings do not have the level of reliability required to sustain the breadth of the headline framing, because each conclusion rests on an exposure measure whose scope (1), construction (2, 3, 4, 5, 7), and interpretation (6, 8, 9, 10) remain contested.

So yeah, an AI company telling us that AI will theoretically replace our jobs, based on their own study with flawed data - damn, that’s trustworthy! /s

I’m not going to argue anymore. It’s pointless.

At least on this point we agree.

rosco385@lemmy.wtf on 13 Apr 07:16 next collapse

The solutions it generated were almost write every time

Did you vibe code this post? 😂

CCMan1701A@startrek.website on 13 Apr 12:05 next collapse

I use AI for researching what existing software or projects exist to help my build up my system that I then suffer through making.

TBi@lemmy.world on 13 Apr 16:28 next collapse

You just didn’t use the right prompts!!!

/s

Jayjader@jlai.lu on 14 Apr 18:38 next collapse

I haven’t tried any Anthropic models personally.

So far, between the free online chats by OpenAI and DeepSeek, and the smaller models I’ve run on my own machine, the most useful things I have gotten from it were to treat it as an overeager student that lacks the first-hand experience needed to see the big picture, asking it questions that I’m pretty sure I already know the answer to and seeing if 1) it “understands” what I’m getting at and 2) it can surprise me with a viewpoint I hadn’t thought of before.

Using them to double-check my own ideas seems to be marginally useful, especially when there’s no qualified human being whose attention I can borrow. Using them as a sort of semantic web search can sometimes get me what I’m looking for faster than Google. If anything, they’re an opportunity to exercise critical thinking; if I can tell where it’s getting things wrong I can be fairly confident that my own understanding of the problem/subject is pretty solid.

Vibe coding, though? I have yet to see it work out. Maybe as some starting slop so that I can get to work refactoring code (and get the ideas flowing) instead of staring at a blank file.

solomonschuler@lemmy.zip on 17 Apr 14:38 next collapse

my experience with LLM’s and numerical computations like with MATLAB or GNU octave, has been poor. I assume its more of an issue that the data isn’t there, MATLAB has it’s own proprietary AI (which I don’t believe is trained on users code) and Octave has no AI associated on it’s end so the major LLM’s only get trained by the data it is prompted by users online or otherwise. Which is why if you prompt it to do a 3D plot, it will almost always pull something out of it’s ass.

your feeling of a “mental-fog” is my experience with AI in general, the language model explains the ideas well, but then the code editor does some obscure move that makes no fucking sense. also, because you’re not programming it and learning from your mistakes it makes you uncertain of your code. its unfortunate to see search engines are going to shit because of AI, because AI is not ready.

alex_riv@lemmy.org on 19 Apr 13:45 collapse

my experience has been similar for complex tasks but the sweet spot for me is small, well-defined scripts where i can verify the output easily.

like: i needed to parse some music metadata and normalize it across a few formats. gave it a spec, it produced something that mostly worked. i spent maybe 20 minutes fixing edge cases instead of 3 hours writing it from scratch. that exchange of time is what sold me.

for 3d math where correctness is hard to verify at a glance, i wouldn’t trust it either. the tool is only as useful as your ability to test its output quickly.