Local LLM agents
from Kkk2237pl@lemmy.world to programming@programming.dev on 28 May 10:09
https://lemmy.world/post/47446983

Has anyone tried in organization to use self hosted llm models for agentic programming?

Im curious if it makes any sense. My organization spends fortune on tokens from us companies. I want to recommend something…

#programming

threaded - newest

FishFace@piefed.social on 28 May 10:47 next collapse

As far as I understand, the only way to get anything resembling usable output for coding is with massive, expensive, labouriously hand-tuned models, not local ones.

Kkk2237pl@lemmy.world on 28 May 10:59 next collapse

I see that qwen 3.5 has pretty good performance and can be run on macbook with 64GB ram

Penta@lemmy.world on 28 May 11:14 next collapse

Qwen 3.6 is even better

SmoothLiquidation@lemmy.world on 28 May 14:36 collapse

I have played with qwen3-coder:30b for my hobby stuff running on my M5 max MacBook and it does alright. It is fast enough and I used ollama tools to let it request files. I haven’t used anything like Claude code to compare it to though, only a bit of the ChatGPT free tier stuff.

Jestzer@lemmy.world on 28 May 11:48 next collapse

^^^ This. Tragically, locally run LLMs don’t even hold a candle to “good” cloud-based LLMs like Claude Code.

irelephant@lemmy.dbzer0.com on 28 May 14:41 next collapse

Deepseek is pretty good the few times I tried it.

locuester@lemmy.zip on 28 May 15:38 collapse

Qwen 3.6 27B dense is really good. Very usable coding output

spectrums_coherence@piefed.social on 28 May 10:56 next collapse

If you just want to avoid U.S. company, you can try mistrialAI.

87Six@lemmy.zip on 28 May 11:42 next collapse

Models running within the constraints of a dev machine have no chance

If you want this, you need a company AI server with enough performance to support the entire team at once, and it will probably still be worse than using a cloud one. Though it MIGHT pay for itself in… A while

Kkk2237pl@lemmy.world on 28 May 11:49 next collapse

How about qwen 3.6 and MacBook with 64GB ram?

I thought about that AI server, but idk how to calculate how long it pay for itself…

87Six@lemmy.zip on 28 May 15:38 collapse

I mean… RAM? Don’t you need mass VRAM for this kind of thing? Or are they shared on Mac?

idk how to calculate how long it pay for itself…

You don’t… Not in this industry. You guess and hope it goes in your favor.

No calculations matter if the market can jump or drop by 300% in a few months… And that applies to programming, hardware prices, AI subscription prices, regulations between countries when Trump is in office…

Venat0r@lemmy.world on 28 May 12:06 collapse

it MIGHT pay for itself in… A while

considering all the cloud ones are currently running at a loss, and hardware prices are way inflated: I doubt that.

87Six@lemmy.zip on 28 May 14:11 collapse

If you think long term as a company that uses AI that’s the way to go anyway, your own AI server.

But alas, nobody cares about the long term, because the cunts at the top of the AI stack always make sure to make things so volatile that the little guy can never survive past the short term.

The solution? Oh just pay the big corporations to be dependent on them and not build your own thing. Surely that will help.

You have to realize, that by your own words, AI subscription prices will skyrocket eventually. So the cost analysis of your own AI server has to take that into account too, not just the current prices and current upfront cost.

eager_eagle@lemmy.world on 28 May 12:07 next collapse

Qwen 3.6 and gemma4 models are the only ones usable for agentic prog sessions that I and my employer run locally. It’s less stable and slower than third-party services, even on much better hardware (as it’s with my employer). The best way is to go with a provider hosting deepseek flash/pro if your privacy policy allows though. It’s going to be hard to beat their price.

onlinepersona@programming.dev on 28 May 16:12 collapse

I thought those didn’t support tool calling. Has that changed?

mesamunefire@piefed.social on 28 May 13:09 next collapse

Ive played around with a couple, mostly from hugging face. Some of the minimal modelsvare halfway decent at SQL and some specific ones are good with templates and html. You cam string them up for agentic work without issue. I found the performance worse than generation tools for the same software tasks. It was neat to try though.

HelloRoot@lemy.lol on 28 May 13:20 next collapse

GLM is pretty good in mg experience, the company I currently freelance at runs it locally (in house server room) for compliance reasons. But it needs very beefy hardware.

eleijeep@piefed.social on 28 May 13:52 collapse

My organization spends fortune on tokens

Perhaps recommend that they spend the money on hiring competent staff instead.