Local LLM agents
from Kkk2237pl@lemmy.world to programming@programming.dev on 28 May 10:09
https://lemmy.world/post/47446983
from Kkk2237pl@lemmy.world to programming@programming.dev on 28 May 10:09
https://lemmy.world/post/47446983
Has anyone tried in organization to use self hosted llm models for agentic programming?
Im curious if it makes any sense. My organization spends fortune on tokens from us companies. I want to recommend something…
#programming
threaded - newest
As far as I understand, the only way to get anything resembling usable output for coding is with massive, expensive, labouriously hand-tuned models, not local ones.
I see that qwen 3.5 has pretty good performance and can be run on macbook with 64GB ram
Qwen 3.6 is even better
I have played with qwen3-coder:30b for my hobby stuff running on my M5 max MacBook and it does alright. It is fast enough and I used ollama tools to let it request files. I haven’t used anything like Claude code to compare it to though, only a bit of the ChatGPT free tier stuff.
^^^ This. Tragically, locally run LLMs don’t even hold a candle to “good” cloud-based LLMs like Claude Code.
Deepseek is pretty good the few times I tried it.
Qwen 3.6 27B dense is really good. Very usable coding output
If you just want to avoid U.S. company, you can try mistrialAI.
Models running within the constraints of a dev machine have no chance
If you want this, you need a company AI server with enough performance to support the entire team at once, and it will probably still be worse than using a cloud one. Though it MIGHT pay for itself in… A while
How about qwen 3.6 and MacBook with 64GB ram?
I thought about that AI server, but idk how to calculate how long it pay for itself…
I mean… RAM? Don’t you need mass VRAM for this kind of thing? Or are they shared on Mac?
You don’t… Not in this industry. You guess and hope it goes in your favor.
No calculations matter if the market can jump or drop by 300% in a few months… And that applies to programming, hardware prices, AI subscription prices, regulations between countries when Trump is in office…
considering all the cloud ones are currently running at a loss, and hardware prices are way inflated: I doubt that.
If you think long term as a company that uses AI that’s the way to go anyway, your own AI server.
But alas, nobody cares about the long term, because the cunts at the top of the AI stack always make sure to make things so volatile that the little guy can never survive past the short term.
The solution? Oh just pay the big corporations to be dependent on them and not build your own thing. Surely that will help.
You have to realize, that by your own words, AI subscription prices will skyrocket eventually. So the cost analysis of your own AI server has to take that into account too, not just the current prices and current upfront cost.
Qwen 3.6 and gemma4 models are the only ones usable for agentic prog sessions that I and my employer run locally. It’s less stable and slower than third-party services, even on much better hardware (as it’s with my employer). The best way is to go with a provider hosting deepseek flash/pro if your privacy policy allows though. It’s going to be hard to beat their price.
I thought those didn’t support tool calling. Has that changed?
Ive played around with a couple, mostly from hugging face. Some of the minimal modelsvare halfway decent at SQL and some specific ones are good with templates and html. You cam string them up for agentic work without issue. I found the performance worse than generation tools for the same software tasks. It was neat to try though.
GLM is pretty good in mg experience, the company I currently freelance at runs it locally (in house server room) for compliance reasons. But it needs very beefy hardware.
Perhaps recommend that they spend the money on hiring competent staff instead.