deddit.petersanchez.com

Checking....what's the status for FOSS agentic AI models with skills?
from iturnedintoanewt@lemmy.world to selfhosted@lemmy.world on 14 Apr 13:28
https://lemmy.world/post/45592254

So…with all this openclaw stuff, I was wondering, what’s the FOSS status for something to run locally? Can I get my own locally run agent to which I can ask to perform simple tasks (go and find this, download that, summarize an article) or things like this? I’m just kinda curious about all of this.

Thanks!

#selfhosted

threaded - newest

9tr6gyp3@lemmy.world on 14 Apr 13:39 next collapse

wiki.archlinux.org/title/Ollama

Ollama is an application which lets you run offline large language models locally.

[deleted] on 14 Apr 13:46 next collapse

Auster@thebrainbin.org on 14 Apr 13:47 next collapse

There's also a community for it here on the fediverse, to those interested: !Ollama

Also, from my tests, it works decent enough even on Android's Termux, though a powerful phone seems needed.

PetteriPano@lemmy.world on 14 Apr 20:22 next collapse

I’ve had better luck with llama.cpp for opencode. I’m guessing it does formatting better for tool use.

iturnedintoanewt@lemmy.world on 15 Apr 10:47 next collapse

Thanks! I have an understanding of being able to run these models as LLM you can chat with, using tools like ollama or GPT4All. My question would be, how do I go from that to actually do things for me, handle files, etc. As it stands, if I run any of these locally, it’s just able to answer offline questions, and that’s about it…how about these “skills”, where it can go fetch files, or go find an specific URL, or say a summary of what a youtube video is about based on what’s being said in it?

9tr6gyp3@lemmy.world on 15 Apr 14:52 collapse

Sorry, wish I was able to share more. I honestly JUST started diving into this stuff after your post. Learning a lot from the various other comments though. Hopefully some of the other commenters can help you get the answers you’re looking for.

cideyav138@lemmy.ml on 15 Apr 14:05 collapse

Ollama is a VC-backed copy/paste of llama.cpp.

They have a history of using llama.cpp’s code (and bugs) without supporting the project or giving credit. llama.cpp is easy to use, more performant, and truly open source.

9tr6gyp3@lemmy.world on 15 Apr 14:35 collapse

Ollama is in the Arch Linux package repository, whereas llama.cpp is in the AUR. Both options are available.

wiki.archlinux.org/title/Llama.cpp

Also, looks like Ollama is mostly written in Go and C, versus C and C++.

VC backed or not, both packages are under the MIT license.

ThePowerOfGeek@lemmy.world on 14 Apr 13:41 next collapse

I’m curious about this too. I know that on the latest version of Ollama it’s possible to install OpenClaw. But I assumed you needed to point it to a paid API (Claude, ChatGPT, Grok, etc.) for it to really work. But yeah, maybe it works with Qwen 3 or similar models?

I guess a major factor to this is what your system resources look like, especially howmuch RAM you have. And therefore which model you are hosting locally.

HelloRoot@lemy.lol on 14 Apr 13:45 next collapse

If you are on linux, and want ai assisted stuff like you mentioned there has been this for a while: github.com/qwersyk/Newelle

( or the weeb version if you prefer: wiki.nyarchlinux.moe/nyarchassistant/ )

and it can use locally run models. But have realistic expectations. If you want it to work well, you need a beefy GPU, a lot of RAM and swap. The “intelligence” is kind of limited if you run low spec models, to the point of it being utterly useless.

artyom@piefed.social on 14 Apr 14:16 next collapse

They all suck and should be avoided.

cecilkorik@lemmy.ca on 14 Apr 14:33 next collapse

Absolutely. There are tons of open-licenced, open-weight (the equivalent of open-source for AI models) models capable of what is called “tool usage”. The key thing to understand is that they’re never quite perfect, and they don’t all “use tools” quite as effectively or in the same way as each other. This is common to LLMs and it is critical to understand that at the end of the day they are just text generators, they do not “use tools” themselves. They create specific structured text that triggers some other software, typically called a harness but could also be called a client or frontend, to call those tools on your system. Openclaw is an example of such a harness (and not a great or particularly safe one in my opinion but if you want to be a lunatic and give an AI model free reign it seems to be the best choice) You can use commercial harnesses too by configuring or tricking them into connecting to a local model instead of their commercial one, although I don’t recommend this for a variety of reasons if you really want to use claude code itself people have done it but I don’t find it works very well since all its prompts and tool calling is optimized for Claude models. Besides OpenClaw, Other popular harnesses for local models include OpenCode (as close as you’re going to get to claude for local models) or Cursor, even Ollama has their own CLI harness now. Personally I use OpenCode a lot but I am starting to lean towards pi-mono (it’s just called pi but that’s ungoogleable) it is very minimal and modular, making it intentionally easy to customize with plugins and skills you can automatically install to make it exactly as safe or capable or visual as you wish it to be.

As a minor diversion we should also discuss what a “tool” is, in this context there are some common basic tools that some or most tool-use models will have or understand some variation of, out of the box. Things like editing files, running command-line tools, opening documents, searching the web, are common built-in skills that pretty much any model advertising itself capable of “tool use” or “tool calling” will support, although some agents will be able to use these skills more capably and effectively than others. Just like some people know the Linux commandline fluently and can completely operate their system with it, while others only know basic commands like ls or cat and need a GUI or guidance for anything more complex, AI models are similar, some (and the latest models in particular) are incredibly capable with even just their basic built-in tools. However they’re not limited by what’s built in, as like I said, they can accept guidance on what to use and how to use it. You can guide them explicitly if you happen to be fluent in their tools, but there are kind of two competing models for how to give them that guidance automatically. These are MCP (model context protocol) which is a separate server they can access that provides structured listings of different kinds of tools they can learn to use and how they work, basically allowing them to connect to a huge variety of APIs in almost any software or service. Some harnesses have an MCP built-in. The other approach is called “skills” and seems to be (to me) a more sensible and flexible approach to giving the AI model enough understanding to become more capable and expand the tools it can use. Again, providing skills is usually something handled by the harness you’re using.

To make this a little less abstract you can put it in perspective of Claude: Anthropic provides several different Claude models like Haiku, Sonnet, and Opus. These are the text-generation models and they have been trained to produce a particular tool usage format, but Opus tends to have more built-in capability than something like Haiku for example. Regardless of which model you choose though (and you can switch at any time) you’ll be using a harness, typically “claude code” which is typically the CLI tool most people use to interact with Claude in an agentic, tool calling capacity.

On the open and local side of the landscape, we don’t have anything quite as fast or capable as Claude code unfortunately, but we can do surprisingly okay considering we’re running small local models on consumer hardware, not massive data center farms being enticingly given away or rented for pennies on the dollar of what they’re actually costing these companies on the hopes of successful marketshare-capture and vendor-lock-in leading to future profits.

Here are some pretty capable tool-use models I would recommend (most should be available for download through ollama and other sources like huggingface)

gemma4 (the latest and greatest hotness, MIT licensed using TurboQuant to deliver pretty incredible capability, performance and results even with limited VRAM)
qwen3.5 (from Alib

PetteriPano@lemmy.world on 14 Apr 20:21 next collapse

Gemma4 doesn’t Turboquant. But it is leaner on the KV cache.

edit: looks like there are forks that do turboquant already

iturnedintoanewt@lemmy.world on 15 Apr 10:51 collapse

Thank you very much for this reply. I didn’t need to stick to claude or openclaw at all, I definitely don’t want to give any model any free reign over my data. It’s just the ones I’ve seen mentioned the most, I guess. But I’d like to be able to run it all locally, and only on command. Your answer is exactly what I needed. I’m gonna study carefully the options you provided, and I might go from here. Again, thanks!

cecilkorik@lemmy.ca on 15 Apr 13:43 collapse

In that case I’d definitely recommend taking a look at pi, it’s a fairly minimal and controllable starting point where you’re in the driver’s seat at all times and most “features” are opt-in and handled responsibly. And since it’s extensible you can even use plugins like the ones here to do things like add more protections against undesired actions if you want and if that is too minimal and you eventually realize you want something a little bit more like OpenClaw you might want to look into Hermes-Agent, which has similar comprehensiveness to OpenClaw but seems to be a lot more responsibly designed. I don’t have any personal experience with it but that seems to be what most of the “security-thoughtful AI keeners” (which I feel is a bit of a contradiction but people seem to be having some success with it) are using these days.

mike_wooskey@lemmy.thewooskeys.com on 14 Apr 14:45 next collapse

I’m my experience, running Ollama locally works great. I do have a beefy GPU, but even on affordable consumer grade GPUs you can get good results with smaller models.

So it technically works to run an AI agent locally, but my experience has been that coding agents don’t work well. I haven’t tried using general AI agents.

I think the amount of VRAM affordable/available to consumers is nowhere near enough to support a context length that’s necessary for a coding agent to remain coherent. There are tools like Get Shit Done which are supposed to help with this, but I didn’t have much luck.

So I’m using OpenCode via OpenRouter to use LLMs in the cloud. Sad that I can’t get local-only to work well enough to use for coding agents, but this arrangement works for me (for now).

hendrik@palaver.p3x.de on 14 Apr 16:30 next collapse

We got open-source agents like OpenCode. OpenClaw is weird, and not really recommended by any sane person, but to my knowledge it’s open source as well. We got a silly(?) “clean-room rewrite” of the Claude Agent, after that leaked…

Regarding the models, I don’t think there’s any strictly speaking “FLOSS” models out there with modern tool-calling etc. You’d be looking at “open-weights” models, though. Where they release the weights under some permissive license. The training dataset and all the tuning remain a trade secret with pretty much all models. So there is no real FLOSS as in the 4 freedoms.

Google dropped a set of Gemma models a few days ago and they seem pretty good. You could have a look at Qwen 3.5, or GLM, DeepSeek… There’s a plethora of open-weights models out there. The newer ones pretty much all do tool-calling and can be used for agentic tasks.

TheCornCollector@piefed.zip on 15 Apr 09:44 next collapse

AllenAI has released open source models with open training data, code and science. If you value the ‘source’ to actually be open. They’ve also published the multimodal Molmo models.

hendrik@palaver.p3x.de on 15 Apr 10:07 collapse

Thanks! I didn’t know about these. I was just aware of Apertus from the Swiss National AI Iniative. But from my experience, they weren’t great. Might look into Olmo 3, then.

iturnedintoanewt@lemmy.world on 15 Apr 10:46 collapse

hendrik@palaver.p3x.de on 15 Apr 12:07 collapse

I think you need some Agent software. Or a MCP server for your existing software. It depends a bit on what you’re doing, whether that’s just chatting and asking questions that need to be googled. Or vibe coding… Or query the documents on your computer. As I said there’s OpenClaw which can do pretty much everything including wreck your computer. I’m also aware of OpenCode, AutoGPT, Aider, Tabby, CrewAI, …

The Ollama projects has some software linked on their page: https://github.com/ollama/ollama?tab=readme-ov-file#chat-interfaces They’re sorted by use-case. And whether they’re desktop software or a webinterface. Maybe that’s a good starting point.

What you’d usually do is install it and connect it to your model / inference software via that software’s OpenAI-compatible API endpoint. But it frequently ends up being a chore. If you use some paid service (ChatGPT), they’ll contract with Google to do the search for you, Youtube, etc. And once you do it yourself, you’re gonna need all sorts of developer accounts and API tokens, to automatically access Google’s search API… You might get blocked from YouTube if you host your software on a VPS in a datacenter… That’s kinda how the internet is these days. All the big companies like Google and their competitors require access tokens or there won’t be any search results. At least that was my experience.

fizzle@quokk.au on 14 Apr 22:44 collapse

No one has mentioned Open Web UI, which is part of this landscape.

Open Web UI is the chat interface you use to interact with a model. I haven’t really dug into much of the functionality beyond simple chat, but there’s thousands of community plugins for web search and similar. You can also create knowledge bases and attach them to queries. For example if I have a bunch of policy and procedure documents from my work, I can create a knowledge base and ask the LLM to create new policies in that context.

You can configure it to work with ollama, which allows you to run LLMs from huggingface.co and similar on your own hardware.

However, in my own case I just don’t have anything resembling a modern powerful GPU, so I don’t run ollama locally. You can use a paid account at huggingface.co and use their API to do the inference (running the models). Not all LLMs are available this way but certainly many are.

More recently I’ve discovered that OVH, (a french bare-metal host I’ve used for years) provides an inference API for a half dozen models, and I’ve found this to be blistering fast compared to huggingface.