What Git library to choose?
from wasabi@discuss.tchncs.de to python@programming.dev on 07 Jun 2024 13:12
https://discuss.tchncs.de/post/16977074

I happen to write a lot of python code dealing with git repositories. Currently I am calling the git command line tool from python and interpret the output.

This solution really doesn’t scale well. Can you recommend a python library that wraps git functionality?

I have found three:

#python

threaded - newest

toasteecup@lemmy.world on 07 Jun 2024 13:51 next collapse

Sounds like pygit2 is the move until dulwich has a bit more support.

fixmycode@feddit.cl on 07 Jun 2024 14:27 next collapse

I read through the docs of pygit2, how is it too low level compared to using direct console output?

if you need complex workflows, couldn’t they be built over the convenience of the library?

wasabi@discuss.tchncs.de on 07 Jun 2024 15:33 collapse

Maybe pygit2 is indeed the way to go. When I looked into it a while back it looked very low level like it only implemented the git plumbing. But maybe I looked at the wrong part of the docs, because it doesn’t look too bad.

GammaGames@beehaw.org on 07 Jun 2024 14:58 next collapse

Honestly if you already know all the git commands? I’d use sh

wasabi@discuss.tchncs.de on 07 Jun 2024 15:29 next collapse

You’d still only get strings as returns. No objects modeling git concepts.

GammaGames@beehaw.org on 08 Jun 2024 05:23 collapse

Fair point, I usually use exit codes

mundane@feddit.nu on 08 Jun 2024 15:36 collapse

We have historically used GitPython a lot, but in a recent project I tried git via sh instead. It works great. If you already know the git cli, this feels very ergonomic to use.

best_username_ever@sh.itjust.works on 07 Jun 2024 15:02 next collapse

I used pygit2 a few years ago and it was easy. Can’t complain.

drwho@beehaw.org on 07 Jun 2024 17:49 next collapse

github.com/gitpython-developers/gitdb

Corbin@programming.dev on 07 Jun 2024 17:55 next collapse

With no more details? I’d go with Dulwich. libgit2 is overly picky about inputs and can’t be hacked apart at all, and this affects its bindings too. I recently found myself monkey-patching Dulwich to allow otherwise-forbidden characters in refs, and this would have been fundamentally impossible with anything on top of libgit2.

lordmauve@programming.dev on 08 Jun 2024 06:35 next collapse

I recommend wrapping the git cli commands using subprocess, using porcelain output modes etc, and parsing the output.

We have had stability problems with GitPython (which wraps gitdb). On Linux gitdb does clever things with sliding mmap, which caused some crashes (in a multi threaded environment), and I found simple race conditions in the code for writing loose objects, which is about as simple an operation as can be, so I lost faith with it. I do use gitdb in one read-only single-threaded system; it’s undoubtedly fast.

The biggest issues with git libraries are around the complexity of git configurations. Any independent reimplementation is probably going to support the most common 99% of features but that 1% always comes back to bite you! We use a lot of git features in service of a gigantic monorepo, like alternates and partial clones and config tricks.

If we use command-line git we get 100% compatibility with all git configuration and ODB features, and it’s hard to ensure that with an independent git implementation (even libgit2).

When you say “that solution doesn’t scale well” - we have made it scale. git itself scales well for operations it can perform natively, you just have to use the features effectively, often the high-level operations but sometimes lower-level commands like git cat-file --batch, git mktree --batch, etc. It’s not as fast as gitdb but fast enough, and I can have high confidence that I can write something once and it won’t break or cause problems later.

treadful@lemmy.zip on 08 Jun 2024 15:15 next collapse

Dulwich is decent. Has some good porcelain functions. But it’s organized kind of weird. I sort of recall it’s the only one that isn’t a wrapper on the git CLI?

Anyway, they all kind of suck in my experience.

bitwolf@lemmy.one on 08 Jun 2024 15:40 collapse

I usually use subprocess. Python has a very nice API for calling subprocesses.