Transparent Data Tiering on Linux: Why I chose fanotify over FUSE for HuskHoard (www.huskhoard.com)
from deepthinker@lemmy.world to selfhosted@lemmy.world on 19 Jun 07:25
https://lemmy.world/post/48355578

Ive been working on an opensource HSM (Hierarchical Storage Management) engine called HuskHoard. Most transparent storage tools on Linux use FUSE, but I found the contextswitching overhead was killing my NVMe performance for hot data. I decided to bypass FUSE and use the fanotify API to intercept file access at the kernel level instead. How it works A background janitor moves cold files to slow storage HDD/Tape/S3. It leaves a sparse husk file on the SSD. When an app tries to read the husk, HuskHoard pauses the process, recalls the data, and resumes. Its written in Rust and licensed under AGPL3.0. Github: github.com/huskhoard/huskhoard Technical Architecture: www.huskhoard.com/blog.html Im curious if anyone else here has experimented with fanotify for storage management? I’d love some technical feedback on the architecture

#selfhosted

threaded - newest

prenatal_confusion@feddit.org on 19 Jun 08:32 next collapse

This project seems like a lot of fun! I will try it

Please consider changing the headline font on the website. It is nothing short of a war crime.

deepthinker@lemmy.world on 19 Jun 13:45 collapse

it is an attention getter! I have a hard time reading it sometimes. perhaps it is ready for a freshen up. Please let me know if you have a chance to try it I could use the feedback.

prenatal_confusion@feddit.org on 19 Jun 15:50 collapse

Oh its getting my attention alright. In the form of discomfort and anger.

randy@lemmy.ca on 19 Jun 11:10 collapse

Sounds like it has similar goals to bcache (not bcachefs) and LVM caching, except that it operates in userspace instead of at the kernel level. Can you explain what the benefits are of keeping this out of the kernel?

deepthinker@lemmy.world on 19 Jun 13:44 collapse

hey thanks for the question. you are right there is some overlap with bcache but the big difference is that bcache is block level and huskhoard is file level. keeping it out of the kernel is mostly about safety and portability. if a kernel module panics it takes down your whole server but if a userspace daemon crashes you just restart it. also kernel dev is a nightmare for things like cloud integration… i can use rclone or complex rust libraries for zstd jump tables in userspace way easier than i could in the kernel. it also lets us be file aware. bcache just sees blocks but we can set policies based on file type or age or owner tags which the kernel doesnt really know about. plus it makes it way easier to move between different linux distros or even nas os like truenas without fighting with dkms or kernel versions. basically i wanted the storage intelligence to be portable and safe so you dont have to be a kernel dev to manage your archive