Is there a tool where I can compare multiple copies of the same file in different directories / hard drives to check if files got corrupted?
from DeathByBigSad@sh.itjust.works to nostupidquestions@lemmy.world on 28 Feb 12:34
https://sh.itjust.works/post/56050223

Like just some tool where I select all the directories then it runs a checksum against everything then tell me which files match.

#nostupidquestions

threaded - newest

FiniteBanjo@feddit.online on 28 Feb 12:37 next collapse

Git Diff

notabot@piefed.social on 28 Feb 12:41 next collapse

It’ll depend on what OS you’re using. On linux you’d probably want to use sha1sum to generate a list of checksums of the files in one directory, then use it to check the other durectories and it’ll tell you if any files don’t match.

blueworld@piefed.world on 28 Feb 12:44 next collapse

Going to guess this is a Windows question and you are thinking GUI. In that case I’d suggest the older https://checksumcompare.sanktuaire.com/downloads-en as it’ll do what you ask for visually.

There are a plethora of ways to do this in Linux on the CLI, TUI, and GUI, which is what most answers will likely lean towards given this community. If your apt toward that then start with https://unix.stackexchange.com/questions/330933/diffing-two-directories-recursively-based-on-checksums

vext01@feddit.uk on 28 Feb 13:32 next collapse

You could do a dry run with rsync and see what it reports needs update.

I think you’d have to force it to use checksums instead of time stamps.

palordrolap@fedia.io on 28 Feb 15:32 next collapse

There's a Linux tool called cmp that compares two files byte for byte. On my distro it's part of the diffutils package which is required and installed by default.

Its better known sibling is diff which is used for finding differences between source code files, or any other text files for that matter.

You could build something fairly quickly that wrapped cmp and a list of files.

Alternatively you could look for a duplicate file detector, but then, those generally only pick up on the duplicates and won't show non-matching files. You'd be blind to the changed ones unless you already knew where they were supposed to be.

Also be aware that on modern filesystems, there's such a thing as a hard link where two or more filenames can point at the same data on the disk. Those two files will always compare as being the same because they literally are the same. And some filesystems can automatically de-duplicate by creating hard links between anything it detects as being identical.

You might be able to leverage that as well, depending on what you need.

Finally, many files have various dates and times associated with them, again depending on file system. The Linux stat command is aware of four of these: File birth (original creation), last access, last modification and last status change. Some or all of these may be combined depending on the underlying filesystem.

flambonkscious@sh.itjust.works on 28 Feb 23:34 next collapse

The windows tool Beyond Compare is exceptional for passing huge directory trees and hi highlighting the conflicts, letting you drill into the file content and insoect line-level changes (assuming the content is understood, does OK with images and documents but certainly better with text, code/scripts)

Brosplosion@lemmy.zip on 28 Feb 23:59 collapse

Beyond Compare is also available on Linux

flambonkscious@sh.itjust.works on 01 Mar 04:20 collapse

Is it?! That’s fantastic!!

sem@piefed.blahaj.zone on 28 Feb 23:53 collapse

Not exactly what you asked, but snapraid is one way to do this at scale.

https://www.snapraid.it/