from dragontamer@lemmy.world to programming@programming.dev on 04 Apr 01:32
https://lemmy.world/post/27759427
I’m doing some Galois Field / Cyclic Redundancy Check research for fun and I’ve come across an intriguing pattern that I need a data structure for.
Across the 64-bit (or even 128-bit or larger) spaces, I’ve discovered an interesting pattern relating to hamming distances that I’d like a data structure to represent.
I’m going to need something on the order of ~billions of intervals each having somewhere between 1 item to ~1 billion per interval. And I’d like to quickly (O(1) or O(lg(n))) determine if other intervals intersect.
For 32-bit space I can simply make a 512MB Bitmask lol and then AND/OR the two Bitmask. Easy
But for 64-bit space I’m stuck and a bit ignorant to various data structures. I’m wondering if someone out there has a good data structure for me to use?
I’ve read over Interval Trees on Wikipedia. I’m also considering binary decision diagram over the 64-bits actually. Finally I’m thinking of some kind of 1-dimension octtree like datastructure (is that just a binary tree?? Lol. But BVH trees in 3d space seems similar to my problem it’s just I need it optimized down to 1 dimension rather than 3.) Anyone else have any other ideas or cool data structures that might work?
#programming
threaded - newest
Ph-trees can do range and closest queries across N dimensions very quickly. I have not used it for 1 dimension, but I’d imagine it would work fine.
github.com/tzaeschke/phtree
Querying for all intervals
(x’, y’)
that overlap with an interval(x, y)
is a two dimensional range query that selects the upper left quadrant of the x-y plane that satisfiesx < y’ && x’ < y
Uh, no, that’s O(n), yeah?
And typical RAM speeds are 100GB/second for CPUs and 500GB/second on GPUs, meaning 512MB operations are literally on the order of 5 miliseconds for CPU and 1ms on GPU.
Below certain sizes, the ‘billions of intervals’ is larger than the damn Bitmask. Seriously, 8 bytes per interval (aka one pointer and 0 data) and that’s 8GB for the data structure.
Instead of a billion 32-bit intervals to store (4GB of RAM at the minimum) it’s obviously a better move to store 500-million byte Bitmasks. And modern GPUs can crush that in parallel with like 3 lines of CUDA anyway.