Patricia Trie for fast retrieval of IPv4 address and satellite data - data-structures

I am writing a program in C++ that requires IP addresses( all IPv4) to be looked up and stored in a fast way. Every IP address has a data associated with it. In case it already exists in the trie, I intend to merge the data of the IP address in the trie with the new addresses data. If it is not present, I intend to add it as a new entry to the trie. Deletion of IP address is not necessary.
In order to implement this, I need to design a Patricia Trie. However, I am unable to visualize the design beyond this. It seems quite naive of me, but the only idea that came to my mind was to change the IP address to their binary form and then use the trie. I am however clueless about HOW exactly to implement this.
I would be really thankful to you if you could help me with this one.
Please note that I did find a similar question here . The question or more specifically the answer was beyond my understanding as the code in the CPAN web site was not clear enough for me.
Also note, my data is the following format
10.10.100.1: "Tom","Jack","Smith"
192.168.12.12: "Jones","Liz"
12.124.2.1: "Jimmy","George"
10.10.100.1: "Mike","Harry","Jennifer"

I think you are referring to a RadixTree. I have an implementation of a RadixTrie in Java, if you want to use that as a starting point, which does the actual key to value mapping. It uses a PatriciaTrie as it's backing structure.
Example using the following strings.
10.10.101.2
10.10.100.1
10.10.110.3
Trie example (uncompressed)
└── 1
└── 0
└── .
└── 1
└── 0
└── .
└── 1
├── 0
│ ├── 1
│ │ └── .
│ │ └── (2) 10.10.101.2
│ └── 0
│ └── .
│ └── (1) 10.10.100.1
└── 1
└── 0
└── .
└── (3) 10.10.110.3
Patricia Trie (compressed)
└── [black] 10.10.1
├── [black] 0
│ ├── [white] (0.1) 00.1
│ └── [white] (1.2) 01.2
└── [white] (10.3) 10.10.110.3

Patricia tries solve the problem of finding the best covering prefix for a given IP address (they are used by routers to quickly determine that 192.168.0.0/16 is the best choice for 192.168.14.63, for example). If you are just trying to match IP addresses exactly, a hash table is a better choice.

Related

How are the checksums in go.sum computed?

I looked at https://go.dev/doc/modules/gomod-ref and https://go.dev/ref/mod#go-mod-tidy, and on neither page could I find any documentation that explains how the checksums in go.sum are computed.
How are the checksums in go.sum computed?
The checksums are hashes of the dependencies. The document you look for is https://go.dev/ref/mod#go-sum-files.
Each line in go.sum has three fields separated by spaces: a module path, a version (possibly ending with /go.mod), and a hash.
The module path is the name of the module the hash belongs to.
The version is the version of the module the hash belongs to. If the version ends with /go.mod, the hash is for the module’s go.mod file only; otherwise, the hash is for the files within the module’s .zip file.
The hash column consists of an algorithm name (like h1) and a base64-encoded cryptographic hash, separated by a colon (:). Currently, SHA-256 (h1) is the only supported hash algorithm. If a vulnerability in SHA-256 is discovered in the future, support will be added for another algorithm (named h2 and so on).
Example go.sum line with module version hash is like
github.com/go-chi/chi v1.5.4 h1:QHdzF2szwjqVV4wmByUnTcsbIg7UGaQ0tPF2t5GcAIs=
github.com/go-chi/chi v1.5.4/go.mod h1:uaf8YgoFazUOkPBG7fxPftUylNumIev9awIWOENIuEg=
If you are asking how you actually compute the hash, i.e. what inputs you feed to the SHA-256 function, it is described here: https://cs.opensource.google/go/x/mod/+/refs/tags/v0.5.0:sumdb/dirhash/hash.go
Here is a gist that allows you to compute the module hash for an arbitrary directory, without using go:
https://gist.github.com/MarkLodato/c03659d242ea214ef3588f29b582be70

Speed of directory look-up vs. formatted filename look-up

I've had this discussion with my co-worker multiple times now, and I'm 99.9% sure that I'm correct, but they have been insisting that they are correct and I'm starting to wonder if I'm the crazy one.
We are uploading images taken from users from their mobile devices, cumulatively they could upload thousands given enough time. Each of these photos belong to a "work orders", which are given a sequential integer. We want to optimize for retrieval (based on the work order) rather than writing. We are also on a Windows machine.
My proposed storage method looks like this:
Images
|-- 23875
| |-- f0347b8.png
| |-- b04675b.png
|-- 28765
|-- aab658c.png
Their proposed storage method looks like this:
Images
|-- 23875_f0347b8.png
|-- 23875_b04675b.png
|-- 28765_aab658c.png
For me, in order to gather the 2 images for work order 23875, I would look in the directory, Images/23875 and grab all the .png files.
For them to do the same thing, they would iterate through all the files and run a wildcard filter on all the filenames, something to effect of 23875_*.png.
I believe my method to be superior because, in the case where there are, say, thousands of images, it doesn't need to run a wildcard filter on potentially thousands of irrelevant files. I've asked why they believe their method to be superior, but I haven't gotten a compelling answer.
Any advice is appreciated.
This method
Images
|-- 23875_f0347b8.png
|-- 23875_b04675b.png
|-- 28765_aab658c.png
requires iterating through every single file in Images to find all files that match 23875_*. Every single time you want to find them. Over and over. Until the world ends and the stars go dark.
Putting all the files in one directory discards information you have when you create the file, thus making the file harder to locate in the future. Trying to encode that information in the file name means the data is mixed in with all other similar data and therefore needs to be filtered out in the future.
Why? You're right - it makes no sense. It's tossing information in the garbage for no good reason.
Your method
Images
|-- 23875
| |-- f0347b8.png
| |-- b04675b.png
|-- 28765
|-- aab658c.png
has already partitioned the files into the required associations. No filtering or searching is needed to find the files.
they have been insisting that they are correct
Oooh-kay. Maybe they like this sort of wrestling...

a sortable and extensible naming scheme

This is a question about a programming structure but I will use filenames to illustrate it:
I have a list of files:
1.jpg
2.jpg
3.jpg
...
15.jpg
16.jpg
What naming scheme would be the most flexible to allow
1) file sorting
2) adding files between existing
For example using 02.jpg instead of 2.jpg will place it before 16.jpg. This is what I need. But if I want to add 15a.jpg then it will be sorted before 15.jpg.
Are there some ready models for that problem?

Ruby and console output

It is possible, generally, by means of Ruby library to output a symbol at specific location on a common Windows console screen, which seems to be 80x25 ?
The problem came up with the need to draw a specific 'tree' structure like this, for example:
│
├──x──y──z
│ │
│ ├──a──b──c
│ │
│ └──e──f──g
│
└──u──v──o
If you're just interested generating trees in the console, this post shows you how to do it with hirb.
You could use a wrapper for Curses, like this one.
The Win32Console project should do what you want, and much more. Also, your question reminds me slightly of Ruby Quiz #14 (LCD Numbers).

Data structure used for directory structure?

I'm making a program which the user build directories (not in windows, in my app) and in these folders there are subfolders and so on; every folder must contain either folders or documents. What is the best data structure to use? Notice that the user may select a subfolder and search for documents in it and in its subfolders. And I don't want to limit the folders or the subfolders levels.
This is what I do:
Every record in the database has two fields: ID and ParentID. IDs are 4-5 characters (Base36, a-z:0-9 or something similar). Parent IDs are a concatenation of the parent's complete structure...
So...
This structure:
Root
Folder1
Folder2
Folder3
Folder4
Folder5
Folder6
Would be represented like this:
ID ParentID Name
0000 NULL ROOT
0001 0000 Folder1
0002 0000 Folder2
0003 00000002 Folder3
0004 0000 Folder4
0005 00000004 Folder5
0006 000000040005 Folder6
I like this structure because if I need to find all the files under a folder I can do a query like:
SELECT * FROM Folders WHERE ParentID LIKE '0000%' -- to find all folders under Folder1
To delete a folder and all its children:
DELETE FROM Folders WHERE ID='0004' AND ParentID LIKE '00000004%'
To move a folder and its children, you have to update all the records that use the same parent, to the new parent.
And I don't want to linit the folders or the subfolders levels
An obvious limitation to this is that the number of subfolders are limited to the size of your ParentID field.
I can think of a few ways you could structure this, but nothing would beat the obvious:
Use the actual file system.
I would look into using some sort of tree data structure
I should recommend B+ Tree .... You can easily use indexing (page,folder etc ) and all .
B+ Tree http://commons.wikimedia.org/wiki/File:Btree.png
for more info :
http://ozark.hendrix.edu/~burch/cs/340/reading/btree/index.html
I know that the question is specifically asking for a data structure but...
If you are using an object oriented language maybe you can use the composite design pattern which is ideally suited for this type of hierarchical tree like structure. You get what you are asking for.
Most OO languages come with some sort of abstraction for the file system, so there is where I would start. Then subclass it if you need to.
I would expect directories as an array of objects which are directories or files, for instance.
you can use m-way tree data structure

Resources