I was reading about Judy trees. What are some examples of real world usage and comparisons to other data structures?
Judy arrays in Python: http://www.dalkescientific.com/Python/PyJudy.html
Some Examples uses.
PyJudy arrays are similar to Python dictionaries and sets. The primary difference is that PyJudy keys are sorted; by unsigned value if an integer, byte ordering if a string and object id if a Python object. In addition to wrapping the underlying Judy functions, PyJudy implements a subset of the Python dictionary interface for the JudyL and JudySL API and a subset of the set interface for the Judy1 API, along with some extensions for iterating a subrange of the sorted keys, values and items.
Related
I'm looking for a way to hash a string to a dynamic number of characters. I don't want to trim an existing hash (such as SHA) but generate a hash that you can specify the number of output characters for. It should also work if the input is less than the number of characters. It doesn't need to be cryptographic, it only needs to guarantee the same hash for the same input. I've been going through the hash functions on wiki but they all seem to have a fixed length of a dynamic length depending on the input.
What you are looking for might be Extendable Output Functions (XOF's)!
Those hash functions don't have predefined output length and might use sponge functions for construction.
The SHA-3 family consists of four cryptographic hash functions, [...], and two extendable-output functions (XOFs), called SHAKE128 and SHAKE256.
You can try out both under https://emn178.github.io/online-tools/. For the output bits choose your desired number or characters.
For a Java implementation see the Bouncy Castle Crypto Library which supports both algorithms https://www.bouncycastle.org/specifications.html
But be aware of collisions if the hash length is to small.
How does Go calculate a hash for keys in a map? Is it truly unique and is it available for use in other structures?
I imagine it's easy for primitive keys like int or immutable string but it seems nontrivial for composite structures.
The language spec doesn't say, which means that it's free to change at any time, or differ between implementations.
The hash algorithm varies somewhat between types and platforms. As of now: On x86 (32 or 64 bit) if the CPU supports AES instructions, the runtime uses aeshash, a hash built on AES primitives, otherwise it uses a function "inspired by" xxHash and cityhash, but different from either. There are different variants for 32-bit and 64-bit systems. Most types use a simple hash of their memory contents, but floating-point types have code to ensure that 0 and -0 hash equally (since they compare equally) and NaNs hash randomly (since two NaNs are never equal). Since complex types are built from floats, their hashes are composed from the hashes of their two floating-point parts. And an interface's hash is the hash of the value stored in the interface, and not the interface header itself.
All of this stuff is in private functions, so no, you can't access Go's internal hash for a value in your own code.
The Go map implementation uses a hash called aeshash. It's not AES, but it uses the aesenc assembly instruction to compute hashes. This hash isn't exported for use in the standard library.
The hash itself is written in assembly, and can be found in the runtime package source.
Since Go 1.14, the go standard library provides the hash/maphash package. The hash functions in this package aren't guaranteed to be the same ones used by Go maps (but it appears that they are, which makes sense); they are guaranteed to be good functions for implementing hashmaps and the like.
hash/maphash only operates on strings or byte slices, so it's still up to you to figure out how to serialize a composite data structure into bytes for hashing purposes.
Hash functions always produce a fixed length output regardless of the input (i.e. MD5 >> 128 bits, SHA-256 >> 256 bits), but why?
I know that it is how the designer designed them to be, but why they designed the output to have the same length?
So that it can be stored in a consistent fashion? easier to be compared? less complicated?
Because that is what the definition of a hash is. Refer to wikipedia
A hash function is any function that can be used to map digital data
of arbitrary size to digital data of fixed size.
If your question relates to why it is useful for a hash to be a fixed size there are multiple reasons (non-exhaustive list):
Hashes typically encode a larger (often arbitrary size) input into a smaller size, generally in a lossy way, i.e. unlike compression functions, you cannot reconstruct the input from the hash value by "reversing" the process.
Having a fixed size output is convenient, especially for hashes designed to be used as a lookup key.
You can predictably (pre)allocate storage for hash values and index them in a contiguous memory segment such as an array.
For hashes of "native word sizes", e.g. 16, 32 and 64 bit integer values, you can do very fast equality and ordering comparisons.
Any algorithm working with hash values can use a single set of fixed size operations for generating and handling them.
You can predictably combine hashes produced with different hash functions in e.g. a bloom filter.
You don't need to waste any space to encode how big the hash value is.
There do exist special hash functions, that are capable of producing an output hash of a specified fixed length, such as so-called sponge functions.
As you can see it is the standard.
Also what you want is specified in standard :
Some application may require a hash function with a message digest
length different than those provided by the hash functions in this
Standard. In such cases, a truncated message digest may be used,
whereby a hash function with a larger message digest length is applied
to the data to be hashed, and the resulting message digest is
truncated by selecting an appropriate number of the leftmost bits.
Often it's because you want to use the hash value, or some part of it, to quickly store and look up values in a fixed-size array. (This is how a non-resizable hashtable works, for example.)
And why use a fixed-size array instead of some other, growable data structure (like a linked list or binary tree)? Because accessing them tends to be both theoretically and practically fast: provided that the hash function is good and the fraction of occupied table entries isn't too high, you get O(1) lookups (vs. O(log n) lookups for tree-based data structures or O(n) for lists) on average. And these accesses are fast in practice: after calculating the hash, which usually takes linear time in the size of the key with a low hidden constant, there's often just a bit shift, a bit mask and one or two indirect memory accesses into a contiguous block of memory that (a) makes good use of cache and (b) pipelines well on modern CPUs because few pointer indirections are needed.
What is the equivalent of a python dictionary like {'a':1, 'b':2} in elisp?
And again, does elisp have any map-reduce api?
Besides association lists,(whose algorithmic complexity is OK for small tables but not for large ones), there are hash tables, you can construct with make-hash-table and puthash, or if you prefer immediate values, you can write them as #s(hash-table data a 1 b 2).
Association lists are the most commonly used associative containers in elisp. It is just a list of key-value cons cells like this ((key . value)). You can use the assoc function to get a value corresponding to a key and rassoc to get a key with the required value.
Elisp comes with the built-in function mapcar which does map, but AFAIK there is no good fold facility. You could emulate it using any of the looping facilities provided. However, the better solution is to use cl-lib and slip into CommonLisp land. In particular, it supplies cl-mapcar and cl-reduce.
In my university notes the pseudocode of Build Heap is written almost like this (Only difference were parenthesis I have brackets):
And I searched on the internet and there are several like this:
But shouldn't be something like that?
BuildHeap(A) {
heapsize <- length[A]
for i <- floor(length[A]/2) downto 1
Heapify(A,i)
}
Why they writing heap_size[A] = length[A]?
If you have many heaps, A, B, C. And only one variable heap-size, How will you remember the sizes of all the heaps? You will have an attribute heap-size for all the heaps.
In many pseudocode the attributes of an object O are written as Attriubute[O] or Attribute(O) , (Sometimes they are also written as O.attribute ).
The first example assumes that you are storing the heap size of a particular heap as an attribute of the heap.
The second example might be storing the heap size in a local variable which gets its value from the length attribute (Length[A]) of the heap.
Here is a text about pseudocode from Introduction To Algorithms:
Compound data are typically organized into objects, which are comprised of attributes or fields . A particular field is accessed using the field name followed by the name of its object in square brackets. For example, we treat an array as an object with the attribute length indicating how many elements it contains. To specify the number of elements in an array A, we write length[A]. Although we use square brackets for both array indexing and object attributes, it will usually be clear from the context which interpretation is intended