Build Heap function - algorithm

In my university notes the pseudocode of Build Heap is written almost like this (Only difference were parenthesis I have brackets):
And I searched on the internet and there are several like this:
But shouldn't be something like that?
BuildHeap(A) {
heapsize <- length[A]
for i <- floor(length[A]/2) downto 1
Heapify(A,i)
}
Why they writing heap_size[A] = length[A]?

If you have many heaps, A, B, C. And only one variable heap-size, How will you remember the sizes of all the heaps? You will have an attribute heap-size for all the heaps.
In many pseudocode the attributes of an object O are written as Attriubute[O] or Attribute(O) , (Sometimes they are also written as O.attribute ).
The first example assumes that you are storing the heap size of a particular heap as an attribute of the heap.
The second example might be storing the heap size in a local variable which gets its value from the length attribute (Length[A]) of the heap.
Here is a text about pseudocode from Introduction To Algorithms:
Compound data are typically organized into objects, which are comprised of attributes or fields . A particular field is accessed using the field name followed by the name of its object in square brackets. For example, we treat an array as an object with the attribute length indicating how many elements it contains. To specify the number of elements in an array A, we write length[A]. Although we use square brackets for both array indexing and object attributes, it will usually be clear from the context which interpretation is intended

Related

How do I generate a predictable random stream in hierarchy in c#?

I am making a procedural game with hierarchy.
So object A will have 10 children.
Each child will have 10 children and so on.
Now suppose I want to give each child a random colour, and a random position (assume these are given by integers).
Therefor let X be the "ID" of an object.
Let COLOUR and POSITION be enums of type PROPERTY.
Then I want to generate random integers:
int GenerateRandomInteger(PROPERTY P, int childNumber);
So I can use:
int N = parentObject.GenerateRandomInteger(COLOUR, 7);
For example.
Any ideas how to go about this?
In this case, GetRandomInteger should be implemented as a hash function. A hash function takes arbitrary data (here, the values of P and childNumber) and outputs a hash code. For the purposes of a game:
The hash function should have the avalanche property, meaning that every bit of the input affects every bit of the hash code.
Good hash functions here include MurmurHash3 and xxHash.
This answer also assumes that childNumber is unique throughout the application, rather than unique for a given parent.
The resulting hash code can then be used to generate a pseudorandom color and a position (for example, the first 24 bits of the hash code can be extracted and treated as a 8-bit-per-component RGB color). But further details on how this will work will depend on what programming language you're using and what ranges are acceptable for colors and positions, which you didn't specify in your question (there are several languages that use ints and enums, for example).

What abstract data type is this?

Is the following a common data type (i.e. does it have a name)?
Its unique characteristic is, unlike a regular Set, that it contains the "universe" on initialisation with O(C) memory overhead, and a max memory overhead of O(N/2) (which only occurs when you remove every-other element):
> s = new Structure(701)
s = Structure(0-700)
> s.remove(100)
s = Structure(0-99, 101-700)
> s.add(100)
s = Structure(0-700)
> s.remove(200)
s = Structure(0-199, 201-700)
> s.remove(202)
s = Structure(0-199, 201, 203-700)
> s.removeAll()
s = Structure()
Does something like this have a standard name?
I've used this many times in the past and seen it used in things like plane-sweep algorithms for polygon clipping.
Sometimes the abstract data type it represents is just a set, and the data structure is an optimization. I use this for representing the set of matching characters given by a regex expression like [^a-zA-z0-9.-], for example, and to perform intersection, union, and other operations on those sets.
This sort of data structure is implemented on top of some other ordered set or map structure, by simply storing the keys where membership in the set changes instead of the keys in the set itself. In all the other cases where I've seen this sort of thing done, the authors refer to that underlying structure instead of giving a name to the concept itself.
I like the idea of having a name for it, though, since as I said I've used it myself many times. Maybe I would call it an "in & out set" in honor of the hamburger chain I liked the best back when I ate hamburgers.
It's a Compressed Bit Set or Compressed Bitmap.
A Bit Set or Bitmap is a set specifically designed for storing Integers. Most languages offer standard implementations of these. They typically work by assigning a 1 to the Nth bit in an internal array of Integers where N is the number you're adding to the set. 0 indicates the value is not present. The memory usage for these types of Bit Sets is dictated by the largest number you store.
A Compressed Bit Set is one that compacts ranges of 0s and 1s.
In this case, the question demonstrates a type of compression called "run-length-encoding" (thank you #Ralf Kleberhoff), so it is specifically a Run-length Encoded Bitmap.
Common implementations of Compressed Bitmaps (from newest-to-oldest) are:
Roaring Bitmaps (only one to provide "good random access")
EWAH
WAH
Oracle BBC

How to efficiently store and manipulate sparse binary matrices in Octave?

I'm trying to manipulate sparse binary matrices in GNU Octave, and it's using way more memory than I expect, and relevant sparse-matrix functions don't behave the way I want them to. I see this question about higher-than-expected sparse-matrix storage in MATLAB, which suggests that this matrix should consume even more memory, but helped explain (only) part of this situation.
For a sparse, binary matrix, I can't figure out any way to get Octave to NOT STORE the array of values (they're always implicitly 1, so need not be stored). Can this be done? Octave always seems to consume memory for a values array.
A trimmed-down example demonstrating the situation: create random sparse matrix, turn it into "binary":
mys=spones(sprandn(1024,1024,.03)); nnz(mys), whos mys
Shows the situation. The consumed size is consistent with the storage mechanism outlined in aforementioned SO answer and expanded below, if spones() creates an array of storage-class double and if all indices are 32-bit (i.e., TotalStorageSize - rowIndices - columnIndices == NumNonZero*sizeof(double) -- unnecessarily storing these values (all 1s as doubles) is over half of the total memory consumed by this 3%-sparse object.
After messing with this (for too long) while composing this question, I discovered some partial workarounds, so I'm going to "self-answer" (only) part of the question for continuity (hopefully), but I didn't figure out an adequate answer to main question:
How do I create an efficiently-stored ("no-/implicit-values") binary matrix in Octave?
Additional background on storage format follows...
The Octave docs say the storage format for sparse matrices uses format Compressed Sparse Column (CSC). This seems to imply storing the following arrays (expanding on aforementioned SO answer, with canonical Yale format labels and tweaks for column-major order):
values (A), number-of-nonzeros (NNZ) entries of storage-class size;
row numbers (IA), NNZ entries of index size (hopefully int64 but maybe int32);
start of each column (JA), number-of-columns-plus-1 entries of index size)
In this case, for binary-only storage, I hope there's a way to completely avoid storing array (A), but I can't figure it out.
Full disclosure: As noted above, as I was composing this question, I discovered a workaround to reduce memory usage, so I'm "self-answering" part of this here, but it still isn't fully satisfying, so I'm still listening for a better actual answer to storage of a sparse binary matrix without a trivial, bloated, unnecessary values array...
To get a binary-like value out of a number-like value and reduce the memory usage in this case, use "logical" storage, created by logical(X). For example, building from above,
logicalmys = logical(mys);
creates a sparse bool matrix, that takes up less memory (1-byte logical rather than 8-byte double for the values array).
Adding more information to the whos information using whos_line_format helps illuminate the situation: The default string includes 5 of the 7 properties (see docs for more). I'm using the format string
whos_line_format(" %a:4; %ln:6; %cs:16:6:1; %rb:12; %lc:8; %e:10; %t:20;\n")
to add display of "elements", and "type" (which is distinct from "class").
With that, whos mys logicalmys shows something like
Attr Name Size Bytes Class Elements Type
==== ==== ==== ===== ===== ======== ====
mys 1024x1024 391100 double 32250 sparse matrix
logicalmys 1024x1024 165350 logical 32250 sparse bool matrix
So this shows a distinction between sparse matrix and sparse bool matrix. However, the total memory consumed by logicalmys is consistent with actually storing an array of NNZ booleans (1-byte) -- That is:
totalMemory minus rowIndices minus columnOffsets leaves NNZ bytes left;
in numbers,
165350 - 32250*4 - 1025*4 == 32250.
So we're still storing 32250 elements, all of which are 1. Further, if you set one of the 1-elements to zero, it reduces the reported storage! For a good time, try: pick a nonzero element, e.g., (42,1), then zero it: logicalmys(42,1) = 0; then whos it!
My hope is that this is correct, and that this clarifies some things for those who might be interested. Comments, corrections, or actual answers welcome!

Fastest data structure with default values for undefined indexes?

I'm trying to create a 2d array where, when I access an index, will return the value. However, if an undefined index is accessed, it calls a callback and fills the index with that value, and then returns the value.
The array will have negative indexes, too, but I can overcome that by using 4 arrays (one for each quadrant around 0,0).
You can create a Matrix class that relies on tuples and dictionary, with the following behavior :
from collections import namedtuple
2DMatrixEntry = namedtuple("2DMatrixEntry", "x", "y", "value")
matrix = new dict()
defaultValue = 0
# add entry at 0;1
matrix[2DMatrixEntry(0,1)] = 10.0
# get value at 0;1
key = 2DMatrixEntry(0,1)
value = {defaultValue,matrix[key]}[key in matrix]
Cheers
This question is probably too broad for stackoverflow. - There is not a generic "one size fits all" solution for this, and the results depend a lot on the language used (and standard library).
There are several problems in this question. First of all let us consider a 2d array, we say this is simply already part of the language and that such an array grows dynamically on access. If this isn't the case, the question becomes really language dependent.
Now often when allocating memory the language automatically initializes the spots (again language dependent on how this happens and what the best method is, look into RAII). Though I can foresee that actual calculation of the specific cell might be costly (compared to allocation). In that case an interesting thing might be so called "two-phase construction". The array has to be filled with tuples/objects. The default construction of an object sets a bit/boolean to false - indicating that the value is not ready. Then on acces (ie a get() method or a operator() - language dependent) if this bit is false it constructs, else it just reads.
Another method is to use a dictionary/key-value map. Where the key would be the coordinates and the value the value. This has the advantage that the problem of construct-on-access is inherit to the datastructure (though again language dependent). The drawback of using maps however is that lookup speed of a value changes from O(1) to O(logn). (The actual time is widely different depending on the language though).
At last I hope you understand that how to do this depends on more specific requirements, the language you used and other libraries. In the end there is only a single data structure that is in each language: a long sequence of unallocated values. Anything more advanced than that depends on the language.

implementing a basic search engine with prefix tree

The problem is the implementing a prefix tree (Trie) in functional language without using any storage and iterative method.
I am trying to solve this problem. How should I approach this problem ? Can you give me exact algorithm or link which shows already implemented one in any functional language?
Why I am trying to do => creating a simple search engine with an feature of
adding word to tree
searching a word in tree
deleting a word in tree
Why I want to use functional language => I want improve my problem-solving ability a bit further.
NOTE : Since it is my hobby project, I will first implement basic features.
EDIT:
i.) What I mean about "without using storage" => I don't want use variable storage ( ex int a ), reference to a variable, array . I want calculate the result by recursively then showing result to the screen.
ii.) I have wrote some line but then I have erased because what I wrote is made me angry. Sorry for not showing my effort.
Take a look at haskell's Data.IntMap. It is purely functional implementation of
Patricia trie and it's source is quite readable.
bytestring-trie package extends this approach to ByteStrings
There is accompanying paper Fast Mergeable Integer Maps which is also readable and through. It describes implementation step-by-step: from binary tries to big-endian patricia trees.
Here is little extract from the paper.
At its simplest, a binary trie is a complete binary tree of depth
equal to the number of bits in the keys, where each leaf is either
empty, indicating that the corresponding key is unbound, or full, in
which case it contains the data to which the corresponding key is
bound. This style of trie might be represented in Standard ML as
datatype 'a Dict =
Empty
| Lf of 'a
| Br of 'a Dict * 'a Dict
To lookup a value in a binary trie, we simply read the bits of the
key, going left or right as directed, until we reach a leaf.
fun lookup (k, Empty) = NONE
| lookup (k, Lf x) = SOME x
| lookup (k, Br (t0,t1)) =
if even k then lookup (k div 2, t0)
else lookup (k div 2, t1)
The key point in immutable data structure implementations is sharing of both data and structure. To update an object you should create new version of it with the most possible number of shared nodes. Concretely for tries following approach may be used.
Consider such a trie (from Wikipedia):
Imagine that you haven't added word "inn" yet, but you already have word "in". To add "inn" you have to create new instance of the whole trie with "inn" added. However, you are not forced to copy the whole thing - you can create only new instance of the root node (this without label) and the right banch. New root node will point to new right banch, but to old other branches, so with each update most of the structure is shared with the previous state.
However, your keys may be quite long, so recreating the whole branch each time is still both time and space consuming. To lessen this effect, you may share structure inside one node too. Normally each node is a vector or map of all possible outcomes (e.g. in a picture node with label "te" has 3 outcomes - "a", "d" and "n"). There are plenty of implementations for immutable maps (Scala, Clojure, see their repositories for more examples) and Clojure also has excellent implementation of an immutable vector (which is actually a tree).
All operations on creating, updating and searching resulting tries may be implemented recursively without any mutable state.

Resources