Data Structure to maintain numbers - algorithm

Please suggest a data structure to maintain numbers in such a way that i can answer the following queries -
Find(int n) - O(log(n))
Count number of numbers less than k = O(log(n))
Insert - O(Log(n))
It's not homework, but a smaller problem that i am encountering to solve a bigger one - Number of students with better grades and lower jee rank
I have though of an avl tree with maintaining number of nodes in subtree at each nod.But i dont know how to maintain this count at each node when an insert is being done and re-balancing is being done.

I would also try using an AVL Tree. Without looking much deeper into it, I don't think this would be too hard to add. In case of an AVL tree you alway need to know the depth of each subtree for each node anyway (or at least the balancing factor). So it should not be too hard to propagate the size of the subtrees. In case of an rotation, you know exactely where each node and each subtree will land, so it should be just a simple recalculation for those nodes, which are rotated.

Finding in a binary tree is O(log(n)), inserting too.
If you store the subtree size in the node:
you can coming back from a successful insert in a subtree increment the node's counter;
on deleting the same.
So subtree size is like a find, O(log(n)).

Have a look at the different variants of heap data structures, e.g. here.

Related

Why is shallow binary tree better?

Why do we always want shallow binary tree? In what cases is shallow binary tree better than non-shallow/minimum depth tree?
I am just confused as my prof keeps saying we want to aim for shallowest possible binary tree but I do not understand why. I guess smallar is better but is there any specific concrete reason? Sorry for my bad english thanks for your help
I'm assuming this is in regards to binary search trees - if not, please let me know and I can update this answer.
In a binary search tree, the cost of almost every operation (insertion, deletion, lookup, successor, predecessor, min, max, range search, split, join, etc.) depends on the height of the binary search tree. The reason for this is that these operations work by walking down the tree from the root until they either fall off the tree or find what they're looking for. The deeper the tree, the longer this can take if you get bad inputs.
By shuffling nodes around to keep the tree height low, we can make it so that these operations are, in general, very fast. A tree with height h can have at most 2h - 1 nodes in it, which is a huge number compared with h (figure that if h = 20, 2h - 1 is over a million!), so if you make an effort to pack the nodes into the tree higher up and closer to the root, you'll get better operation speeds all around.
There are some cases where it's actually beneficial to have trees that are as imbalanced as possible. For example, if you have a binary search tree and know in advance that some elements will be looked up more than others, you may want to shuffle the nodes around in the tree to put the high-frequency items higher up and the low-frequency items deeper in the tree. In non-binary-search-tree contexts, the randomized meldable priority queue works by randomly walking down a tree doing merges, and the less balanced the tree is the more likely it is for these operations to end early by falling off the tree.

How can i split an AVL tree at a given node at time O(log(n))?

I've been busting my head trying all kinds of ways but the best I got is O(log^2(n)).
the exact question is:
make a function Split(AVLtree T, int k) which returns 2 AVL trees (like a tuple) such that all values in T1 are lower than or equal to k and the rest are in T2. k is not necessarily in the tree. time must be O(log(n)).
Assume efficient implementation of AVL tree and I managed to make a merge function with time O(log(|h1-h2|)).
Any help would be greatly appriciated.
You're almost there, given that you have the merge function!
Do a regular successor search in the tree for k. This will trace out a path through the tree from the root to that successor node. Imagine cutting every edge traced out on the path this way, which will give you a collection of "pennants," single nodes with legal AVL trees hanging off to the sides. Then, show that if you merge them back together in the right order, the costs of the merges form a telescoping sum that adds up to O(log n).

How does a binary tree waste memory when stored as nodes and references?

I'm researching binary trees and came across this section describing storage methods.
It states that:
In a language with records and references, binary trees are typically constructed by having a tree node structure which contains some data and references to its left child and its right child...
...This method of storing binary trees wastes a fair bit of memory, as the pointers will be null (or point to the sentinel) more than half the time...
Can someone demonstrate or further explain why this happens?
https://en.wikipedia.org/wiki/Binary_tree#Methods_for_storing_binary_trees
The typical binary tree representation consists of the data associated with the node and two pointers to the left and right subtree respectively.
I think that with representation it is easy to realize that every node spends two pointers. Consequently, every tree of n nodes, spends a total of 2 x n pointers in storage (for the pointers).
Now well, at the exception of the root, n-1 nodes have a parent (that is an arc or edge). So, you really use n-1 pointers of 2n that you have (as explained in the previous paragraph).
That said, of a total of 2n pointers, you always use n-1. The remaining 2n - (n-1) = n+1 are always set to null. So, no matter the tree topology, you always spend more space for storing null pointers than for storing tree arcs.
Imagine a full binary tree of height 3. It has 7 nodes. 4 of them have no children. 4/7 > 1/2.

Optimal filling order for binary tree

I have a problem where I need to store changing data values v_i (integers) for constant keys i (also integers, in some range say [1;M]). I need to be able to draw quickly a random element weighted by the values v_i, i.e. the probability to draw key k should be v_k/(sum(i=1...M) v_i)
The best idea I could come up with is using a binary tree and storing the partial sum over the values in the subtree rooted at k as the value for key k (still in the range [1;M]). Then, whenever a value changes, I need to update its node and all parent nodes in the tree (takes O(log M) time since the keys are fixed and thus the binary tree is perfectly balanced). Drawing a random element as above also takes O(log M) time (for each level of the tree, one compares the random number say in the range (0,1) against the relative weights of the left subtree, right subtree, and the node itself) and is much faster than the naive algorithm (take random number r, iterate through the elements to find the least k so that sum(i=1...k) < r, sum(i=1...k+1) > r; takes O(M) time).
The question I now have is how to optimize the placement of the tree nodes in memory in order to minimize cache misses. Since all keys are known and remain constant, this is essentially the order in which I should allocate memory for the tree nodes.
Thanks!!
I don't think there is an optimal filling order of a binary tree except something like a pre-order, post-order, in-order filling? Doesn't your question isn't asking how a cache in general can work? Unfortunately I don't know it myself, maybe a more simplier hash-array would be more efficient in your case?

Tree Datastructures

I've tried to understand what sorted trees are and binary trees and avl and and and ...
I'm still not sure, what makes a sorted tree sorted? And what is the complexity (Big-Oh) between searching in a sorted and searching in an unsorted tree? Hope you can help me.
Binary Trees
There exists two main types of binary trees, balanced and unbalanced. A balanced tree aims to keep the height of the tree (height = the amount of nodes between the root and the furthest child) as even as possible. There are several types of algorithms for balanced trees, the two most famous being AVL- and RedBlack-trees. The complexity for insert/delete/search operations on both AVL and RedBlack trees is O(log n) or better - which is the important part. Other self balancing algorithms are AA-, Splay- and Scapegoat-tree.
Balanced trees gain their property (and name) of being balanced from the fact that after every delete or insert operation on the tree the algorithm introspects the tree to make sure it's still balanced, if it's not it will try to fix this (which is done differently with each algorithm) by rotating nodes around in the tree.
Normal (or unbalanced) binary trees do not modify their structure to keep themselves balanced and have the risk of, most often overtime, to become very inefficient (especially if the values are inserted in order). However if performance is of no issue and you mainly want a sorted data structure then they might do. The complexity for insert/delete/search operations on an unbalanced tree range from O(1) (best case - if you want the root) to O(n) (worst-case if you inserted all nodes in order and want the largest node)
There exists another variation which is called a randomized binary tree which uses some kind of randomization to make sure the tree doesn't become fully unbalanced (which is the same as a linked list)
A binary search tree is an "tree"-structure where every node has two children-nodes.
The left nodes all have the property of being less than its parent, and the right-nodes are all greater than its parent.
The intressting thing with an binary-tree is that we can search for an value in O(log n) when the tree is properly sorted. Doing the same search in an LinkedList for an example would give us the searchspeed of O(n).
The best way to go about learning datastructures would be to do a day of googling and reading wikipedia articles.
This might get you started
http://en.wikipedia.org/wiki/Binary_search_tree
Do a google search for the following:
site:stackoverflow.com binary trees
to get a list of SO questions which will answer your several questions.
There isn't really a lot of point in using a tree structure if it isn't sorted in some fashion - if you are planning on searching for a node in the tree and it is unsorted, you will have to traverse the entire tree (O(n)). If you have a tree which is sorted in some fashion, then it is only necessary to traverse down a single branch of the tree (typically O(log n)).
In binary tree the right leaf is always smaller then the head, and the left leaf is always bigger, so you can search in sorted tree in O(log(n)), you just need to go right if if the key is smaller than head and to the left if bgger

Resources