I don't understand how binary search trees are always defined as "sorted". I get in an array representation of a binary heap you have a fully sorted array. I haven't seen array representations of binary search trees so hard for me to see them as sorted like an array eg [0,1,2,3,4,5] but rather sorted with respect to each node. What is the right way to think about a BST being "sorted" conceptually?
There are many types of binary search trees. All of them have one thing in common: they satisfy an invariant which enables binary search, namely an order relation by which every element in the tree can be compared to any other element in the tree, in a total preorder.
What does that mean?
Let's consider the typical statement of a BST invariant in a textbook, which states that every node's key is greater than all keys in its left sub-tree, and less than all keys in its right sub-tree. We omit conflict resolution details for keys which compare equal.
What does that BST look like? Here's an example:
The way I would explain it to a class of three-year-olds, is try to collapse all the nodes to the bottom level of the leaves, just let them fall down. Or, for high-schoolers, draw a line from each node/key projecting them on the x-axis. Once you did that, it's obvious the keys are already in (ascending) order.
Is this imaginary and casual observation analogous to our definition of a sorted sequence? Yes, it is. Since the elements of the BST satisfy a total preorder, an in-order traversal of the BST must produce those elements in order (Ex: prove it).
It is equivalent to state that if we had stored a BST's keys, by an in-order traversal, in an array, the array would be sorted.
Therefore, by way of our initial definition of a BST, the in-order traversal is the intuitive way of thinking of one as "sorted".
Does this help? It's a binary heap shown as an array
as far as data structures are concerned (arrays, trees, linked lists, etc), "sorted" means that sequentially going through all it's elements you'll find that their values are ordered according to some rule ( >, <, <=, etc).
For arrays, this is easy to picture because it's a linear data structure.
But trees are not, however, iterating through a BST you will notice that all the element are ordered accoring to the rule left value <= node value < right value ( or something similar); the very definition of a sorted data structure.
It is not "sorted" in the same sense an array might be sorted (and trees, except for heaps, are rarely represented as arrays anyway), but they have a structure that allows you to easily traverse the elements in sorted order: simply traverse the nodes of the BST with a depth-first search, and output each node's value after you've looked at its left child (if any) but before you look at its right child (if any).
By the way, the array in which a heap is stored is almost always not sorted. The heap itself can also not be said to be "sorted", because it does not have enough structure to be able to readily produce the elements in sorted order without destroying the heap by successively removing the top element. For example, although you do know that the top element is larger than both of its children (or smaller, depending on the heap type), you cannot tell in advance which child is smaller than the other.
Related
Problem- Given a sorted doubly link list and two numbers C and K. You need to decrease the info of node with data K by C and insert the new node formed at its correct position such that the list remains sorted.
I would think of insertion sort for such problem, because, insertion sort at any instance looks like, shown bunch of cards,
that are partially sorted. For insertion sort, number of swaps is equivalent to number of inversions. Number of compares is equivalent to number of exchanges + (N-1).
So, in the given problem(above), if node with data K is decreased by C, then the sorted linked list became partially sorted. Insertion sort is the best fit.
Another point is, amidst selection of sorting algorithm, if sorting logic applied for array representation of data holds best fit, then same sorting logic should holds best fit for linked list representation of same data.
For this problem, Is my thought process correct in choosing insertion sort?
Maybe you mean something else, but insertion sort is not the best algorithm, because you actually don't need to sort anything. If there is only one element with value K then it doesn't make a big difference, but otherwise it does.
So I would suggest the following algorithm O(n), ignoring edge cases for simplicity:
Go forward in the list until the value of the current node is > K - C.
Save this node, all the reduced nodes will be inserted before this one.
Continue to go forward while the value of the current node is < K
While the value of the current node is K, remove node, set value to K - C and insert it before the saved node. This could be optimized further, so that you only do one remove and insert operation of the whole sublist of nodes which had value K.
If these decrease operations can be batched up before the sorted list must be available, then you can simply remove all the decremented nodes from the list. Then, sort them, and perform a two-way merge into the list.
If the list must be maintained in order after each node decrement, then there is little choice but to remove the decremented node and re-insert in order.
Doing this with a linear search for a deck of cards is probably acceptable, unless you're running some monstrous Monte Carlo simulation involving cards, that runs for hours or day, so that optimization counts.
Otherwise the way we would deal with the need to maintain order would be to use an ordered sequence data structure: balanced binary tree (red-black, splay) or a skip list. Take the node out of the structure, adjust value, re-insert: O(log N).
Build a Data structure that has functions:
set(arr,n) - initialize the structure with array arr of length n. Time O(n)
fetch(i) - fetch arr[i]. Time O(log(n))
invert(k,j) - (when 0 <= k <= j <= n) inverts the sub-array [k,j]. meaning [4,7,2,8,5,4] with invert(2,5) becomes [4,7,4,5,8,2]. Time O(log(n))
How about saving the indices in binary search tree and using a flag saying the index is inverted? But if I do more than 1 invert, it mess it up.
Here is how we can approach designing such a data structure.
Indeed, using a balanced binary search tree is a good idea to start.
First, let us store array elements as pairs (index, value).
Naturally, the elements are sorted by index, so that the in-order traversal of a tree will yield the array in its original order.
Now, if we maintain a balanced binary search tree, and store the size of the subtree in each node, we can already do fetch in O(log n).
Next, let us only pretend we store the index.
Instead, we still arrange elements as we did with (index, value) pairs, but store only the value.
The index is now stored implicitly and can be calculated as follows.
Start from the root and go down to the target node.
Whenever we move to a left subtree, the index does not change.
When moving to a right subtree, add the size of the left subtree plus one (the size of the current vertex) to the index.
What we got at this point is a fixed-length array stored in a balanced binary search tree. It takes O(log n) to access (read or write) any element, as opposed to O(1) for a plain fixed-length array, so it is about time to get some benefit for all the trouble.
The next step is to devise a way to split our array into left and right parts in O(log n) given the required size of the left part, and merge two arrays by concatenation.
This step introduces dependency on our choice of the balanced binary search tree.
Treap is the obvious candidate since it is built on top of the split and merge primitives, so this improvement comes for free.
Perhaps it is also possible to split a Red-black tree or a Splay tree in O(log n) (though I admit I didn't try to figure out the details myself).
Right now, the structure is already more powerful than an array: it allows splitting and concatenation of "arrays" in O(log n), although element access is as slow as O(log n) too.
Note that this would not be possible if we still stored index explicitly at this point, since indices would be broken in the right part of a split or merge operation.
Finally, it is time to introduce the invert operation.
Let us store a flag in each node to signal whether the whole subtree of this node has to be inverted.
This flag will be lazily propagating: whenever we access a node, before doing anything, check if the flag is true.
If this is the case, swap the left and right subtrees, toggle (true <-> false) the flag in the root nodes of both subtrees, and set the flag in the current node to false.
Now, when we want to invert a subarray:
split the array into three parts (before the subarray, the subarray itself, and after the subarray) by two split operations,
toggle (true <-> false) the flag in the root of the middle (subarray) part,
then merge the three parts back in their original order by two merge operations.
I have a set of items that are supposed to for a balanced binary tree. Each item is of the form (data,parent), data being the useful information and parent being the index of the parent node in the binary tree.
Nodes in the tree are numbered left-to-right, row-by-row, like this:
1
___/ \___
/ \
2 3
_/\_ _/\_
4 5 6 7
These elements come stored in a linked list. How should I order this list such that it's easier for me to build the tree? Each parent node will be referenced (by index) by exactly two child nodes; if I sort these by parent index, the sorting must be stable.
You can sort the list in any stable sort, according to the parent field, in increasing order.
The result will be a list like that:
[(d_1,nil), (d_2,1), (d_3,1) , (d_4,2), (d_5,2), ...(d_i,x), (d_i+1,x) ]
^
the root has no parent...
Note that in this list, since we used a stable sort - for each two pairs (d_i,x), (d_i+1,x) in the sorted list, d_i is the left leaf!
Now, you can populate the tree in breadth-first traversal,
Since it is homework - I still want you to make sure you understand everything by your own. So I do not want to "feed answer". If you have any specific question, please comment - and I will try to edit and explain the relevant parts with more details.
Bonus: The result of this organization is very common way to implement a binary heap structure, which is a complete binary tree, but for performance, we usually store it as an array, which is very similar to the output generated by this approach.
I don't think I understand what exactly are you trying to achieve. You have to write the function that inserts items in the tree. The red-black tree, for example, has the same complexity for insertions, O(log n), no matter how the input data is sorted. Is there a specific implementation that you have to use or a specific speed target that you must reach for inserts?
PS: Sounds like a homework to me :)
It sounds like you want a binary tree that allows you to go from a leaf node to its ancestors, using an array.
Usually sorting a list before putting it into a binary tree causes an unbalanced binary tree, unless you use a treap or other O(logn) datastructure.
The usual way of stashing a (complete) binary tree in an array, is to make node i have two children 2i and 2i+1.
Given this organization (not sorting but organization), you can go to a parent node from a leaf node by dividing the array index by 2 using integer arithmetic which will truncate fractions.
if your binary trees are not always complete, you'll probably be better served by forgetting about using an array, and instead using a more traditional tree structure with pointers/references.
I need to calculate a 1d histogram that must be dynamically maintained and looked up frequently. One idea I had involves keeping an ordered array with the data (cause thus I can determine percentiles in O(1), and this suffices for quickly finding a histogram with non-uniform bins with the exactly same amount of points inside each bin).
So, is there a way that is less than O(N) to insert a number into an ordered array while keeping it ordered?
I guess the answer is very well known but I don't know a lot about algorithms (physicists doing numerical calculations rarely do).
In the general case, you could use a more flexible tree-like data structure. This would allow access, insertion and deletion in O(log) time and is also relatively easy to get ready-made from a library (ex.: C++'s STL map).
(Or a hash map...)
An ordered array with binary search does the same things as a tree, but is more rigid. It might probably be faster for acess and memory use but you will pay when having to insert or delete things in the middle (O(n) cost).
Note, however, that an ordered array might be enough for you: if your data points are often the same, you can mantain a list of pairs {key, count}, ordered by key, being able to quickly add another instance of an existing item (but still having to do more work to add a new item)
You could use binary search. This is O(log(n)).
If you like to insert number x, then take the number in the middle of your array and compare it to x. if x is smaller then then take the number in the middle of the first half else the number in the middle of the second half and so on.
You can perform insertions in O(1) time if you rearrange your array as a bunch of linked-lists hanging off of each element:
keys = Array([0][1][2][3][4]......)
a c b e f . .
d g i . . .
h j .
|__|__|__|__|__|__|__/linked lists
There's also the strategy of keeping two datastructures at the same time, if your update workload supports it without increasing time-complexity of common operations.
So, is there a way that is less than O(N) to insert a number into an
ordered array while keeping it ordered?
Yes, you can use an array to implement a binary search tree using arrays and do the insertion in O(log n) time. How?
Keep index 0 empty; index 1 = root; if node is the left child of parent node, index of node = 2 * index of parent node; if node is the right child of parent node, index of node = 2 * index of parent node + 1.
Insertion will thus be O(log n). Unfortunately, you might notice that the binary search tree for an ordered list might degenerate to a linear search if you don't balance the tree i.e. O(n), which is pointless. Here, you may have to implement a red black tree to keep the height balanced. However, this is quite complicated, BUT insertion can be done with arrays in O(log n). Note that the array elements will no longer be ints; instead, they'll have to be objects with a colour attribute.
I wouldn't recommend it.
Any particular reason this demands an array? You need an data structure which keeps data ordered and allows you to insert quickly. Why not a binary search tree? Or better still, a red black tree. In C++, you could use the Set structure in the Standard template library which is implemented as a red black tree. Gives you O(log(n)) insertion time and the ability to iterate over it like an array.
Let A[1..n] be an array of real numbers. Design an algorithm to perform any sequence of the following operations:
Add(i,y) -- Add the value y to the ith number.
Partial-sum(i) -- Return the sum of the first i numbers, i.e.
There are no insertions or deletions; the only change is to the values of the numbers. Each operation should take O(logn) steps. You may use one additional array of size n as a work space.
How to design a data structure for above algorithm?
Construct a balanced binary tree with n leaves; stick the elements along the bottom of the tree in their original order.
Augment each node in the tree with "sum of leaves of subtree"; a tree has #leaves-1 nodes so this takes O(n) setup time (which we have).
Querying a partial-sum goes like this: Descend the tree towards the query (leaf) node, but whenever you descend right, add the subtree-sum on the left plus the element you just visited, since those elements are in the sum.
Modifying a value goes like this: Find the query (left) node. Calculate the difference you added. Travel to the root of the tree; as you travel to the root, update each node you visit by adding in the difference (you may need to visit adjacent nodes, depending if you're storing "sum of leaves of subtree" or "sum of left-subtree plus myself" or some variant); the main idea is that you appropriately update all the augmented branch data that needs updating, and that data will be on the root path or adjacent to it.
The two operations take O(log(n)) time (that's the height of a tree), and you do O(1) work at each node.
You can probably use any search tree (e.g. a self-balancing binary search tree might allow for insertions, others for quicker access) but I haven't thought that one through.
You may use Fenwick Tree
See this question