What is the difference between sequential access and sequential traversal of elements in data structures - data-structures

Sequential traversal is the main difference between linear and non linear data structures.Can anyone explain it briefly?

A linear data structure is something like this:
A
B
C
D
E
For instance, lists and arrays. Each element is followed by a single element. Traversal is trivial, as you simply go from one element to the next. For instance, if you start at A, you only have one next element B, from B you only have one next element C and so on.
A non-linear data structure is something like this:
A
/ \
B C
/ \ / \
D E F G
For instance, a tree. Notice how A is followed by two elements; B and C, and each of them is followed by two elements. Now traversal is more complex, because once you start from A, you have a choice of going to either B and C. What's more, once at B, you have a choice of going further down, or going "sideways" to C. In this case (a tree), your traversal options are breadth-first or depth-first.

Related

Establishing chronology of a list based on a directed graph tree

So, in a personal project I've been working on I came across a following problem, and I've been struggling to come up with a solution since my maths skills are not terribly great.
Lets say you have a following tree of numbers a b c d e f g h:
a
/ \
b c
/ | |
g d f
| |
h e
Each step down the tree means that the next number is bigger then the previous one. So a < b, d < e, a < c. However, it is impossible to determine whether b > c or c < b - we can only tell that both numbers are bigger then a.
Lets say we have an ordered list of numbers, for instance [a, b, c, d, e]. How do we write an algorithm that checks if the order of the numbers in the list (assuming that L[i] < L[i+1]) is, in fact, correct in relation to the information we have accoring to this tree?
I. E, both [a, c, b, d, e] and [a, b, d, c, e] are correct, but [c, a, b, d, e] is not (since we know that c > a but nothing else in relation to how the other numbers are structured).
For the sake of the algorithm, lets assume that our access to the tree is a function provably_greater(X, Y) which returns true if the tree knows that a number is higher then another number. I.E. provably_greater(a, d) = True, but provably_greater(d, f) = False. Naturally if a number is provably not greater, it also returns false.
This is not a homework question, I have abstracted the problem quite a lot to make it more clear, but solving this problem is quite crucial for what I'm trying to do. I've made several attempts at cracking it myself, but everything that I come up with ends up being insufficient for some edge case I find out about later.
Thanks in advance.
Your statement "everything that I come up with ends up being insufficient for some edge case I find out about later" makes it seem that you have no solution at all. Here is a brute-force algorithm that should work in all cases. I can think of several possible ways to improve the speed, but this is a start.
First, set up a data structure that allows a quick evaluation of provably_greater(X, Y) based on the tree. This structure can be a set or hash-table, which will take much memory but allow fast access. For each leaf of the tree, take a path up to the root. At each node, look at all descendant nodes and add an ordered pair to the set that shows the less-than relation for those two nodes. In your example tree, if you start at node h you move up to node g and add (g,h) to the set, then move up to node b and add the pairs (b,h) and (b,g) to the set, then move up to node a and add the pairs (a,h), (a,g), and (a,b) to the set. Do the same for leaf nodes e and f. The pair (a,b) will be added twice to the set, due to the leaf nodes h and e, but a set structure can handle this easily.
The function provably_greater(X, Y) is now quick and easy: the result is True if the pair (Y,X) is in the set and False otherwise.
You now look at all pairs of numbers in your list--for the list [a,b,c,d,e] you would look at the pairs (a,b), (a,c), (b,c), etc. If provably_greater(X, Y) is true for any of those pairs, the list is out of order. Otherwise, the list is in order.
This should be very easy to implement in a language like Python. Let me know if you would like some Python 3 code.
I'm going to ignore your provably_greater function and assume access to the tree so that I can provide an efficient algorithm.
First, perform a Euler tour of the tree, remembering the start and end indexes for every node. If you use the same tree to check a lot of lists, you only have to do this once. See https://www.geeksforgeeks.org/euler-tour-tree/
Create an initially empty binary search tree of indexes
Iterate through the list. For each node, check to see if the tree contains any indexes between its start and end Euler tour indexes. If it does, then the list is out of order. If it does not, then insert its start index into the tree. This will prevent any provably lesser node from appearing later in the list.
That's it -- O(N log N) altogether, and for each list.
A TreeSet in Java or std::set in C++ can be used for the binary search tree.

Why inorder traversal of binary search tree guarantee to have non-decreasing order?

TL;DR My ultimate problem is to find the two nodes on a proper binary tree (i.e. itself has at least two nodes) that one is only greater than input, and the other only less than input. (cont. under the line)
To implement that, I personally asserted that literally if you draw a tree (decently), you see horizontally the one on the right is by all means greater than any one on its left.
In other words, quoting from Wikipedia (Binary search tree):
The key in each node must be greater than all keys stored in the left sub-tree, and smaller than all keys in the right sub-tree.
And it seems to only guaranteed to be true locally. With a figure like this:
A
/ \
B C
/ \ / \
D E F G
/ \
H I
(Letters have no order, i.e. just for structure)
By locally I mean when we talk about node E, it's guaranteed that D (with F and G) is smaller than E, but what about C, F, G compared to E, is that also guaranteed?
This seems quite intuitive (that F,C,G are all greater than E), but I don't find anyway to prove that, so is there any counterexample or theoretical proof? Any existed theorems or suggestions?
EDIT: I finally find this equivalent to why is in-order traversal of a binary search tree has a non-decreasing order.
This seems quite intuitive (that F,C,G are all greater than E), but I don't find anyway to prove that, so is there any counterexample or theoretical proof? Any existed theorems or suggestions?
F > A — definition of BST ("key in each node must be … smaller than all keys in the right sub-tree")
E < A — definition of BST ("key in each node must be greater than all keys stored in the left sub-tree")
E < A < F — transitivity
And so on for C and G
Imagine, you have same tree without E:
A
/ \
B C
/ / \
D F G
/ \
H I
Now you insert this E that greater than B. What if E greater than A also? In this case it will be inserted to right subtree of A, well? But while E is in right subtree of B, it less than A:
A
/ \
B C
/ \ / \
D E F G
/ \
H I
You need to differentiate between a "binary tree" and a "binary search tree" to be precise about it.
A binary search tree has the property you're looking for; all nodes in the left branch are smaller than all nodes in the right branch. If it's weren't the case, you couldn't use the search method usually associated -- that is, look at a node, and if you're looking for a smaller node, go left, if you're looking for a larger node, go right. I think both a basic BST, plus balanced trees like AVL and red-black, all observe this same property.
There are other binary tree data structures which aren't "Binary Search Trees" -- for example, a min-heap and max-heap are both binary trees, but the 'left is smaller than right' constraint is not met all the way through the tree. Heaps are used to find the smallest or largest element in a set, so you only normally reason about the node at the top of the tree.
As to a proof, I guess there is this; if you accept that the search algorithm works, then we can show that this property must hold. For instance, in this tree;
a
/ \
b n
/ \
c d
then let's say you wanted to prove that d is smaller than n, or any child. Imagine you were searching for d and you were at a. Obviously, you compare a to d and find that d is smaller -- that's how you know to traverse left, to b. So right there we've got to have faith that the entire left tree (b and under) must all be smaller than a. The same argument for the right hand side and numbers greater than a.
So left-children(a) < a < right-children(a)
In terrible pseudo-proof...

Union-Find operations on a tree?

can someone please explain the answer in bold? How it's done?
Below is a sequence of four Union-Find operations (with weighted-union and full com-
pression) that led to the following up-tree. What were the last two operations?
Answer: Union(D,A), Union(B,C), Union(D/A,B/C),Find(B/C).
The notation is used because of the sets.
Let us apply the four operations:
Union(D,A) leads to following tree:
D
/
A
Union(B,C) leads to following tree:
B
/
C
Now Union(D/A,B/C) means that because D and A belong to the same set, it does not matter what the first argument is, it can be D or it can be A. Similarly because B and C belong to the same set, it does not matter what the second argument is, it can be B or it can be C, the result will be the same.
The result will be after third operation:
D
/ \
A B
\
C
Now because compression is also allowed, the Find(C) operation would result in the tree:
D
/|\
A B C
If the fourth operation is Find(B), the tree would remain the same, because when we apply compression after a find operation, we make all the nodes encountered in the path upto the root immediate child of the root, but since we will not encounter C, we will not be able to make C immediate child of D, as it is in final tree.
Correct Answer
The correct sequence of four operations is:
Union(D,A), Union(B,C), Union(D/A,B/C),Find(C).

Find mode of a multiset in given time bound (most multiplicity)

The given problem:
A multiset is a set in which some of the elements occur more then once (e.g. {a, f, b, b, e, c, b, g, a, i, b} is a multiset). The elements are drawn from a totally ordered set. Present an algorithm, when presented with a multiset as input, finds an element that has the most occurrences in the multiset (e.g. in {a, f, b, b, e, c, b, g, a, c, b}, b has the most occurrences). The algorithm
should run in O(n lg n/M +n) time, where n is the number of elements in the multiset and M is the highest number of occurrences of an element in the multiset. Note that you do not know the value of M.
[Hint: Use a divide-and-conquer strategy based on the median of the list. The subproblems generated by the divide-and-conquer strategy cannot be smaller than a ‘certain’ size in order
to achieve the given time bound.]
Our initial solution:
Our idea was to use Moore's majority algorithm to determine if the multiset contained a majority candidate (eg. {a, b, b} has a majority, b). After determining if this was true or false we either output the result or find the median of the list using a given algorithm (known as Select) and split the list into three sublists (elements less than and equal to the median, and elements greater than the median). Again, we would check each of the lists to determine if the majority element was present, if so, that is your result.
For example, given the multiset {a, b, c, d, d, e, f}
Step 1: check for majority. None found, split the list based on the median.
Step 2: L1 = {a, b, c, d, d}, L2 = {e, f} Find the majority of each. None found, split the lists again.
Step 3: L11 = {a, b, c} L12 = {d, d} L21 = {e} L22 = {f} Check each for majority elements. L12 returns d. In this case, d is the most occurring elements in the original multiset, thus is the answer.
The issues we're having are whether this type of algorithm is fast enough, as well as whether this can be done recursively or if a loop that terminates is required. Like the hint says, the sub-problems cannot be smaller than a 'certain' size, which we believe to be M (the most occurrences).
If you use recursion in a most straight-forward way as described in your post, it will not have the desired time complexity. Why? Let's assume that the answer element is the largest one. Then it is always located in the right branch of recursion. But we call the left branch first, which can go much deeper if all elements are distinct there(getting pieces of size 1, while we do not want to get them smaller than M).
Here is a correct solution:
Let's always split the array into three parts at each step as described in your question. Now let's step aside and take a look at what we have: recursive calls form a tree. To get the desired time complexity, we should never go deeper than the level where the answer is located. To achieve this, we can traverse the tree using a breadth-first search with queue instead of a depth-first search. That's it.
If you want to do this in real life, it is worth considering using a hash table to track the counts. This can have amortized O(1) complexity per hash table access, so the overall complexity of the following Python code is O(n).
import collections
C = collections.Counter(['a','f','b','b','e','c','b','g','a','i','b'])
most_common_element, highest_count = C.most_common(1)[0]

O(1) algorithm to determine if node is descendant of another node in a multiway tree?

Imagine the following tree:
A
/ \
B C
/ \ \
D E F
I'm looking for a way to query if for example F is a descendant of A (note: F doesn't need to be a direct descendant of A), which, in this particular case would be true. Only a limited amount of potential parent nodes need to be tested against a larger potential descendants node pool.
When testing whether a node is a descendant of a node in the potential parent pool, it needs to be tested against ALL potential parent nodes.
This is what a came up with:
Convert multiway tree to a trie, i.e. assign the following prefixes to every node in the above tree:
A = 1
B = 11
C = 12
D = 111
E = 112
F = 121
Then, reserve a bit array for every possible prefix size and add the parent nodes to be tested against, i.e. if C is added to the potential parent node pool, do:
1 2 3 <- Prefix length
*[1] [1] ...
[2] *[2] ...
[3] [3] ...
[4] [4] ...
... ...
When testing if a node is a descendant of a potential parent node, take its trie prefix, lookup the first character in the first "prefix array" (see above) and if it is present, lookup the second prefix character in the second "prefix array" and so on, i.e. testing F leads to:
F = 1 2 1
*[1] [1] ...
[2] *[2] ...
[3] [3] ...
[4] [4] ...
... ...
so yes F, is a descendant of C.
This test seems to be worst case O(n), where n = maximum prefix length = maximum tree depth, so its worst case is exactly equal to the obvious way of just going up the tree and comparing nodes. However, this performs much better if the tested node is near the bottom of the tree and the potential parent node is somewhere at the top. Combining both algorithms would mitigate both worst case scenarios. However, memory overhead is a concern.
Is there another way for doing that? Any pointers greatly appreciated!
Are your input trees always static? If so, then you can use a Lowest Common Ancestor algorithm to answer the is descendant question in O(1) time with an O(n) time/space construction. An LCA query is given two nodes and asked which is the lowest node in the tree whose subtree contains both nodes. Then you can answer the IsDescendent query with a single LCA query, if LCA(A, B) == A or LCA(A, B) == B, then one is the descendent of the other.
This Topcoder algorithm tuorial gives a thorough discussion of the problem and a few solutions at various levels of code complexity/efficiency.
I don't know if this would fit your problem, but one way to store hierarchies in databases, with quick "give me everything from this node and downwards" features is to store a "path".
For instance, for a tree that looks like this:
+-- b
|
a --+ +-- d
| |
+-- c --+
|
+-- e
you would store the rows as follows, assuming the letter in the above tree is the "id" of each row:
id path
a a
b a*b
c a*c
d a*c*d
e a*c*e
To find all descendants of a particular node, you would do a "STARTSWITH" query on the path column, ie. all nodes with a path that starts with a*c*
To find out if a particular node is a descendant of another node, you would see if the longest path started with the shortest path.
So for instance:
e is a descendant of a since a*c*e starts with a
d is a descendant of c since a*c*d starts with a*c
Would that be useful in your instance?
Traversing any tree will require "depth-of-tree" steps. Therefore if you maintain balanced tree structure it is provable that you will need O(log n) operations for your lookup operation. From what I understand your tree looks special and you can not maintain it in a balanced way, right? So O(n) will be possible. But this is bad during creation of the tree anyways, so you will probably die before you use the lookup anyway...
Depending on how often you will need that lookup operation compared to insert, you could decide to pay during insert to maintain an extra data structure. I would suggest a hashing if you really need amortized O(1). On every insert operation you put all parents of a node into a hashtable. By your description this could be O(n) items on a given insert. If you do n inserts this sounds bad (towards O(n^2)), but actually your tree can not degrade that bad, so you probably get an amortized overall hastable size of O(n log n). (actually, the log n part depends on the degration-degree of your tree. If you expect it to be maximal degraed, don't do it.)
So, you would pay about O(log n) on every insert, and get hashtable efficiency O(1) for a lookup.
For a M-way tree, instead of your bit array, why not just store the binary "trie id" (using M bits per level) with each node? For your example (assuming M==2) : A=0b01, B=0b0101, C=0b1001, ...
Then you can do the test in O(1):
bool IsParent(node* child, node* parent)
{
return ((child->id & parent->id) == parent->id)
}
You could compress the storage to ceil(lg2(M)) bits per level if you have a fast FindMSB() function which returns the position of the most significant bit set:
mask = (1<<( FindMSB(parent->id)+1) ) -1;
retunr (child->id&mask == parent->id);
In a pre-order traversal, every set of descendants is contiguous. For your example,
A B D E C F
+---------+ A
+---+ B
+ D
+ E
+-+ C
+ F
If you can preprocess, then all you need to do is number each node and compute the descendant interval.
If you can't preprocess, then a link/cut tree offers O(log n) performance for both updates and queries.
You can answer query of the form "Is node A a descendant of node B?" in constant time, by just using two auxiliary arrays.
Preprocess the tree, by visiting in Depth-First order, and for each node A store its starting and ending time in the visit in the two arrays Start[] and End[].
So, let us say that End[u] and Start[u] are respectively the ending and starting time of the visit of node u.
Then node u is a descendant of node v if and only if:
Start[v] <= Start[u] and End[u] <= End[v].
and you are done, checking this condition requires just two lookup in the arrays Start and End
Take a look at Nested set model It's very effective to select but too slow to update
For what it's worth, what you're asking for here is equivalent to testing if a class is a subtype of another class in a class hierarchy, and in implementations like CPython this is just done the good old fashioned "iterate the parents looking for the parent" way.

Resources