About deleting node from max-heap - data-structures

Why in most tutorials and documents (98% of what i watch or read) when they take about deleting node form a heap, they only delete the root node (first image where we delete 12 node root). Why not some node of its childrens (second image where we delete node 11 is an example scenario)?
I know that is not impossible to do that and how to do it but why everyone take only about the root node?
I watch about 7 courses in Arabic, other 7 in English and 4 in Turkish. And i read a lot of article about that also in different language!
But all of them delete the root node! Why?
Note: I am a beginner in computer science.
As i write in my question details, i read and watch a lot of courses and articles about this topic

Related

Design a data structure that supports build(s), insert(S,k), delete_max (S), delete_old(S,t) and add_to_new (S,d) in constant time

I'm doing an algorithm course and I'm stuck on a question.
I need to suggest a data structure that supports the following:
build(S): build the structure S with n values in O(n*lgn)
insert(S,k): insert the value k into S in O(lgn)
delete_max(S): delete the max value from S in O(lgn)
delete_old(S,t): delete the t oldest value from S in o(lng)
add_to_new(S,d): add value d to the last value that entered to S in O(Lng)
We just learned Red-Black Trees so I think I probably need to use this structure and add something to it, or to add another structure that will help me with delete_old(S,t).
I'm struggling to understand how to update the "age" of each value after I use the "delete_old(S,t)".
Lets say that I entered to S: 11,22,44,68,79... Then 11 is the oldest, 22 is the second oldest, etc.
After I deleted the 3rd oldest (value 44), now 68 need to become the 3rd oldest and 79 will be the 4th oldest, etc- how would I update the ages of all the remaining values after the one I deleted in O(lng)?
I hope my question is clear. Any help will be much appreciated! Thanks :)
To support two orderings, you need to mix two ordered data structures.
For example, you can create a node for each value, and link it into two separate red-black trees, one ordered by value and one ordered by age. Of course the node object would need two sets of links and two isRed fields.
If you want to leverage the libraries that your language already provides, then you can make a node object that just has a value and age in it, and insert it into two separate trees.

What are the first and the last elements in stack data structure?

I've studied stacks as data structures and I know that they work according to the LIFO ( Last-In-First-Out) Rule. I've encountered a problem that asks me to retrieve the first elements, as well as the last ones of the stack. Since I would rather have my knowledge double-checked by more experienced users, I am curios to know which elements of the stack are acknowledged to be the first and the last ones?
As an example, let us push in stack in this order 1,2,3,4,5,6. Which are the first two elements of the stack and which are the last two ones?
It helps to think of a stack from top to bottom like a real-life stack of books. If you push book 1 on the stack, followed by books 2-6, your stack looks like:
6
5
4
3
2
1
So 1 and 2 are your first two elements (books), and 6, 5 are your last and second-to-last. Since book 6 was last, it's the first to come off the stack (pop in programming parlance), otherwise your stack of books will fall!
On this question, people might have different opinions. What best can be done is understand the requirement of the problem in hand. Like if you are solving problems of stack on online judges then you can get the clarification from the problem statements/examples or if you are solving the problem in your professional work you can get the clarification from requirement analysis. At the end, if you could solve the problem you are supposed to, it doesn't matter you call top element as first or bottom element as first.

Why is it important to delete files in-order to remove them faster?

Some time ago I learned that rsync deletes files much faster that many other tools.
A few days ago I came across this wonderful answer on Serverfault which explains why rsync is so good at deleting files.
Quotation from that answer:
I revisited this today, because most filesystems store their directory
structures in a btree format, the order of which you delete files is
also important. One needs to avoid rebalancing the btree when you
perform the unlink. As such I added a sort before deletes occur.
Could you explain how does removing files in-order prevents or reduces the number of btree rebalancings?
I expect the answer to show how deleting in order increase deletion speed, with details of what happens at btree level. People, who wrote rsync and another programs (see links in the question) used this knowledge to create better programs. I think it's important for other programmers to have this understanding to be able to write better soft.
It is not important, nor b-tree issue. It is just a coincidence.
First of all, this is very much implementation dependent and very much ext3 specific. That's why I said it's not important (for general use). Otherwise, put the ext3 tag or edit the summary line.
Second of all, ext3 does not use b-tree for the directory entry index. It uses Htree. The Htree is similar to b-tree but different and does not require balancing. Search "htree" in fs/ext3/dir.c.
Because of the htree based index, a) ext3 has a faster lookup compare to ext2, but b) readdir() returns entries in hash value order. The hash value order is random relative to file creation time or physical layout of data. As we all know random access is much slower than sequential access on a rotating media.
A paper on ext3 published for OLS 2005 by Mingming Cao, et al. suggests (emphasis mine):
to sort the directory entries returned by readdir() by inode number.
Now, onto rsync. Rsync sorts files by file name. See flist.c::fsort(), flist.c::file_compare(), and flist.c::f_name_cmp().
I did not test the following hypothesis because I do not have the data sets from which #MIfe got 43 seconds. but I assume that sorted-by-name was much closer to the optimal order compare to the random order returned by readdir(). That was why you saw much faster result with rsync on ext3. What if you generate 1000000 files with random file names then delete them with rsync? Do you see the same result?
Let's assume that the answer you posted is correct, and that the given file system does indeed store things in a balanced tree. Balancing a tree is a very expensive operation. Keeping a tree "partially" balanced is pretty simple, in that when you allow for a tree to be imbalanced slightly, you only worry about moving things around the point of insertion/deletion. However, when talking about completely balanced trees, when you remove a given node, you may find that suddenly, the children of this node could belong on the complete opposite side of the tree, or a child node on the opposite side has become the root node, and all of it's children need to be rotated up the tree. This requires you to do either a long series of rotations, or to place all the items into an array and re-create the tree.
5
3 7
2 4 6 8
now remove the 7, easy right?
5
3 8
2 4 6
Now remove the 6, still easy, yes...?
5
3 8
2 4
Now remove the 8, uh oh
5
3
2 4
Getting this tree to be the proper balanced form like:
4
3 5
2
Is quite expensive, compared at least to the other removals we have done, and gets exponentially worse as the depth of our tree increases. We could make this go much(exponentially) faster by removing the 2 and the 4, before removing the 8. Particularly if our tree was more than 3 levels deep.
Without sorting removal is on average a O(K * log_I(N)^2). N representing the number of elements total, and K the number to be removed, I the number of children a given node is permitted, log_I(N) then being the depth, and for each level of depth we increase the number of operations quadratically.
Removal with some ordering help is on average O(K * log_I(N)), though sometimes ordering cannot help you and you are stuck removing something that will require a re-balance. Still, minimizing this is optimal.
EDIT:
Another possible tree ordering scheme:
8
6 7
1 2 3 4
Accomplishing optimal removal under this circumstance would be easier, because we can take advantage of our knowledge of how things are sorted. Under either situation it is possible, and in fact both are identical, under this one the logic is just a little simpler to understand, because the ordering is more human friendly for the given scenario. In either case in-order is defined as "remove the farthest leaf first", in this case it just so happens to be that the farthest leaves are also the smallest numbers, a fact that we could take advantage of to make it even a little more optimal, but this fact is not necessarily true for the file system example presented(though it may be).
I am not convinced that the number of B-tree rebalancing changes significantly if you delete the files in-order. However I do believe that the number of different seeks to external storage will be significantly smaller if you do this. At any time, the only nodes in the B-tree that need be visited will then be the far right boundary of the tree, whereas, with a random order, each leaf block in the B tree is visited with equal probability for each file.
Rebalancing for B-Trees are cheaper than B-Tree+ implementations that's why most filesystems and database indexes implementations use them.
There are many approaches when deletion, depending on the approach it can be more efficient in terms of time and the need to rebalance the tree. You'll also have to consider the size of the node, since the number of keys the node can store will affect the need for rebalancing the tree. A large node size will just reorder keys inside the node, but a small one probably will make the tree rebalance many times.
A great resource for understanding this is the famous CLR (Thomas Cormen) book "Introduction to Algorithms".
On storage systems where hosting huge directories, the buffer cache will be under stress and buffers may get recycled. So, if you have deletes spaced apart by time, then the number of disk reads to get the btree back in core into the buffer cache, between deletes, may be high.
If you sort the files to be deleted, effectively you are delaying the deletes and bunching them. This may have the side effect of more deletes per block of btree paged in. If there are stats to say what the buffer cache hits are between the two experiments, it may tell if this hypo is wrong or not.
But, if there is no stress on the buffer cache during the deletes, then the btree blocks could stay in core and then my hypothesis is not a valid one.

How to implement minimax in Tictactoe

I read this answer, and it just confused me: TicTacToe AI Making Incorrect Decisions
Could somebody help me understand how I could apply this to Tictactoe?
How would I "work my way up through the tree?
How do I even create a tree of moves?
Note: I currently have a Board class which stores state about the game (e.g. Is the game complete with the current moves?, Is there a winner?, etc.) Each move on the current board is stored as 1 - 9 (top left to bottom right in rows). I can make copies of the current board state with ease. I can return a list of current moves for "X" and "O", as well as available moves from a Board.
Solving Tic-Tac-Toe: Game Tree Basics
Category: Game Theory
Posted on: July 30, 2008 11:38 AM, by Mark C. Chu-Carroll
The picture pretty much says it all, but here is a link to the blog post:
http://scienceblogs.com/goodmath/2008/07/30/solving-tictactoe-game-tree-ba/
I can answer your question "2", and hopefully this should help you figure out question "1":
Every node in the tree represents the current state of the game after some number of moves. So the root of the tree represents the game at the start (i.e. no pieces played so far). It has nine children (one for each possible first move). Each child in turn has 8 children (one for each possible second move). And so on, until you reach points where the game has been won or drawn. These are the leaf nodes.

Strategies for quickly traversing an ACL

We are currently working on a project where the main domain objects are content nodes and we are using an ACL-like system where each node in the hierarchy can contain rules that override or complement those on their parents. Everything is as well based on roles and actions, for example.
Node 1 - {Deny All, Allow Role1 View}
\- Node 2 - {Allow Role2 View}
\- Node 3 - {Deny Role1 View}
In that case, rules will be read in order from top to bottom so the Node 3 can be viewed only by Role2. It's not really complicated in concept.
Retrieving the rules for a single node can result in some queries, getting all the parents and then recreating the list of rules and evaluating them, but this process can be cumbersome because the hierarchy can become quite deep and there may be a lot of rules on each node.
I have been thinking on preparing a table with precalculated rules for each node which could be recreated whenever a permission is changed and propagated to all the leaf nodes of the updated one.
Do you think of any other strategy to speed up the retrieval and calculation of the rules? Ideally it should be done in a single query, but trees are not the best structures for that.
I would think that an Observer Pattern might be adapted.
The idea would be that each Node maintains a precomputed list and is simply notified by its parent of any update so that it can recompute this list.
This can be done in 2 different ways:
notify that a change occurred, but don't recompute anything
recompute at each update
I would advise going with 1 if possible, since it does not involve recomputing the whole world when root is updated, and only recomputing when needed (lazy eval in fact) but you might prefer the second option if you update rarely but need blazing fast retrieval (there are more concurrency issues though).
Let's illustrate Solution 1:
Root ___ Node1 ___ Node1A
\ \__ Node1B
\_ Node2 ___ Node2A
\__ Node2B
Now, to begin with, none of them has precomputed anything (there are all in a dirty state), if I ask for Node2A rules:
Node2A realizes it is dirty: it queries Node2 rules
Node2 realizes it is dirty: it queries Root
Root does not have any parent, so it cannot be dirty, it sends its rules to Node2
Node2 caches the answer from Root, merges its rules with those received from Root and cleans the dirty bit, it sends the result of the merge (cached now) to Node2A
Node2A caches, merges, cleans the dirty bit and returns the result
If I subsequently asks for Node2B rules:
Node2B is dirty, it queries Node2
Node2 is clean, it replies
Node2B caches, merges, cleans the dirty bit and returns the result
Note that Node2 did not recomputed anything.
In the update case:
I update Node1: I use the Root cached rules to recompute the new rules and send a beat to Node1A and Node1B to notify them their cache is outdated
Node1A and Node1B set their dirty bit, they would also have propagated this notification had they had children
Note that because I cached Root rules I don't have to query the Root object, if it's a simple enough operation, you might prefer not to cache them at all: if you're not playing distributed here, and querying Root only involves a memory round-trip you might prefer not to duplicate it in order to save up some memory and book-keeping.
Hopes it gets you going.
Your version of pre-computation appears to store all the permissions relevant to each role at each node. You can save a little time and space by traversing the tree, numbering the nodes as you reach them, and producing, for each role, an array of the node numbers and permission changes just for the nodes at which the permissions relevant to that role change. This produces output only linear in the size of the input tree (including its annotations). Then when you come to check a permission for a role at a node, use the number of that node to search the array to find the point in the array that represents the most recent change of permission when you visited that node during the tour.
This may be associated in some way with http://en.wikipedia.org/wiki/Range_Minimum_Query and http://en.wikipedia.org/wiki/Lowest_common_ancestor, but I don't really know if those references will help or not.

Resources