Drawing consisting of two linked lists with pointers and values - algorithm

I just started taking an algorithms course, however, due to some family issues, I did not have a chance to participate in the first two lectures. Now I'm in a bit of a pickle, because I don't understand much of what is going on.
Above is a picture of a task that I need to solve. From what i understand, L0 is a list containing all values of S and L1 is a list containing all values of S and a pointer to the corresponding value in L0. However, what I do not understand is when they start bringing in delta and drawings. If anyone could clarify the meaning of delta and the parameter delta = 3, I might have a chance of solving it.
Any help is appreciated.

Here, 'delta' is just a parameter. You can call it 'd' if you prefer.
L0 contains all elements of S (as a linked list).
L1 contains every 'delta'th element of S as a linked list, with a pointer to the corresponding value in L0.
So the answer to 2.1 is something like:
L0: 1 -> 2 -> 3 -> 4 -> 5 -> 6 -> 7 -> 8
^ ^ ^
| | |
L1: 1 -----------> 4 -----------> 7
That is, L1 contains the 0th, 3rd, and 6th (i=0, delta, 2*delta, where delta=3) element of S.

Related

Relational Algebra Division

I'm currently dealing with a relational algebra division issue. I have the following two relations:
A | B | C B
--|---|-- ---
1 | 2 | 3 2
Relation R = 1 | 2 | 6 Relation T =
4 | 2 | 2
4 | 5 | 6
Now I'm doing the following operation: R ÷ T
When I calculate this, my result is as follows:
A | C
--|--
1 | 3
R ÷ T = 1 | 6
4 | 2
For me it is because for the division I look at those tuples in R which are present in combination with all tuples in T. But when I use a relational algebra calculator, such as RelaX it returns
A | C
--|--
R ÷ T = 4 | 2
Where did I make a mistake? Thanks in advance for any help.
Is there anybody who can help?
Performing division on these schema is not good to fully understand how the operator works. Definitions of this operator are not very clear, and the operation is usually replaced by a combination of other operators.
A clear way to see how this works in your case would consist on create a new instance of R with columns ordered properly defining a new schema: (A,C,B). This is, making the attribute of T appear as the last attribute in A. In this case where you are performing a really simple division, it's pretty straightforward to see the result, but imagine if you had schema R(A,D,B,C) and T(B,D). Here the attributes of T appear with a different order in R, and given some instances with not many tuples would already make it difficult to check just by looking them up.
This might be the most difficult operator defined in relational algebra as a query usually involves concepts from selection, projection and join. Also it's complicated to put it out only on words.
A good way of thinking about this operator, is to think about the GROUP BY on SQL. In your example this means using GROUP BY attributes A,C - which would create groups with every combination of different values for these attributes that appear on the schema instance. Each of these groups will have a set of all the values of B associated with the combinations of values of A, C. If you think of the values of the attribute B in the instance of T as a set, you can quickly verify: for each group obtained by grouping by A,C, if the set of values of B in T is included in the set of values of B in R, then the values of A,C are a tuple of the resulting relation.
I know I'm a bit late to this question but I've seen many people confused about this. If it's not clear enough, I leave reference to a document I wrote explaining it much more in detail and with a really good example, HERE.

Why postfix (rpn) notation is more used than prefix?

By use I mean its use in many calculators like HP35-
My guesses (and confusions) are -
postfix is actually more memory efficient -( SO post comments here ). (confusion - The evaluation algorithm of both are similar with a stack)
keyboard input type in calculators back then(confusion - this shouldn't have mattered much as it only depends on order of operators given first or last)
Another way this question can be asked is what advantages postfix notation have over prefix?
Can anyone enlighten me?
For one it is easier to implement evaluation.
With prefix, if you push an operator, then its operands, you need to have forward knowledge of when the operator has all its operands. Basically you need to keep track of when operators you've pushed have all their operands so that you can unwind the stack and evaluate.
Since a complex expression will likely end up with many operators on the stack you need to have a data structure that can handle this.
For instance, this expression: - + 10 20 + 30 40 will have one - and one + on the stack at the same time, and for each you need to know if you have the operands available.
With suffix, when you push an operator, the operands are (should) already on the stack, simply pop the operands and evaluate. You only need a stack that can handle operands, and no other data structure is necessary.
Prefix notation is probably used more commonly ... in mathematics, in expressions like F(x,y). It's a very old convention, but like many old systems (feet and inches, letter paper) it has drawbacks compared to what we could do if we used a more thoughtfully designed system.
Just about every first year university math textbook has to waste a page at least explaining that f(g(x)) means we apply g first then f. Doing it in reading order makes so much more sense: x.f.g means we apply f first. Then if we want to apply h "after" we just say x.f.g.h.
As an example, consider an issue in 3d rotations that I recently had to deal with. We want to rotate a vector according to XYZ convention. In postfix, the operation is vec.rotx(phi).roty(theta).rotz(psi). With prefix, we have to overload * or () and then reverse the order of the operations, e.g., rotz*roty*rotx*vec. That is error prone and irritating to have to think about that all the time when you want to be thinking about bigger issues.
For example, I saw something like rotx*roty*rotz*vec in someone else's code and I didn't know whether it was a mistake or an unusual ZYX rotation convention. I still don't know. The program worked, so it was internally self-consistent, but in this case prefix notation made it hard to maintain.
Another issue with prefix notation is that when we (or a computer) parses the expression f(g(h(x))) we have to hold f in our memory (or on the stack), then g, then h, then ok we can apply h to x, then we can apply g to the result, then f to the result. Too much stuff in memory compared to x.f.g.h. At some point (for humans much sooner than computers) we will run out of memory. Failure in that way is not common, but why even open the door to that when x.f.g.h requires no short term memory. It's like the difference between recursion and looping.
And another thing: f(g(h(x))) has so many parentheses that it's starting to look like Lisp. Postfix notation is unambiguous when it comes to operator precedence.
Some mathematicians (in particular Nathan Jacobson) have tried changing the convention, because postfix so much easier to work with in noncommutative algebra where order really matters, to little avail. But since we have a chance to do things over, better, in computing, we should take the opportunity.
Basically, because if you write the expression in postfix, you can evaluate that expression using just a Stack:
Read the next element of the expression,
If it is an operand, push into Stack,
Otherwise read from Stack operands required by the Operation, & push the result into Stack.
If not the end of the expression, go to 1.
Example
expression = 1 2 + 3 4 + *
stack = [ ]
Read 1, 1 is Operand, Push 1
[ 1 ]
Read 2, 2 is Operand, Push 2
[ 1 2 ]
Read +, + is Operation, Pop two Operands 1 2
Evaluate 1 + 2 = 3, Push 3
[ 3 ]
Read 3, 3 is Operand, Push 3
[ 3 3 ]
Read 4, 4 is Operand, Push 4
[ 3 3 4 ]
Read +, + is Operation, Pop two Operands 3 4
Evaluate 3 + 4 = 7, Push 7
[ 3 7 ]
Read *, * is Operation, Pop two Operands 3 7
Evaluate 3 * 7 = 21, Push 21
[ 21 ]
If you like your human reading order to match the machine's stack-based evaluation order then postfix is a good choice.
That is, assuming you read left-to-right, which not everyone does (e.g. Hebrew, Arabic, ...). And assuming your machine evaluates with a stack, which not all do (e.g. term rewriting - see Joy).
On the other hand, there's nothing wrong with the human preferring prefix while the machine evaluates "back to front/bottom-to-top". Serialization could be reversed too if the concern is evaluation as tokens arrive. Tool assistance may work better in prefix notation (knowing functions/words first may help scope valid arguments), but you could always type right-to-left.
It's merely a convention I believe...
Offline evaluation of both notation is same in theoretical machine
(Eager evaluation strategy)Evaluating with only one stack(without putting operator in stack)
It can be done by evaluating Prefix-notation right-to-left.
- 7 + 2 3
# evaluate + 2 3
- 7 5
# evaluate - 7 5
2
It is same as evaluating Postfix-notation left-to-right.
7 2 3 + -
# put 7 on stack
7 2 3 + -
# evaluate 2 3 +
7 5 -
# evaluate 7 5 -
2
(Optimized short-circuit strategy) Evaluating with two stacks(one for operator and one for operand)
It can be done by evaluating Prefix-notation left-to-right.
|| 1 < 2 3
# put || in instruction stack, 1 in operand stack or keep the pair in stack
instruction-stack: or
operand-stack: 1
< 2 3
# push < 2 3 in stack
instruction-stack: or, less_than
operand-stack: 1, 2, 3
# evaluate < 2 3 as 1
instruction-stack: or
operand-stack: 1, 1
# evaluate || 1 1 as 1
operand-stack:1
Notice that we can do short-circuit optimization for the boolean expression here easily(compared to previous evaluation sequence).
|| 1 < 2 3
# put || in instruction stack, 1 in operand stack or keep the pair in stack
instruction-stack: or
operand-stack: 1
< 2 3
# Is it possible to evaluate `|| 1` without evaluating the rest ? Yes !!
# skip < 2 3 and put place-holder 0
instruction-stack: or
operand-stack: 1 0
# evaluate || 1 0 as 1
operand-stack: 1
It is same as evaluating Postfix-notation right-to-left.
(Optimized short-circuit strategy) Evaluating with one stack that takes a tuple (same as above)
It can be done by evaluating Prefix-notation left-to-right.
|| 1 < 2 3
# put || 1 in tuple-stack
stack tuple[or,1,unknown]
< 2 3
# We do not need to compute < 2 3
stack tuple[or,1,unknown]
# evaluate || 1 unknown as 1
1
It is same as evaluating Postfix-notation right-to-left.
Online evaluation in a calculator while human entering data in left-to-right
When putting numbers in a calculator, the Postfix-notation 2 3 + can be evaluated instantly without any knowledge of the symbol human is going to put. It is opposite of Prefix notation because when we have - 7 + we have nothing to do, not until we get something like - 7 + 2 3.
Online evaluation in a calculator while human entering data in right-to-left
Now the Prefix-notation can evaluate + 2 3 instantly, while the Postfix-notation waits for further input when it has 3 + - .
Please refer to #AshleyF note that the Arabic-language writes from right-to-left in contrast to English-language that writes from left-to-write !
I guess little-endian and big-endian is something related to this prefix/postfix notation.
One final comment, Reverse-Polish notation is strongly supported by Dijkstra (he is strong opponent of short-circuit optimization and regarded as the inventor of Reverse-Polish notation). It is your choice to support his opinion or not(I do not).

Print Any Older Version of the Queue [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
I came across this question and thought of asking what is the best possible way to achieve this.
Given a FIFO queue. you can assume that queue contains integers. Every time an insertion or deletion happens, a new version of the queue is created. At any time, you have to print(whole content of the queue) any older version of the queue with minimal time and space complexity.
This is assuming you meant a FIFO queue and not some other kind of queue, like a priority queue.
Store it in an array and keep two pointer variables, one to its head and another to its tail.
For example:
insert 3 2 5 9 - version 1
q = [3 2 5 9]
^ ^
delete, delete - version 2
q = [3 2 5 9]
^ ^
insert 6 3 4 - version 3
q = [3 2 5 9 6 3 4]
^ ^
To print a version, you just need to store two values for each version: where the pointers were. Printing is then linear in the size of the queue at that version.
The vector can grow big, but you have to store every element there ever was in it if you want to be able to print any version.
You can also use a linked list to avoid array resizing if you consider that a problem. Just make sure not to remove a node from memory when deleting.
Your problem is to make the queue a partially persistent data structure.
Partially persistent means that you can query any version, but you only can make updates in the most recent version.
A couple years ago I've given a speech about making any pointer data structure persistent. It was based on "Making data structures persistent" by Driscoll, Sarnak, Sleator and Tarjan.
Clearly, any queue can be implemented as a linked data structure. If you want the simplest practical version, you may be interested in method called "The Fat Node Method" which is described on page 91 in the above PDF.
The idea is to store in every node several pointers to the next elements corresponding to different versions of the queue. Each pointer has assigned a version number called timestamp.
For every insert or delete operation, you update pointers only in nodes touched by the update operation.
For lookup operation in the i-th version of the queue, you simply follow the pointers with the largest timestamp not exceeding i. You can find the pointer to follow using the binary search.
In the given PDF there is also a more complex, but also even more efficient method called "The Node-Copying Method".
There are many possible solutions. Here is one with all operations guaranteed O(log(n)) and normal operations an amortized O(log(log(n)).
Keep an operation counter. Store the items in a skip list (see http://en.wikipedia.org/wiki/Skip_list for a definition of that) based on the order of the insertion operation. When an element is removed, fill in the id of the removal operation. For efficiency of access, keep a pair of pointers to the current head and current tail.
To insert an element, add it to the current tail. To return an element, return it to the current head. To return a past state, search the skip list for the then head, then start walking the list until you read the then tail.
The log(n) operations here are finding the past head, and (very occasionally) inserting a new head that happens to be a high node in the skip list.
Now lets us assume that in a FIFO queue, the head pointer is at the beginning of the array, and then the tail pointer is at the end of the current insertion. Then by storing the current tail position pointer value in a variable which is used as the head pointer position during future insertion and tail position again takes the end of that insertion. This way just by using a single variable and with only insertion taking place, previous versions can be printed from the beginning of the array to the tail pointer.
insert 3 2 1 4
a=[3 2 1 4] --version 1
^ ^
b e
p = e;
insert 5 7 8
a=[3 2 1 4 5 7 8] --version 2
^ ^
b e
here version 1 = position 0 to position p = [3 2 1 4]
p = e;
delete 3
a=[2 1 4 5 7 8] --version 3
^ ^
b e
here version 2 = position 0 to position p =[2 1 4 7 8]
where
b = beginning
e = end
Hence by using a single variable to hold previous version tail position and assuming the beginning position to be always 0 , previous versions can be easily printed.

What is an "external node" of a "magic" 3-gon ring?

I want to solve Project Euler's problem #68 in C#, but I've so far not understood the question clearly. What does external node mean in this problem statement?
Consider the following "magic" 3-gon ring, filled with the numbers 1
to 6, and each line adding to nine.
4
\
3
/ \
1 - 2 - 6
/
5
Working clockwise, and starting from the group of three with the
numerically lowest external node (4,3,2 in this example), each
solution can be described uniquely. For example, the above solution
can be described by the set: 4,3,2; 6,2,1; 5,1,3.
'External node' is a node not included in the inner triangle (pentagon). On the first picture, 4, 5 and 6 are external nodes.
Regarding 'helping to understand the question', what other parts confuse you?
edit
In the first sentence it says 'each line adding to nine', 9 here is the total. You can calculate the 'total' of each solution by summing up numbers in any of 3 lines.
#Kristo Aun: Think of the '4,3,2; 6,2,1; 5,1,3' > '4,2,3; 5,3,1; 6,1,2' as numbers, which means 432621513 > 423531612. The numbers come from a single line in any order, though you need to start clockwise.
In the task, they say "By concatenating each group it is possible to form 9-digit strings; the maximum string for a 3-gon ring is 432621513."
What is meant by 'maximum'? How come '4,3,2; 6,2,1; 5,1,3' > '4,2,3; 5,3,1; 6,1,2'? It certainly doesn't make sense in terms of set theory...

First-Occurrence Parallel String Matching Algorithm

To be up front, this is homework. That being said, it's extremely open ended and we've had almost zero guidance as to how to even begin thinking about this problem (or parallel algorithms in general). I'd like pointers in the right direction and not a full solution. Any reading that could help would be excellent as well.
I'm working on an efficient way to match the first occurrence of a pattern in a large amount of text using a parallel algorithm. The pattern is simple character matching, no regex involved. I've managed to come up with a possible way of finding all of the matches, but that then requires that I look through all of the matches and find the first one.
So the question is, will I have more success breaking the text up between processes and scanning that way? Or would it be best to have process-synchronized searching of some sort where the j'th process searches for the j'th character of the pattern? If then all processes return true for their match, the processes would change their position in matching said pattern and move up again, continuing until all characters have been matched and then returning the index of the first match.
What I have so far is extremely basic, and more than likely does not work. I won't be implementing this, but any pointers would be appreciated.
With p processors, a text of length t, and a pattern of length L, and a ceiling of L processors used:
for i=0 to t-l:
for j=0 to p:
processor j compares the text[i+j] to pattern[i+j]
On false match:
all processors terminate current comparison, i++
On true match by all processors:
Iterate p characters at a time until L characters have been compared
If all L comparisons return true:
return i (position of pattern)
Else:
i++
I am afraid that breaking the string will not do.
Generally speaking, early escaping is difficult, so you'd be better off breaking the text in chunks.
But let's ask Herb Sutter to explain searching with parallel algorithms first on Dr Dobbs. The idea is to use the non-uniformity of the distribution to have an early return. Of course Sutter is interested in any match, which is not the problem at hand, so let's adapt.
Here is my idea, let's say we have:
Text of length N
p Processors
heuristic: max is the maximum number of characters a chunk should contain, probably an order of magnitude greater than M the length of the pattern.
Now, what you want is to split your text into k equal chunks, where k is is minimal and size(chunk) is maximal yet inferior to max.
Then, we have a classical Producer-Consumer pattern: the p processes are feeded with the chunks of text, each process looking for the pattern in the chunk it receives.
The early escape is done by having a flag. You can either set the index of the chunk in which you found the pattern (and its position), or you can just set a boolean, and store the result in the processes themselves (in which case you'll have to go through all the processes once they have stop). The point is that each time a chunk is requested, the producer checks the flag, and stop feeding the processes if a match has been found (since the processes have been given the chunks in order).
Let's have an example, with 3 processors:
[ 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 ]
x x
The chunks 6 and 8 both contain the string.
The producer will first feed 1, 2 and 3 to the processes, then each process will advance at its own rhythm (it depends on the similarity of the text searched and the pattern).
Let's say we find the pattern in 8 before we find it in 6. Then the process that was working on 7 ends and tries to get another chunk, the producer stops it --> it would be irrelevant. Then the process working on 6 ends, with a result, and thus we know that the first occurrence was in 6, and we have its position.
The key idea is that you don't want to look at the whole text! It's wasteful!
Given a pattern of length L, and searching in a string of length N over P processors I would just split the string over the processors. Each processor would take a chunk of length N/P + L-1, with the last L-1 overlapping the string belonging to the next processor. Then each processor would perform boyer moore (the two pre-processing tables would be shared). When each finishes, they will return the result to the first processor, which maintains a table
Process Index
1 -1
2 2
3 23
After all processes have responded (or with a bit of thought you can have an early escape), you return the first match. This should be on average O(N/(L*P) + P).
The approach of having the i'th processor matching the i'th character would require too much inter process communication overhead.
EDIT: I realize you already have a solution, and are figuring out a way without having to find all solutions. Well I don't really think this approach is necessary. You can come up with some early escape conditions, they aren't that difficult, but I don't think they'll improve your performance that much in general (unless you have some additional knowledge the distribution of matches in your text).

Resources