Visualizing and Solving recusrive questions without a computer - algorithm

Say I've something like this: (predict the output)
void abc (char *s){
if(s[0]=='\0')
return;
abc(s+1);
abc(s+1);
printf(“%c “, s[0]);
}
It's not tough to solve, but I take too much time doing it and I've to redo such questions 2-3 times because I lose track of the recursion and values of variables(especially when there are 2-3 such recursive statements)
Is there any good method to use when one has to solve such questions?

The basic technique is to first start with a small input. Then try with one larger. Then try with one larger than that. For recursive functions, a pattern should emerge that lets you predict what the next one will look like given you know what the previous one looked like.
So, let's start with an empty string. Easy, nothing is printed.
input: ""
output:
Next is a string of length one. Almost as easy, the two recursive calls each do nothing (empty string case), and then the string's character is printed.
input: "z"
output: z
Next is a string of length two. Each of the recursive calls end up printing the second character (string of length one case), and then the first character is printed.
input: "yz"
output: zzy
So, let's try to predict what will happen for the string of length three case. What will happen is that the substring that excludes the first character gets worked on twice, then the first character is printed. That substring is the string of length two case. So:
input: "xyz"
output: zzyzzyx
So, it should be clear now how to derive the next output sequence given the current output sequence.

The easiest example for analyzing recursion is Fibonacci and Factorial function.
This will help you in analyzing recursive functions in a better manner. Whenever you lose track of recursive functions just recall these examples.

Take a stack of index cards of an appropriate size. Start tracing the initial call to the recursive function. When you get another call start a new index card and either put it in front of the first card or behind it (as appropriate). Sooner or later you will (unless you are tracing an infinite recursion) trace the execution of a call which does not make a recursive call, in which case copy the return value back to the card you came from.
It's probably a good idea to include 'go to card X' and 'came from card Y' on your cards.
In complicated situations you might find it useful to create more than one stack of cards to trace your function calls, oh why the heck, why not call them call stacks.

Related

Turing Machine Algorithm

Could you please help me? I need to write code for a one-tape Turing Machine that uses the following two-letter alphabet a and b.
So the programme should show the common prefix of the two words.
For example:
g(aab,aaaba) -> aa; g(_,abab) -> _; g(aaba,baa) -> _; g(_,_) -> _; g(babaab,babb) -> bab
Where g is the function of the Machine and underscore means an Empty word, between words we have space
I tried to implement the following option:
If at the start we see the letter a, then we erase it and move to the beginning of the second word. If we also see a letter a there, we erase it too and after both words we write a through a space. After that we return to the beginning of the first word and repeat this operation. When the first letter of the first word and the first letter of the second no longer match, we erase everything that is left.
But I have some troubles with code, because after each operation a space between two words gets longer and I don't know how to control this. Also there is a trouble when the first or the second word is a common prefix fully, like this:
g(baa,baabab) -> baa
Your approach seems reasonable. From your description it sounds like you just have trouble generalizing some of the individual steps.
For instance, to deal with the growing spaces between the two words, remember that at any time in the program, the two words are separated by one or more spaces. So implement your seek operation for that general case.
For the common prefix case you have to deal with the situation that you eventually run out of characters to compare. So after deleting the character from the first word, while seeking for the beginning of the second word, check whether the first character you pass over is a letter or a space. If it's a space, you're in the prefix case and need to take care that you don't try to seek back to the first word later, because you already erased all of it and there's only spaces left. Similarly, if the second word is the prefix, you can detect this when seeking to the output.
Try breaking the algorithm down into its essential steps and test each of those steps in isolation. It is much easier to make sure you handle the corner cases correctly when you can focus on a simple step in isolation, instead of having to test it as part of the larger algorithm. Doing this is an essential skill in debugging code, so consider this a good exercise for that. Even if it seems painful at first, make sure you have a structured approach to analyzing problems and breaking your code down into smaller parts, and you will be able to fix any problems eventually. Happy coding!

Parse expression with functions

This is my situation: the input is a string that contains a normal mathematical operation like 5+3*4. Functions are also possible, i.e. min(5,A*2). This string is already tokenized, and now I want to parse it using stacks (so no AST). I first used the Shunting Yard Algorithm, but here my main problem arise:
Suppose you have this (tokenized) string: min(1,2,3,+) which is obviously invalid syntax. However, SYA turns this into the output stack 1 2 3 + min(, and hopefully you see the problem coming. When parsing from left to right, it sees the + first, calculating 2+3=5, and then calculating min(1,5), which results in 1. Thus, my algorithm says this expression is completely fine, while it should throw a syntax error (or something similar).
What is the best way to prevent things like this? Add a special delimiter (such as the comma), use a different algorithm, or what?
In order to prevent this issue, you might have to keep track of the stack depth. The way I would do this (and I'm not sure it is the "best" way) is with another stack.
The new stack follows these rules:
When an open parentheses, (, or function is parsed, push a 0.
Do this in case of nested functions
When a closing parentheses, ), is parsed, pop the last item off and add it to the new last value on the stack.
The number that just got popped off is how many values were returned by the function. You probably want this to always be 1.
When a comma or similar delimiter is parsed, pop from the stack, add that number to the new last element, then push a 0.
Reset so that we can begin verifying the next argument of a function
The value that just got popped off is how many values were returned by the statement. You probably want this to always be 1.
When a number is pushed to the output, increment the top element of this stack.
This is how many values are available in the output. Numbers increase the number of values. Binary operators need to have at least 2.
When a binary operator is pushed to the output, decrement the top element
A binary operator takes 2 values and outputs 1, thus reducing the overall number of values left on the output by 1.
In general, an n-ary operator that takes n values and returns m values should add (m-n) to the top element.
If this value ever becomes negative, throw an error!
This will find that the last argument in your example, which just contains a +, will decrement the top of the stack to -1, automatically throwing an error.
But then you might notice that a final argument in your example of, say, 3+ would return a zero, which is not negative. In this case, you would throw an error in one of the steps where "you probably want this to always be 1."

Fortran implied do write speedup

tl;dr: I found that an "implied do" write was slower than an explicit one under certain circumstances, and want to understand why/if I can improve this.
Details:
I've got a code that does something to the effect of:
DO i=1,n
calculations...
!m, x, and y all change each pass through the loop
IF(m.GT.1)THEN
DO j=1,m
WRITE(10,*)x(j),y(j) !where 10 is an output file
ENDDO
ENDIF
ENDDO
The output file ends up being fairly large, and so it seems like the writing is a big performance factor, so I wanted to optimize it. Before anyone asks, no, moving away from ASCII isn't an option due to various downstream requirements. Accordingly, I rewrote the IF statement (and contents) as:
IF(m.GT.1)THEN
!build format statement for write
WRITE(mm1,*)m-1
mm1=ADJUSTL(mm1)
!implied do write statement
WRITE(10,'('//TRIM(mm1)//'(i9,1x,f7.5/),i9,1x,f7.5)')(x(j),y(j),j=1,m)
ELSEIF(m.EQ.1)THEN
WRITE(10,'(i9,1x,f7.5)')x(1),y(1)
ENDIF
This builds the format statement according to the # of values to be written out, then does a single write statement to output things. I've found that the code actually runs slower with this formulation. For reference, I've seen significant speedup on the same system (hardware and software) when going to an implied do write statement when the amount of data to be written was fixed. Under the assumption that the WRITE statement, itself, is faster, then that would mean the overhead from the couple of lines building that statement are what take the added time, but that seems hard to believe. For reference, m can vary a fair amount, but probably averages at least 1000. Is the concatenation of strings // a very slow operator, or is there something else I'm missing? Thanks in advance.
I haven't specific timing information to add, but your data transfer with an implied do loop is needlessly complicated.
In the first fragment, with the explicit looping, you are writing each pair of numbers to distinct records and you wish to repeat this output with the implied do loop. To do this, you use the slash edit descriptor to terminate each record once a pair has been written.
The needless complexity comes from two areas:
you have distinct cases for one/more than one pair;
for the more-than-one case you construct a format including a "dynamic" repeat count.
As Vladimir F comments you could just use a very large repeat count: it isn't erroneous for an edit descriptor to be processed when there are no more items to be written. The output terminates (successfully) when reaching such a non-matching descriptor. You could, then, just write
WRITE(10,'(*(i9,1x,f7.5/))') (x(j),y(j),j=1,m) ! * replacing a large count
rather than the if construct and the format creation.
Now, this doesn't quite match your first output. As I mentioned above, output termination comes about when a data edit descriptor is reached when there is no corresponding item to output. This means that / will be processed before that happens: you have a final empty record.
The colon edit descriptor is useful here:
WRITE(10,'(*(i9,1x,f7.5,:,/))') (x(j),y(j),j=1,m)
On reaching a : processing stops immediately if there is no remaining output item to process.
But my preferred approach is the far simpler
WRITE(10,'(i9,1x,f7.5)') (x(j),y(j),j=1,m) ! No repeat count
You had the more detailed format to include record termination. However, we have what is known as format reversion: if a format end is reached and more remains to be output then the record is terminated and processing goes back to the start of the format.
Whether these things make your output faster remains to be seen, but they certainly make the code itself much cleaner and clearer.
As a final note, it used to be trendy to avoid additional X editing. If your numbers fit inside the field of width 7 then 1x,f7.5 could be replaced by f8.5 and have the same look: the representation is right-justified in the field. It was claimed that this reduction had performance benefits with fewer switching of descriptors.

Is this a correct way to think about recursivity in programming? (example)

I've been trying to learn what recursion in programming is, and I need someone to confirm whether I have thruly understood what it is.
The way I'm trying to think about it is through collision-detection between objects.
Let's say we have a function. The function is called when it's certain that a collision has occured, and it's called with a list of objects to determine which object collided, and with what object it collided with. It does this by first confirming whether the first object in the list collided with any of the other objects. If true, the function returns the objects in the list that collided. If false, the function calls itself with a shortened list that excludes the first object, and then repeats the proccess to determine whether it was the next object in the list that collided.
This is a finite recursive function because if the desired conditions aren't met, it calls itself with a shorter and shorter list to until it deductively meets the desired conditions. This is in contrast to a potentially infinite recursive function, where, for example, the list it calls itself with is not shortened, but the order of the list is randomized.
So... is this correct? Or is this just another example of iteration?
Thanks!
Edit: I was fortunate enough to get three VERY good answers by #rici, #Evan and #Jack. They all gave me valuable insight on this, in both technical and practical terms from different perspectives. Thank you!
Any iteration can be expressed recursively. (And, with auxiliary data structures, vice versa, but not so easily.)
I would say that you are thinking iteratively. That's not a bad thing; I don't say it to criticise. Simply, your explanation is of the form "Do this and then do that and continue until you reach the end".
Recursion is a slightly different way of thinking. I have some problem, and it's not obvious how to solve it. But I observe that if I knew the answer to a simpler problem, I could easily solve the problem at hand. And, moreover, there are some very simple problems which I can solve directly.
The recursive solution is based on using a simpler (smaller, fewer, whatever) problem to solve the problem at hand. How do I find out which pairs of objects in a set of objects collide?
If the set has fewer than 2 elements, there are no pairs. That's the simplest problem, and it has an obvious solution: the empty set.
Otherwise, I select some object. All colliding pairs either include this object, or they don't. So that gives me two subproblems.
The set of collisions which don't involve the selected object is obviously the same problem which I started with, but with a smaller set. So I've replaced this part of the problem with a smaller problem. That's one recursion.
But I also need the set of objects which the selected object collides with (which might be an empty set). That's a simpler problem, because now one element of each pair is known. I can solve that problem recursively as well:
I need the set of pairs which include the object X and a set S of objects.
If the set is empty, there are no pairs. Simple.
Otherwise, I choose some element from the set. Then I find all the collisions between X and the rest of the set (a simpler but otherwise identical problem).
If there is a collision between X and the selected element, I add that to the set I just found.
Then I return the set.
Technically speaking, you have the right mindset of how recursion works.
Practically speaking, you would not want to use recursion for an instance such as the one you described above. Reasons being is that every recursive call adds to the stack (which is finite in size), and recursive calls are expensive on the processor, with enough objects you are going to run into some serious bottle-necking on a large application. With enough recursive calls, you would result with a stack overflow, which is exactly what you would get in "infinite recursion". You never want something to infinitely recurse; it goes against the fundamental principal of recursion.
Recursion works on two defining characteristics:
A base case can be defined: It is possible to eventually reach 0 or 1 depending on your necessity
A general case can be defined: The general case is continually called, reducing the problem set until your base case is reached.
Once you have defined both cases, you can define a recursive solution.
The point of recursion is to take a very large and difficult-to-solve problem and continually break it down until it's easy to work with.
Once our base case is reached, the methods "recurse-out". This means they bounce backwards, back into the function that called it, bringing all the data from the functions below it!
It is at this point that our operations actually occur.
Once the original function is reached, we have our final result.
For example, let's say you want the summation of the first 3 integers. The first recursive call is passed the number 3.
public factorial(num) {
//Base case
if (num == 1) {
return 1;
}
//General case
return num + factorial(num-1);
}
Walking through the function calls:
factorial(3); //Initial function call
//Becomes..
factorial(1) + factorial(2) + factorial(3) = returned value
This gives us a result of 6!
Your scenario seems to me like iterative programming, but your function is simply calling itself as a way of continuing its comparisons. That is simply re-tasking your function to be able to call itself with a smaller list.
In my experience, a recursive function has more potential to branch out into multiple 'threads' (so to speak), and is used to process information the same way the hierarchy in a company works for delegation; The boss hands a contract down to the managers, who divide up the work and hand it to their respective staff, the staff get it done, and had it back to the managers, who report back to the boss.
The best example of a recursive function is one that iterates through all files on a file system. ( I will do this in pseudo code because it works in all languages).
function find_all_files (directory_name)
{
- Check the given directory name for sub-directories within it
- for each sub-directory
find_all_files(directory_name + subdirectory_name)
- Check the given directory for files
- Do your processing of the filename; it is located at directory_name + filename
}
You use the function by calling it with a directory path as the parameter. The first thing it does is, for each subdirectory, it generates a value of the actual path to the subdirectory and uses it as a value to call find_all_files() with. As long as there are sub-directories in the given directory, it will keep calling itself.
Now, when the function reaches a directory that contains only files, it is allowed to proceed to the part where it process the files. Once done that, it exits, and returns to the previous instance of itself that is iterating through directories.
It continues to process directories and files until it has completed all iterations and returns to the main program flow where you called the original instance of find_all_files in the first place.
One additional note: Sometimes global variables can be handy with recursive functions. If your function is merely searching for the first occurrence of something, you can set an "exit" variable as a flag to "stop what you are doing now!". You simply add checks for the flag's status during any iterations you have going on inside the function (as in the example, the iteration through all the sub-directories). Then, when the flag is set, your function just exits. Since the flag is global, all generations of the function will exit and return to the main flow.

How to elegantly compute the anagram signature of a word in ruby?

Arising out of this question, I'm looking for an elegant (ruby) way to compute the word signature suggested in this answer.
The idea suggested is to sort the letters in the word, and also run length encode repeated letters. So, for example "mississippi" first becomes "iiiimppssss", and then could be further shortened by encoding as "4impp4s".
I'm relatively new to ruby and though I could hack something together, I'm sure this is a one liner for somebody with more experience of ruby. I'd be interested to see people's approaches and improve my ruby knowledge.
edit: to clarify, performance of computing the signature doesn't much matter for my application. I'm looking to compute the signature so I can store it with each word in a large database of words (450K words), then query for words which have the same signature (i.e. all anagrams of a given word, that are actual english words). Hence the focus on space. The 'elegant' part is just to satisfy my curiosity.
The fastest way to create a sorted list of the letters is this:
"mississippi".unpack("c*").sort.pack("c*")
It is quite a bit faster than split('') and join(). For comparison it is also best to pack the array back together into a String, so you dont have to compare arrays.
I'm not much of a Ruby person either, but as I noted on the other comment this seems to work for the algorithm described.
s = "mississippi"
s.split('').sort.join.gsub(/(.)\1{2,}/) { |s| s.length.to_s + s[0,1] }
Of course, you'll want to make sure the word is lowercase, doesn't contain numbers, etc.
As requested, I'll try to explain the code. Please forgive me if I don't get all of the Ruby or reg ex terminology correct, but here goes.
I think the split/sort/join part is pretty straightforward. The interesting part for me starts at the call to gsub. This will replace a substring that matches the regular expression with the return value from the block that follows it. The reg ex finds any character and creates a backreference. That's the "(.)" part. Then, we continue the matching process using the backreference "\1" that evaluates to whatever character was found by the first part of the match. We want that character to be found a minimum of two more times for a total minimum number of occurrences of three. This is done using the quantifier "{2,}".
If a match is found, the matching substring is then passed to the next block of code as an argument thanks to the "|s|" part. Finally, we use the string equivalent of the matching substring's length and append to it whatever character makes up that substring (they should all be the same) and return the concatenated value. The returned value replaces the original matching substring. The whole process continues until nothing is left to match since it's a global substitution on the original string.
I apologize if that's confusing. As is often the case, it's easier for me to visualize the solution than to explain it clearly.
I don't see an elegant solution. You could use the split message to get the characters into an array, but then once you've sorted the list I don't see a nice linear-time concatenate primitive to get back to a string. I'm surprised.
Incidentally, run-length encoding is almost certainly a waste of time. I'd have to see some very impressive measurements before I'd think it worth considering. If you avoid run-length encoding, you can anagrammatize any string, not just a string of letters. And if you know you have only letters and are trying to save space, you can pack them 5 bits to a letter.
---Irma Vep
EDIT: the other poster found join which I missed. Nice.

Resources