When to use backtracking? - algorithm

Sometimes I see a problem and I'm not sure whether my first instinct should be to brute force all possible solutions + memoize or try to solve the problem intelligently. For example, this problem:
Given a positive integer n and you can do operations as follow:
If n is even, replace n with n/2. If n is odd, you can replace n with
either n + 1 or n - 1. What is the minimum number of replacements
needed for n to become 1?
How does one know not to code something that tries every single option for any given n (for example: 1 + [func(n+1), func(n-1), func(n/2)].min instead of looking for a more intelligent solution? How do you know to have that instinct?

Related

what does 2n and m stands for?

Suppose we need at most m guesses for an array of length n. Then, for an array of length 2n, the first guess cuts the reasonable portion of the array down to size n, and at most m guesses finish up, giving us a total of at most m+1, plus, 1 guesses.
What is the meaning of array of length 2n and m in this statement, what does it denote? i've also seen other symbols like o(n).
(Plus,does anyone know good reading materials to learn about these symbols)
The symbols m and n don't have general definitions; rather, they're available to be used in whatever way is needed when discussing a given problem, and their problem-specific definitions generally need to be given at that time.
In your quotation, the first sentence:
Suppose we need at most m guesses for an array of length n.
is actually the definition of m and n. If you prefer, you could read it like this:
For any given array, we need a certain number of guesses. Let n be a certain length, and let m be the greatest number of guesses we need for any array of length n.
They're not special symbols. Just generic letters... N (or X) are very common for denoting an unknown amount. M (or Y) usually refer to the next number in the problem.
M is the number of guesses, as the problem says
2N is 2 times N, or double the length of the array
Your question is quite broad. But using the context you are given, let me give you an example where this is used.
Suppose we are searching an array using Merge sort ( if you are new to merge sort read here https://www.geeksforgeeks.org/merge-sort/ ).
There will be m comparisons(m guesses) to find the specified element in the array.
When we double the number of elements in the array( in your case 2n) it will take m+1 comparisons (m+1 guesses) at most.

Solve using either master theorem or by expansion

I have two questions, which I have trying but unable to figure them out.
1) 𝑇(𝑛) = 𝑇(𝑛 − 1) + 𝑛^4
2) 𝑇(𝑛) = 2𝑇 (𝑛/2) + 𝑛 lg 𝑛
For first one, I am assuming substitution (am I correct?), and got kb + T(n-k). Pretty sure that's wrong so need help with it.
For the second one, I have no idea at all...
Any help would be great! Thanks!
1) So you got
...? I don't know how you obtained this but it's certainly incorrect.
This is basically the summation of the 4th power of all integers up to n. The standard formula for this is:
2) We can find a pattern if we keep expanding this:
The limit log n - 1 is because we keep dividing the parameter to T by 2, so the substitution as above can continue for log n lines until, say T(1) or wherever the stopping condition is. Continuing using the logarithm rules (google them if you don't know):
Both summations have log n terms. Since the 1st summation does not depend on i at all, we simply multiply by log n. The 2nd summation is given by a standard formula for summation of integers from 1 (or 0, doesn't matter in this case):

Determine the value of M and does M depends on k?

Here is an exercise I'm struggling with:
One way to improve the performance of QuickSort is to switch to
InsertionSort when a subfile has <= M elements instead of recursively calling itself.
Implement a recursive QuickSort with a cutoff to InsertionSort for subfiles with M or less elements. Empirically determine the value of M for which it performs fewest key comparisons on inputs of 60000 random natural numbers less than K for K = 10,100,1000, 10000, 100000, 1000000. Does the optimal value M depend on K?
My issues:
I would like to know whether the value of M differs from statement 1 and statement 3. If so, what would be the array size, and how to vary the random numbers ? How to compare M and K? Do i have any mathematical equation or i should it just do it using my code ?
Implement the sort algoritm as requested.
Add support for recording the number of comparisons (e.g. increment a global)
Generate 5 sets of input data for each k. So 30 files with 1,800,000 lines in total.
Run the sort on every set for every K and guess M a couple of times. Start with the low-valued inputs and make the favorable M guide your guesses as you progress towards high-valued inputs.
Describe your observations about the influence of M over K.
Pass the exercise like a pro

Count permutations of N A's and B's

Given N "A"s and N "B"s. We need to arrange them such that if A is on left of B then our solution increase by +1 and also we need to maintain that at every point the number of A's must be greater than or equal to number of B's.
Now we need to count such permutations of these 2*N alphabets such that solution is equal to K every time.
Example : Say N=4 and K=2 then here answer is 6.
6 possible ways are :
ABAAABBB
AABBAABB
AABAABBB
AAABABBB
AAABBABB
AAABBBAB
My approach : I am having O(N*K) approach to solve this problem by doing brute type solution.
Make a function say F(last,Acount,K) and then check if we can place A or B their satisfying the conditions.But problem is N and K can be large .
So how to solve this problem
Short answer: Narayana numbers is what you need.
I've spent whole saturday researching your problem. Thanks for an interesting question :) Here I would like to tell you my steps to solve it if you are interested.
You said that you can brute force the problem for a small inputs. For an algorithmic problems this ofthen might give you such an insight for better solution (e.g. if you find out some relations and guess the generating function). So I did brute force too (with dynamic programming aproach) and looked at given numbers. First time I wrote it to my output file wrong way (for every K for all N one after another in one column) so I didn't find some beatiful relations. That's why I didn't find answer that time and spent some more to get better reccurent function.
Then I tried to deduce it to nonreccurent function but failed (I think because it wasn't the best one possible). However doing this I found relations like Pascal triangle has (kind of symmetry) and then wrote numbers sequence to the output file in a triangle way.
It looked like very nice sequence so I just googled some of its numbers and found links to Narayana numbers.
Then I tried to deduce nonreccurent formula with no success again (now I could check its correctness because the sequence formula is already knonwn) :) So this time I don't know exactly how Narayana numbers' formula is given although the formula seems simple and small but you can search more if you are interested. I wish I go deep more but I'm afraid of spending the whole holidays for it :)
Here my brute force solution #1 and better reccurent solution #2. And here the outputs I've got in a triangle way (word wrap makes it ugly so be careful).
Also, thanks DavidEisenstat for pointing the right way in comment to your question. I checked that link but didn't go further googling to find a solution. Although that would be the shortest path to the answer your question.
UPDATE I didn't provide implementation of fast calculation of combinations. There're different ways to do this which you can simply find in the internet including its implementations. The complexity of calculations would be O(N) so you will pass the execution time bounds you have. Or if you need to answer the query (with different N and K) multiple times, you can precalculate factorial for every number from 1 to N modulo (1e6 + 9) and then use them to calculate combinations with O(1).
Now that there's a spoiler, let me expand on my comment.
Given a string with n ('s and n )'s, append a ). Exactly one of its n + 1 rotations ending in ) begins with a length-2n parenthesized expression. This is one way to prove the Catalan formula
C(n) = (2n choose n) / (n + 1).
For this problem, we want to count only the parenthesized expressions that have k occurrences of (). The number of ordered partitions of n elements into k nonempty parts is (n - 1) choose (k - 1), by choosing k - 1 positions out of n - 1 to insert part boundaries. By partitioning n + 1 into k nonempty parts also, we can count the number of length-(2n + 1) strings that begin with (, end with ), and have n ('s and n + 1 )'s and k ()'s as
((n - 1) choose (k - 1)) (n choose (k - 1)).
Of these strings, exactly k rotations result in strings of the same form, hence the division by k to get the number that are parenthesized properly. (There is no rotational symmetry because if k > 1 then k fails to divide n or n + 1.) The final answer is
((n - 1) choose (k - 1)) (n choose (k - 1)) / k.
To compute the binomial coefficients quickly, cache the factorials and their inverses modulo the modulus up to the maximum value of n.

Create a random permutation of 1..N in constant space

I am looking to enumerate a random permutation of the numbers 1..N in fixed space. This means that I cannot store all numbers in a list. The reason for that is that N can be very large, more than available memory. I still want to be able to walk through such a permutation of numbers one at a time, visiting each number exactly once.
I know this can be done for certain N: Many random number generators cycle through their whole state space randomly, but entirely. A good random number generator with state size of 32 bit will emit a permutation of the numbers 0..(2^32)-1. Every number exactly once.
I want to get to pick N to be any number at all and not be constrained to powers of 2 for example. Is there an algorithm for this?
The easiest way is probably to just create a full-range PRNG for a larger range than you care about, and when it generates a number larger than you want, just throw it away and get the next one.
Another possibility that's pretty much a variation of the same would be to use a linear feedback shift register (LFSR) to generate the numbers in the first place. This has a couple of advantages: first of all, an LFSR is probably a bit faster than most PRNGs. Second, it is (I believe) a bit easier to engineer an LFSR that produces numbers close to the range you want, and still be sure it cycles through the numbers in its range in (pseudo)random order, without any repetitions.
Without spending a lot of time on the details, the math behind LFSRs has been studied quite thoroughly. Producing one that runs through all the numbers in its range without repetition simply requires choosing a set of "taps" that correspond to an irreducible polynomial. If you don't want to search for that yourself, it's pretty easy to find tables of known ones for almost any reasonable size (e.g., doing a quick look, the wikipedia article lists them for size up to 19 bits).
If memory serves, there's at least one irreducible polynomial of ever possible bit size. That translates to the fact that in the worst case you can create a generator that has roughly twice the range you need, so on average you're throwing away (roughly) every other number you generate. Given the speed an LFSR, I'd guess you can do that and still maintain quite acceptable speed.
One way to do it would be
Find a prime p larger than N, preferably not much larger.
Find a primitive root of unity g modulo p, that is, a number 1 < g < p such that g^k ≡ 1 (mod p) if and only if k is a multiple of p-1.
Go through g^k (mod p) for k = 1, 2, ..., ignoring the values that are larger than N.
For every prime p, there are φ(p-1) primitive roots of unity, so it works. However, it may take a while to find one. Finding a suitable prime is much easier in general.
For finding a primitive root, I know nothing substantially better than trial and error, but one can increase the probability of a fast find by choosing the prime p appropriately.
Since the number of primitive roots is φ(p-1), if one randomly chooses r in the range from 1 to p-1, the expected number of tries until one finds a primitive root is (p-1)/φ(p-1), hence one should choose p so that φ(p-1) is relatively large, that means that p-1 must have few distinct prime divisors (and preferably only large ones, except for the factor 2).
Instead of randomly choosing, one can also try in sequence whether 2, 3, 5, 6, 7, 10, ... is a primitive root, of course skipping perfect powers (or not, they are in general quickly eliminated), that should not affect the number of tries needed greatly.
So it boils down to checking whether a number x is a primitive root modulo p. If p-1 = q^a * r^b * s^c * ... with distinct primes q, r, s, ..., x is a primitive root if and only if
x^((p-1)/q) % p != 1
x^((p-1)/r) % p != 1
x^((p-1)/s) % p != 1
...
thus one needs a decent modular exponentiation (exponentiation by repeated squaring lends itself well for that, reducing by the modulus on each step). And a good method to find the prime factor decomposition of p-1. Note, however, that even naive trial division would be only O(√p), while the generation of the permutation is Θ(p), so it's not paramount that the factorisation is optimal.
Another way to do this is with a block cipher; see this blog post for details.
The blog posts links to the paper Ciphers with Arbitrary Finite Domains which contains a bunch of solutions.
Consider the prime 3. To fully express all possible outputs, think of it this way...
bias + step mod prime
The bias is just an offset bias. step is an accumulator (if it's 1 for example, it would just be 0, 1, 2 in sequence, while 2 would result in 0, 2, 4) and prime is the prime number we want to generate the permutations against.
For example. A simple sequence of 0, 1, 2 would be...
0 + 0 mod 3 = 0
0 + 1 mod 3 = 1
0 + 2 mod 3 = 2
Modifying a couple of those variables for a second, we'll take bias of 1 and step of 2 (just for illustration)...
1 + 2 mod 3 = 0
1 + 4 mod 3 = 2
1 + 6 mod 3 = 1
You'll note that we produced an entirely different sequence. No number within the set repeats itself and all numbers are represented (it's bijective). Each unique combination of offset and bias will result in one of prime! possible permutations of the set. In the case of a prime of 3 you'll see that there are 6 different possible permuations:
0,1,2
0,2,1
1,0,2
1,2,0
2,0,1
2,1,0
If you do the math on the variables above you'll not that it results in the same information requirements...
1/3! = 1/6 = 1.66..
... vs...
1/3 (bias) * 1/2 (step) => 1/6 = 1.66..
Restrictions are simple, bias must be within 0..P-1 and step must be within 1..P-1 (I have been functionally just been using 0..P-2 and adding 1 on arithmetic in my own work). Other than that, it works with all prime numbers no matter how large and will permutate all possible unique sets of them without the need for memory beyond a couple of integers (each technically requiring slightly less bits than the prime itself).
Note carefully that this generator is not meant to be used to generate sets that are not prime in number. It's entirely possible to do so, but not recommended for security sensitive purposes as it would introduce a timing attack.
That said, if you would like to use this method to generate a set sequence that is not a prime, you have two choices.
First (and the simplest/cheapest), pick the prime number just larger than the set size you're looking for and have your generator simply discard anything that doesn't belong. Once more, danger, this is a very bad idea if this is a security sensitive application.
Second (by far the most complicated and costly), you can recognize that all numbers are composed of prime numbers and create multiple generators that then produce a product for each element in the set. In other words, an n of 6 would involve all possible prime generators that could match 6 (in this case, 2 and 3), multiplied in sequence. This is both expensive (although mathematically more elegant) as well as also introducing a timing attack so it's even less recommended.
Lastly, if you need a generator for bias and or step... why don't you use another of the same family :). Suddenly you're extremely close to creating true simple-random-samples (which is not easy usually).
The fundamental weakness of LCGs (x=(x*m+c)%b style generators) is useful here.
If the generator is properly formed then x%f is also a repeating sequence of all values lower than f (provided f if a factor of b).
Since bis usually a power of 2 this means that you can take a 32-bit generator and reduce it to an n-bit generator by masking off the top bits and it will have the same full-range property.
This means that you can reduce the number of discard values to be fewer than N by choosing an appropriate mask.
Unfortunately LCG Is a poor generator for exactly the same reason as given above.
Also, this has exactly the same weakness as I noted in a comment on #JerryCoffin's answer. It will always produce the same sequence and the only thing the seed controls is where to start in that sequence.
Here's some SageMath code that should generate a random permutation the way Daniel Fischer suggested:
def random_safe_prime(lbound):
while True:
q = random_prime(lbound, lbound=lbound // 2)
p = 2 * q + 1
if is_prime(p):
return p, q
def random_permutation(n):
p, q = random_safe_prime(n + 2)
while True:
r = randint(2, p - 1)
if pow(r, 2, p) != 1 and pow(r, q, p) != 1:
i = 1
while True:
x = pow(r, i, p)
if x == 1:
return
if 0 <= x - 2 < n:
yield x - 2
i += 1

Resources