This question already has answers here:
Get all 1-k tuples in a n-tuple
(3 answers)
Closed 8 years ago.
Given a binary string or a binary number(one is free to take it in any way), I need to find out the next smaller binary number but retaining the number of 0s and 1s in the original binary string or number.
For e.g.
If the given binary number or string was 11100000, the required output would be 11010000.
If the given binary number or string was 11010000, the required output would be 11001000.
Of course, I can do this with Brute Force approach. But I needed a better solution. What could be an optimal way of doing it? I was wondering if someone can help me reach a solution to this in O(1) using bit wise operations.
This is an elaboration on Setzer22's answer, which was close but which lacked one vital piece.
FindNextSmallestWithSameNumberOfBits(string[1...n])
1. for i = n - 1 to 1 do
2. if string[i+1] = 0 and string[i] = 1 then
3. string[i] := 0
4. string[i+1] := 1
5. sort(string[i+2...n], descending)
6. return string[1...n]
7. return "no solution"
This is an O(n) algorithm, which is a provably optimal asymptotic bound for this problem when the input size is unrestricted; while this is "bitwise" in the sense that it operates on bits, it clearly doesn't use what one would typically think of as "bitwise operations." Luckily, for inputs which can be of arbitrary length, there can be no asymptotic advantage to using traditional "bitwise operations" over this method. For inputs of fixed length, to which asymptotic analysis does not readily apply, one might do better using a technique such as those linked to by Asuka in the other answer to this question.
Note, based on comments, that sorting on line 5 can be replaced with simply reversing the string. The reason for this is that this substring is guaranteed to be of the form 0...01...1 (that is, any 0s to the left of any 1s) since, if it weren't, we'd have already found an occurrence of the string 10 and satisfied the condition on line 2.
The key that was missing in Setzer22's answer is that, once you move the rightmost 1 with a 0 to the right of it to the right, you then need to left-shift all the 1s that are even further right as far left as they will go. The reason for this is that the 1 bit shifted to the right is more significant than the bits to the right of it, so left-shifting any 1s which are less significant will give a larger number, but not large enough to undo the effect of reducing the more significant bit.
Clarification based on comments: notice that in line 7 of the pseudocode presented above, it's possible that the algorithm won't return a valid string. The reason for this is that, sometimes, there is no string with the same number of 1s which represents a smaller number. This occurs if and only if the string 01 does not appear as a substring in the input string (in which case the condition on line 2 is never satisfied).
This isn't the clearest explanation of all time, so please let me know if it needs more work. Here's an example:
10011 // input
01011 // right-shift the right-most 1 bit with a 0 to the right of it
01110 // left-shift all 1 bits to the right of the right-shifted as far as possible
1010100011 // input
1010010011 // right-shift the right-most 1 bit with a 0 to the right of it
1010011100 // left-shift all 1 bits to the right of the right-shifted bit as far as possible
One way to clarify this which just occurred to me: right-shifting the 1 bit guarantees that the result will be smaller than the original number; left-shifting the 1s to the right guarantees that the result will no smaller than is necessary.
May be this is what you are finding:
https://github.com/hcs0/Hackers-Delight/blob/master/snoob.c.txt
The functions snoob(), snoob1(), snoob2(), snoob3(), snoob4() and next_set_of_n_elements() are various implementations.
These functions are helper functions which are called by the above functions:
ntz() stands for "number of trailing zeros"
nlz() stands for "number of leading zeros"
pop() stands for "population count" (number of bit set (number of "1"s) in the string)
This is very efficient but only works on fix size integers (eg 32-bit, 64-bit).
Related
Which way should I follow to create an algorithm to find out whether fibonacci sequence exists in a given string ?
The string includes only digits with no whitespaces and there may be more than one sequence, I need to find all of them.
If as your comment says the first number must have less than 6 digits, you can simply search for all positions there one of the 25 fibonacci numbers (there are only 25 with less than 6 digits) and than try to expand this 1 number sequence in both directions.
After your update:
You can even speed things up when you are only looking for sequences of at least 3 numbers.
Prebuild all 25 3-number-Strings that start with one of the 25 first fibonnaci-numbers this should give much less matches than the search for the single fibonacci-numbers I suggested above.
Than search for them (like described above and try to expand the found 3-number-sequences).
here's how I would approach this.
The main algorithm could search for triplets then try to extend them to as long a sequence as possible.
This leaves us with the subproblem of finding triplets. So if you are scanning through a string to look for fibonacci numbers, one thing you can take advantage of is that the next number must have the same number of digits or one more digit.
e.g. if you have the string "987159725844" and are considering "[987]159725844" then the next thing you need to look at is "987[159]725844" and "987[1597]25844". Then the next part you would find is "[2584]4" or "[25844]".
Once you have the 3 numbers you can check if they form an arithmetic progression with C - B == B - A. If they do you can now check if they are from the fibonacci sequence by seeing if the ratio is roughly 1.6 and then running the fibonacci iteration backwards down to the initial conditions 1,1.
The overall algorithm would then work by scanning through looking for all triples starting with width 1, then width 2, width 3 up to 6.
I'd say you should first find all interesting Fibonacci items (which, having 6 or less digits, are no more than 30) and store them into an array.
Then, loop every position in your input string, and try to find upon there the longest possible Fibonacci number (that is, you must browse the array backwards).
If some Fib number is found, then you must bifurcate to a secondary algorithm, consisting of merely going through the array from current position to the end, trying to match every item in the following substring. When the matching ends, you must get back to the main algorithm to keep searching in the input string from the current position.
None of these two algorithms is recursive, nor too expensive.
update
Ok. If no tables are allowed, you could still use this approach replacing in the first loop the way to get the bext Fibo number: Instead of indexing, apply your formula.
Given is an array of integers. Each number in the array repeats an ODD number of times, but only 1 number is repeated for an EVEN number of times. Find that number.
I was thinking a hash map, with each element's count. It requires O(n) space. Is there a better way?
Hash-map is fine, but all you need to store is each element's count modulo 2.
All of those will end up being 1 (odd) except for the 0 (even) -count element.
(As Aleks G says you don't need to use arithmetic (count++ %2), only xor (count ^= 0x1); although any compiler will optimize that anyway.)
I don't know what the intended meaning of "repeat" is, but if there is an even number of occurrences of (all-1) numbers, and an odd number of occurances for only one number, then XOR should do the trick.
You don't need to keep the number of times each element is found - just whether it's even or odd number of time - so you should be ok with 1 bit for each element. Start with 0 for each element, then flip the corresponding bit when you encounter the element. Next time you encounter it, flip the bit again. At the end, just check which bit is 1.
If all numbers are repeated even times and one number repeats odd times, if you XOR all of the numbers, the odd count repeated number can be found.
By your current statement I think hashmap is good idea, but I'll think about it to find a better way. (I say this for positive integers.)
Apparently there is a solution in O(n) time and O(1) space, since it was asked at a software engineer company with this constraint explictly. See here : Bing interview question -- it seems to be doable using XOR over numbers in the array. Good luck ! :)
I asked whether this problem was NP-complete on the Computer Science forum, but asking for programming heuristics seems better suited for this site. So here it goes.
You are given an NxN grid of unit squares and 2N binary strings of length N. The goal is to fill the grid with 0's and 1's so that each string appears once and only once in the grid, either horizontally (left to right) or vertically (top down). Or determine that no such solution exists. If N is not fixed I suspect this is an NP-complete problem. However are there any heuristics that can hopefully speed up the search to faster than brute force trying all ways to fill in the grid with N vertical strings?
I remember programming this for my friend that had the 5x5 physical version of this game, but I used brute force back then. I can only think of this heuristic:
Consider a 4x4 map with these 8 strings (read each from left to right):
1 1 0 1
1 0 0 1
1 0 1 1
1 0 1 0
1 1 1 1
1 0 0 0
0 0 1 1
1 1 1 0
(Note that this is already solved, since the second 4 is the first 4 transposed)
First attempt:
We will choose columns from left to right. Since 7 of 8 strings start with 1, we will try to put the one with most 1s to the first column (so that we can lay rows more easily when columns are done).
In the second column, most string have 0, so you can also try putting a string with most zeros to the second row, and so on.
This i would call a wide-1 prediction, since it only looks at one column at a time
(Possible) Improvement:
You can look at 2 columns at a time (a wide-2 prediction, if i may call it like that). In this case, from the 8 strings, the most common combination of first two bits is 10 (5/8), so you would like to choose first two columns so the the combination 10 occurring as much as possible (in this case, 1111 followed by 1000 has 3 of 4 10 at start).
(Of course you don't have to stop at 2)
Weaknesses:
I don't know if this would work. I just made it up and thought it might work.
If you choose to he wide-X prediction, the number of possibilities is exponential with X
This can absolutely fail if the distribution of combinations if even.
What you can do:
As i said, this game has physical 5x5 adaptation, only there you can also lay the string from right-to-left and bottom-to-top, if you found that name, you could google further. I unfortunately don't remember it.
Sounds like you want the crossword grid filling algorithm:
First, build 2N subsets of your 2N strings -- each subset has all the strings with a particular bit at a particular postion. So subset(0,3) is all the strings that have a 0 in the 3rd position and subset(1,5) is all the strings that have a 1 in the 5th position.
The algorithm is a basic brute-force depth fist search trying all possible mappings of strings to slots in the grid, with severe pruning of impossible branches
Your search state is a set of assignments of strings to slots and a set of sets of possible assignments to the remaining slots. The initial state has 0 assignments and 2N sets, all of which contain all 2N strings.
At each step of the search, pick the most constrained set (the set with the fewest elements) from the set of possible sets. Try each element of the set in turn in that slot (adding it to the assigments and removing it from the set of sets), and constrain all the remaining sets of sets by removing the chosen string and intersecting the crossing sets with subset(X,N) (computed in step 1) where X is the bit from the chosen string and N is the row/column number of the chosen string
If you find an empty set when picking above, there is no solution with the choices so far, so backtrack up the tree to a different choice
This is still EXPTIME, but it is about as fast as you can get it. Since the main time consuming step is the set intersections, using 2N bit binary strings for your set representation is very fast -- for N=32, the sets fit in a 64-bit word and can be intersected with a single AND instruction. It also helps to have a POPCOUNT instruction, since you also need set sizes.
This can be solved as a 0/1 integer linear program with O(N^2) variables and constraints. First there are variables Xij which are 1 if string i is assigned to line j (where j=1 to N are rows and j = (N+1) to 2N are columns). Then there is a variable for each square in the grid, which indicates if the entry is 0 or 1. If the position of the square is (i,j) with variable Yij then the sum of all X variables for line j that correspond to strings that have a 1 in position i is equal to Yij, and the sum of all X variables for line j that correspond to strings that have a 0 in position i is equal to (1 - Yij). And similarly for line i and position j. Finally, the sum of all X variables Xij for each string i (summed over all lines j) is equal to 1.
There has been a lot of research in speeding up solvers for 0/1 integer programming so this may be able to often handle fairly large N (like N=100) for many examples. Also, in some cases, solving the relaxed non-integer linear program and rounding the solution off to 0/1 may produce a valid solution, in polynomial time.
We could choose the first lg 2N rows out of the 2N strings, and then since 2^(lg 2N) = 2N, in a lot of cases there shouldn't be very many ways to assign the N columns so that the prefixes of length lg 2N are respected. Then all the rows are filled in so they can be checked to see if a solution has been found. We can also try assigning more rows in the beginning, and fill in different combinations of rows besides the initial rows. (e.g. we can try filling in contiguous rows starting anywhere in the grid).
Running time for assigning lg 2N rows out of 2N strings is O((2N)^(lg 2N)) = O(2^((lg 2N)^2)), which grows slower than 2^N. Assigning columns to match the prefixes is the part that's the hardest to predict run time. If a prefix occurs K times among the assigned rows, and there are M remaining strings that have the prefix, then the number of assignments for this prefix is M*(M-1)...(M-K+1). The total number of possible column assignments is the product of these terms over all prefixes that occur among the rows. If this gets to be too large, the number of rows initially assigned can be increased. But it's hard to predict the worst-case run time unless an assumption is made like the NxN grid is filled in randomly.
Is is possible to count the distinct digits in a number in constant time O(1)?
Suppose n=1519 output should be 3 as there are 3 distinct digits(1,5,9).
I have done it in O(N) time but anyone knows how to find it in O(1) time?
I assume N is the number of digits of n. If the size of n is unlimited, it can't be done in general in O(1) time.
Consider the number n=11111...111, with 2 trillion digits. If I switch one of the digits from a 1 to a 2, there is no way to discover this without in some way looking at every single digit. Thus processing a number with 2 trillion digits must take (of the order of) 2 trillion operations at least, and in general, a number with N digits must take (of the order of) N operations at least.
However, for almost all numbers, the simple O(N) algorithm finishes very quickly because you can just stop as soon as you get to 10 distinct digits. Almost all numbers of sufficient length will have all 10 digits: e.g. the probability of not terminating with the answer '10' after looking at the first 100 digits is about 0.00027, and after the first 1000 digits it's about 1.7e-45. But unfortunately, there are some oddities which make the worst case O(N).
After seeing that someone really posted a serious answer to this question, I'd rather repeat my own cheat here, which is a special case of the answer described by #SimonNickerson:
O(1) is not possible, unless you are on radix 2, because that way, every number other than 0 has both 1 and 0, and thus my "solution" works not only for integers...
EDIT
How about 2^k - 1? Isn't that all 1s?
Drat! True... I should have known that when something seems so easy, it is flawed somehow... If I got the all 0 case covered, I should have covered the all 1 case too.
Luckily this case can be tested quite quickly (if addition and bitwise AND are considered an O(1) operation): if x is the number to be tested, compute y this way: y=(x+1) AND x. If y=0, then x=2^k - 1. because this is the only case when all the bits needed to be flipped by the addition. Of course, this is quite a bit flawed, as with bit lengths exceeding the bus width, the bitwise operators are not O(1) anymore, but rather O(N).
At the same time, I think it can be brought down to O(logN), by breaking the number into bus width size chunks, and AND-ing together the neighboring ones, repeating until only one is left: if there were no 0s in the number tested, the last one will be full 1s too...
EDIT2: I was wrong... This is still O(N).
I'm trying to solve this problem on SPOJ : http://www.spoj.pl/problems/EDIT/
I'm trying to get a decent recursive description of the algorithm, but I'm failing as my thoughts keep spinning in circles! Can you guys help me out with this one? I'll try to describe what approach I'm trying to solve this.
Basically I want to solve a problem of size j-i where i is the starting index and j is the ending index. Now, there should be two cases. If j-i is even then both the starting and the ending letters have to be the same case, and they have to be the opposite case when j-i is odd. I also want to reduce the problem of a lower size (j-i-1 or j-i-2), but I feel that if I know a solution to a smaller problem, then constructing a solution of a just bigger problem should also take into account the starting and ending letter cases of the smaller problem. This is exactly where I'm getting confused. Can you guys put my thoughts on the right track?
I think recursion is not the best way to go with this problem. It can be solved quite fast if we take a different approach!
Let us consider binary strings. Say an uppercase char is 1 and a lowercase one is 0. For example
AaAaB -> 10101
ABaa -> 1100
a -> 0
a "correct" alternating chain is either 10101010.. or 010101010..
We call the minimum number of substitutions required to change one string into the other the Hamming distance between the strings. What we have to find is the minimum Hamming distance between the input binary string and one of the two alternating chains of the same length.
It's not difficult: we XOR each string and then count the number of 1s. (link). For example, let's consider the following string: ABaa.
We convert it in binary:
ABaa -> 1100
We generate the only two alternating chains of length 4:
1010
0101
We XOR them with the input:
1100 XOR 1010 = 0101
1100 XOR 0101 = 1010
We count the 1s in the results and take the minimum. In this case, it's 2.
I coded this procedure in Java with some minor optimization (buffered I/O, no real need to generate the alternating chains) and it got accepted: (0.60 seconds one).
Given any string s of length n, there are only two possible "alternating chain".
This 2 variants can be defined sequentially by settings the first letter state (if first is upper then second is lower, third is upper...).
A simple linear algorithm would be to make 2 simple assumptions about the first letter:
First letter is UpperCase
First letter is LowerCase
For each assumption, run a simple edit distance algorithm and you are done.
You can do it recursively, but you'll need to pass and return a lot of state information between functions, which I think is not worthwhile when this problem can be solved by a simple loop.
As the others say, there are two possible "desired result" strings: one starts with an uppercase letter (let's call it result_U) and one starts with a lowercase letter (result_L). We want the smaller of EditDistance(input, result_U) and EditDistance(input, result_L).
Also observe that, to calculate EditDistance(input, result_U), we do not need to generate result_U, we just need to scan input 1 character at a time, and each character that is not the expected case will need 1 edit to make it the correct case, i.e. adds 1 to the edit distance. Ditto for EditDistance(input, result_L).
Also, we can combine the two loops so that we scan input only once. In fact, this can be done while reading each input string.
A naive approach would look like this:
Pseudocode:
EditDistance_U = 0
EditDistance_L = 0
Read a character
To arrive at result_U, does this character need editing?
Yes => EditDistance_U += 1
No => Do nothing
To arrive at result_L, does this character need editing?
Yes => EditDistance_L += 1
No => Do nothing
Loop until end of string
EditDistance = min(EditDistance_U, EditDistance_L)
There are obvious optimizations that can be done to the above also, but I'll leave it to you.
Hint 1: Do we really need 2 conditionals in the loop? How are they related to each other?
Hint 2: What is EditDistance_U + EditDistance_L?