Distinct digit count

Distinct digit count - algorithm

Is is possible to count the distinct digits in a number in constant time O(1)?
Suppose n=1519 output should be 3 as there are 3 distinct digits(1,5,9).
I have done it in O(N) time but anyone knows how to find it in O(1) time?

I assume N is the number of digits of n. If the size of n is unlimited, it can't be done in general in O(1) time.
Consider the number n=11111...111, with 2 trillion digits. If I switch one of the digits from a 1 to a 2, there is no way to discover this without in some way looking at every single digit. Thus processing a number with 2 trillion digits must take (of the order of) 2 trillion operations at least, and in general, a number with N digits must take (of the order of) N operations at least.
However, for almost all numbers, the simple O(N) algorithm finishes very quickly because you can just stop as soon as you get to 10 distinct digits. Almost all numbers of sufficient length will have all 10 digits: e.g. the probability of not terminating with the answer '10' after looking at the first 100 digits is about 0.00027, and after the first 1000 digits it's about 1.7e-45. But unfortunately, there are some oddities which make the worst case O(N).

After seeing that someone really posted a serious answer to this question, I'd rather repeat my own cheat here, which is a special case of the answer described by #SimonNickerson:
O(1) is not possible, unless you are on radix 2, because that way, every number other than 0 has both 1 and 0, and thus my "solution" works not only for integers...
EDIT
How about 2^k - 1? Isn't that all 1s?
Drat! True... I should have known that when something seems so easy, it is flawed somehow... If I got the all 0 case covered, I should have covered the all 1 case too.
Luckily this case can be tested quite quickly (if addition and bitwise AND are considered an O(1) operation): if x is the number to be tested, compute y this way: y=(x+1) AND x. If y=0, then x=2^k - 1. because this is the only case when all the bits needed to be flipped by the addition. Of course, this is quite a bit flawed, as with bit lengths exceeding the bus width, the bitwise operators are not O(1) anymore, but rather O(N).
At the same time, I think it can be brought down to O(logN), by breaking the number into bus width size chunks, and AND-ing together the neighboring ones, repeating until only one is left: if there were no 0s in the number tested, the last one will be full 1s too...
EDIT2: I was wrong... This is still O(N).

Related

Dynamic algorithm to multiply elements in a sequence two at a time and find the total

I am trying to find a dynamic approach to multiply each element in a linear sequence to the following element, and do the same with the pair of elements, etc. and find the sum of all of the products. Note that any two elements cannot be multiplied. It must be the first with the second, the third with the fourth, and so on. All I know about the linear sequence is that there are an even amount of elements.
I assume I have to store the numbers being multiplied, and their product each time, then check some other "multipliable" pair of elements to see if the product has already been calculated (perhaps they possess opposite signs compared to the current pair).
However, by my understanding of a linear sequence, the values must be increasing or decreasing by the same amount each time. But since there are an even amount of numbers, I don't believe it is possible to have two "multipliable" pairs be the same (with potentially opposite signs), due to the issue shown in the following example:
Sequence: { -2, -1, 0, 1, 2, 3 }
Pairs: -2*-1, 0*1, 2*3
Clearly, since there are an even amount of pairs, the only case in which the same multiplication may occur more than once is if the elements are increasing/decreasing by 0 each time.
I fail to see how this is a dynamic programming question, and if anyone could clarify, it would be greatly appreciated!

A quick google for define linear sequence gave
A number pattern which increases (or decreases) by the same amount each time is called a linear sequence. The amount it increases or decreases by is known as the common difference.
In your case the common difference is 1. And you are not considering any other case.
The same multiplication may occur in the following sequence
Sequence = {-3, -1, 1, 3}
Pairs = -3 * -1 , 1 * 3
with a common difference of 2.
However this is not necessarily to be solved by dynamic programming. You can just iterate over the numbers and store the multiplication of two numbers in a set(as a set contains unique numbers) and then find the sum.

Probably not what you are looking for, but I've found a closed solution for the problem.
Suppose we observe the first two numbers. Note the first number by a, the difference between the numbers d. We then count for a total of 2n numbers in the whole sequence. Then the sum you defined is:
sum = na^2 + n(2n-1)ad + (4n^2 - 3n - 1)nd^2/3
That aside, I also failed to see how this is a dynamic problem, or at least this seems to be a problem where dynamic programming approach really doesn't do much. It is not likely that the sequence will go from negative to positive at all, and even then the chance that you will see repeated entries decreases the bigger your difference between two numbers is. Furthermore, multiplication is so fast the overhead from fetching them from a data structure might be more expensive. (mul instruction is probably faster than lw).

Fastest algorithm to find factors of 2 and 5

I've read this question: Which is the fastest algorithm to find prime numbers?, but I'd like to do this only for 2 and 5 primes.
For example, the number 42000 is factorized as:
24 • 31 • 53 • 71
I'm only interested in finding the powers of 2 and 5: 4 and 3 in this example.
My naive approach is to successively divide by 2 while the remainder is 0, then successively divide by 5 while the remainder is 0.
The number of successful divisions (with zero remainder) are the powers of 2 and 5.
This involves performing (x + y + 2) divisions, where x is the power of 2 and y is the power of 5.
Is there a faster algorithm to find the powers of 2 and 5?

Following the conversation, I do think your idea is the fastest way to go, with one exception:
Division (in most cases) is expensive. On the other hand, checking the last digit of the number is (usually?) faster, so I would check the last digit (0/5 and 0/2/4/6/8) before dividing.

I am basing this off this comment by the OP:
my library is written in PHP and the number is actually stored as a string in base 10. That's not the most efficient indeed, but this is what worked best within the technical limits of the language.
If you are committed to strings-in-php, then the following pseudo-code will speed things up compared to actual general-purpose repeated modulus and division:
while the string ends in 0, but is not 0
chop a zero off the end,
increment ctr2 and ctr5
switch repeatedly depending on the last digit:
if it is a 5,
divide it by 5
increment ctr5
if it is 2, 4, 6, 8,
divide it by 2
increment ctr2
otherwise
you have finished
This does not require any modulus operations, and you can implement divide-by-5 and divide-by-2 cheaper than a general-purpose long-number division.
On the other hand, if you want performance, using string representations for unlimited-size integers is suicide. Use gmp (which has a php library) for your math, and convert to strings only when necessary.
edit:
you can gain extra efficiency (and simplify your operations) using the following pseudocode:
if the string is zero, terminate early
while the last non-zero character of the string is a '5',
add the string to itself
decrement ctr2
count the '0's at the end of the string into a ctr0
chop off ctr0 zeros from the string
ctr2 += ctr0
ctr5 += ctr0
while the last digit is 2, 4, 6, 8
divide the string by 2
increment ctr2
Chopping many 0s at once is better than looping. And mul2 beats div5 in terms of speed (it can be implemented by adding the number once).

If you have a billion digit number, you do not want to do divisions on it unless it's really necessary. If you don't have reason to believe that it is in the 1/2^1000 numbers divisible by 2^1000, then it makes sense to use much faster tests that only look at the last few digits. You can tell whether a number is divisible by 2 by looking at the last digit, whether it is divisible by 4 by looking at the last 2 digits, and by 2^n by looking at the last n digits. Similarly, you can tell whether a number is divisible by 5 by looking at the last digit, whether it is divisible by 25 by looking at the last 2 digits, and by 5^n by looking at the last n digits.
I suggest that you first count and remove the trailing 0s, then decide from the last digit whether you are testing for powers of 2 (last digit 2,4,6, or 8) or powers of 5 (last digit 5).
If you are testing for powers of 2, then take the last 2, 4, 8, 16, ... 2^i digits, and multiply this by 25, 625, ... 5^2^i, counting the trailing 0s up to 2^i (but not beyond). If you get fewer than 2^i trailing 0s, then stop.
If you are testing for powers of 5, then take the last 2, 4, 8, 16, ... 2^i digits, and multiply this by 4, 16, ... 2^2^i, counting the trailing 0s up to 2^i (but not beyond). If you get fewer than 2^i trailing 0s, then stop.
For example, suppose the number you are analyzing is 283,795,456. Multiply 56 by 25, you get 1400 which has 2 trailing 0s, continue. Multiply 5,456 by 625, you get 3,410,000, which has 4 trailing 0s, continue. Multiply 83,795,456 by 5^8=390,625, you get 32,732,600,000,000, which has 8 trailing 0s, continue. Multiply 283,795,456 by 5^16 to get 43,303,750,000,000,000,000 which has only 13 trailing 0s. That's less than 16, so stop, the power of 2 in the prime factorization is 2^13.
I hope that for larger multiplications you are implementing an n log n algorithm for multiplying n digit numbers, but even if you aren't, this technique should outperform anything involving division on typical large numbers.
Let's look at the average-case time complexity of various algorithms, assuming that each n-digit number is equally likely.
Addition or subtraction of two n-digit numbers takes theta(n) steps.
Dividing an n-digit number by a small number like 5 takes theta(n) steps. Dividing by the base is O(1).
Dividing an n-digit number by another large number takes theta(n log n) steps using the FFT, or theta(n^2) by a naive algorithm. The same is true for multiplication.
The algorithm of repeatedly dividing a base 10 number by 2 has an average case time complexity of theta(n): It takes theta(n) time for the first division, and on average, you need to do only O(1) divisions.
Computing a large power of 2 with at least n digits takes theta(n log n) by repeated squaring, or theta(n^2) with simple multiplication. Performing Euclid's algorithm to compute the GCD takes an average of theta(n) steps. Although divisions take theta(n log n) time, most of the steps can be done as repeated subtractions and it takes only theta(n) time to do those. It takes O(n^2 log log n) to perform Euclid's algorithm this way. Other improvements might bring this down to theta(n^2).
Checking the last digit for divisibility by 2 or 5 before performing a more expensive calculation is good, but it only results in a constant factor improvement. Applying the original algorithm after this still takes theta(n) steps on average.
Checking the last d digits for divisibility by 2^d or 5^d takes O(d^2) time, O(d log d) with the FFT. It is very likely that we only need to do this when d is small. The fraction of n-digit numbers divisible by 2^d is 1/2^d. So, the average time spent on these checks is O(sum(d^2 / 2^d)) and that sum is bounded independent of n, so it takes theta(1) time on average. When you use the last digits to check for divisibility, you usually don't have to do any operations on close to n digits.

depends on whether you're starting with a native binary number or some bigint string -
chopping off very long chains of trailing edge zeros in bigint strings are a lot easier than trying to extract powers of 2 and 5 separately - e.g. 23456789 x 10^66
23456789000000000000000000000000000000000000000000000000000000000000000000
This particular integer, on the surface, is 244-bits in total, requiring a 177-bit-wide mantissa (178-bit precision minus 1-bit implicit) to handle it losslessly, so even newer data types such as uint128 types won't suffice :
11010100011010101100101010010000110000101000100001000110100101
01011011111101001110100110111100001001010000110111110101101101
01001000011001110110010011010100001001101000010000110100000000
0000000000000000000000000000000000000000000000000000000000
The sequential approach is to spend 132 loop cycles in a bigint package to get them out ::
129 63 2932098625
130 64 586419725
131 65 117283945
132 66 23456789
133 2^66 x
5^66 x
23456789
But once you can quickly realize there's a chain of 66 trailing zeros, the bigint package becomes fully optional, since the residual digits is less than 24.5-bits in total width:
2^66
5^66
23456789

I think your algorithm will be the fastest. But I have a couple of suggestions.
One alternative is based on the greatest common divisor. Take the gcd of your input number with the smallest power of 2 greater than your input number; that will give you all the factors of 2. Divide by the gcd, then repeat the same operation with 5; that will give you all the factors of 5. Divide again by the gcd, and the remainder tells you if there are any other factors.
Another alternative is based on binary search. Split the binary representation of your input number in half; if the right half is 0, move left, otherwise move right. Once you have the factors of 2, divide, then apply the same algorithm on the remainder using powers of 5.
I'll leave it to you to implement and time these algorithms. But my gut feeling is that repeated division will be hard to beat.
I just read your comment that your input number is stored in base 10. In that case, divide repeatedly by 10 as long as the remainder is 0; that gives factors of both 2 and 5. Then apply your algorithm on the reduced number.

Quickly determine if number divides any element in a set

Is there an algorithm that can quickly determine if a number is a factor of a given set of numbers ?
For example, 12 is a factor of [24,33,52] while 5 is not.
Is there a better approach than linear search O(n)? The set will contain a few million elements. I don't need to find the number, just a true or false result.

If a large number of numbers are checked against a constant list one possible approach to speed up the process is to factorize the numbers in the list into their prime factors first. Then put the list members in a dictionary and have the prime factors as the keys. Then when a number (potential factor) comes you first factorize it into its prime factors and then use the constructed dictionary to check whether the number is a factor of the numbers which can be potentially multiples of the given number.

I think in general O(n) search is what you will end up with. However, depending on how large the numbers are in general, you can speed up the search considerably assuming that the set is sorted (you mention that it can be) by observing that if you are searching to find a number divisible by D and you have currently scanned x and x is not divisible by D, the next possible candidate is obviously at floor([x + D] / D) * D. That is, if D = 12 and the list is
5 11 13 19 22 25 27
and you are scanning at 13, the next possible candidate number would be 24. Now depending on the distribution of your input, you can scan forwards using binary search instead of linear search, as you are searching now for the least number not less than 24 in the list, and the list is sorted. If D is large then you might save lots of comparisons in this way.
However from pure computational complexity point of view, sorting and then searching is going to be O(n log n), whereas just a linear scan is O(n).

For testing many potential factors against a constant set you should realize that if one element of the set is just a multiple of two others, it is irrelevant and can be removed. This approach is a variation of an ancient algorithm known as the Sieve of Eratosthenes. Trading start-up time for run-time when testing a huge number of candidates:
Pick the smallest number >1 in the set
Remove any multiples of that number, except itself, from the set
Repeat 2 for the next smallest number, for a certain number of iterations. The number of iterations will depend on the trade-off with start-up time
You are now left with a much smaller set to exhaustively test against. For this to be efficient you either want a data structure for your set that allows O(1) removal, like a linked-list, or just replace "removed" elements with zero and then copy non-zero elements into a new container.

I'm not sure of the question, so let me ask another: Is 12 a factor of [6,33,52]? It is clear that 12 does not divide 6, 33, or 52. But the factors of 12 are 2*2*3 and the factors of 6, 33 and 52 are 2*2*2*3*3*11*13. All of the factors of 12 are present in the set [6,33,52] in sufficient multiplicity, so you could say that 12 is a factor of [6,33,52].
If you say that 12 is not a factor of [6,33,52], then there is no better solution than testing each number for divisibility by 12; simply perform the division and check the remainder. Thus 6%12=6, 33%12=9, and 52%12=4, so 12 is not a factor of [6.33.52]. But if you say that 12 is a factor of [6,33,52], then to determine if a number f is a factor of a set ns, just multiply the numbers ns together sequentially, after each multiplication take the remainder modulo f, report true immediately if the remainder is ever 0, and report false if you reach the end of the list of numbers ns without a remainder of 0.
Let's take two examples. First, is 12 a factor of [6,33,52]? The first (trivial) multiplication results in 6 and gives a remainder of 6. Now 6*33=198, dividing by 12 gives a remainder of 6, and we continue. Now 6*52=312 and 312/12=26r0, so we have a remainder of 0 and the result is true. Second, is 5 a factor of [24,33,52]? The multiplication chain is 24%5=5, (5*33)%5=2, and (2*52)%5=4, so 5 is not a factor of [24,33,52].
A variant of this algorithm was recently used to attack the RSA cryptosystem; you can read about how the attack worked here.

Since the set to be searched is fixed any time spent organising the set for search will be time well spent. If you can get the set in memory, then I expect that a binary tree structure will suit just fine. On average searching for an element in a binary tree is an O(log n) operation.
If you have reason to believe that the numbers in the set are evenly distributed throughout the range [0..10^12] then a binary search of a sorted set in memory ought to perform as well as searching a binary tree. On the other hand, if the middle element in the set (or any subset of the set) is not expected to be close to the middle value in the range encompassed by the set (or subset) then I think the binary tree will have better (practical) performance.
If you can't get the entire set in memory then decomposing it into chunks which will fit into memory and storing those chunks on disk is probably the way to go. You would store the root and upper branches of the set in memory and use them to index onto the disk. The depth of the part of the tree which is kept in memory is something you should decide for yourself, but I'd be surprised if you needed more than the root and 2 levels of branch, giving 8 chunks on disk.
Of course, this only solves part of your problem, finding whether a given number is in the set; you really want to find whether the given number is the factor of any number in the set. As I've suggested in comments I think any approach based on factorising the numbers in the set is hopeless, giving an expected running time beyond polynomial time.
I'd approach this part of the problem the other way round: generate the multiples of the given number and search for each of them. If your set has 10^7 elements then any given number N will have about (10^7)/N multiples in the set. If the given number is drawn at random from the range [0..10^12] the mean value of N is 0.5*10^12, which suggests (counter-intuitively) that in most cases you will only have to search for N itself.
And yes, I am aware that in many cases you would have to search for many more values.
This approach would parallelise relatively easily.

A fast solution which requires some precomputation:
Organize your set in a binary tree with the following rules:
Numbers of the set are on the leaves.
The root of the tree contains r the minimum of all prime numbers that divide a number of the set.
The left subtree correspond to the subset of multiples of r (divided by r so that r won't be repeated infinitly).
The right subtree correspond to the subset of numbers not multiple of r.
If you want to test if a number N divides some element of the set, compute its prime decomposition and go through the tree until you reach a leaf. If the leaf contains a number then N divides it, else if the leaf is empty then N divides no element in the set.

Simply calculate the product of the set and mod the result with the test factor.
In your example
{24,33,52} P=41184
Tf 12: 41184 mod 12 = 0 True
Tf 5: 41184 mod 5 = 4 False
The set can be broken into chunks if calculating the product would overflow the arithmetic of the calculator, but huge numbers are possible by storing a strings.

Finding a number that repeats even no of times where all the other numbers repeat odd no of times

Given is an array of integers. Each number in the array repeats an ODD number of times, but only 1 number is repeated for an EVEN number of times. Find that number.
I was thinking a hash map, with each element's count. It requires O(n) space. Is there a better way?

Hash-map is fine, but all you need to store is each element's count modulo 2.
All of those will end up being 1 (odd) except for the 0 (even) -count element.
(As Aleks G says you don't need to use arithmetic (count++ %2), only xor (count ^= 0x1); although any compiler will optimize that anyway.)

I don't know what the intended meaning of "repeat" is, but if there is an even number of occurrences of (all-1) numbers, and an odd number of occurances for only one number, then XOR should do the trick.

You don't need to keep the number of times each element is found - just whether it's even or odd number of time - so you should be ok with 1 bit for each element. Start with 0 for each element, then flip the corresponding bit when you encounter the element. Next time you encounter it, flip the bit again. At the end, just check which bit is 1.

If all numbers are repeated even times and one number repeats odd times, if you XOR all of the numbers, the odd count repeated number can be found.
By your current statement I think hashmap is good idea, but I'll think about it to find a better way. (I say this for positive integers.)

Apparently there is a solution in O(n) time and O(1) space, since it was asked at a software engineer company with this constraint explictly. See here : Bing interview question -- it seems to be doable using XOR over numbers in the array. Good luck ! :)

in a series of n elements of arithmetic progression, [n/2] elements are changed. Find the difference in the initial arithmetic progression

I have a list of size n which contains n consecutive members of an arithmetic progression which are not in order. I changed less than half of the elements in this list with some random integer. From this new list, how can I find the difference of the initial arithmetic progression?
I thought a lot about it but except brute force, I was not able to come up with any other thing :(
Thanks for thinking on this one :)

It's not possible to solve this in general and be 100% sure that your answer is correct. Let's say that the initial list is the following arithmetic progression (not in order):
1 3 2 4
Change less than half the elements at random... let's say for example that we changed 2 to 5:
1 3 5 4
If we can first find out which numbers we need to change to obtain a valid shuffled arithmetic sequence then we can easily solve the problem stated in the question. However we can see that there are multiple possible answers depending in which we number we choose to change:
6, 3, 5, 4 (difference is 1)
1, 3, 2, 4 (difference is 1)
1, 3, 5, 7 (difference is 2)
There is no way to know which of these possible sequence is the original sequence, so you cannot be sure what the original difference was.

Since there is no deterministic solution for the problem (as stated by #Mark Byers), you can try a probabilistic approach.
It's difficult to obtain the original progression, but its rate can be obtained easily by comparing the differences between elements. The difference of original ones will be multiples of rate.
Consider you take 2 elements from the list (probability that both of them belongs to the original sequence is 1/4), and compute the difference. This difference, with probability of 1/4, will be a multiple of the rate. Decompose it to prime factors and count them (for example, 12 = 2^^2 * 3 will add 2 to 2's counter and will increment 3's counter).
After many such iterations (it looks like a good problem for probabilistic methods, like Monte Carlo), you could analize the counters.
If a prime factor belongs to the rate, its counter will be at least num_iteartions/4 ( or num_iterations/2 if it appears twice).
The main problem is that small factors will have large probability on random input (for example, the difference between two random numbers will have 50% probability to be divisible by 2). So you'll have to compensate it: since 3/4 of your differences were random, you'll have to consider that (3/8)*num_iterations of 2's counter must be ignored. Since this also applies to all powers of two, the simpliest way is to pregenerate "white noise mask" by taking the differences only between random numbers.
EDIT: let's take this approach further. Consider that you create this "white noise mask" (let's call it spectrum) for random numbers, and consider that it's base-1 spectrum, since their smallest "largest common factor" is 1. By computing it for a differences of the arithmetic sequence, you'll obtain a base-R spectrum, where R is the rate, and it will equivalent to a shifted version of base-1 spectrum. So you have to find the value of R such that
your_spectrum ~= spectrum(1)*3/4 + spectrum(R)*1/4
You could also check for largest number R such that at least half of the elements will be equal modulo R.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio