How is the complexity of this algorithm logarithmic? - algorithm

Here is the code:
def intToStr(i):
digits = '0123456789'
if i == 0:
return '0'
result = ''
while i > 0:
result = digits[i%10] + result
i = i/10
return result
I understand that with logarithmic complexity, you are essentially dividing the necessary steps by some value each time you iterate (for example binary search algorithm). However in this example, we are not really dividing by a number, instead we remove one letter at a time. So by dividing i by 10 in i/10, we eliminate one number at a time. I can't really wrap my head around this algorithm... Is there a name for this algorithm so I can better understand why this is logarithmic?

The run time of this algorithm is linear with respect to the size (number of bits) of the input, so it's not logarithmic according to the usual definition. However, the run time is logarithmic with respect to the numerical value of the input, so it could be called "pseudo-logarithmic".
See also: Pseudo-polynomial time.

Well lets look at the steps for 123:
i result
123 ""
12 "3" -- after first iteration
1 "23" -- second iteration
0 "123" -- third iteration
For the number 123 we need 3 steps to convert it to a string. By doing further tests, we essentially see that the number of iterations is always equal to the number of digits of the number we want to convert. So for any n we can say that the algorithm needs floor(log10(n)+1) steps, which equals log(n) in Big O Notation.
EDIT:
hammar's answer is much more informative on the details of the complexity (one could say he hit the nail right on the head (pun intended)) so if you want to exactly know the complexity and want to be able to refer to it correctly you should look into his answer otherwise I think this "pseudo-logarithmic" fulfils your needs.

Related

a puzzle about definition of the time complexity

Wikipedia defines time complexity as
In computer science, the time complexity of an algorithm quantifies
the amount of time taken by an algorithm to run as a function of the
length of the string representing the input.
What's mean of the strong part?
I know algorithm may be treated as a function but why its input must be "the length of the string representing"?
The function in the bolder part means the time complexity of the algorithm, not the algorithm itself. An algorithm may be implemented in a programming language that has a function keyword, but that's something else.
Algorithm MergeSort has as input a list of 32m bits (assuming m 32-bit values). It's time complexity T(n) is a function of n = 32m, the input size, and in the worst case is bound from above by O(n log n). MergeSort could be implemented as a function in C or JavaScript.
The defition is derived from the context of Turing machines where you define different states. Every function which you can compute with a computer is also computeable with turing machine.(i would say that computer computes a function on the basis of turing machine)
Every fucntion is just a mapping from one domain to other domain or same domain.
Before going to Turing machines look at the concept of finite-automata.It has finite states.If your input is of length n then it's possible that it only needs two stats but it has to visit those states n times where n is legnth of the string.
Not a good sketch but look at the image below ,our final state is C , means if
end with a string in c our string will be accepted.
We use unary numberal system. We want to check if this this string gets accepted by our automata: string is 010101010
When we read 0 from A we move to B and if we read again a 0 we will move to C and if we end with 0 our string gets accepted otherwise we move to A again.
In computer you represent numbers as strings with length n and in order to compute it you have to visit each character of the string.
Turing machines work on the same way but finite-automata only is limited to regular languages. How this is a big theory
Did you ever try to think how computer computes a function 2*x where x is your input.
It's fun :D . Suppose i want to compute 20*2 and i represent this number with unary numeral system because it's easy. so we represetn 0 with , 1 with 11 , 2 with 111 and etc so if we convert 20 to unary sytem we get 1111. You can think of a turing machine or computer(not advanced) , a system with linear memory.
Suppose empty spots in your memory are presented with # .
With you input you have something like this: ###1111#### where # means empty slot in memory, with your input head of your turing machine is at first 1 so you keep moving forward until you find first # once you find this you just replace it with * which is just a helping symbol and change the right side of # with 1 now move back and change one more 1 to * and write one 1 on the right side when you find a # keep doing this and you will be left with all * on the left hand side and all 1s on the right side, Now change all *s back to 1 and you have 2*x. Here is the trace and you have 2*x where x was your input.
The point is that the only thing these machines remember is the state.
#####1111#######
#####111*1######
#####11**11#####
#####1***111####
#####****1111###
#####11111111###
Summary
If there is some input, it is expressed as a string. So you have a length of that string. Then you have a function F that maps the length of the input (as a string) to the time needed by A to compute this input (in the worst case).
We call this F time complexity.
Say we have an algorithm A. What is its time complexity?
The very easy case
If A has constant complexity, the input doesn't matter. The input could be a single value, or a list, or a map from strings to lists of lists. The algorithm will run for the same amount of time. 3 seconds or 1000 ticks or a million years or whatever. Constant time values not depending on the input.
Not much complexity at all to be honest.
Adding complexity
Now let's say for example A is an algorithm for sorting list of integer numbers. It's clear that the time needed by A now depends on the length of the list. A list of length 0 is literally sorted in no time (but checking the length of the list) but this changes if the length of the input list grows.
You could say there exists a function F that maps the list length to the seconds needed by A to sort a list of that length. But wait! What if the list is already sorted? So for simplicity let's always assume a worst case scenario: F maps list length to the maximum of seconds needed by A to sort a list of that length.
You could measure in seconds, CPU cycles, ticks, or whatever. It doesn't depend on the units.
Generalizing a bit
What with all the other algorithms? How to measure time complexity for an algorithm that cooks me a nice meal?
If you cannot define any input parameter then we're back in the easy case: constant time. If there is some input it is expressed as a string. So you have a length of that string. And - similar to what has been said above - then you have a function F that maps the length of the input (as a string) to the time needed by A to compute this input (in the worst case).
We call this F time complexity.
That's too simple
Yeah, I know. There is the average case and the best case, there is the big O notation and asymptotic complexity. But for explaining the bold part in the original question this is sufficient, I think.

Linear algorithm on binary strings

I'm going through some old midterms to study. (None of the solutions are given)
I've come across this problem which I'm stuck on
Let n = 2ℓ − 1 for some positive integer ℓ. Suppose someone claims to hold an array A[1.. n] of
distinct ℓ-bit strings; thus, exactly one ℓ-bit string does not appear in A. Suppose further that the
only way we can access A is by calling the function FetchBit(i, j), which returns the jth bit of the string A[i] in O(1) time.
Describe an algorithm to find the missing string in A using only O(n) calls to FetchBit.
The only thing I can think of is go through each string, convert it to base 10, sort them all and then see which value is missing. But that's certainly not O(n)
Proof it's not homework... http://web.engr.illinois.edu/~jeffe/teaching/algorithms/hwex/f12/midterm1.pdf
You can do it in 2n operations.
First, look at the first bit of every number. Obviously, you will get 2ℓ-1 zeros and 2ℓ-1-1 ones ore vice versa (because only one number is missing). If there is 2ℓ-1-1 ones then you know that the first bit of the missing number is one, otherwise it is zero.
Now you know the first bit of a missing number. Let's look at all numbers which have the same first bit (there are 2ℓ-1-1 of them) and repeat the same procedure with their second bit. This way you will determine the second bit of the missing number, and so on.
The total number of FetchBit calls will be 2ℓ-1 + 2ℓ-1-1 + ... + 21-1 <= 2ℓ+1 <= 2n+2 = O(n).

searching through a vast collection of potential solutions

I have a quite difficult problem (perhaps even a NP-hard problem ^^) with looking for a solution in a massive collection of results. Perhaps there is an algorithm for it.
Below exercise is artificial but is a perfect example to illustrate my issue.
There is a big array with integers. Lets say it has 100.000 elements.
int numbers[] = {-123,32,4,-234564,23,5,....}
I want to check in a relatively quick way if a sum on any 2 numbers from this array is equal to 0. In other words, if the array has "-123" I want to find is there also a "123" number.
The easiest solution would be brute force - check everything with everything. That gives 100.000 x 100.000 a big number ;-) Obviously brute force method can by optimised. Order numbers and check negatives against positive only. My question is - is there something better then optimised brute force to find a solution?
First, sort the array by magnitude of the value.
Then, if the data contains a pair which satisfies the conditions you're after, it contains such a pair adjacent in the array. So just sweep through looking for adjacent pairs whose sum is 0.
Overall time complexity is O(n log n) for the sort, could be O(n) if you use "cheating" sorts not based solely on comparisons. Clearly it can't be done in less than linear time, because in the worst case you can't do it without looking at all the elements. I think n log n is probably optimal in the decision tree model of computing, but only because it "feels a bit like" the element uniqueness problem.
Alternative approach:
Add the elements one at a time to a hash-based or tree-based container. Before adding each element, check whether its negative is present. If so, stop.
This is likely to be faster in the case where there are lots of suitable pairs, because you save the cost of sorting the whole data. That said, you could write a modified sort that exits early by checking for adjacent pairs as soon as any subset of the data is in its final order, but that's effort.
Brute force would be an O(n^2) solution. You can certainly do better.
Off the top of my head, first sort it. Heap sort will have a complexity of O(nlogn).
Now, for the first element, say a, you know you need to find an element b, such that a+b = 0. This can be found using binary search (since your array is now sorted). Binary search has a complexity of O(logn).
This gives you an overall solution of O(nlogn) complexity.
The example you provided can be brute-force solved in O(n^2) time.
You can start ordering the numbers (O(n·logn)) from smaller to bigger. If you place one pointer at the beginning (the "most negative number") and other at the end (the "most positive"), you can check if there is such pair of numbers in an additional O(n) steps by following the next procedure:
If the numbers at both pointers have the same module, you have the solution
If not, move the pointer of the number with bigger module towards "zero" (this is, increase if it is the pointer on the negative side, decrease if it is the positive-side one)
Repeat until finding a solution, or the pointers cross.
Total complexity is O(n·logn)+O(n) = O(n·logn).
Sort your array using Quicksort. After this happened, use two indexes, let's call them positive and negative.
positive <- 0
negative <- size - 1
while ((array[positive] > 0) and (array(negative < 0) and (positive >= 0) and (negative < size)) do
delta <- array[positive] + array[negative]
if (delta = 0) then
return true
else if (delta < 0) then
negative <- negative + 1
else
positive <- positive - 1
end if
end while
return (array[positive] * array[negative] = 0)
You didn't say what should the algorithm do if 0 is part of the array, I've supposed that in this case true should be returned.

Counting permutation of Strings

I need help with a problem. Given an input string with repetitions, say "aab", how to
count the number of distinct permutations of that string.
One formula that could be used is n!/n1!n2!.....nr!.
However calculating these ni's takes time O(rn) and O(n),if we
use a lookup table.
However I need a solution without use of such tables.Is any recursive or
dynamic programming solution possible for this problem.
Thanks in advance.
no. of distinct permutations will be n!/(c1!*c2*..*cn!)
here n is length of the string
ck denotes the no. of occurence of each distinct character.
For eg: string :aabb n=4 ca=2,cb=2
solution=4!/(2!*2!)=6
If you want to do this for very large strings, consider using the gamma function (with gamma(n+1)=n!), which is faster for large n and still gives you floating-point accuracy even in cases where you would get an int overflow.
If you have arbitrary precision arithmetic, you could probably push the effort down to O(r+n) by exploiting the fact that you can, e.g. write 1*2*3 * 1*2*3*4 * 1*2*3*4*5*6*7 as (1*2*3)^3 * 4^2 * 6*7. The end result will still have O(rn) digits and you'll still have an O(rn) time consumption, because multiplication cost increases with the size of the number.
I don't see the difference between lookup tables and dynamic programming - basically, dynamic programming uses a lookup table that you build on-the-fly. (i.e., use a lookup table, but only populate it on-demand).
Do you need approximate answers, or exact ones? Which part of this calculation do you think is slow?
If you need approximate answers, use the gamma function as #Yannick Versley suggested.
If you need exact answers, here is how I'd do it. I'd first figure out the prime factorization of the answer, then multiply those factors out. This avoids division. The hard part of figuring out the prime factorization is figuring out the prime factorization of n!. For that you can use a trick. Suppose that p is a prime, and k is the integer part of n/p'. Then the number of times thatpdividesn!iskplus the number of times thatpdividesk. Proceed recursively and it is quick to see that, for instance, the number of times that3is a factor of80!is26 + 8 + 2 = 36`. So after you find the primes up to 'n', it isn't hard to find the prime factorization of 'n!'.
Once you know the prime factorization, you can multiply it out. You expect to be dealing with large numbers, so try to arrange to do lots of small multiplications first, and only a few big ones. Here is a simple way to do that.
Make an array of the prime factors. Scramble it (to mix up big and small factors). Then as long as you have at least 2 factors in your array grab the first two, multiply them, push them onto the end. When you have one number left, that is your answer.
This should be much, much faster for large strings than the naive approach of multiplying the numbers one at a time. However in the end you will have very large numbers, and nothing can make multiplying those fast.
You can keep a running counts for each character, and build the result up as you go along. It's impossible to do better than O(n), since without looking at every character in the string you can't know how many of each character there are.
I've written some code in Python, with some simple unit tests. The code carefully avoids large intermediate values when the result is going to be small (in fact, the variable result is never larger than len(s) times the final result). If you were going to code this up in another language, say C, then you might use an array of size 256 rather than the defaultdict.
If you want an exact result, then I don't think you can do better than this.
from collections import defaultdict
def permutations(s):
seen = defaultdict(int)
for c in s:
seen[c] += 1
result = 1
n = 0
for k, count in seen.iteritems():
for j in xrange(count):
n += 1
result *= n
result //= j + 1
return result
test_cases = [
('abc', 6),
('aab', 3),
('abcd', 24),
('aabb', 6),
('aaaaa', 1),
('a', 1)]
for s, want in test_cases:
got = permutations(s)
if got != want:
print 'permutations(%s) = %s want %s' % (s, got, want)
As #MRalwasser says, the number of permutations should be n!. You can generate those permutations fairly simply, but the run time is going to be exponential because you have to hit exponentially many output strings. (Quick way to show O(n!) = O(2n) is by using Stirling's Formula.)

Finding a single number in a list [duplicate]

This question already has answers here:
How to find the only number in an array that doesn't occur twice [duplicate]
(5 answers)
Closed 7 years ago.
What would be the best algorithm for finding a number that occurs only once in a list which has all other numbers occurring exactly twice.
So, in the list of integers (lets take it as an array) each integer repeats exactly twice, except one. To find that one, what is the best algorithm.
The fastest (O(n)) and most memory efficient (O(1)) way is with the XOR operation.
In C:
int arr[] = {3, 2, 5, 2, 1, 5, 3};
int num = 0, i;
for (i=0; i < 7; i++)
num ^= arr[i];
printf("%i\n", num);
This prints "1", which is the only one that occurs once.
This works because the first time you hit a number it marks the num variable with itself, and the second time it unmarks num with itself (more or less). The only one that remains unmarked is your non-duplicate.
By the way, you can expand on this idea to very quickly find two unique numbers among a list of duplicates.
Let's call the unique numbers a and b. First take the XOR of everything, as Kyle suggested. What we get is a^b. We know a^b != 0, since a != b. Choose any 1 bit of a^b, and use that as a mask -- in more detail: choose x as a power of 2 so that x & (a^b) is nonzero.
Now split the list into two sublists -- one sublist contains all numbers y with y&x == 0, and the rest go in the other sublist. By the way we chose x, we know that a and b are in different buckets. We also know that each pair of duplicates is still in the same bucket. So we can now apply ye olde "XOR-em-all" trick to each bucket independently, and discover what a and b are completely.
Bam.
O(N) time, O(N) memory
HT= Hash Table
HT.clear()
go over the list in order
for each item you see
if(HT.Contains(item)) -> HT.Remove(item)
else
ht.add(item)
at the end, the item in the HT is the item you are looking for.
Note (credit #Jared Updike): This system will find all Odd instances of items.
comment: I don't see how can people vote up solutions that give you NLogN performance. in which universe is that "better" ?
I am even more shocked you marked the accepted answer s NLogN solution...
I do agree however that if memory is required to be constant, then NLogN would be (so far) the best solution.
Kyle's solution would obviously not catch situations were the data set does not follow the rules. If all numbers were in pairs the algorithm would give a result of zero, the exact same value as if zero would be the only value with single occurance.
If there were multiple single occurance values or triples, the result would be errouness as well.
Testing the data set might well end up with a more costly algorithm, either in memory or time.
Csmba's solution does show some errouness data (no or more then one single occurence value), but not other (quadrouples). Regarding his solution, depending on the implementation of HT, either memory and/or time is more then O(n).
If we cannot be sure about the correctness of the input set, sorting and counting or using a hashtable counting occurances with the integer itself being the hash key would both be feasible.
I would say that using a sorting algorithm and then going through the sorted list to find the number is a good way to do it.
And now the problem is finding "the best" sorting algorithm. There are a lot of sorting algorithms, each of them with its strong and weak points, so this is quite a complicated question. The Wikipedia entry seems like a nice source of info on that.
Implementation in Ruby:
a = [1,2,3,4,123,1,2,.........]
t = a.length-1
for i in 0..t
s = a.index(a[i])+1
b = a[s..t]
w = b.include?a[i]
if w == false
puts a[i]
end
end
You need to specify what you mean by "best" - to some, speed is all that matters and would qualify an answer as "best" - for others, they might forgive a few hundred milliseconds if the solution was more readable.
"Best" is subjective unless you are more specific.
That said:
Iterate through the numbers, for each number search the list for that number and when you reach the number that returns only a 1 for the number of search results, you are done.
Seems like the best you could do is to iterate through the list, for every item add it to a list of "seen" items or else remove it from the "seen" if it's already there, and at the end your list of "seen" items will include the singular element. This is O(n) in regards to time and n in regards to space (in the worst case, it will be much better if the list is sorted).
The fact that they're integers doesn't really factor in, since there's nothing special you can do with adding them up... is there?
Question
I don't understand why the selected answer is "best" by any standard. O(N*lgN) > O(N), and it changes the list (or else creates a copy of it, which is still more expensive in space and time). Am I missing something?
Depends on how large/small/diverse the numbers are though. A radix sort might be applicable which would reduce the sorting time of the O(N log N) solution by a large degree.
The sorting method and the XOR method have the same time complexity. The XOR method is only O(n) if you assume that bitwise XOR of two strings is a constant time operation. This is equivalent to saying that the size of the integers in the array is bounded by a constant. In that case you can use Radix sort to sort the array in O(n).
If the numbers are not bounded, then bitwise XOR takes time O(k) where k is the length of the bit string, and the XOR method takes O(nk). Now again Radix sort will sort the array in time O(nk).
You could simply put the elements in the set into a hash until you find a collision. In ruby, this is a one-liner.
def find_dupe(array)
h={}
array.detect { |e| h[e]||(h[e]=true; false) }
end
So, find_dupe([1,2,3,4,5,1]) would return 1.
This is actually a common "trick" interview question though. It is normally about a list of consecutive integers with one duplicate. In this case the interviewer is often looking for you to use the Gaussian sum of n-integers trick e.g. n*(n+1)/2 subtracted from the actual sum. The textbook answer is something like this.
def find_dupe_for_consecutive_integers(array)
n=array.size-1 # subtract one from array.size because of the dupe
array.sum - n*(n+1)/2
end

Resources