def two_sum1?(array, value)
array.sort! # O(nlogn)
array.each do |element|
return true if bsearch(array - [element], value - element) == true
end
return false
end
def bsearch(array, value)
return false if array.empty?
mid_idx = array.length / 2
mid_value = array[mid_idx]
return true if mid_value == value
mid_value < value ? bsearch(array[0...mid_idx], value) : bsearch(array[mid_idx+1..-1], value)
end
I'm trying to create a function that finds two unique numbers in an array such that their sum equals the value in the second argument. I believe my implementation has a time complexity of O(n log n). However, when I run it with another function whose time complexity is also O(n log n), the total time is way different (calculated using the Benchmark gem) using the same input. For my function, it takes about 0.9 seconds. For the other function, it is taking 0.003 seconds. Is there any error in my algorithm analysis? Is my implementation not O(n log n)?
This is the other function:
def two_sum2?(arr, target_sum)
arr = arr.sort
arr.each_with_index do |el, i|
match_idx = arr.bsearch_index { |el2| (target_sum - el) <=> el2 }
return true if match_idx && match_idx != i
end
false
end
This is what I'm using to test the two functions:
arr = [0, 1, 5, 7] + [100] * 10000
puts Benchmark.measure { two_sum1?(arr, 6) }
puts Benchmark.measure { two_sum1?(arr, 8) }
puts Benchmark.measure { two_sum1?(arr, 10) }
puts Benchmark.measure { two_sum2?(arr, 6) }
puts Benchmark.measure { two_sum2?(arr, 8) }
puts Benchmark.measure { two_sum2?(arr, 10) }
No, it's O(n^2).
array[0...mid_idx] (i.e. slicing) every time will create a new array. Therefore, bsearch is not log(n), but n.
Also, try to rewrite bsearch from recursive approach to iterative. It works faster. Like here.
The algorithm you have created seems correct in general. Let's try to analyze the complexity of that. You have sorting which should be O(n log n) if implemented correctly. Now, let's assume for a moment that your bsearch is implemented correctly (which means it has complexity O(log n)), then you would get for the whole loop O(n log n). Which would still mean that algorithm is O(n log n).
Now let's go to the implementation of the bsearch. As I said, it is generally correct (although you probably should consider changing the part with selecting index if Your array has the odd number of elements), the place where it fails is the array slicing. Whenever you slice an array it is internally iterating over the number of elements from the start to end given in the slice and copying them to the new array, this breaks the complexity and makes it O(n) instead of O(log n) and thus making the whole algorithm O(n2), because of the loop.
Related
Is there a different in the number of comparisons between merge sort in different case ?
For example ,
Case 1 : if I divide the array 2 parts and same size T(n)=T(n/2)+T(n/2)+n/2+n/2-1=T(n/2)+T(n/2)+n-1
Case 2 : T(n)=T(n/4)+T(3n/4)+(n/4)+(3n/4)-1=T(n)=T(n/4)+T(3n/4)+n-1
Since for merging 2 sub-array(let's say length m,n) I have to make at least m+n-1 comparisons, then I think the answer is yes but I am not sure.
And what's about dividing the array into $k$ sub-arrays in each iteration?
Is there a an optimal dividing for getting the lowest number of comparisons ?
I hope this is not a silly question, thanks!
You get the best possible worst-case performance from dividing the array into equal-size parts. Consider the opposite in the extreme case: letting one part be size 1 and the other n-1. That gives you linear recursion depth, and quadratic time.
You get n log n (plus/minus some constant) k-way comparisons if you split into k subarrays of size as close to n/k as possible, where log is the base-k logarithm. Note, however, that logarithms of different bases differ only by a constant factor, so you always get O(n log n) as long as k is a constant.
Update: If you do a k-way split, you need a k-way merge, and there are different ways to code that. The most natural way is perhaps to repeat n times: find which of the k subarrays has the smallest not-yet-picked element and pick that. This way, what you get is a total of n log_k n find-the-minimum-of-k-elements operations. For each find-the-minimum, you need to make k−1 compare operations between pairs of elements, so the total number of compares is (k−1)n log_k n. However, it's possible to organize k−1 compares so that you get more information than just which one is the minimum (otherwise selection sort would be optimal)! Is it possible to find a way to do the merge in a way that gets the number of compares down to the optimal n log_2 n that you get with k=2? Maybe, but it would probably be a very complicated merge procedure.
I was curious about k-way merge sort, so I did an experiment, sorting 216=65536 shuffled numbers with:
Ordinary two-way merge sort.
k-way merge sorts (splitting into k equal-size parts instead of 2) and using a heap for the k-way merge. From k=2 up to k=65536.
Timsort (Python's builtin sorting algorithm), which is an optimized merge sort.
The tracked numbers of comparisons for each one:
two_way: 965,599 True
k_way(2): 965,599 True
k_way(4): 1,180,226 True
k_way(16): 1,194,042 True
k_way(256): 1,132,726 True
k_way(65536): 1,071,758 True
timsort: 963,281 True
Code (Try it online!):
from random import shuffle
def two_way(a):
def merge(a, b):
c = []
i = j = 0
m, n = len(a), len(b)
while i < m and j < n:
if b[j] < a[i]:
c.append(b[j])
j += 1
else:
c.append(a[i])
i += 1
c += a[i:] or b[j:]
return c
def sort(a):
if i := len(a) // 2:
return merge(sort(a[:i]), sort(a[i:]))
return a[:]
return sort(a)
def k_way(a, k):
from heapq import merge
def sort(a):
if m := len(a) // k:
return list(merge(*(sort(a[j*m:(j+1)*m]) for j in range(k))))
return a[:]
return sort(a)
class Int:
def __init__(self, value):
self.value = value
def __lt__(self, other):
global comparisons
comparisons += 1
return self.value < other.value
def __repr__(self):
return str(self.value)
def is_sorted(a):
return all(x < y for x, y in zip(a, a[1:]))
def report(label, result):
print(f'{label:13} {comparisons:9,}', is_sorted(result))
a = list(map(Int, range(2**16)))
shuffle(a)
comparisons = 0
report('two_way:', two_way(a[:]))
for e in 1, 2, 4, 8, 16:
k = 2**e
comparisons = 0
report(f'k_way({k}):', k_way(a[:], k))
comparisons = 0
report('timsort:', sorted(a))
def can_sum(from_: list[int], into: int) -> bool:
divmod_table = [divmod(into, val) for val in from_]
for quotient, modulus in divmod_table: # O(n)
if not modulus:
return True
elif any(not (modulus % val) for val in from_):
return True
for val in from_: # O(n) or O(nlog(n))?
if into - val <= 0:
continue
elif can_sum(from_, into - val):
return True
return False
I tried finding an analogous algorithm implementation using this logic, but there doesn't seem to be any. In my simple analysis I can see that the first for-loop will be O(n) at worst, since the desired number may not be (i) divisible by any selected numbers (ii) have a modulus which is decomposable by any selected numbers, but what confuses me is the second for-loop.
In the second for-loop, the worst case would be an extension on O(n) because of the recursion, but dependent on the primary property of the first loop being unsuccessful, which would then further lead into higher orders of recursion. Maybe I'm misunderstanding time complexity, but the reduction from a desired sum n into a sequence s_k independently relies on k and n.
What is the relation on time complexity between these two variables?
I need to find if two numbers in an array equal a specific digit or not.
def sum?(array, num)
end
sum?([1, 2, 3], 5) #=> true
sum?([1, 2, 3], 6) #=> false
Is there a solution using a set whose order is less than N^2? Also, is there a way to get under N^2 without using a set?
With a set, lookup of a specific number is approximately constant. For each number in the array, there is only one other number that can make the sum work. Scanning the array is linear.
require 'set'
def sum?(array, num)
set = array.to_set
array.any? { |x| set.include?(num - x) }
end
The set conversion is approximately linear (N inserts), the any? is linear since the worst case iterates over each element once, and each element has an approximately constant lookup in the set. So overall linear.
This method fails if num is even and the array contains a single copy of num/2. That is another linear check, which I'll leave as an exercise.
The naive strategy here is pretty simple:
def sum?(array, n)
array.reject do |v|
v >= n
end.combination(2).any? do |a, b|
a + b == n
end
end
Using a Set seems pretty heavy-handed as you'd have to compute the entire set of all possible sums. For repeated tests it would be more efficient, you only need to compute the set once, but for single tests it's less efficient. This really depends on your use case.
For example:
def sum?(array, n)
Set.new(
array.reject do |v|
v >= n
end.combination(2).map do |a, b|
a + b
end
).include?(n)
end
You could try benchmarking that to see what the performance is like with longer lists of numbers.
You asked for a set based solution. If you want to avoid imports, you could use rubys "Enumerable#uniq" function instead (also implemented in Array), which creates a new array without dublicate entries. The next step would be to determine the minimum int value of your array and subtract that from your wished sum value. The value calculated like this is the maximum allowed value you have. You can then reject the others. All the operations so far seem to be O(N) according to the C-Code in the ruby docs. In the end you can use the Enumerable#any? function, as in your pereferd solution.
Here goes some untested code:
def sum?(array,num)
rest = array.uniq
max_allowed = num - rest.min
rest.reject! {|value| value > max_allowed}
rest.any? {|value| rest.include?(num - value)}
end
For the calculation:
array.uniq is pretty much the same as a tranformation into a set, so lets skip this.
calculating max allowed is O(1) + O(N)
rejection is O(N)
any? should result in avg of O(N * log(N))
which makes the result: O(1) + 2 * O(N) + O(N * log(N))
according to the big O notation this is O(N * log(N)) as it is the biggest part which is less then O(N^2)
I have an unordered list of n items, and I'm trying to find the most frequent item in that list. I wrote the following code:
def findFrequant(L):
count = int()
half = len(L)//2
for i in L:
for j in L:
if i == j:
count += 1
if count > half:
msg = "The majority vote is {0}".format(i)
return msg
else:
continue
count = 0
return "mixed list!"
Obviously this procedure with the two loops is O(n^2), and I'm trying to accomplish the same task in O(n log n) time. I'm not looking for a fix or someone to write the code for me, I'm simply looking for directions.
I don't recognize the language here so I'm treating it as pseudocode.
This depends on a hashtable with key of the same type of element of L and value type of int. Count each record in hashtable, then iterate the hashtable as a normal collection of key,value pairs applying normal maxlist algorithm.
O(n) is slightly worse than linear. We remember that expense of a good hash is not linear but may be approximated as linear. Linear space used.
def findFrequant(L):
hash = [,]
vc = 0
vv = null
for i in L
if hash.contains(i)
hash[i] = hash[i] + 1
else
hash[i] = 1
for (k,v) in hash
if v > vc
vv = k
vc = v
else if v == vc
vv = null
if (vv == null)
return "mixed list!"
else
return "The majority vote is {0}".format(v)
You could use a Merge Sort, which has a worst case time complexity of O(n log(n)) and a binary search which has a worst case time complexity of O(log(n)).
There are sorting algorithms with faster best case scenarios, like for example bubble sort which has a best case of O(n), but merge sort performs in O(n log(n)) at it's worst, while bubble sort has a worst case of O(n^2).
Pessimistic as we Computer Scientists are, we generally analyze based on the worst-case scenario. Therefore a combination of merge sort and binary search is probably best in your case.
Note that in certain circumstances a Radix sort might perform faster than merge sort, but this is really dependent on your data.
I have been asked to write algorithm for this problem: we are given array A and we want to know if there is any two elements U and L in array which U+L=K
I wrote my algorithm like this:
while(first<last)
{
if(arr[first]+arr[last]==k)
return true
else if(arr[first]+arr[last]<k)
last=last-1;
else
first++;
}
return false
}
But the problem is that what is the running time of this algorithm ?is it O(nlogn)? if yes why?and if not how can I implement it in O(nlogn)?
Running time of your algorithm is O(N) since in the worst case, you just end up iterating over the whole array.
Though your algorithm would not solve the problem. For example consider {9,1,3,4,2}. In this case if k would be 12, it would return false. For your algorithm the input array should be sorted first and then passed to the algorithm, which will take O(NlogN) in the worst case.
A much faster solution will be however to use something like HashMap to solve the problem in O(N) time.
Here is a small example of the alg in python where the result is false, but there is two elements in the list that fulfilles the U+L=k
def testArray(a, k):
first = 0
last = len(a) - 1
while (first < last):
print first, last
if (a[first] + a[last] == k):
return True
elif (a[first] + a[last] < k):
last=last-1
else:
first=first+1
return False
a = [3, 1, 5, 3, 6]
print testArray(a, 6)