Related
There are shuffle algorithms like FisherYates. They take an array and return one with elements in random order. This runs in O(n).
What I'm trying to do is to implement a prioritized left-shuffle algorithm. What does that mean?
Prioritized: It does not take an array of values. It takes an array of value-probability pairs. E.g. [ (1, 60), (2, 10), (3, 10), (4, 20) ]. Value 1 has 60%, value 2 has 10%, ...
left-shuffle: The higher the probability of a value, the higher its chances to be far on the left of the array.
Let's take this example [ (1, 10), (2, 10), (3, 60), (4, 20) ]. The most probable result should be [ 3, 4, 1, 2 ] or [ 3, 4, 2, 1 ].
I tried implementing this, but I haven't found any solution in O(n).
O(n^2) in pseudocode based on FisherYates:
sum = 100 #100%
for i = 0 to n-2:
r = random value between 0 and sum
localsum = 0
for j = i to n-1:
localsum = localsum + pair[j].Probability
if localsum >= r + 1:
swap(i, j)
break
sum = sum - pair[i].Probability
What could probably improve this a bit: Sorting the elements decreasing by probability right in the beginning to minimize the number of swaps and the iterations in the inner loop.
Is there a better solution (maybe even in O(n))?
Update of my first answer:
I've found a paper where the 'Roulette-wheel selection via stochastic acceptance' with O(1) is introduced. This makes the algorithm to O(n) and is simple to implement
from random import randint
from random import random
import time
data = [ (1, 10), (2, 10), (3, 60), (4, 20) ]
def swap(i, j, array):
array[j], array[i] = array[i], array[j]
def roulette_wheel_selection(data, start, max_weight_limit):
while True:
r = random()
r_index = randint(start, len(data) - 1)
if r <= data[r_index][1] / max_weight_limit:
return r_index
def shuffle(data, max_weight):
data = data.copy()
n = len(data)
for i in range(n-1):
r_index = roulette_wheel_selection(data, i, max_weight)
swap(i, r_index, data)
return data
def performance_test(iterations, data):
start = time.time()
max_weight = max([item[1] for item in data])
for i in range(iterations):
shuffle(data, max_weight)
end = time.time()
print(len(data), ': ',end - start)
return end - start
performance_test(1000, data)
data2 = []
for i in range(10):
data2 += data
performance_test(1000, data2)
data3 = []
for i in range(100):
data3 += data
performance_test(1000, data3)
data4 = []
for i in range(1000):
data4 += data
performance_test(1000, data4)
Performance Output
4 : 0.09153580665588379
40 : 0.6010794639587402
400 : 5.142168045043945
4000 : 50.09365963935852
So it's linear time in n (data size). I updated from my first answer the constant from "updated sum" to "maximum weight of all data items" But sure it depends on the max_weight konstant. If someone has a strategy to update max_weight in a proper way, the performance would increase.
There’s a way to do this in time O(n log n) using augmented binary search trees. The idea is the following. Take the items you want to shuffle and add them into a binary search tree, each annotated with their associated weights. Then, for each node in the BST, calculate the total weight of all the nodes in the subtree rooted at that node. For example, the weight of the root node will be 1 (sum of all the weights, which is 1 because it’s a probability distribution), the sum of the weight of the left child of the root will be the total weight in the left subtree, and the sum of the weights in the right child of the root will be the total weight of the right subtree.
With this structure in place, you can in time O(log n) select a random element from the tree, distributed according to your weights. The algorithm works like this. Pick a random number x, uniformly, in the range from 0 to the total weight left in the tree (initially 1, but as items are picked this will decrease). Then, start at the tree root. Let L be the weight of the tree’s left subtree and w be the weight of the root. Recursively use this procedure to select a node:
If x < L, move left and recursively select a node from there.
If L ≤ x < L + w, return the root.
If L + w ≤ x, set x := x - L - w and recursively select a node from the right subtree.
This technique is sometimes called roulette wheel selection, in case you want to learn more about it.
Once you’ve selected an item from the BST, you can then delete that item from the BST to ensure you don’t pick it again. There are techniques that ensure that, after removing the node from the tree, you can fix up the weight sums of the remaining nodes in the tree in time O(log n) so that they correctly reflect the weights of the remaining items. Do a search for augmented binary search tree for details about how to do this. Overall, this means that you’ll spend O(log n) work sampling and removing a single item, which summed across all n items gives an O(n log n)-time algorithm for generating your shuffle.
I’m not sure whether it’s possible to improve upon this. There is another algorithm for sampling from a discrete distribution called Vose’s alias method which gives O(1)-time queries, but it doesn’t nicely handle changes to the underlying distribution, which is something you need for your use case.
I've found a paper where the 'Roulette-wheel selection via stochastic acceptance' with O(1) is introduced. This makes the algorithm to O(n) and is simple to implement
from random import randint
from random import random
data = [ (1, 10), (2, 10), (3, 60), (4, 20) ]
def swap(i, j, array):
array[j], array[i] = array[i], array[j]
def roulette_wheel_selection(data, start, sum):
while True:
r = random()
r_index = randint(start, len(data) - 1)
if r <= data[r_index][1] / sum:
return r_index
def shuffle(data):
data = data.copy()
n = len(data)
sum = 100.0
for i in range(n-1):
r_index = roulette_wheel_selection(data, i, sum)
swap(i, r_index, data)
sum = sum - data[i][1]
return data
for i in range(10):
print(shuffle(data))
Output
[(3, 60), (4, 20), (2, 10), (1, 10)]
[(3, 60), (1, 10), (4, 20), (2, 10)]
[(3, 60), (1, 10), (4, 20), (2, 10)]
[(3, 60), (4, 20), (1, 10), (2, 10)]
[(3, 60), (4, 20), (2, 10), (1, 10)]
[(3, 60), (4, 20), (2, 10), (1, 10)]
[(3, 60), (4, 20), (2, 10), (1, 10)]
[(4, 20), (3, 60), (1, 10), (2, 10)]
[(3, 60), (2, 10), (4, 20), (1, 10)]
[(4, 20), (3, 60), (2, 10), (1, 10)]
Notice: For best performance the roulette_wheel_selection should use p_max depending on every iteration instead of sum. I use sum because it is easy to compute and update.
The 'Roulette-wheel selection via stochastic acceptance' answer of #StefanFenn technically answers my question.
But it has a disadvantage:
The maximum in the algorithm is only calculated once. Calculating it more often leads to a performance worse than O(n). If there are priorities like [100.000.000, 1, 2, 3], the algorithm would probably need 1 iteration through the while loop of roulette_wheel_selection if it picks the number 100.000.000, but millions of iterations through the while loop as soon as 100.000.000 is picked.
So I want to show you a very short O(n*log(n)) solution I've found that does not depend on how large the priorities themselves are (C# code):
var n = elements.Count;
Enumerable.Range(0, n)
.OrderByDescending(k => Math.Pow(_rng.NextDouble(), 1.0 / elements[k].Priority))
.Select(i => elements[i].Value);
Description: Based on the collection with the priorities with n elements, we create a new collection with values 0, 1, ... n-1. For each of them, we call the Math.Pow method to calculate a key and order the values descending by that key (because we want the values with higher priorities on the left, not the right). Now, we've got a collection with 0, 1, ... n-1 but in a prioritized/weighted random order. These are indices. In the last step, we get the insert the values based on the order of these indices.
I am trying to design an algorithm to find indices of two same element in an array. The input is an array and the output is two indices i & j such that array[i]=array[j].
time complexity must be O(nlogn).
here is what I tried
let i=0 to size_of_array{
let j=i+1 to size_of_array{
if array[j]=array[i]{
print(i, j)
}
}
}
Nested loop is O(n^2), but if I try to design like this. what the time complexity would be?
n is the size of array
my implementation would run O(n[(n-1)+(n-2)+(n-3)....+1]) times. Does it still O(n^2),Someone told me it is O(nlogn), why?
You can keep two array: one with the values (A) and one with the indices (I). A possible O(nlogn) algorithm could be:
Sort values array A in parallel with index array I. (Time complexity: O(nlogn)).
Scan A and compare every elements with its right neighbor, if a duplicate is found you can return the corresponding index in I. (Time complexity: O(n))
I implemented this idea in a python function:
import operator
def repeatedNumber(A):
if len(A) <= 1:
return -1
# building the indices array
indices = range(len(A))
# join the two arrays
zipped = zip(A, indices)
# sort the arrays based on value
zipped = sorted(zipped, key=operator.itemgetter(0))
# scan the array and compare every pair of neighbor
for i in range(len(zipped)):
if zipped[i][0] == zipped[i + 1][0]:
return zipped[i][1], zipped[i+1][1]
return -1
You can try with some examples:
For A = [2,3,5,2,6] give (0, 3)
For A = [2, 3, 100, 6, 15, 40, 7, 3] give (1, 7)
As you know the time complexity of your algorithm is O(n^2). To get a better result you can sort the array first and then find the indices.
If you sort the array, two indices with the same value could be located beside each other. Hence, you can iterate over the sorted array, and report their original index of the two current neighbor indices in the sorted array.
The time complexity of sorting could be O(n log n) and then iterating over the array is O(n). Hence, this algorithm is O(n log n).
You could use a map for inverse lookup
function findSame(theValues) {
var inverseLookup = {};
var theLength = theValues.length;
for (var i = 0; i < theLength; ++i) {
if (inverseLookup[theValues[i]] != undefined) {
return ([inverseLookup[theValues[i]], i]);
}
inverseLookup[theValues[i]] = i;
}
return [];
}
console.log(findSame([1, 2, 3, 4, 5, 4, 5, 6, 7, 8, 9]));
for loop has O(n) time + inverseLookup has O(1) time + O(n) space (hash table)
Given a list of integers in sorted order, say, [-9, -2, 0, 2, 3], we have to square each element and return the result in a sorted order. So, the output would be: [0, 4, 4, 9, 81].
I could figure out two approaches:
O(NlogN) approach - We insert the square of each element in a hashset. Then copy the elements into a list, sort it and then return it.
O(n) approach - If there is a bound for the input elements (say -100 to -100), then we create a boolean list of size 20000 (to store -10000 to 10000). For each of the input elements, we mark the corresponding square number as true. For e.g., for 9 in the input, I will mark 81 in the boolean array as true. Then traverse this boolean list and insert all the true elements into a return list. Note that in this we make an assumption - that there is a bound for the input elements.
Is there some way in which we could do it in O(n) time even without assuming any bounds for the input?
Well I can think of an O(n) approach
Split the input into 2 lists. One with negative numbers, let's call this list A. And one with positive numbers and 0, list B. This is done while preserving the input order, which is trivial : O(n)
Reverse list A. We do this because once squared, the greater than relation between the elements if flipped
Square every item of both list in place : O(n)
Run a merge operation not unlike that of a merge sort. : O(n)
Total: O(n)
Done :)
Is there some way in which we could do it in O(n) time even without assuming any bounds for the input?
Absolutely.
Since the original list is already sorted you are in luck!
given two numbers x and y
if |x| > |y| then x^2 > y^2
So all you have to do is to split the list into two parts, one for all the negative numbers and the other one for all the positive ones
Reverse the negative one and make them positive
Then you merge those two lists into one using insertion. This runs in O(n) since both lists are sorted.
From there you can just calculate the square and put them into the new list.
We can achieve it by 2 pointer technique. 1 pointer at the start and other at the end. Compare the squares and move the pointers accordingly and start allocating the max element at the end of the new list.
Time = O(n)
Space = O(n)
Can you do it inplace ? To reduce space complexity.
This can be done with O(n) time and space. We need two pointers. The following is the Java code:
public int[] sortedSquares(int[] A) {
int i = 0;
int j = A.length - 1;
int[] result = new int[A.length];
int count = A.length - 1;
while(count >= 0) {
if(Math.abs(A[i]) > Math.abs(A[j])) {
result[count] = A[i]*A[i];
i++;
}
else {
result[count] = A[j]*A[j];
j--;
}
count--;
}
return result;
}
Start from the end ad compare the absolute values. And then create the answer.
class Solution {
public int[] sortedSquares(int[] nums) {
int left = 0;
int right = nums.length -1;
int index = nums.length- 1;
int result[] = new int [nums.length];
while(left<=right)
{
if(Math.abs(nums[left])>Math.abs(nums[right]))
{
result[index] = nums[left] * nums[left];
left++;
}
else
{
result[index] = nums[right] * nums[right];
right--;
}
index--;
}
return result;
}
}
By using the naive approach this question will be very easy but it will require O(nlogn) complexity
To solve this question in O(n), two pointer method is the best approach.
Create a new result array with the same length as the given array, and store it pointer as array length
Assign a pointer at the start of the array and then assign another pointer at the last of the array, as we know the last element from either side will be highest
[-9, -2, 0, 2, 3]
compare -9 and 3 absolute value
if the left value then store the value to the resultant array and decrease its index value and increase the left, otherwise decrease the right.
Python3 solution. time complexity - O(N) and space complexity O(1).
def sorted_squArrres(Arr:list) ->list:
i = 0
j = len(Arr)-1
while i<len(Arr):
if Arr[i]*Arr[i]<Arr[j]*Arr[j]:
Arr.insert(0,Arr[j]*Arr[j])
Arr.pop(j+1)
i+=1
continue
if Arr[i]*Arr[i]>Arr[j]*Arr[j]:
Arr.insert(0,Arr[i]*Arr[i])
Arr.pop(i+1)
i+=1
continue
else:
if i!=j:
Arr.insert(0,Arr[j]*Arr[j])
Arr.insert(0,Arr[j+1]*Arr[j+1])
Arr.pop(j+2)
Arr.pop(i+2)
i+=2
else:
Arr.insert(0,Arr[j]*Arr[j])
Arr.pop(j+1)
i+=1
return Arr
X = [[-4,-3,-2,0,3,5,6],[1,2,3,4,5],[-5,-4,-3,-2,-1],[-9,-2,0,2,3]]
for i in X:
# looping differnt kinds of inputs
print(sorted_squArrres(i))
# outputs
'''
[0, 4, 9, 9, 16, 25, 36]
[1, 4, 9, 16, 25]
[1, 4, 9, 16, 25]
[0, 4, 4, 9, 81]
'''
This question already has answers here:
Convert a given array of integers to a sorted one, by deleting the minimum number of elements
(4 answers)
Closed 6 years ago.
I have a situation where I want to detect "outliers" in a supposedly sorted sequence. Elements that breaks the order are considered suspicious.
For example the sequence 1, 2, 3, 4, 7, 5, 6, 8, 9 is not sorted, but if you remove the 7 you get a sorted sequence 1, 2, 3, 4, 5, 6, 8, 9, this is also true if you remove 5 and 6, but that is more than just removing the 7 (also when having a sorted sequence you can remove arbitrary elements and still have a sorted sequence).
Is there an efficient algorithm for doing this? Is there an algorithm that finds all equally good solutions?
The later is for example if you have the sequence 1, 3, 2, 4. You could remove 3 to get a sorted sequence, but you could also remove just 2 to get a sorted sequence (both solutions are equally good since they only removes one element).
This can be done in O(N²) by dynamic program or memoized recursion. If foo(n,m) represents maximum length of sorted subset (I think sub sequence is the correct word) from index n when index of last element added was m then recursive function is:
int foo(int n,int m) {
int res = 0;
// you can add this number in the current sequence
//if its greater than the previous element in the sequence
// seq is array containing the numbers
if (seq[n] >= seq[m]) {
//1 because we added this element
// second argument is n because n is now the last element added
res = 1 + foo (n+1, n);
}
// you can always skip the current element
// in that case m remains same
res = max (res, foo(n+1, m))
}
You will need to handle corner cases (index equal to array length) and add memoization to make it work but I will leave that to you. Also the wiki page has even faster implementation.
This problem is similar to finding longest increasing subsequence.
Ref for O(n^2) implementation : http://www.geeksforgeeks.org/dynamic-programming-set-3-longest-increasing-subsequence/
and http://www.geeksforgeeks.org/longest-monotonically-increasing-subsequence-size-n-log-n/ for nlog(n) implementation.
Given a matrix with m rows and n columns, each of which are sorted. How to efficiently sort the entire matrix?
I know a solution which runs in O(m n log(min(m,n)). I am looking for a better solution.
The approach that I know basically takes 2 rows/cols at a time and applies merge operation.
Here is an example:
[[1,4,7,10],
[2,5,8,11],
[3,6,9,12]]
is the input martix which has every row and column sorted.
Expected output is:
[1,2,3,4,5,6,7,8,9,10,11,12]
Another example:
[[1, 2, 3, 3, 4, 5, 6, 6, 7, 7],
[1, 2, 4, 6, 7, 7, 8, 8, 9,10],
[3, 3, 4, 8, 8, 9,10,11,11,12],
[3, 3, 5, 8, 8, 9,12,12,13,14]]
I don't think you can do it any faster than Ω(m n log(min(m, n)), at least not in the general case.
Suppose (without loss of generality) that m < n. Then your matrix looks like this:
Each circle is a matrix entry and each arrow indicates a known order relation (the entry at the source of the arrow is smaller than the entry at the destination of the arrow).
To sort the matrix, we must resolve all the unknown order relations, some of which are shown in the grey boxes here:
Sorting all of these boxes takes:
2 Σk < m Ω(k log k) + (n - m + 1) Ω(m log m)
= 2 Ω(m² log m) + (n - m + 1) Ω(m log m)
= Ω(m n log m)
If elements are integers within a certain range k where K=o(mn), we can use count sort with extra space to achieve O(mn), otherwise the mnlog(min(m,n)) is the best we can do.
By creating a Binary Search Tree, we can achieve this in O(mn) time.
Take the last element from the first column (element 3 in the example mentioned above), make it as a root. Right nodes will be the n greater elements of that last row and left node will be the one above element ie. the (m-1)th or the 1st element from the second last row. Similarly for this element, the right nodes will be the n elements of that row. Again m-2 will be the left element and all the n elements in it's row will be the right elements. Similarly moving forward we'll have a binary search tree created in O(mn) time. This is O(mn) because we are not searching while inserting, it's a simple insert while traversing by shifting the root node pointer.
Then inorder traversal of this BST will be done which will also be O(mn) time.