Can this algorithm be optimised? - algorithm

I have a list of some objects and I want to iterate through them in a specific sequence for a particular number as returned by the following function. What the following does is, removes each number based on the modulo of the hash number to the list size and generates a sequence.
def genSeq(hash,n):
l = range(n)
seq = []
while l:
ind = hash % len(l)
seq.append(l[ind])
del l[ind]
return seq
Eg: genSeq(53,5) will return [3, 1, 4, 2, 0]
I am presenting the algo in python for easy understanding. I am supposed to code in c++. The complexity in this form in O(n^2) both for vector and list. (we either pay for removing or for access). Can this be made any better ?

A skip list would give you O(log n) access and removal. So your total traversal time would be O(n log n).
I'd like to think there is a linear solution, but nothing jumps out at me.

The sequence
[hash % (i+1) in range(len(l))]
can be seen as a number in factoradic.
There is a bijection between permutations and factoradic numbers.
The mapping from factoradic numbers to permutations is described in the top answer under the section "Permuting a list using an index sequence". Unfortunately the algorithm provided is quadratic. However there is a commenter which points to a data-structure which would make the algorithm O(nlog(n)).

You should not delete from your vector, but swap with the end and decrease a fill pointer. Think about the Fisher-Yates shuffle.

Related

What numbers add up to k?

I am not sure how to do this. Given a list of numbers and a number k, return all pairs of numbers from the list that add up to k. only pass through the list once.
For example, given [10, 15, 3, 7] and k of 17. The program should return 10 + 7.
How do you order and return every pair while only going through the list once.
Use a set to keep track of what you've seen. Runtime O(N), Space: O(N)
def twoAddToK(nums, k):
seen = set()
N = len(nums)
for i in range(N):
if k - nums[i] in seen:
return True
seen.add(nums[i])
return False
As an alternative to Shawn's code, which uses a set, there is also the option of sorting the list in O(N log N) time (and possibly no extra space, if you are allowed to overwrite the original input), and then applying an O(N) algorithm to solve the problem on a sorted list
While asymptotic complexity slightly favors using hash sets in terms of time, since O(N) is better than O(N log N), I am ready to bet that sorting + single-pass lookup is considerably faster in practice.

Given a permutation's lexicographic number, is it possible to get any item in it in O(1)

I want to know whether the task explained below is even theoretically possible, and if so how I could do it.
You are given a space of N elements (i.e. all numbers between 0 and N-1.) Let's look at the space of all permutations on that space, and call it S. The ith member of S, which can be marked S[i], is the permutation with the lexicographic number i.
For example, if N is 3, then S is this list of permutations:
S[0]: 0, 1, 2
S[1]: 0, 2, 1
S[2]: 1, 0, 2
S[3]: 1, 2, 0
S[4]: 2, 0, 1
S[5]: 2, 1, 0
(Of course, when looking at a big N, this space becomes very large, N! to be exact.)
Now, I already know how to get the permutation by its index number i, and I already know how to do the reverse (get the lexicographic number of a given permutation.) But I want something better.
Some permutations can be huge by themselves. For example, if you're looking at N=10^20. (The size of S would be (10^20)! which I believe is the biggest number I ever mentioned in a Stack Overflow question :)
If you're looking at just a random permutation on that space, it would be so big that you wouldn't be able to store the whole thing on your harddrive, let alone calculate each one of the items by lexicographic number. What I want is to be able to do item access on that permutation, and also get the index of each item. That is, given N and i to specify a permutation, have one function that takes an index number and find the number that resides in that index, and another function that takes a number and finds in which index it resides. I want to do that in O(1), so I don't need to store or iterate over each member in the permutation.
Crazy, you say? Impossible? That may be. But consider this: A block cipher, like AES, is essentially a permutation, and it almost accomplishes the tasks I outlined above. AES has a block size of 16 bytes, meaning that N is 256^16 which is around 10^38. (The size of S, not that it matters, is a staggering (256^16)!, or around 10^85070591730234615865843651857942052838, which beats my recent record for "biggest number mentioned on Stack Overflow" :)
Each AES encryption key specifies a single permutation on N=256^16. That permutation couldn't be stored whole on your computer, because it has more members than there are atoms in the solar system. But, it allows you item access. By encrypting data using AES, you're looking at the data block by block, and for each block (member of range(N)) you output the encrypted block, which the member of range(N) that is in the index number of the original block in the permutation. And when you're decrypting, you're doing the reverse (Finding the index number of a block.) I believe this is done in O(1), I'm not sure but in any case it's very fast.
The problem with using AES or any other block cipher is that it limits you to very specific N, and it probably only captures a tiny fraction of the possible permutations, while I want to be able to use any N I like, and do item access on any permutation S[i] that I like.
Is it possible to get O(1) item access on a permutation, given size N and permutation number i? If so, how?
(If I'm lucky enough to get code answers here, I'd appreciate if they'll be in Python.)
UPDATE:
Some people pointed out the sad fact that the permutation number itself would be so huge, that just reading the number would make the task non-feasible. Then, I'd like to revise my question: Given access to the factoradic representation of a permutation's lexicographic number, is it possible to get any item in the permutation in O(as small as possible)?
The secret to doing this is to "count in base factorial".
In the same way that 134 = 1*10^2+3*10 + 4, 134 = 5! + 2 * 3! + 2! => 10210 in factorial notation (include 1!, exclude 0!). If you want to represent N!, you will then need N^2 base ten digits. (For each factorial digit N, the maximum number it can hold is N). Up to a bit of confusion about what you call 0, this factorial representation is exactly the lexicographic number of a permutation.
You can use this insight to solve Euler Problem 24 by hand. So I will do that here, and you will see how to solve your problem. We want the millionth permutation of 0-9. In factorial representation we take 1000000 => 26625122. Now to convert that to the permutation, I take my digits 0,1,2,3,4,5,6,7,8,9, and The first number is 2, which is the third (it could be 0), so I select 2 as the first digit, then I have a new list 0,1,3,4,5,6,7,8,9 and I take the seventh number which is 8 etc, and I get 2783915604.
However, this assumes that you start your lexicographic ordering at 0, if you actually start it at one, you have to subtract 1 from it, which gives 2783915460. Which is indeed the millionth permutation of the numbers 0-9.
You can obviously reverse this procedure, and hence convert backwards and forwards easily between the lexiographic number and the permutation that it represents.
I am not entirely clear what it is that you want to do here, but understanding the above procedure should help. For example, its clear that the lexiographic number represents an ordering which could be used as the key in a hashtable. And you can order numbers by comparing digits left to right so once you have inserted a number you never have to work outs it factorial.
Your question is a bit moot, because your input size for an arbitrary permutation index has size log(N!) (assuming you want to represent all possible permutations) which is Theta(N log N), so if N is really large then just reading the input of the permutation index would take too long, certainly much longer than O(1). It may be possible to store the permutation index in such a way that if you already had it stored, then you could access elements in O(1) time. But probably any such method would be equivalent to just storing the permutation in contiguous memory (which also has Theta(N log N) size), and if you store the permutation directly in memory then the question becomes trivial assuming you can do O(1) memory access. (However you still need to account for the size of the bit encoding of the element, which is O(log N)).
In the spirit of your encryption analogy, perhaps you should specify a small SUBSET of permutations according to some property, and ask if O(1) or O(log N) element access is possible for that small subset.
Edit:
I misunderstood the question, but it was not in waste. My algorithms let me understand: the factoradic representation of a permutation's lexicographic number is almost the same as the permutation itself. In fact the first digit of the factoradic representation is the same as the first element of the corresponding permutation (assuming your space consists of numbers from 0 to N-1). Knowing this there is not really a point in storing the index rather than the permutation itself . To see how to convert the lexicographic number into a permutation, read below.
See also this wikipedia link about Lehmer code.
Original post:
In the S space there are N elements that can fill the first slot, meaning that there are (N-1)! elements that start with 0. So i/(N-1)! is the first element (lets call it 'a'). The subset of S that starts with 0 consists of (N-1)! elements. These are the possible permutations of the set N{a}. Now you can get the second element: its the i(%((N-1)!)/(N-2)!). Repeat the process and you got the permutation.
Reverse is just as simple. Start with i=0. Get the 2nd last element of the permutation. Make a set of the last two elements, and find the element's position in it (its either the 0th element or the 1st), lets call this position j. Then i+=j*2!. Repeat the process (you can start with the last element too, but it will always be the 0th element of the possibilities).
Java-ish pesudo code:
find_by_index(List N, int i){
String str = "";
for(int l = N.length-1; i >= 0; i--){
int pos = i/fact(l);
str += N.get(pos);
N.remove(pos);
i %= fact(l);
}
return str;
}
find_index(String str){
OrderedList N;
int i = 0;
for(int l = str.length-1; l >= 0; l--){
String item = str.charAt(l);
int pos = N.add(item);
i += pos*fact(str.length-l)
}
return i;
}
find_by_index should run in O(n) assuming that N is pre ordered, while find_index is O(n*log(n)) (where n is the size of the N space)
After some research in Wikipedia, I desgined this algorithm:
def getPick(fact_num_list):
"""fact_num_list should be a list with the factorial number representation,
getPick will return a tuple"""
result = [] #Desired pick
#This will hold all the numbers pickable; not actually a set, but a list
#instead
inputset = range(len(fact_num_list))
for fnl in fact_num_list:
result.append(inputset[fnl])
del inputset[fnl] #Make sure we can't pick the number again
return tuple(result)
Obviously, this won't reach O(1) due the factor we need to "pick" every number. Due we do a for loop and thus, assuming all operations are O(1), getPick will run in O(n).
If we need to convert from base 10 to factorial base, this is an aux function:
import math
def base10_baseFactorial(number):
"""Converts a base10 number into a factorial base number. Output is a list
for better handle of units over 36! (after using all 0-9 and A-Z)"""
loop = 1
#Make sure n! <= number
while math.factorial(loop) <= number:
loop += 1
result = []
if not math.factorial(loop) == number:
loop -= 1 #Prevent dividing over a smaller number than denominator
while loop > 0:
denominator = math.factorial(loop)
number, rem = divmod(number, denominator)
result.append(rem)
loop -= 1
result.append(0) #Don't forget to divide to 0! as well!
return result
Again, this will run in O(n) due to the whiles.
Summing all, the best time we can find is O(n).
PS: I'm not a native English speaker, so spelling and phrasing errors may appear. Apologies in advance, and let me know if you can't get around something.
All correct algorithms for accessing the kth item of a permutation stored in factoradic form must read the first k digits. This is because, regardless of the values of the other digits among the first k, it makes a difference whether an unread digit is a 0 or takes on its maximum value. That this is the case can be seen by tracing the canonical correct decoding program in two parallel executions.
For example, if we want to decode the third digit of the permutation 1?0, then for 100, that digit is 0, and for 110, that digit is 2.

What's this sorting algorithm called?

I am thinking about a non-comparison sorting algorithm and I think I've found one myself.
Input: A[0...n] ranged from 0...n //ideally, I think it can be expanded to more general case later
Non-comparison-sort(A,n):
let B = [0...n] = [0]
for i in A:
B[A[i]]=i
After this algorithm,each element in array B will have a reference to array A and say if we want to access A[k] whose value is m, we can use A[B[m]]
I am sure I am not the first one come across this idea, So my question is what is this algorithm called?
Thanks in advance.
Actually, your algorithm is not a sorting algorithm. It's an algorithm to calculate the inverse of a permutation on 0..n. In other words, it will tell you how to rearrange A in order to have all the numbers in place.
Why isn't it a sorting algorithm?
If A contains all numbers in range 0..n, then the sorted array will always be B = [0, 1, 2, ..., n]. On the other hand, if A has duplicates, then this algorithm won't work.
I think what you're looking to do is counting sort. This algorithm is suitable for the case where A is an array of size k, and contains numbers in the range 0..n. The algorithm has an array B of size n+1 and it counts how many time each number appears while iterating once over A.
An example for counting sort (in your pseudo-code syntax):
Counting-sort(A, n):
let B = [0...n] = [0]
for x in A:
B[x] = B[x] + 1
let C = [] // an empty list
for i in 0...n:
for j in 0...B[i]: // add each number 0..n the number of times it appeared in A
C.append(i)
return C
After reading bucket sort here, it looks like bucket sort where the size of bucket is 1.
In Bucket sort, after putting the element in buckets, each bucket is made sorted.
However, in your case, since bucket size is 1, this step is not required. Merging the bucket is also not required since bucket size is 1 and is already merged in the array.
I think you've got a pigeon hole sort there

How to sort an array according to another array?

Suppose A={1,2,3,4}, p={36,3,97,19}, sort A using p as sort keys. You can get {2,4,1,3}.
It is an example in the book introducton to algorithms. It says it can be done in nlogn.
Can anyone give me some idea about how it can be done? My thought is you need to keep track of each element in p to find where it ends up, like p[1] ends up at p[3] then A[1] ends up at A[3]. Can anyone use merge sort or other nlogn sorting to get this done?
I'm new to algorithm and find it a little intimidating :( thanks for any help.
Construct an index array:
i = { 0, 1, 2, 3 }
Now, while you are sorting p, make the same changes to the index array i.
When you're done, you'll have:
i = { 1, 3, 0, 2 }
Sorting two arrays takes at most twice as long as sorting one (and actually, if you're only counting comparisons you don't have to do any additional comparisons, just data swaps in two arrays instead of one), so that doesn't change the Big-O complexity of the overall sort because O( 2n log n ) = O(n log n).
Now, you can use those indices to construct the sorted A array in linear time by simply iterating through the sorted index array and looking up the element of A at that index. This takes O( n ) time.
The runtime complexity of your overall algorithm is at worst: O( n + 2n log n ) = O( n log n )
Of course you can also skip index array entirely and simply treat the array A in the same way, sorting it along side p.
I don't see this difficult, since complexity of a sorting algorithm is usually measured on number of comparisons required you just need to update the position of elements in array A according to the elements in B. You won't need to do any comparison in addition to ones already needed to sort B so complexity is the same.
Every time you move an element, just move it in both arrays and you are done.

Optimum search for k minimum values in unsorted list of integers

I was just interviewed with a question, and I'm curious what the answer ought to be. The problem was, essentially:
Say you have an unsorted list of n integers. How do you find the k minimum values in this list? That is, if you have a list of [10, 11, 24, 12, 13] and are looking for the 2 minimum values, you'd get [10, 11].
I've got an O(n*log(k)) solution, and that's my best, but I'm curious what other people come up with. I'll refrain from polluting folks brains by posting my solution and will edit it in in a little while.
EDIT #1: For example, a function like:
list getMinVals(list &l, int k)
EDIT #2: It looks like it's a selection algorithm, so I'll toss in my solution as well; iterating over the list, and using a priority queue to save the minimum values. The spec on the priority queue was that the maximum values would end up at the top of the priority queue, so on comparing the top to an element, the top would get popped and the smaller element would get pushed. This assumed the priority queue had an O(log n) push and an O(1) pop.
This is the quickSelect algorithm. It's basically a quick sort where you only recurse for one part of the array. Here's a simple implementation in Python, written for brevity and readability rather than efficiency.
def quickSelect(data, nLeast) :
pivot = data[-1]
less = [x for x in data if x <= pivot]
greater = [x for x in data if x > pivot]
less.append(pivot)
if len(less) < nLeast :
return less + quickSelect(greater, nLeast - len(less))
elif len(less) == nLeast :
return less
else :
return quickSelect(less, nLeast)
This will run in O(N) on average, since at each iteration, you are expected to reduce the size of data by a multiplicative constant. The result will not be sorted. The worst case is O(N^2), but this is dealt with in essentially the same way as a quick sort, using things like median-of-3.
This is usually in the algorithm books under selection algorithms or "linear selection". Here's the specific section on min/max k values in a list. It's O(nlog(k)).

Resources