Online algorithm for random permutation of N integers

Online algorithm for random permutation of N integers - algorithm

Imagine a standard permute function that takes an integer and returns a vector of the first N natural numbers in a random permutation. If you only need k (<= N) of them, but don't know k beforehand, do you still have to perform a O(N) generation of the permutation? Is there a better algorithm than:
for x in permute(N):
if f(x):
break
I'm imagining an API such as:
p = permuter(N)
for x = p.next():
if f(x):
break
where the initialization is O(1) (including memory allocation).

This question is often viewed as a choice between two competing algorithms:
Strategy FY: A variation on the Fisher-Yates shuffle where one shuffle step is performed for each desired number, and
Strategy HT: Keep all generated numbers in a hash table. At each step, random numbers are produced until a number which is not in the hash table is found.
The choice is performed depending on the relationship between k and N: if k is sufficiently large, the strategy FY is used; otherwise, strategy HT. The argument is that if k is small relative to n, maintaining an array of size n is a waste of space, as well as producing a large initialization cost. On the other hand, as k approaches n more and more random numbers need to be discarded, and towards the end producing new values will be extremely slow.
Of course, you might not know in advance the number of samples which will be requested. In that case, you might pessimistically opt for FY, or optimistically opt for HT, and hope for the best.
In fact, there is no real need for trade-off, because the FY algorithm can be implemented efficiently with a hash table. There is no need to initialize an array of N integers. Instead, the hash-table is used to store only the elements of the array whose values do not correspond with their indices.
(The following description uses 1-based indexing; that seemed to be what the question was looking for. Hopefully it is not full of off-by-one errors. So it generates numbers in the range [1, N]. From here on, I use k for the number of samples which have been requested to date, rather than the number which will eventually be requested.)
At each point in the incremental FY algorithm a single index r is chosen at random from the range [k, N]. Then the values at indices k and r are swapped, after which k is incremented for the next iteration.
As an efficiency point, note that we don't really need to do the swap: we simply yield the value at r and then set the value at r to be the value at k. We'll never again look at the value at index k so there is no point updating it.
Initially, we simulate the array with a hash table. To look up the value at index i in the (virtual) array, we see if i is present in the hash table: if so, that's the value at index i. Otherwise the value at index i is i itself. We start with an empty hash table (which saves initialization costs), which represents an array whose value at every index is the index itself.
To do the FY iteration, for each sample index k we generate a random index r as above, yield the value at that index, and then set the value at index r to the value at index k. That's exactly the procedure described above for FY, except for the way we look up values.
This requires exactly two hash-table lookups, one insertion (at an already looked-up index, which in theory can be done more quickly), and one random number generation for each iteration. That's one more lookup than strategy HT's best case, but we have a bit of a saving because we never need to loop to produce a value. (There is another small potential saving when we rehash because we can drop any keys smaller than the current value of k.)
As the algorithm proceeds, the hash table will grow; a standard exponential rehashing strategy is used. At some point, the hash table will reach the size of a vector of N-k integers. (Because of hash table overhead, this point will be reached at a value of k much less than N, but even if there were no overhead this threshold would be reached at N/2.) At that point, instead of rehashing, the hash is used to create the tail of the now non-virtual array, a procedure which takes less time than a rehash and never needs to be repeated; remaining samples will be selected using the standard incremental FY algorithm.
This solution is slightly slower than FY if k eventually reaches the threshold point, and it is slightly slower than HT if k never gets big enough for random numbers to be rejected. But it is not much slower in either case, and if never suffers from pathological slowdown when k has an awkward value.
In case that was not clear, here is a rough Python implementation:
from random import randint
def sampler(N):
k = 1
# First phase: Use the hash
diffs = {}
# Only do this until the hash table is smallish (See note)
while k < N // 4:
r = randint(k, N)
yield diffs[r] if r in diffs else r
diffs[r] = diffs[k] if k in diffs else k
k += 1
# Second phase: Create the vector, ignoring keys less than k
vbase = k
v = list(range(vbase, N+1))
for i, s in diffs.items():
if i >= vbase:
v[i - vbase] = s
del diffs
# Now we can generate samples until we hit N
while k <= N:
r = randint(k, N)
rv = v[r - vbase]
v[r - vbase] = v[k - vbase]
yield rv
k += 1
Note: N // 4 is probably pessimistic; computing the correct value would require knowing too much about hash-table implementation. If I really cared about speed, I'd write my own hash table implementation in a compiled language, and then I'd know :)

Related

Fixing this faulty Bingo Sort implementation

While studying Selection Sort, I came across a variation known as Bingo Sort. According to this dictionary entry here, Bingo Sort is:
A variant of selection sort that orders items by first finding the least value, then repeatedly moving all items with that value to their final location and find the least value for the next pass.
Based on the definition above, I came up with the following implementation in Python:
def bingo_sort(array, ascending=True):
from operator import lt, gt
def comp(x, y, func):
return func(x, y)
i = 0
while i < len(array):
min_value = array[i]
j = i + 1
for k in range(i + 1, len(array), 1):
if comp(array[k], min_value, (lt if ascending else gt)):
min_value = array[k]
array[i], array[k] = array[k], array[i]
elif array[k] == min_value:
array[j], array[k] = array[k], array[j]
j += 1
i = j
return array
I know that this implementation is problematic. When I run the algorithm on an extremely small array, I get a correctly sorted array. However, running the algorithm with a larger array results in an array that is mostly sorted with incorrect placements here and there. To replicate the issue in Python, the algorithm can be ran on the following input:
test_data = [[randint(0, 101) for i in range(0, 101)],
[uniform(0, 101) for i in range(0, 101)],
["a", "aa", "aaaaaa", "aa", "aaa"],
[5, 5.6],
[3, 2, 4, 1, 5, 6, 7, 8, 9]]
for dataset in test_data:
print(dataset)
print(bingo_sort(dataset, ascending=True, mutation=True))
print("\n")
I cannot for the life of me realize where the fault is at since I've been looking at this algorithm too long and I am not really proficient at these things. I could not find an implementation of Bingo Sort online except an undergraduate graduation project written in 2020. Any help that can point me in the right direction would be greatly appreciated.

I think your main problem is that you're trying to set min_value in your first conditional statement and then to swap based on that same min_value you've just set in your second conditional statement. These processes are supposed to be staggered: the way bingo sort should work is you find the min_value in one iteration, and in the next iteration you swap all instances of that min_value to the front while also finding the next min_value for the following iteration. In this way, min_value should only get changed at the end of every iteration, not during it. When you change the value you're swapping to the front over the course of a given iteration, you can end up unintentionally shuffling things a bit.
I have an implementation of this below if you want to refer to something, with a few notes: since you're allowing a custom comparator, I renamed min_value to swap_value as we're not always grabbing the min, and I modified how the comparator is defined/passed into the function to make the algorithm more flexible. Also, you don't really need three indexes (I think there were even a couple bugs here), so I collapsed i and j into swap_idx, and renamed k to cur_idx. Finally, because of how swapping a given swap_val and finding the next_swap_val is to be staggered, you need to find the initial swap_val up front. I'm using a reduce statement for that, but you could just use another loop over the whole array there; they're equivalent. Here's the code:
from operator import lt, gt
from functools import reduce
def bingo_sort(array, comp=lt):
if len(array) <= 1:
return array
# get the initial swap value as determined by comp
swap_val = reduce(lambda val, cur: cur if comp(cur, val) else val, array)
swap_idx = 0 # set the inital swap_idx to 0
while swap_idx < len(array):
cur_idx = swap_idx
next_swap_val = array[cur_idx]
while cur_idx < len(array):
if comp(array[cur_idx], next_swap_val): # find next swap value
next_swap_val = array[cur_idx]
if array[cur_idx] == swap_val: # swap swap_vals to front of the array
array[swap_idx], array[cur_idx] = array[cur_idx], array[swap_idx]
swap_idx += 1
cur_idx += 1
swap_val = next_swap_val
return array
In general, the complexity of this algorithm depends on how many duplicate values get processed, and when they get processed. This is because every time k duplicate values get processed during a given iteration, the length of the inner loop is decreased by k for all subsequent iterations. Performance is therefore optimized when large clusters of duplicate values are processed early on (as when the smallest values of the array contain many duplicates). From this, there are basically two ways you could analyze the complexity of the algorithm: You could analyze it in terms of where the duplicate values tend to appear in the final sorted array (Type 1), or you could assume the clusters of duplicate values are randomly distributed through the sorted array and analyze complexity in terms of the average size of duplicate clusters (that is, in terms of the magnitude of m relative to n: Type 2).
The definition you linked uses the first type of analysis (based on where duplicates tend to appear) to derive best = Theta(n+m^2), average = Theta(nm), worst = Theta(nm). The second type of analysis produces best = Theta(n), average = Theta(nm), worst = Theta(n^2) as you vary m from Theta(1) to Theta(m) to Theta(n).
In the best Type 1 case, all duplicates will be among the smallest elements of the array, such that the run-time of the inner loop quickly decreases to O(m), and the final iterations of the algorithm proceed as an O(m^2) selection sort. However, there is still the up-front O(n) pass to select the initial swap value, so the overall complexity is O(n + m^2).
In the worst Type 1 case, all duplicates will be among the largest elements of the array. The length of the inner loop isn't substantially shortened until the last iterations of the algorithm, such that we achieve a run-time looking something like n + n-1 + n-2 .... + n-m. This is a sum of m O(n) values, giving us O(nm) total run-time.
In the average Type 1 case (and for all Type 2 cases), we don't assume that the clusters of duplicate values are biased towards the front or back of the sorted array. We take it that the m clusters of duplicate values are randomly distributed through the array in terms of their position and their size. Under this analysis, we expect that after the initial O(n) pass to find the first swap value, each of the m iterations of the outer loop reduce the length of the inner loop by approximately n/m. This leads to an expression of the overall run-time for unknown m and randomly distributed data as:
We can use this expression for the average case run-time with randomly distributed data and unknown m, Theta(nm), as the average Type 2 run-time, and it also directly gives us the best and worst case run-times based on how we might vary the magnitude of n.
In the best Type 2 case, m might just be some constant value independent of n. if we have m=Theta(1) randomly distributed duplicate clusters, the best case run time is then Theta(n*Theta(1))) = Theta(n). For example as you would see O(2n) = O(n) performance from bingo-sort with just one unique value (one pass to find the find value, one pass to swap every single value to the front), and this O(n) asymptotic complexity still holds if m is bounded by any constant.
However in the worst Type 2 case we could have m=Theta(n), and bingo sort essentially devolves into O(n^2) selection sort. This is clearly the case for m = n, but if the amount the inner-loop's run-time is expected to decrease by with each iteration, n/m, is any constant value, which is the case for any m value in Theta(n), we still see O(n^2) complexity.

Uniform sampling of k integers from [0:n)

My goal is to sample k integers from 0, ... n-1 without duplication. The order of sampled integers doesn't matter. At every each call (which occurs very often), n and k will slightly vary but not much (n is about 250,000 and k is about 2,000). I've come up with the following amortized O(k) algorithm:
Prepare an array A with items 0, 1, 2, ... , n-1. This takes O(n) but since n is relatively stable, the cost can be made amortized constant.
Sample a random number r from [0:i] where i = n - 1. Here the cost is in fact related to n, but as n is not VERY BIG, this dependency is not critical.
Swap the rth item and the ith item in the array A.
Decrease i by 1.
Repeat k times the steps 2~4; now we have a random permutation of length k at the tail of A. Copy this.
We should roll back A to its initial state (0, ... , n-1) to keep the cost of the step 1 constant. This can be done by push r to a stack of length k at each pass of step 2. Preparation of the stack requires amortized constant cost.
I think uniform sampling of permutation/combination should be an exhaustively studied problem, so either (1) there is a much better solution, or at least (2) my solution is a (minor modification of) a well-known solution. Thus,
In case (1), I want to know that better solution.
In case (2), I want to find a reference.
Please help me. Thanks.

If k is much less than n -- say, less than half of n -- then the most efficient solution is to keep the numbers generated in a hash table (actually, a hash set, since there is no value associated with a key). If the random number happens to already be in the hash table, reject it and generate another one in its place. With the actual values of k and n suggested (k ∼ 2000; n ∼ 250,000) the expected number of rejections to generate k unique samples is less than 10, so it will hardly be noticeable. The size of the hash table is O(k), and it can simply be deleted at the end of the sample generation.
It is also possible to simulate the FYK shuffle algorithm using a hash table instead of a vector of n values, thereby avoiding having to reject generated random numbers. If you were using a vector A, you would start by initializing A[i] to i, for every 0 ≤ i < k. With the hash table H, you start with an empty hash table, and use the convention that H[i] is considered to be i if the key i is not in the hash table. Step 3 in your algorithm -- "swap A[r] with A[i]" -- becomes "add H[r] as the next element of the sample and set H[r] to H[i]". Note that it is unnecessary to set H[i] because that element will never be referred to again: all subsequent random numbers r are generate from a range which does not include i.
Because the hash table in this case contains both keys and values, it is larger than the hash set used in alternative 1, above, and the increased size (and consequent increase in memory cache misses) is likely to cause more overhead than is saved by eliminating rejections. However, it has the advantage of working even if k is occasionally close to n.
Finally, in your proposed algorithm, it is actually quite easy to restore A in O(k) time. A value A[j] will have been modified by the algorithm only if:
a. n − k ≤ j < n, or
b. there is some i such that n − k ≤ i < n and A[i] &equals; j.
Consequently, you can restore the vector A by looking at each A[i] for n − k ≤ i < n: first, if A[i] < n−k, set A[A[i]] to A[i]; then, unconditionally set A[i] to i.

Find "important" entries in a sorted log

I have a log file consisting of several thousand integers, each separated onto a new line. I've parsed this into an array of such integers, also sorted. Now my issue becomes finding the "important" integers from this log--these are ones that show up some user-configurable portion of the time.
For example, given the log, the user can filter to only see entries that appear a certain scaled number of times.
Currently I'm scanning the whole array and keeping count of the number of times each entry appears. Surely there is a better method?

First, I need to note that the following is just a theoretical solution, and you probably should use what is proposed by #MBo.
Take every m = n / lth element of the sorted array. Only those elements can be important, as no sequence of identical elements of length m can fit between i*m and (i+1)*m.
For each element x, find with binary search its lower bound and upper bound in the array. Subtracting indexes, you can know count, and decide to keep or discard x as unimportant.
Total complexity would be O((n/m) * log n) = O(l * log n). For large m it could be (asymptotically) better than O(n). To get an improvement in practice, however, you need very specific circumstances:
Array is given to you presorted (otherwise just use counting sort and you get an answer immediately)
You can access i-th element of the array in O(1) without reading the whole array. Otherwise, again, use counting sort with hash table.
Lets assume you have a file consisting of sorted fixed-width integers "data.bin" (it is possible for variable width too, but requires some extra effort). Then in pseudocode, algorithm could be something like so:
def find_all_important(l, n):
m = n / l
for i = m to l step m:
x = read_integer_at_offset("data.bin", i)
lower_bound = find_lower_bound(x, 0, i)
upper_bound = find_upper_bound(x, i, n)
if upper_bound - lower_bound >= m:
report(x)
def find_lower_bound(x, begin, end):
if end - begin == 0:
return begin
mid = (end + begin) / 2
x = read_integer_at_offset("data.bin", mid)
if mid < x:
return find_lower_bound(x, mid + 1, end)
else:
return find_lower_bound(x, begin, mid)
As a guess, you will not gain any noticeable improvement compared to naive O(n) on modern hardware, unless your file is very large (hundreds of MBs). And of course it is viable if your data can't fit in RAM. But as always with optimization, it might be worth testing.

Your sorting takes O(NlogN) time perhaps. Do you need to make (n/I) queries many times for the same data set?
If yes, walk through sorted array, make (Value;Count) pairs and sort them by Count field. Now you can easily separate pairs with high counts with binary search

Hash Function with Order Preserving

Is there any hash function with uniq hash code (like MD5) with order preserving?
NOTE:
i don't care about security, i need it for sorting, i have lot of chunks with (~1MB size) and i want to sort them, of course i can use index sort but i want to reduce time of compare
Theoreticaly:
if i have 1'000'000 chunks with 1MB size (1'048'576 byte) and all of them have difference in last 10 bytes then time of compare of one chunk to other will be O(n-10) and if i will use QuictSort (which make ~(nlog2(n)) compares) then total time of compare will be nlog2(n)*(k-10) (where k is chunk size)
1'000'000 * 20 * (1'048'576 - 10)
that's why i want to generate order preserved hash codes with fixed size (for example 16 bytes) once then sort chunks and save result (for example: in file)

CHM (Z.J. Czech, G. Havas, and B.S. Majewski) is an algorithm which generates a minimal perfect hash that preserves ordering (e.g. if A < B, then h(A) < h(B)). It uses approximately 8 bytes of storage per key.
See: http://cmph.sourceforge.net/chm.html

In general case, such a function is impossible unless the size of the hash is at least the size of the object.
The argument is trivial: if there are N objects but M < N hash values, by pigeonhole principle, two different objects are mapped to one hash value, and so their order is not preserved.
If however we have additional properties of the objects guaranteed or the requirements relaxed, a custom or probabilistic solution may become possible.

According to NIST (I'm no expert) a Pearson hash can be order-preserving. The hash uses an auxiliary table. Such a table can (in theory) be constructed such that the resulting hash is order preserving.
It doesn't meet your full requirements though, because it doesn't reduce the size as you would like. I'm posting this in case other people are looking for a solution.
Some pointers:
The NIST page: http://xlinux.nist.gov/dads/HTML/pearsonshash.html
Wikipedia: http://en.wikipedia.org/wiki/Pearson_hashing
The original Pearson Hash paper: http://cs.mwsu.edu/~griffin/courses/2133/downloads/Spring11/p677-pearson.pdf

Sorting an array of N strings each of length K can be done in just O (NK) or O (N^2 + NK) character comparisons.
For example, construct a trie.
Or do a kind of insertion sort. Construct the set of sorted strings S by adding strings to it one by one. For each new string P, traverse it, maintaining the (non-decreasing) index of the greatest string Q in S such that Q <= P. When the string P ends, insert it into S just after Q. Each of the O(N) insertions can be done in O(N+K) operations: O(N) times increasing the index distributed into K.
When you have indices of the strings in sorted order, just use them for your purposes instead of the "hashes" you want.

Lets construct such a function from the requirements:
You want a function that outputs a 16 byte hash. So you will have collisions. You can't preserve perfect order and you don't want to. Best you can do is:
H(x) < H(y) => x < y
H(x) > H(y) => x > y
Values close to each other will have the same hash.
For each x there is an i_x > 0 so that H(x) = H(x + i_x) < H(x + i_x + 1). (Except for the end where x + i_x + 1 would overflow your 1MB chunks.)
Extending that you get: H(x) < H(x + i_x + n) for any n > 0.
Same argument works for j_x > 0 in the other direction. Combine them and you get:
H(x - j_x) == H(x - j_x + 1) == ... == H(x + i_x - 1) == H(x + i_x)
Or in other words for each hash value there is a single segment [a, b] mapping to the same value. No value outside this segment can have the same hash value or the ordering would be violated.
Your hash function can then be described by the segments you choose:
Let a_i be 1MB chunks with 0 <= i < 256^16 and a_i <= a_i+1. Then
H(x) = i where a_i <= x < a_i+1
You want an more of less uniform distribution of hash values. Otherwise one would get far more collisions than another and you would spend all the time doing a full compare when that value is hit. So all the segments [a, b] should be about the same size.
The only way to have exact the same size for each segment is to have
a_i = i * 2 ^ (1MB - 16)
or in other words: H(x) = first 16 bytes of x.
Any other order preserving hash function with a 16 byte output would be less efficient for a random set of input blocks.
And yes, if all but the last few bits of each input block are the same then every test will be a collision. That's a worst case scenario that always exists. If you know your inputs aren't uniformly random then you can adjust the size of each segment to have the same probability to be hit. But that requires knowledge of likely inputs.
Note: If you really want to sort 1'000'000 1MB chunks where you fear such a worst case then you can use bucket sort, resulting in 1,000,000 * 1'048'576 (byte) compares every time. Half of that if you compare 16 bit values at a time, which still has a reasonable number of buckets (65536).

In theory there is no such thing. If you want, you can create a composed hash:
index:md5
I think this will resolve your needs.

Given a permutation's lexicographic number, is it possible to get any item in it in O(1)

I want to know whether the task explained below is even theoretically possible, and if so how I could do it.
You are given a space of N elements (i.e. all numbers between 0 and N-1.) Let's look at the space of all permutations on that space, and call it S. The ith member of S, which can be marked S[i], is the permutation with the lexicographic number i.
For example, if N is 3, then S is this list of permutations:
S[0]: 0, 1, 2
S[1]: 0, 2, 1
S[2]: 1, 0, 2
S[3]: 1, 2, 0
S[4]: 2, 0, 1
S[5]: 2, 1, 0
(Of course, when looking at a big N, this space becomes very large, N! to be exact.)
Now, I already know how to get the permutation by its index number i, and I already know how to do the reverse (get the lexicographic number of a given permutation.) But I want something better.
Some permutations can be huge by themselves. For example, if you're looking at N=10^20. (The size of S would be (10^20)! which I believe is the biggest number I ever mentioned in a Stack Overflow question :)
If you're looking at just a random permutation on that space, it would be so big that you wouldn't be able to store the whole thing on your harddrive, let alone calculate each one of the items by lexicographic number. What I want is to be able to do item access on that permutation, and also get the index of each item. That is, given N and i to specify a permutation, have one function that takes an index number and find the number that resides in that index, and another function that takes a number and finds in which index it resides. I want to do that in O(1), so I don't need to store or iterate over each member in the permutation.
Crazy, you say? Impossible? That may be. But consider this: A block cipher, like AES, is essentially a permutation, and it almost accomplishes the tasks I outlined above. AES has a block size of 16 bytes, meaning that N is 256^16 which is around 10^38. (The size of S, not that it matters, is a staggering (256^16)!, or around 10^85070591730234615865843651857942052838, which beats my recent record for "biggest number mentioned on Stack Overflow" :)
Each AES encryption key specifies a single permutation on N=256^16. That permutation couldn't be stored whole on your computer, because it has more members than there are atoms in the solar system. But, it allows you item access. By encrypting data using AES, you're looking at the data block by block, and for each block (member of range(N)) you output the encrypted block, which the member of range(N) that is in the index number of the original block in the permutation. And when you're decrypting, you're doing the reverse (Finding the index number of a block.) I believe this is done in O(1), I'm not sure but in any case it's very fast.
The problem with using AES or any other block cipher is that it limits you to very specific N, and it probably only captures a tiny fraction of the possible permutations, while I want to be able to use any N I like, and do item access on any permutation S[i] that I like.
Is it possible to get O(1) item access on a permutation, given size N and permutation number i? If so, how?
(If I'm lucky enough to get code answers here, I'd appreciate if they'll be in Python.)
UPDATE:
Some people pointed out the sad fact that the permutation number itself would be so huge, that just reading the number would make the task non-feasible. Then, I'd like to revise my question: Given access to the factoradic representation of a permutation's lexicographic number, is it possible to get any item in the permutation in O(as small as possible)?

The secret to doing this is to "count in base factorial".
In the same way that 134 = 1*10^2+3*10 + 4, 134 = 5! + 2 * 3! + 2! => 10210 in factorial notation (include 1!, exclude 0!). If you want to represent N!, you will then need N^2 base ten digits. (For each factorial digit N, the maximum number it can hold is N). Up to a bit of confusion about what you call 0, this factorial representation is exactly the lexicographic number of a permutation.
You can use this insight to solve Euler Problem 24 by hand. So I will do that here, and you will see how to solve your problem. We want the millionth permutation of 0-9. In factorial representation we take 1000000 => 26625122. Now to convert that to the permutation, I take my digits 0,1,2,3,4,5,6,7,8,9, and The first number is 2, which is the third (it could be 0), so I select 2 as the first digit, then I have a new list 0,1,3,4,5,6,7,8,9 and I take the seventh number which is 8 etc, and I get 2783915604.
However, this assumes that you start your lexicographic ordering at 0, if you actually start it at one, you have to subtract 1 from it, which gives 2783915460. Which is indeed the millionth permutation of the numbers 0-9.
You can obviously reverse this procedure, and hence convert backwards and forwards easily between the lexiographic number and the permutation that it represents.
I am not entirely clear what it is that you want to do here, but understanding the above procedure should help. For example, its clear that the lexiographic number represents an ordering which could be used as the key in a hashtable. And you can order numbers by comparing digits left to right so once you have inserted a number you never have to work outs it factorial.

Your question is a bit moot, because your input size for an arbitrary permutation index has size log(N!) (assuming you want to represent all possible permutations) which is Theta(N log N), so if N is really large then just reading the input of the permutation index would take too long, certainly much longer than O(1). It may be possible to store the permutation index in such a way that if you already had it stored, then you could access elements in O(1) time. But probably any such method would be equivalent to just storing the permutation in contiguous memory (which also has Theta(N log N) size), and if you store the permutation directly in memory then the question becomes trivial assuming you can do O(1) memory access. (However you still need to account for the size of the bit encoding of the element, which is O(log N)).
In the spirit of your encryption analogy, perhaps you should specify a small SUBSET of permutations according to some property, and ask if O(1) or O(log N) element access is possible for that small subset.

Edit:
I misunderstood the question, but it was not in waste. My algorithms let me understand: the factoradic representation of a permutation's lexicographic number is almost the same as the permutation itself. In fact the first digit of the factoradic representation is the same as the first element of the corresponding permutation (assuming your space consists of numbers from 0 to N-1). Knowing this there is not really a point in storing the index rather than the permutation itself . To see how to convert the lexicographic number into a permutation, read below.
See also this wikipedia link about Lehmer code.
Original post:
In the S space there are N elements that can fill the first slot, meaning that there are (N-1)! elements that start with 0. So i/(N-1)! is the first element (lets call it 'a'). The subset of S that starts with 0 consists of (N-1)! elements. These are the possible permutations of the set N{a}. Now you can get the second element: its the i(%((N-1)!)/(N-2)!). Repeat the process and you got the permutation.
Reverse is just as simple. Start with i=0. Get the 2nd last element of the permutation. Make a set of the last two elements, and find the element's position in it (its either the 0th element or the 1st), lets call this position j. Then i+=j*2!. Repeat the process (you can start with the last element too, but it will always be the 0th element of the possibilities).
Java-ish pesudo code:
find_by_index(List N, int i){
String str = "";
for(int l = N.length-1; i >= 0; i--){
int pos = i/fact(l);
str += N.get(pos);
N.remove(pos);
i %= fact(l);
}
return str;
}
find_index(String str){
OrderedList N;
int i = 0;
for(int l = str.length-1; l >= 0; l--){
String item = str.charAt(l);
int pos = N.add(item);
i += pos*fact(str.length-l)
}
return i;
}
find_by_index should run in O(n) assuming that N is pre ordered, while find_index is O(n*log(n)) (where n is the size of the N space)

After some research in Wikipedia, I desgined this algorithm:
def getPick(fact_num_list):
"""fact_num_list should be a list with the factorial number representation,
getPick will return a tuple"""
result = [] #Desired pick
#This will hold all the numbers pickable; not actually a set, but a list
#instead
inputset = range(len(fact_num_list))
for fnl in fact_num_list:
result.append(inputset[fnl])
del inputset[fnl] #Make sure we can't pick the number again
return tuple(result)
Obviously, this won't reach O(1) due the factor we need to "pick" every number. Due we do a for loop and thus, assuming all operations are O(1), getPick will run in O(n).
If we need to convert from base 10 to factorial base, this is an aux function:
import math
def base10_baseFactorial(number):
"""Converts a base10 number into a factorial base number. Output is a list
for better handle of units over 36! (after using all 0-9 and A-Z)"""
loop = 1
#Make sure n! <= number
while math.factorial(loop) <= number:
loop += 1
result = []
if not math.factorial(loop) == number:
loop -= 1 #Prevent dividing over a smaller number than denominator
while loop > 0:
denominator = math.factorial(loop)
number, rem = divmod(number, denominator)
result.append(rem)
loop -= 1
result.append(0) #Don't forget to divide to 0! as well!
return result
Again, this will run in O(n) due to the whiles.
Summing all, the best time we can find is O(n).
PS: I'm not a native English speaker, so spelling and phrasing errors may appear. Apologies in advance, and let me know if you can't get around something.

All correct algorithms for accessing the kth item of a permutation stored in factoradic form must read the first k digits. This is because, regardless of the values of the other digits among the first k, it makes a difference whether an unread digit is a 0 or takes on its maximum value. That this is the case can be seen by tracing the canonical correct decoding program in two parallel executions.
For example, if we want to decode the third digit of the permutation 1?0, then for 100, that digit is 0, and for 110, that digit is 2.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio