Exhaust list of elements randomly without sorting them randomly first - sorting

If I have a list of 10K elements, and I want to randomly iterate through all of them, is there an algorithm that lets me access each element randomly, without just sorting them randomly first?
In other words, this would not be ideal:
const sorted = list
.map(v => [math.random(), v])
.sort((a,b) => a[0]- b[0]);
It would be nice to avoid the sort call and the mapping call.
My only idea would be to store everything in a hashmap and access the hash keys randomly somehow? Although that's just coming back to the same problem, afaict.

Just been having a play with this and realised that the Fisher-Yates shuffle works well "on-line". For example, if you've got a large list you don't need to spend the time to shuffle the whole thing before you start iterating over items, or, equivalently, you might only need a few items out of a large list.
I didn't see a language tag in the question, so I'll pick Python.
from random import randint
def iterrand(a):
"""Iterate over items of a list in a random order.
Additional items can be .append()ed arbitrarily at runtime."""
for i, ai in enumerate(a):
j = randint(i, len(a)-1)
a[i], a[j] = a[j], ai
yield a[i]
This is O(n) in the length of the list and by allowing .append()s (O(1) in Python) the list can be built in the background.
An example use would be:
l = [0, 1, 2]
for i, v in enumerate(iterrand(l)):
print(f"{i:3}: {v:<5} {l}")
if v < 4:
l.append(randint(1, 9))
which might produce output like:
0: 2 [2, 1, 0]
1: 3 [2, 3, 0, 1]
2: 1 [2, 3, 1, 1, 0]
3: 0 [2, 3, 1, 0, 1, 3]
4: 1 [2, 3, 1, 0, 1, 3, 7]
5: 7 [2, 3, 1, 0, 1, 7, 7, 3]
6: 7 [2, 3, 1, 0, 1, 7, 7, 3]
7: 3 [2, 3, 1, 0, 1, 7, 7, 3]
8: 2 [2, 3, 1, 0, 1, 7, 7, 3, 2]
9: 3 [2, 3, 1, 0, 1, 7, 7, 3, 2, 3]
10: 2 [2, 3, 1, 0, 1, 7, 7, 3, 2, 3, 2]
11: 7 [2, 3, 1, 0, 1, 7, 7, 3, 2, 3, 2, 7]
Update: To test correctness, I'd do something like:
# trivial tests
assert list(iterrand([])) == []
assert list(iterrand([1])) == [1]
# bigger uniformity test
from collections import Counter
# tally 1M draws
c = Counter()
for _ in range(10**6):
c[tuple(iterrand([1, 2, 3, 4, 5]))] += 1
# ensure it's uniform
assert all(7945 < v < 8728 for v in c.values())
# above constants calculated in R via:
# k<-120;p<-0.001/k;qbinom(c(p,1-p), 1e6, 1/k))

Fisher-Yates should do the trick as good as any, this article is really good:
https://medium.com/#oldwestaction/randomness-is-hard-e085decbcbb2
the relevant JS code is very short and sweet:
const fisherYatesShuffle = (deck) => {
for (let i = deck.length - 1; i >= 0; i--) {
const swapIndex = Math.floor(Math.random() * (i + 1));
[deck[i], deck[swapIndex]] = [deck[swapIndex], deck[i]];
}
return deck
}
to yield results as you go, so you don't have to iterate through the list twice, use generator function like so:
const fisherYatesShuffle = function* (deck) {
for (let i = deck.length - 1; i >= 0; i--) {
const swapIndex = Math.floor(Math.random() * (i + 1)); // * use ;
[deck[i], deck[swapIndex]] = [deck[swapIndex], deck[i]];
yield deck[i];
}
};
(note don't forget some of those semi-colons, when the next line is bracket notation).

Related

How to derive max amount of items from DP table of Knapsack problem?

I have a little bit changed algorithm for 1-0 Knapsack problem.
It calculates max count (which we can put to the knapsack) as well.
I'm using it to find max subset sum which <= target sum. For example:
weights: 1, 3, 4, 5, target sum: 10
result: 1, 4, 5 (because 1 + 4 + 5 = 10)
weights: 2, 3, 4, 9 target sum: 10
result: 2, 3, 4 (2 + 3 + 4 = 9, max possible sum <= 10)
I use 2 DP tables: one for calculating max possible sum (dp) and one for max possible amount (count).
The question is: how I can derive chosen values from the both tables?
Example:
weights: [3, 2, 5, 2, 1, 1, 3], target_sum: 10
indexes: 0, 1, 2, 3, 4, 5, 6
dp:
0: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
1: [0, 0, 0, 3, 3, 3, 3, 3, 3, 3, 3]
2: [0, 0, 2, 3, 3, 5, 5, 5, 5, 5, 5]
3: [0, 0, 2, 3, 3, 5, 5, 7, 8, 8, 10]
4: [0, 0, 2, 3, 4, 5, 5, 7, 8, 9, 10]
5: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
6: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
7: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
count:
0: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
1: [0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1]
2: [0, 0, 1, 1, 1, 2, 2, 2, 2, 2, 2]
3: [0, 0, 1, 1, 1, 1, 1, 2, 2, 2, 3]
4: [0, 0, 1, 1, 2, 2, 2, 2, 2, 3, 3]
5: [0, 1, 1, 2, 2, 3, 3, 2, 3, 3, 4]
6: [0, 1, 2, 2, 3, 3, 4, 4, 3, 4, 4]
7: [0, 1, 2, 1, 2, 3, 3, 4, 4, 5, 5]
Here, items with weight [3, 2, 1, 3, 1] should be derived (because they have max possible count) instead of (for example) [5, 2, 3].
Some notation explanations:
dp means the same as in original Knapsack problem: i - for the items, j for the weight.
The value in the dp[i][j] mean the sum of chosen items (weights) which have sum <= j.
Each cell in the count corresponds to dp and shows max possible amount of items (with total weight = dp[i][j])
How chosen items could be derived efficiently?
I know how to derive just any items from the dp (e.g. not the max amount of them) by reconstructing it from the bottom-right cell.
Also, I've found a hack which allows to derive the items if input is sorted.
But I'm looking for the way which allows to do that without soring.
Is it's possible?
The code which constructs these two tables doesn't matter much, but here it is:
def max_subset_sum(ws, target_sum):
n = len(ws)
k = target_sum
dp = [[0] * (k + 1) for _ in range(n + 1)]
count = [[0] * (k + 1) for _ in range(n + 1)]
for i in range(1, n + 1):
for j in range(1, k + 1):
curr_w = ws[i - 1]
if curr_w > j:
dp[i][j] = dp[i - 1][j]
count[i][j] = count[i - 1][j]
else:
tmp = round(dp[i - 1][j - curr_w] + curr_w, 2)
if tmp >= dp[i - 1][j]:
dp[i][j] = tmp
count[i][j] = count[i - 1][j - curr_w] + 1
else:
dp[i][j] = dp[i - 1][j]
count[i][j] = count[i - 1][j]
return get_items(dp, k, n, ws)
def get_items(dp, k, n, ws):
# The trick which allows to get max amount of items if input is sorted
start = n
while start and dp[start][k] == dp[start - 1][k]:
start -= 1
res = []
w = dp[start][k]
i, j = start, k
while i and w:
if w != dp[i - 1][j]:
res.append(i - 1)
w = round(w - ws[i - 1], 2)
j -= ws[i - 1]
i -= 1
return res
Also, I have weird attempt to get max amount of items.
But it's produces incorrect result which sums to 9: [3, 1, 1, 2, 2]
def get_items_incorrect(dp, count, k, n, ws):
start = n
res = []
w = dp[start][k]
i, j = start, k
while i and w:
# while dp[i][j] < ws[i - 1]:
# i -= 1
while ws[i - 1] > j:
i -= 1
if i < 0:
break
max_count = count[i][j]
max_count_i = i
while i and w == dp[i - 1][j]:
if count[i - 1][j] > max_count:
max_count = count[i - 1][j]
max_count_i = i - 1
i -= 1
res.append(max_count_i - 1)
w = round(w - ws[max_count_i - 1], 2)
j -= ws[max_count_i - 1]
i = max_count_i - 1
return res
Sorry for the long read and thank you for any help!
Seems you have overcomplicated the problem. This kind of problem (without item cost) might be solved using 1D list. Weights for the best cases are stored in parallel list.
After table filling we look for the largest occupied index (largest possible sum <= target) and unwind used items chain.
def sol(wts, target):
dp = [-1] * (target + 1)
dp[0] = 0
items = [0] * (target + 1)
wts.sort()
for w in weights:
for i in range(target, w-1, -1):
if (dp[i-w] >= 0) and (dp[i-w] > dp[i]):
dp[i] = dp[i-w] + 1
items[i] = w
last = target
while (last) and (dp[last] < 0):
last -= 1
best = dp[last]
res = []
while (last):
res.append(items[last])
last -= items[last]
return(best, res)
weights = [1, 2, 3, 3, 5, 2, 1]
target_sum = 9
print(sol(weights, target_sum))

efficient way of generating semi-random sequences

Quite often, I have to generate sequences of numbers in some semi-random way, which means that it is not totally random, but has to have some other property. For example we need a random sequence of 1,2,3 and 4s, but no number must be repeated three times in a row. These are usually not very complicated to do, but I ran into a tricky one: I need to generate a semi-random sequence that is a bit over 400 long, is composed of 1,2,3 and 4s, each number must appear the same amount of times (or if the sum is not divisible by four than as close as you can get it) and they must not repeat 3 times in a row (so 1,3,4,4,4,2 is not ok ).
I tried to methods:
Create a list which has the desired length and number of numbers; shuffle; check if ok for consecutive numbers if not, shuffle again.
Create a list which has the desired length and number of numbers; generate all permutations and select which are ok; save these for later and randomly select one of them when needed.
Method number one runs for minutes before yielding any sequence that is ok and method number two generates so many permutations my jupter notebook gave up.
Here's the python code for the first one
from random import shuffle
v = []
for x in range(108):
v += [1,2,3,4]
shouldicontinue = 1
while shouldicontinue:
shuffle(v)
shouldicontinue = 0
for h in range(len(v)-1):
if v[h] == v[h+1] and v[h] == v[h+2]:
shouldicontinue = 1
break
else:
pass
and the second one
from random import shuffle
import itertools
v = []
for x in range(108):
v += [1,2,3,4]
good = []
for l in itertools.permutations(v):
notok = 0
for h in range(len(v)-1):
if v[h] == v[h+1] and v[h] == v[h+2]:
notok = 1
break
else:
pass
if not notok:
good.append(v)
I'm looking for a way to solve this problem in an efficient way, i.e.: if it runs in real time, it doesn't need more than say a minute to generate on slower computers or if it is prepared in advance in someway (like the idea of method 2), it can be prepared on some moderate level computer in a few hours.
Before you can check all the permutations of a >400 length list, the universe will likely have died. Thus you need another approach.
Here, I recommend trying to insert the elements in the list at random, but shifting to the next index when the insertion would break one of the requirements.
Cycling through your elements, 1 to 4 in your case, should ensure an insertion is always possible.
from itertools import cycle, islice
from random import randint
def has_repeated(target, n, lst):
"""A helper to check if insertion would break the max repetition requirement"""
count = 0
for el in lst:
count += el == target
if count == n:
return True
return False
def sequence(length, max_repeat, elements=(1, 2, 3, 4)):
# Iterator that will yield our elements in cycle
values = islice(cycle(elements), length)
seq = []
for value in values:
# Pick an insertion index at random
init_index = randint(0, len(seq))
# Loop over indices from that index until a legal position is found
for shift in range(len(seq) + 1):
index = init_index - shift
slice_around_index = seq[max(0, index - max_repeat):index + max_repeat]
# If the insertion would cause no forbidden subsequence, insert
if not has_repeated(value, max_repeat, slice_around_index):
seq.insert(index, value)
break
# This will likely never happen, except if a solution truly does not exist
else:
raise ValueError('failed to generate the sequence')
return seq
Sample
Here is some sample output to check the result is correct.
for _ in range(10):
print(sequence(25, 2))
Output
[4, 1, 4, 1, 3, 2, 1, 2, 4, 1, 4, 2, 1, 2, 2, 4, 3, 3, 1, 4, 3, 1, 2, 3, 3]
[3, 1, 3, 2, 2, 4, 1, 2, 2, 4, 3, 4, 1, 3, 4, 3, 2, 4, 4, 1, 1, 2, 1, 1, 3]
[1, 3, 2, 4, 1, 3, 4, 4, 3, 2, 4, 1, 1, 3, 1, 2, 4, 2, 3, 1, 1, 2, 4, 3, 2]
[1, 3, 2, 4, 1, 2, 2, 1, 2, 3, 4, 3, 2, 4, 2, 4, 1, 1, 3, 1, 3, 4, 1, 4, 3]
[4, 1, 4, 4, 1, 1, 3, 1, 2, 2, 3, 2, 4, 2, 2, 3, 1, 3, 4, 3, 2, 1, 3, 1, 4]
[2, 3, 3, 1, 3, 3, 1, 2, 1, 2, 1, 2, 3, 4, 4, 1, 3, 4, 4, 2, 1, 1, 4, 4, 2]
[3, 2, 1, 4, 3, 2, 3, 1, 4, 1, 1, 2, 3, 3, 2, 2, 4, 1, 1, 2, 4, 1, 4, 3, 4]
[4, 4, 3, 1, 4, 1, 2, 2, 4, 4, 3, 2, 2, 3, 3, 1, 1, 2, 1, 1, 4, 1, 2, 3, 3]
[1, 4, 1, 4, 4, 2, 4, 1, 1, 2, 1, 2, 2, 3, 3, 2, 2, 3, 1, 4, 4, 3, 3, 1, 3]
[4, 3, 2, 1, 4, 1, 1, 2, 2, 3, 3, 1, 4, 4, 1, 3, 2, 3, 4, 2, 1, 1, 4, 2, 3]
Efficiency-wise, it takes around 10ms to generate a list of length 10,000 with he same requirements. Hinting that this might be an efficient enough solution for most purpose.
I think it should be possible (with about 4 gigabytes of memory and 1 minute of precomputation) to generate uniformly distributed random sequences faster than 1 second per random sequence.
The idea is to prepare a cache of results for the question "How many sequences with exactly a 1s, b 2s, c 3s, d 4s are there which end with count copies of a particular digit?".
Once you have this cache, then you can compute how many sequences (N) there are that satisfy your constraint, and can generate one at random by picking a random number n between 1 and N and using the cache to generate the n^th sequence.
To save memory in the cache you can use a couple of tricks:
The answer is symmetric in a/b/c/d so you only need to store results with a>=b>=c>=d
The count of the last digit will always be 1 or 2 in legal sequences
These tricks should mean the cache only needs to hold about 40 million results.
import random
rc = random.choices([1,2,3,4])
for _ in range(22):
if rc[-1] == 1:
rc = rc + random.choices([2,3,4])
rc = rc + random.choices([1,2,3,4])
if rc[-1] == 2:
rc = rc + random.choices([1,3,4])
rc = rc + random.choices([1,2,3,4])
if rc[-1] == 3:
rc = rc + random.choices([2,1,4])
rc = rc + random.choices([1,2,3,4])
if rc[-1] == 4:
rc = rc + random.choices([2,3,1])
rc = rc + random.choices([1,2,3,4])
print(rc)

How to get the Nth arrangement in a Combinatoric sequence and vice-versa?

how do I get the Nth arrangement out of all possible combinations of arranging 4 indistinguishable balls in 3 distinct buckets. if Bl = number of balls and Bk = number of buckets e.g. for Bl = 4, Bk = 3 the possible arrangements are :
004,013,022,031,040,103,112,121,130,202,211,220,301,310,400 .
the first arrangement(N=0) is 004(i.e. bucket 1 = 0 balls, bucket 2 = 0 balls, bucket 3 = 4 balls) and the last(N=14) is 400. so say I have 103 N would be equal to 5. I want to be able to do
int Bl=4,Bk=3;
getN(004,Bl,Bk);// which should be = 0
getNthTerm(8,Bl,Bk);// which should be = 130
P.S: max number of terms for the sequence is (Bl+Bk-1)C(Bk-1) where C is the combinatorics/combination operator. Obtained from stars and bars
As far as I know, there is no faster way of doing this than combinatorial decomposition which takes roughly O(Bl) time.
We simply compute the number of balls which go into the each bucket for the selected index, working one bucket at a time. For each possible assignment to the bucket we compute the number of possible arrangements of the remaining balls and buckets. If the index is less than that number, we select that arrangement; otherwise we put one more ball in the bucket and subtract the number of arrangements we just skipped from the index.
Here's a C implementation. I didn't include the binom function in the implementation below. It's usually best to precompute the binomial coefficients over the range of values you are interested in, since there won't normally be too many. It is easy to do the computation incrementally but it requires a multiplication and a division at each step; while that doesn't affect the asymptotic complexity, it makes the inner loop much slower (because of the divide) and increases the risk of overflow (because of the multiply).
/* Computes arrangement corresponding to index.
* Returns 0 if index is out of range.
*/
int get_nth(long index, int buckets, int balls, int result[buckets]) {
int i = 0;
memset(result, 0, buckets * sizeof *result);
--buckets;
while (balls && buckets) {
long count = binom(buckets + balls - 1, buckets - 1);
if (index < count) { --buckets; ++i; }
else { ++result[i]; --balls; index -= count; }
}
if (balls) result[i] = balls;
return index == 0;
}
There are some interesting bijections that can be made. Finally, we can use ranking and unranking methods for the regular k-combinations, which are more common knowledge.
A bijection from the number of balls in each bucket to the ordered multiset of choices of buckets; for example: [3, 1, 0] --> [1, 1, 1, 2] (three choices of 1 and one choice of 2).
A bijection from the k-subsets of {1...n} (with repetition) to k-subsets of {1...n + k − 1} (without repetition) by mapping {c_0, c_1...c_(k−1)} to {c_0, c_(1+1), c_(2+2)...c_(k−1+k−1)} (see here).
Here's some python code:
from itertools import combinations_with_replacement
def toTokens(C):
return map(lambda x: int(x), list(C))
def compositionToChoice(tokens):
result = []
for i, t in enumerate(tokens):
result = result + [i + 1] * t
return result
def bijection(C):
result = []
k = 0
for i, _c in enumerate(C):
result.append(C[i] + k)
k = k + 1
return result
compositions = ['004','013','022','031','040','103','112',
'121','130','202','211','220','301','310','400']
for c in compositions:
tokens = toTokens(c)
choices = compositionToChoice(tokens)
combination = bijection(choices)
print "%s --> %s --> %s" % (tokens, choices, combination)
Output:
"""
[0, 0, 4] --> [3, 3, 3, 3] --> [3, 4, 5, 6]
[0, 1, 3] --> [2, 3, 3, 3] --> [2, 4, 5, 6]
[0, 2, 2] --> [2, 2, 3, 3] --> [2, 3, 5, 6]
[0, 3, 1] --> [2, 2, 2, 3] --> [2, 3, 4, 6]
[0, 4, 0] --> [2, 2, 2, 2] --> [2, 3, 4, 5]
[1, 0, 3] --> [1, 3, 3, 3] --> [1, 4, 5, 6]
[1, 1, 2] --> [1, 2, 3, 3] --> [1, 3, 5, 6]
[1, 2, 1] --> [1, 2, 2, 3] --> [1, 3, 4, 6]
[1, 3, 0] --> [1, 2, 2, 2] --> [1, 3, 4, 5]
[2, 0, 2] --> [1, 1, 3, 3] --> [1, 2, 5, 6]
[2, 1, 1] --> [1, 1, 2, 3] --> [1, 2, 4, 6]
[2, 2, 0] --> [1, 1, 2, 2] --> [1, 2, 4, 5]
[3, 0, 1] --> [1, 1, 1, 3] --> [1, 2, 3, 6]
[3, 1, 0] --> [1, 1, 1, 2] --> [1, 2, 3, 5]
[4, 0, 0] --> [1, 1, 1, 1] --> [1, 2, 3, 4]
"""

Find all combinations in a 3x3 matrix following some rules

Given a 3x3 matrix:
|1 2 3|
|4 5 6|
|7 8 9|
I'd like to calculate all the combinations by connecting the numbers in this matrix following these rules:
the combinations width are between 3 and 9
use one number only once
you can only connect adjacent numbers
Some examples: 123, 258, 2589, 123654, etc.
For example 1238 is not a good combination because 3 and 8 are not adjacent. The 123 and the 321 combination is not the same.
I hope my description is clear.
If anyone has any ideas please let me know. Actually I don't know how to start :D. Thanks
This is a search problem. You can just use straightforward depth-first-search with recursive programming to quickly solve the problem. Something like the following:
func search(matrix[N][M], x, y, digitsUsed[10], combination[L]) {
if length(combination) between 3 and 9 {
add this combination into your solution
}
// four adjacent directions to be attempted
dx = {1,0,0,-1}
dy = {0,1,-1,0}
for i = 0; i < 4; i++ {
next_x = x + dx[i]
next_y = y + dy[i]
if in_matrix(next_x, next_y) and not digitsUsed[matrix[next_x][next_y]] {
digitsUsed[matrix[next_x][next_y]] = true
combination += matrix[next_x][next_y]
search(matrix, next_x, next_y, digitsUsed, combination)
// At this time, sub-search starts with (next_x, next_y) has been completed.
digitsUsed[matrix[next_x][next_y]] = false
}
}
}
So you could run search function for every single grid in the matrix, and every combinations in your solution are different from each other because they start from different grids.
In addition, we don't need to record the status which indicates one grid in the matrix has or has not been traversed because every digit can be used only once, so grids which have been traversed will never be traversed again since their digits have been already contained in the combination.
Here is a possible implementation in Python 3 as a a recursive depth-first exploration:
def find_combinations(data, min_length, max_length):
# Matrix of booleans indicating what values have been used
visited = [[False for _ in row] for row in data]
# Current combination
comb = []
# Start recursive algorithm at every possible position
for i in range(len(data)):
for j in range(len(data[i])):
# Add initial combination element and mark as visited
comb.append(data[i][j])
visited[i][j] = True
# Start recursive algorithm
yield from find_combinations_rec(data, min_length, max_length, visited, comb, i, j)
# After all combinations with current element have been produced remove it
visited[i][j] = False
comb.pop()
def find_combinations_rec(data, min_length, max_length, visited, comb, i, j):
# Yield the current combination if it has the right size
if min_length <= len(comb) <= max_length:
yield comb.copy()
# Stop the recursion after reaching maximum length
if len(comb) >= max_length:
return
# For each neighbor of the last added element
for i2, j2 in ((i - 1, j), (i, j - 1), (i, j + 1), (i + 1, j)):
# Check the neighbor is valid and not visited
if i2 < 0 or i2 >= len(data) or j2 < 0 or j2 >= len(data[i2]) or visited[i2][j2]:
continue
# Add neighbor and mark as visited
comb.append(data[i2][j2])
visited[i2][j2] = True
# Produce combinations for current starting sequence
yield from find_combinations_rec(data, min_length, max_length, visited, comb, i2, j2)
# Remove last added combination element
visited[i2][j2] = False
comb.pop()
# Try it
data = [[1, 2, 3],
[4, 5, 6],
[7, 8, 9]]
min_length = 3
max_length = 9
for comb in find_combinations(data, min_length, max_length):
print(c)
Output:
[1, 2, 3]
[1, 2, 3, 6]
[1, 2, 3, 6, 5]
[1, 2, 3, 6, 5, 4]
[1, 2, 3, 6, 5, 4, 7]
[1, 2, 3, 6, 5, 4, 7, 8]
[1, 2, 3, 6, 5, 4, 7, 8, 9]
[1, 2, 3, 6, 5, 8]
[1, 2, 3, 6, 5, 8, 7]
[1, 2, 3, 6, 5, 8, 7, 4]
[1, 2, 3, 6, 5, 8, 9]
[1, 2, 3, 6, 9]
[1, 2, 3, 6, 9, 8]
[1, 2, 3, 6, 9, 8, 5]
[1, 2, 3, 6, 9, 8, 5, 4]
[1, 2, 3, 6, 9, 8, 5, 4, 7]
...
Look at all the combinations and take the connected ones:
import itertools
def coords(n):
"""Coordinates of number n in the matrix."""
return (n - 1) // 3, (n - 1) % 3
def adjacent(a, b):
"""Check if a and b are adjacent in the matrix."""
ai, aj = coords(a)
bi, bj = coords(b)
return abs(ai - bi) + abs(aj - bj) == 1
def connected(comb):
"""Check if combination is connected."""
return all(adjacent(a, b) for a, b in zip(comb, comb[1:]))
for width in range(3, 10):
for comb in itertools.permutations(range(1, 10), width):
if connected(comb):
print(comb)

How to sort an array using minimum number of writes?

My friend was asked a question in his interview:
The interviewer gave him an array of unsorted numbers and asked him to sort. The restriction is that the number of writes should be minimized while there is no limitation on the number of reads.
Selection sort is not the right algorithm here. Selection sort will swap values, making up to two writes per selection, giving a maximum of 2n writes per sort.
An algorithm that's twice as good as selection sort is "cycle" sort, which does not swap. Cycle sort will give a maximum of n writes per sort. The number of writes is absolutely minimized. It will only write a number once to its final destination, and only then if it's not already there.
It is based on the idea that all permutations are products of cycles and you can simply cycle through each cycle and write each element to its proper place once.
import java.util.Random;
import java.util.Collections;
import java.util.Arrays;
public class CycleSort {
public static final <T extends Comparable<T>> int cycleSort(final T[] array) {
int writes = 0;
// Loop through the array to find cycles to rotate.
for (int cycleStart = 0; cycleStart < array.length - 1; cycleStart++) {
T item = array[cycleStart];
// Find where to put the item.
int pos = cycleStart;
for (int i = cycleStart + 1; i < array.length; i++)
if (array[i].compareTo(item) < 0) pos++;
// If the item is already there, this is not a cycle.
if (pos == cycleStart) continue;
// Otherwise, put the item there or right after any duplicates.
while (item.equals(array[pos])) pos++;
{
final T temp = array[pos];
array[pos] = item;
item = temp;
}
writes++;
// Rotate the rest of the cycle.
while (pos != cycleStart) {
// Find where to put the item.
pos = cycleStart;
for (int i = cycleStart + 1; i < array.length; i++)
if (array[i].compareTo(item) < 0) pos++;
// Put the item there or right after any duplicates.
while (item.equals(array[pos])) pos++;
{
final T temp = array[pos];
array[pos] = item;
item = temp;
}
writes++;
}
}
return writes;
}
public static final void main(String[] args) {
final Random rand = new Random();
final Integer[] array = new Integer[8];
for (int i = 0; i < array.length; i++) { array[i] = rand.nextInt(8); }
for (int iteration = 0; iteration < 10; iteration++) {
System.out.printf("array: %s ", Arrays.toString(array));
final int writes = cycleSort(array);
System.out.printf("sorted: %s writes: %d\n", Arrays.toString(array), writes);
Collections.shuffle(Arrays.asList(array));
}
}
}
A few example runs :
array: [3, 2, 6, 1, 3, 1, 4, 4] sorted: [1, 1, 2, 3, 3, 4, 4, 6] writes: 6
array: [1, 3, 4, 1, 3, 2, 4, 6] sorted: [1, 1, 2, 3, 3, 4, 4, 6] writes: 4
array: [3, 3, 1, 1, 4, 4, 2, 6] sorted: [1, 1, 2, 3, 3, 4, 4, 6] writes: 6
array: [1, 1, 3, 2, 4, 3, 6, 4] sorted: [1, 1, 2, 3, 3, 4, 4, 6] writes: 6
array: [3, 2, 3, 4, 6, 4, 1, 1] sorted: [1, 1, 2, 3, 3, 4, 4, 6] writes: 7
array: [6, 2, 4, 3, 1, 3, 4, 1] sorted: [1, 1, 2, 3, 3, 4, 4, 6] writes: 6
array: [6, 3, 2, 4, 3, 1, 4, 1] sorted: [1, 1, 2, 3, 3, 4, 4, 6] writes: 5
array: [4, 2, 6, 1, 1, 4, 3, 3] sorted: [1, 1, 2, 3, 3, 4, 4, 6] writes: 7
array: [4, 3, 3, 1, 2, 4, 6, 1] sorted: [1, 1, 2, 3, 3, 4, 4, 6] writes: 7
array: [1, 6, 4, 2, 4, 1, 3, 3] sorted: [1, 1, 2, 3, 3, 4, 4, 6] writes: 7
array: [5, 1, 2, 3, 4, 3, 7, 0] sorted: [0, 1, 2, 3, 3, 4, 5, 7] writes: 5
array: [5, 1, 7, 3, 2, 3, 4, 0] sorted: [0, 1, 2, 3, 3, 4, 5, 7] writes: 6
array: [4, 0, 3, 1, 5, 2, 7, 3] sorted: [0, 1, 2, 3, 3, 4, 5, 7] writes: 8
array: [4, 0, 7, 3, 5, 1, 3, 2] sorted: [0, 1, 2, 3, 3, 4, 5, 7] writes: 7
array: [3, 4, 2, 7, 5, 3, 1, 0] sorted: [0, 1, 2, 3, 3, 4, 5, 7] writes: 7
array: [0, 5, 3, 2, 3, 7, 1, 4] sorted: [0, 1, 2, 3, 3, 4, 5, 7] writes: 6
array: [1, 4, 3, 7, 2, 3, 5, 0] sorted: [0, 1, 2, 3, 3, 4, 5, 7] writes: 7
array: [1, 5, 0, 7, 3, 3, 4, 2] sorted: [0, 1, 2, 3, 3, 4, 5, 7] writes: 7
array: [0, 5, 7, 3, 3, 4, 2, 1] sorted: [0, 1, 2, 3, 3, 4, 5, 7] writes: 4
array: [7, 3, 1, 0, 3, 5, 4, 2] sorted: [0, 1, 2, 3, 3, 4, 5, 7] writes: 7
If the array is shorter (ie less than about 100 elements) a Selection sort is often the best choice if you also want to reduce the number of writes.
From wikipedia:
Another key difference is that
selection sort always performs Θ(n)
swaps, while insertion sort performs
Θ(n2) swaps in the average and worst
cases. Because swaps require writing
to the array, selection sort is
preferable if writing to memory is
significantly more expensive than
reading. This is generally the case if
the items are huge but the keys are
small. Another example where writing
times are crucial is an array stored
in EEPROM or Flash. There is no other
algorithm with less data movement.
For larger arrays/lists Quicksort and friends will provide better performance, but may still likely need more writes than a selection sort.
If you're interested this is a fantastic sort visualization site that allows you to watch specific sort algorithms do their job and also "race" different sort algorithms against each other.
You can use a very naive algorithm that satisfies what you need.
The algorithm should look like this:
i = 0
do
search for the minimum in range [i..n)
swap a[i] with a[minPos]
i = i + 1
repeat until i = n.
The search for the minimum can cost you almost nothing, the swap costs you 3 writes, the i++ costs you 1..
This is named selection sort as stated by ash. (Sorry, I didn't knew it was selection sort :( )
One option for large arrays is as follows (assuming n elements):
Initialize an array with n elements numbered 0..n-1
Sort the array using any sorting algorithm. As the comparison function, compare the elements in the input set with the corresponding numbers (eg, to compare 2 and 4, compare the 2nd and 4th elements in the input set). This turns the array from step 1 into a permutation that represents the sorted order of the input set.
Iterate through the elements in the permutation, writing out the blocks in the order specified by the array. This requires exactly n writes, the minimum.
To sort in-place, in step 3 you should instead identify the cycles in the permutation, and 'rotate' them as necessary to result in sorted order.
The ordering I meant in O(n) is like the selection sort(the previous post) useful when you have a small range of keys (or you are ordering numbers between 2 ranges)
If you have a number array where numbers will be between -10 and 100, then you can create an array of 110 and be sure that all numbers will fit in there, if you consider repeated numbers the idea is the same, but you will have lists instead of numbers in the sorted array
the pseudo-idea is like this
N: max value of your array
tosort //array to be sorted
sorted = int[N]
for i = 0 to length(tosort)
do
sorted[tosort[i]]++;
end
finalarray = int[length(tosort)]
k = 0
for i = 0 to N
do
if ( sorted[i] > 0 )
finalarray[k] = i
k++;
endif
end
finalarray will have the final sorted array and you will have o(N) write operations, where N is the range of the array. Once again, this is useful when using keys inside a specific range, but perhaps its your case.
Best regards and good luck!

Resources