efficient way of generating semi-random sequences - algorithm

Quite often, I have to generate sequences of numbers in some semi-random way, which means that it is not totally random, but has to have some other property. For example we need a random sequence of 1,2,3 and 4s, but no number must be repeated three times in a row. These are usually not very complicated to do, but I ran into a tricky one: I need to generate a semi-random sequence that is a bit over 400 long, is composed of 1,2,3 and 4s, each number must appear the same amount of times (or if the sum is not divisible by four than as close as you can get it) and they must not repeat 3 times in a row (so 1,3,4,4,4,2 is not ok ).
I tried to methods:
Create a list which has the desired length and number of numbers; shuffle; check if ok for consecutive numbers if not, shuffle again.
Create a list which has the desired length and number of numbers; generate all permutations and select which are ok; save these for later and randomly select one of them when needed.
Method number one runs for minutes before yielding any sequence that is ok and method number two generates so many permutations my jupter notebook gave up.
Here's the python code for the first one
from random import shuffle
v = []
for x in range(108):
v += [1,2,3,4]
shouldicontinue = 1
while shouldicontinue:
shuffle(v)
shouldicontinue = 0
for h in range(len(v)-1):
if v[h] == v[h+1] and v[h] == v[h+2]:
shouldicontinue = 1
break
else:
pass
and the second one
from random import shuffle
import itertools
v = []
for x in range(108):
v += [1,2,3,4]
good = []
for l in itertools.permutations(v):
notok = 0
for h in range(len(v)-1):
if v[h] == v[h+1] and v[h] == v[h+2]:
notok = 1
break
else:
pass
if not notok:
good.append(v)
I'm looking for a way to solve this problem in an efficient way, i.e.: if it runs in real time, it doesn't need more than say a minute to generate on slower computers or if it is prepared in advance in someway (like the idea of method 2), it can be prepared on some moderate level computer in a few hours.

Before you can check all the permutations of a >400 length list, the universe will likely have died. Thus you need another approach.
Here, I recommend trying to insert the elements in the list at random, but shifting to the next index when the insertion would break one of the requirements.
Cycling through your elements, 1 to 4 in your case, should ensure an insertion is always possible.
from itertools import cycle, islice
from random import randint
def has_repeated(target, n, lst):
"""A helper to check if insertion would break the max repetition requirement"""
count = 0
for el in lst:
count += el == target
if count == n:
return True
return False
def sequence(length, max_repeat, elements=(1, 2, 3, 4)):
# Iterator that will yield our elements in cycle
values = islice(cycle(elements), length)
seq = []
for value in values:
# Pick an insertion index at random
init_index = randint(0, len(seq))
# Loop over indices from that index until a legal position is found
for shift in range(len(seq) + 1):
index = init_index - shift
slice_around_index = seq[max(0, index - max_repeat):index + max_repeat]
# If the insertion would cause no forbidden subsequence, insert
if not has_repeated(value, max_repeat, slice_around_index):
seq.insert(index, value)
break
# This will likely never happen, except if a solution truly does not exist
else:
raise ValueError('failed to generate the sequence')
return seq
Sample
Here is some sample output to check the result is correct.
for _ in range(10):
print(sequence(25, 2))
Output
[4, 1, 4, 1, 3, 2, 1, 2, 4, 1, 4, 2, 1, 2, 2, 4, 3, 3, 1, 4, 3, 1, 2, 3, 3]
[3, 1, 3, 2, 2, 4, 1, 2, 2, 4, 3, 4, 1, 3, 4, 3, 2, 4, 4, 1, 1, 2, 1, 1, 3]
[1, 3, 2, 4, 1, 3, 4, 4, 3, 2, 4, 1, 1, 3, 1, 2, 4, 2, 3, 1, 1, 2, 4, 3, 2]
[1, 3, 2, 4, 1, 2, 2, 1, 2, 3, 4, 3, 2, 4, 2, 4, 1, 1, 3, 1, 3, 4, 1, 4, 3]
[4, 1, 4, 4, 1, 1, 3, 1, 2, 2, 3, 2, 4, 2, 2, 3, 1, 3, 4, 3, 2, 1, 3, 1, 4]
[2, 3, 3, 1, 3, 3, 1, 2, 1, 2, 1, 2, 3, 4, 4, 1, 3, 4, 4, 2, 1, 1, 4, 4, 2]
[3, 2, 1, 4, 3, 2, 3, 1, 4, 1, 1, 2, 3, 3, 2, 2, 4, 1, 1, 2, 4, 1, 4, 3, 4]
[4, 4, 3, 1, 4, 1, 2, 2, 4, 4, 3, 2, 2, 3, 3, 1, 1, 2, 1, 1, 4, 1, 2, 3, 3]
[1, 4, 1, 4, 4, 2, 4, 1, 1, 2, 1, 2, 2, 3, 3, 2, 2, 3, 1, 4, 4, 3, 3, 1, 3]
[4, 3, 2, 1, 4, 1, 1, 2, 2, 3, 3, 1, 4, 4, 1, 3, 2, 3, 4, 2, 1, 1, 4, 2, 3]
Efficiency-wise, it takes around 10ms to generate a list of length 10,000 with he same requirements. Hinting that this might be an efficient enough solution for most purpose.

I think it should be possible (with about 4 gigabytes of memory and 1 minute of precomputation) to generate uniformly distributed random sequences faster than 1 second per random sequence.
The idea is to prepare a cache of results for the question "How many sequences with exactly a 1s, b 2s, c 3s, d 4s are there which end with count copies of a particular digit?".
Once you have this cache, then you can compute how many sequences (N) there are that satisfy your constraint, and can generate one at random by picking a random number n between 1 and N and using the cache to generate the n^th sequence.
To save memory in the cache you can use a couple of tricks:
The answer is symmetric in a/b/c/d so you only need to store results with a>=b>=c>=d
The count of the last digit will always be 1 or 2 in legal sequences
These tricks should mean the cache only needs to hold about 40 million results.

import random
rc = random.choices([1,2,3,4])
for _ in range(22):
if rc[-1] == 1:
rc = rc + random.choices([2,3,4])
rc = rc + random.choices([1,2,3,4])
if rc[-1] == 2:
rc = rc + random.choices([1,3,4])
rc = rc + random.choices([1,2,3,4])
if rc[-1] == 3:
rc = rc + random.choices([2,1,4])
rc = rc + random.choices([1,2,3,4])
if rc[-1] == 4:
rc = rc + random.choices([2,3,1])
rc = rc + random.choices([1,2,3,4])
print(rc)

Related

Exhaust list of elements randomly without sorting them randomly first

If I have a list of 10K elements, and I want to randomly iterate through all of them, is there an algorithm that lets me access each element randomly, without just sorting them randomly first?
In other words, this would not be ideal:
const sorted = list
.map(v => [math.random(), v])
.sort((a,b) => a[0]- b[0]);
It would be nice to avoid the sort call and the mapping call.
My only idea would be to store everything in a hashmap and access the hash keys randomly somehow? Although that's just coming back to the same problem, afaict.
Just been having a play with this and realised that the Fisher-Yates shuffle works well "on-line". For example, if you've got a large list you don't need to spend the time to shuffle the whole thing before you start iterating over items, or, equivalently, you might only need a few items out of a large list.
I didn't see a language tag in the question, so I'll pick Python.
from random import randint
def iterrand(a):
"""Iterate over items of a list in a random order.
Additional items can be .append()ed arbitrarily at runtime."""
for i, ai in enumerate(a):
j = randint(i, len(a)-1)
a[i], a[j] = a[j], ai
yield a[i]
This is O(n) in the length of the list and by allowing .append()s (O(1) in Python) the list can be built in the background.
An example use would be:
l = [0, 1, 2]
for i, v in enumerate(iterrand(l)):
print(f"{i:3}: {v:<5} {l}")
if v < 4:
l.append(randint(1, 9))
which might produce output like:
0: 2 [2, 1, 0]
1: 3 [2, 3, 0, 1]
2: 1 [2, 3, 1, 1, 0]
3: 0 [2, 3, 1, 0, 1, 3]
4: 1 [2, 3, 1, 0, 1, 3, 7]
5: 7 [2, 3, 1, 0, 1, 7, 7, 3]
6: 7 [2, 3, 1, 0, 1, 7, 7, 3]
7: 3 [2, 3, 1, 0, 1, 7, 7, 3]
8: 2 [2, 3, 1, 0, 1, 7, 7, 3, 2]
9: 3 [2, 3, 1, 0, 1, 7, 7, 3, 2, 3]
10: 2 [2, 3, 1, 0, 1, 7, 7, 3, 2, 3, 2]
11: 7 [2, 3, 1, 0, 1, 7, 7, 3, 2, 3, 2, 7]
Update: To test correctness, I'd do something like:
# trivial tests
assert list(iterrand([])) == []
assert list(iterrand([1])) == [1]
# bigger uniformity test
from collections import Counter
# tally 1M draws
c = Counter()
for _ in range(10**6):
c[tuple(iterrand([1, 2, 3, 4, 5]))] += 1
# ensure it's uniform
assert all(7945 < v < 8728 for v in c.values())
# above constants calculated in R via:
# k<-120;p<-0.001/k;qbinom(c(p,1-p), 1e6, 1/k))
Fisher-Yates should do the trick as good as any, this article is really good:
https://medium.com/#oldwestaction/randomness-is-hard-e085decbcbb2
the relevant JS code is very short and sweet:
const fisherYatesShuffle = (deck) => {
for (let i = deck.length - 1; i >= 0; i--) {
const swapIndex = Math.floor(Math.random() * (i + 1));
[deck[i], deck[swapIndex]] = [deck[swapIndex], deck[i]];
}
return deck
}
to yield results as you go, so you don't have to iterate through the list twice, use generator function like so:
const fisherYatesShuffle = function* (deck) {
for (let i = deck.length - 1; i >= 0; i--) {
const swapIndex = Math.floor(Math.random() * (i + 1)); // * use ;
[deck[i], deck[swapIndex]] = [deck[swapIndex], deck[i]];
yield deck[i];
}
};
(note don't forget some of those semi-colons, when the next line is bracket notation).

Find all combinations that include a specific value where all values are next to each other in array

I have an array of variable length containing all unique values and I need to find all combinations of values whose indices are next to each other and always include a specified value. The order of values in each resulting combination doesn't matter (However I kept them in order in my example to better illustrate).
As an example: [5, 4, 2, 0, 1, 3]
If the specific value chosen is 0, we would end up with the following 12 combinations:
0
0, 1
2, 0
0, 1, 3
2, 0, 1
4, 2, 0
2, 0, 1, 3
4, 2, 0, 1
5, 4, 2, 0
4, 2, 0, 1, 3
5, 4, 2, 0, 1
5, 4, 2, 0, 1, 3
If the specific value chosen is 3, we would end up with the following 6 combinations:
3
1, 3
0, 1, 3
2, 0, 1, 3
4, 2, 0, 1, 3
5, 4, 2, 0, 1, 3
Answers in any programming language will work.
EDIT: I believe this can be brute forced by finding all combinations of all numbers and then narrowing that list to make sure each combination meets the requirements...its not ideal but should work.
This problem could be solved in O(n^3) time-complexity using the following algorithm:
Step-1: Find the index of the target element.
Step-2: Iterate through an index of the target to the rightmost index. Let's call this iterator as idx.
Step-3: Then iterate from the target index to the leftmost index. Let's call this index as i.
Step-4: Print all the elements between the indices idx and i.
Following the above steps will print all the combinations.
The code for the above algorithm is implemented using python below.
def solution(array,target):
index = -1
for idx,element in enumerate(array):
if(element == target):
index = idx
n = len(array)
for idx in range(n-1,index-1,-1):
for i in range(index,-1,-1):
for j in range(i,idx+1):
print(array[j],end = ",")
print()
arr = [5, 4, 2, 0, 1, 3]
target = 0
solution(arr,target)

Finding duplicate columns in a nested array

0 1 2 3 4 5 6
0{1,2,1,2,1,5,5}
1{5,4,5,4,5,1,1}
2{2,4,2,4,2,1,1}
3{1,2,1,2,1,1,1}
4{4,4,4,4,4,1,1}
5{2,4,2,4,2,2,2}
output: {{0,2,4}, {1,3}, {5,6}} (can use any data structure)
Let's say there is a nested array like above. If we wanted to find column indices that contain the same exact numbers in the same order (for example, column 0, 2, 4 with (1,5,2,1,4,2) and column 1, 3 with (2,4,4,2,4,4), and column 5, 6 with (5,1,1,1,1,2), how can we go about with this efficiently? Will it require dynamic programming?
Thanks in advance.
You can just iterate through the columns, keeping a hashmap of the columns that you've seen so far. Here's an implementation in python:
x = [[1, 2, 1, 2, 1, 5, 5],
[5, 4, 5, 4, 5, 1, 1],
[2, 4, 2, 4, 2, 1, 1],
[1, 2, 1, 2, 1, 1, 1],
[4, 4, 4, 4, 4, 1, 1],
[2, 4, 2, 4, 2, 2, 2]]
seen_before = {}
for v, col in enumerate(zip(*x)):
if tuple(col) not in seen_before:
seen_before[tuple(col)] = [v]
else:
seen_before[tuple(col)].append(v)
This solves the problem in linear time. I hope that's good enough for you.

How to sort an array using minimum number of writes?

My friend was asked a question in his interview:
The interviewer gave him an array of unsorted numbers and asked him to sort. The restriction is that the number of writes should be minimized while there is no limitation on the number of reads.
Selection sort is not the right algorithm here. Selection sort will swap values, making up to two writes per selection, giving a maximum of 2n writes per sort.
An algorithm that's twice as good as selection sort is "cycle" sort, which does not swap. Cycle sort will give a maximum of n writes per sort. The number of writes is absolutely minimized. It will only write a number once to its final destination, and only then if it's not already there.
It is based on the idea that all permutations are products of cycles and you can simply cycle through each cycle and write each element to its proper place once.
import java.util.Random;
import java.util.Collections;
import java.util.Arrays;
public class CycleSort {
public static final <T extends Comparable<T>> int cycleSort(final T[] array) {
int writes = 0;
// Loop through the array to find cycles to rotate.
for (int cycleStart = 0; cycleStart < array.length - 1; cycleStart++) {
T item = array[cycleStart];
// Find where to put the item.
int pos = cycleStart;
for (int i = cycleStart + 1; i < array.length; i++)
if (array[i].compareTo(item) < 0) pos++;
// If the item is already there, this is not a cycle.
if (pos == cycleStart) continue;
// Otherwise, put the item there or right after any duplicates.
while (item.equals(array[pos])) pos++;
{
final T temp = array[pos];
array[pos] = item;
item = temp;
}
writes++;
// Rotate the rest of the cycle.
while (pos != cycleStart) {
// Find where to put the item.
pos = cycleStart;
for (int i = cycleStart + 1; i < array.length; i++)
if (array[i].compareTo(item) < 0) pos++;
// Put the item there or right after any duplicates.
while (item.equals(array[pos])) pos++;
{
final T temp = array[pos];
array[pos] = item;
item = temp;
}
writes++;
}
}
return writes;
}
public static final void main(String[] args) {
final Random rand = new Random();
final Integer[] array = new Integer[8];
for (int i = 0; i < array.length; i++) { array[i] = rand.nextInt(8); }
for (int iteration = 0; iteration < 10; iteration++) {
System.out.printf("array: %s ", Arrays.toString(array));
final int writes = cycleSort(array);
System.out.printf("sorted: %s writes: %d\n", Arrays.toString(array), writes);
Collections.shuffle(Arrays.asList(array));
}
}
}
A few example runs :
array: [3, 2, 6, 1, 3, 1, 4, 4] sorted: [1, 1, 2, 3, 3, 4, 4, 6] writes: 6
array: [1, 3, 4, 1, 3, 2, 4, 6] sorted: [1, 1, 2, 3, 3, 4, 4, 6] writes: 4
array: [3, 3, 1, 1, 4, 4, 2, 6] sorted: [1, 1, 2, 3, 3, 4, 4, 6] writes: 6
array: [1, 1, 3, 2, 4, 3, 6, 4] sorted: [1, 1, 2, 3, 3, 4, 4, 6] writes: 6
array: [3, 2, 3, 4, 6, 4, 1, 1] sorted: [1, 1, 2, 3, 3, 4, 4, 6] writes: 7
array: [6, 2, 4, 3, 1, 3, 4, 1] sorted: [1, 1, 2, 3, 3, 4, 4, 6] writes: 6
array: [6, 3, 2, 4, 3, 1, 4, 1] sorted: [1, 1, 2, 3, 3, 4, 4, 6] writes: 5
array: [4, 2, 6, 1, 1, 4, 3, 3] sorted: [1, 1, 2, 3, 3, 4, 4, 6] writes: 7
array: [4, 3, 3, 1, 2, 4, 6, 1] sorted: [1, 1, 2, 3, 3, 4, 4, 6] writes: 7
array: [1, 6, 4, 2, 4, 1, 3, 3] sorted: [1, 1, 2, 3, 3, 4, 4, 6] writes: 7
array: [5, 1, 2, 3, 4, 3, 7, 0] sorted: [0, 1, 2, 3, 3, 4, 5, 7] writes: 5
array: [5, 1, 7, 3, 2, 3, 4, 0] sorted: [0, 1, 2, 3, 3, 4, 5, 7] writes: 6
array: [4, 0, 3, 1, 5, 2, 7, 3] sorted: [0, 1, 2, 3, 3, 4, 5, 7] writes: 8
array: [4, 0, 7, 3, 5, 1, 3, 2] sorted: [0, 1, 2, 3, 3, 4, 5, 7] writes: 7
array: [3, 4, 2, 7, 5, 3, 1, 0] sorted: [0, 1, 2, 3, 3, 4, 5, 7] writes: 7
array: [0, 5, 3, 2, 3, 7, 1, 4] sorted: [0, 1, 2, 3, 3, 4, 5, 7] writes: 6
array: [1, 4, 3, 7, 2, 3, 5, 0] sorted: [0, 1, 2, 3, 3, 4, 5, 7] writes: 7
array: [1, 5, 0, 7, 3, 3, 4, 2] sorted: [0, 1, 2, 3, 3, 4, 5, 7] writes: 7
array: [0, 5, 7, 3, 3, 4, 2, 1] sorted: [0, 1, 2, 3, 3, 4, 5, 7] writes: 4
array: [7, 3, 1, 0, 3, 5, 4, 2] sorted: [0, 1, 2, 3, 3, 4, 5, 7] writes: 7
If the array is shorter (ie less than about 100 elements) a Selection sort is often the best choice if you also want to reduce the number of writes.
From wikipedia:
Another key difference is that
selection sort always performs Θ(n)
swaps, while insertion sort performs
Θ(n2) swaps in the average and worst
cases. Because swaps require writing
to the array, selection sort is
preferable if writing to memory is
significantly more expensive than
reading. This is generally the case if
the items are huge but the keys are
small. Another example where writing
times are crucial is an array stored
in EEPROM or Flash. There is no other
algorithm with less data movement.
For larger arrays/lists Quicksort and friends will provide better performance, but may still likely need more writes than a selection sort.
If you're interested this is a fantastic sort visualization site that allows you to watch specific sort algorithms do their job and also "race" different sort algorithms against each other.
You can use a very naive algorithm that satisfies what you need.
The algorithm should look like this:
i = 0
do
search for the minimum in range [i..n)
swap a[i] with a[minPos]
i = i + 1
repeat until i = n.
The search for the minimum can cost you almost nothing, the swap costs you 3 writes, the i++ costs you 1..
This is named selection sort as stated by ash. (Sorry, I didn't knew it was selection sort :( )
One option for large arrays is as follows (assuming n elements):
Initialize an array with n elements numbered 0..n-1
Sort the array using any sorting algorithm. As the comparison function, compare the elements in the input set with the corresponding numbers (eg, to compare 2 and 4, compare the 2nd and 4th elements in the input set). This turns the array from step 1 into a permutation that represents the sorted order of the input set.
Iterate through the elements in the permutation, writing out the blocks in the order specified by the array. This requires exactly n writes, the minimum.
To sort in-place, in step 3 you should instead identify the cycles in the permutation, and 'rotate' them as necessary to result in sorted order.
The ordering I meant in O(n) is like the selection sort(the previous post) useful when you have a small range of keys (or you are ordering numbers between 2 ranges)
If you have a number array where numbers will be between -10 and 100, then you can create an array of 110 and be sure that all numbers will fit in there, if you consider repeated numbers the idea is the same, but you will have lists instead of numbers in the sorted array
the pseudo-idea is like this
N: max value of your array
tosort //array to be sorted
sorted = int[N]
for i = 0 to length(tosort)
do
sorted[tosort[i]]++;
end
finalarray = int[length(tosort)]
k = 0
for i = 0 to N
do
if ( sorted[i] > 0 )
finalarray[k] = i
k++;
endif
end
finalarray will have the final sorted array and you will have o(N) write operations, where N is the range of the array. Once again, this is useful when using keys inside a specific range, but perhaps its your case.
Best regards and good luck!

"Pyramidizing" an array/list (in Ruby, but general solutions could probably be implemented)

I'm not sure what the best word to use here. By "pyramidizing", I mean:
[1,2,3,4].pyramidize # => [1,1,1,1,2,2,2,3,3,4]
["a","b","c","d"].pyramidize # => ["a","a","a","a","b","b","b","c","c","d"]
To represent visually, it could be thought of as:
[ 1,1,1,1,
2,2,2,
3,3,
4 ]
Is there a way to do this that maximizes elegance? A most ruby-like way?
I came across the "need" to do this in a project of mine. After thinking about it, I gave up and decided to work around the problem in an ugly way. I was wondering if there was a pretty way to do this. So far, to do it directly, I've ended up making a separate array for each index and stretching out each array the appropriate length and combining them together. But I don't know how to do this so it looks pretty; my solution is pretty ugly.
Added code golf tag because any solution in one line would probably make my day, but it doesn't have to be.
It doesn't really matter if your solution makes the first index the "base" of the pyramid, or the last index, because I could just reverse the array before running it.
Requires the new iterator fanciness in Ruby 1.9.
class Array
def pyramidize
reverse.map.with_index do |object, index|
[object] * (index + 1)
end.flatten.reverse
end
end
[1,2,3,4].pyramidize
=> [1, 1, 1, 1, 2, 2, 2, 3, 3, 4]
["a","b","c","d"].pyramidize
=> ["a", "a", "a", "a", "b", "b", "b", "c", "c", "d"]
irb(main):001:0> [2,1,3,5].flat_map.with_index{|i,j|[i]*(j+1)}
=> [2, 1, 1, 3, 3, 3, 5, 5, 5, 5]
irb(main):002:0> [1,2,3,4].flat_map{|i|[i]*i}
=> [1, 2, 2, 3, 3, 3, 4, 4, 4, 4]
I'm not sure if you want to use the value of the list or the index to determine how much the list should repeat, but a simple solution in python that can probably transfer to ruby easily:
>>> import operator
>>> input = range(6)
>>> reduce(operator.add, [[i]*idx for idx, i in enumerate(input)])
[1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 5]
Update
Oh and to invert the counts:
>>> import operator
>>> input = range(1, 6)
>>> reduce(operator.add, [[i]*(max(input) - idx) for idx, i in enumerate(input)])
[1, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 4, 4, 5]
And of course you reversed the list in one of your examples:
>>> import operator
>>> input = range(1, 6)
>>> reduce(operator.add, [[i]*(max(input) - idx) for idx, i in enumerate(input)])[::-1]
[ 5,
4, 4,
3, 3, 3,
2, 2, 2, 2,
1, 1, 1, 1, 1]
FWIW, this is a mathy way of doing it:
>>> A = [1, 2, 3, 4]
>>> [ A[int((sqrt(8*k+1)-1) / 2)] for k in range(len(A)*(len(A)+1) / 2) ]
[1, 2, 2, 3, 3, 3, 4, 4, 4, 4]
Admittedly, the use of sqrt is pretty ugly.
Here is another way to do it in Python
>>> A=[1,2,3,4]
>>> [y for i,x in enumerate(A) for y in [x]*(len(A)-i)]
[1, 1, 1, 1, 2, 2, 2, 3, 3, 4]
But it's nicer not to create all those temporary lists
>>> from itertools import repeat
>>> [y for i,x in enumerate(A) for y in repeat(x, len(A)-i)]
[1, 1, 1, 1, 2, 2, 2, 3, 3, 4]

Resources