Generate a new array from an array of numbers - algorithm

I found this question on Glassdoor:
Generate a new array from an array of numbers. Start from the beginning. Put the number of some number first, and then that number. For example, from array 1, 1, 2, 3, 3, 1 You should get 2, 1, 1, 2, 2, 3, 1, 1 Write a program to solve this problem.
I am not sure if I get the idea, how come 1, 1, 2, 3, 3, 1 transforms into 2, 1, 1, 2, 2, 3, 1, 1? I first thought they are number of occurrences of a number followed by the number itself. But from the given example, it seems like something else is wanted.
What is this transformation?

I first thought they are number of occurrences of a number followed by the number itself.
Your first thought was correct.
Break the first array down to be:
1, 1,
2,
3, 3,
1
And the second to be:
2, 1,
1, 2,
2, 3,
1, 1
Then it should make more sense.
Sample implementation:
#!/usr/bin/env python
import sys
array = map(int, sys.argv[1:])
print array
count = 0
current = array[0]
index = 1
output = []
for number in array:
if current != number:
output.append(count)
output.append(current)
current = number
count = 0
count += 1
output.append(count)
output.append(current)
print output
Demo:
> ./arrays.py 1 1 2 3 3 1
[1, 1, 2, 3, 3, 1]
[2, 1, 1, 2, 2, 3, 1, 1]

what u think is correct. its the number of times the distinct element comes and then the element itself.
here is pseudocode:
array1 = given input array
array2 = output array
int previous = array1[0];
int currentCount = 0;
for each entry x in array1 {
if(x == previous) {
currentCount++;
}
else {
array2.add(currentCount);
array2.add(x);
//reset global variables for next elements
previous = x;
currentCount = 0;
}
}

And the Haskell version...yup, that's the whole thing.
import Data.List
countArray list = concat [[length l, fromIntegral (head l)] | l <- group list]

Related

Exhaust list of elements randomly without sorting them randomly first

If I have a list of 10K elements, and I want to randomly iterate through all of them, is there an algorithm that lets me access each element randomly, without just sorting them randomly first?
In other words, this would not be ideal:
const sorted = list
.map(v => [math.random(), v])
.sort((a,b) => a[0]- b[0]);
It would be nice to avoid the sort call and the mapping call.
My only idea would be to store everything in a hashmap and access the hash keys randomly somehow? Although that's just coming back to the same problem, afaict.
Just been having a play with this and realised that the Fisher-Yates shuffle works well "on-line". For example, if you've got a large list you don't need to spend the time to shuffle the whole thing before you start iterating over items, or, equivalently, you might only need a few items out of a large list.
I didn't see a language tag in the question, so I'll pick Python.
from random import randint
def iterrand(a):
"""Iterate over items of a list in a random order.
Additional items can be .append()ed arbitrarily at runtime."""
for i, ai in enumerate(a):
j = randint(i, len(a)-1)
a[i], a[j] = a[j], ai
yield a[i]
This is O(n) in the length of the list and by allowing .append()s (O(1) in Python) the list can be built in the background.
An example use would be:
l = [0, 1, 2]
for i, v in enumerate(iterrand(l)):
print(f"{i:3}: {v:<5} {l}")
if v < 4:
l.append(randint(1, 9))
which might produce output like:
0: 2 [2, 1, 0]
1: 3 [2, 3, 0, 1]
2: 1 [2, 3, 1, 1, 0]
3: 0 [2, 3, 1, 0, 1, 3]
4: 1 [2, 3, 1, 0, 1, 3, 7]
5: 7 [2, 3, 1, 0, 1, 7, 7, 3]
6: 7 [2, 3, 1, 0, 1, 7, 7, 3]
7: 3 [2, 3, 1, 0, 1, 7, 7, 3]
8: 2 [2, 3, 1, 0, 1, 7, 7, 3, 2]
9: 3 [2, 3, 1, 0, 1, 7, 7, 3, 2, 3]
10: 2 [2, 3, 1, 0, 1, 7, 7, 3, 2, 3, 2]
11: 7 [2, 3, 1, 0, 1, 7, 7, 3, 2, 3, 2, 7]
Update: To test correctness, I'd do something like:
# trivial tests
assert list(iterrand([])) == []
assert list(iterrand([1])) == [1]
# bigger uniformity test
from collections import Counter
# tally 1M draws
c = Counter()
for _ in range(10**6):
c[tuple(iterrand([1, 2, 3, 4, 5]))] += 1
# ensure it's uniform
assert all(7945 < v < 8728 for v in c.values())
# above constants calculated in R via:
# k<-120;p<-0.001/k;qbinom(c(p,1-p), 1e6, 1/k))
Fisher-Yates should do the trick as good as any, this article is really good:
https://medium.com/#oldwestaction/randomness-is-hard-e085decbcbb2
the relevant JS code is very short and sweet:
const fisherYatesShuffle = (deck) => {
for (let i = deck.length - 1; i >= 0; i--) {
const swapIndex = Math.floor(Math.random() * (i + 1));
[deck[i], deck[swapIndex]] = [deck[swapIndex], deck[i]];
}
return deck
}
to yield results as you go, so you don't have to iterate through the list twice, use generator function like so:
const fisherYatesShuffle = function* (deck) {
for (let i = deck.length - 1; i >= 0; i--) {
const swapIndex = Math.floor(Math.random() * (i + 1)); // * use ;
[deck[i], deck[swapIndex]] = [deck[swapIndex], deck[i]];
yield deck[i];
}
};
(note don't forget some of those semi-colons, when the next line is bracket notation).

Find all combinations that include a specific value where all values are next to each other in array

I have an array of variable length containing all unique values and I need to find all combinations of values whose indices are next to each other and always include a specified value. The order of values in each resulting combination doesn't matter (However I kept them in order in my example to better illustrate).
As an example: [5, 4, 2, 0, 1, 3]
If the specific value chosen is 0, we would end up with the following 12 combinations:
0
0, 1
2, 0
0, 1, 3
2, 0, 1
4, 2, 0
2, 0, 1, 3
4, 2, 0, 1
5, 4, 2, 0
4, 2, 0, 1, 3
5, 4, 2, 0, 1
5, 4, 2, 0, 1, 3
If the specific value chosen is 3, we would end up with the following 6 combinations:
3
1, 3
0, 1, 3
2, 0, 1, 3
4, 2, 0, 1, 3
5, 4, 2, 0, 1, 3
Answers in any programming language will work.
EDIT: I believe this can be brute forced by finding all combinations of all numbers and then narrowing that list to make sure each combination meets the requirements...its not ideal but should work.
This problem could be solved in O(n^3) time-complexity using the following algorithm:
Step-1: Find the index of the target element.
Step-2: Iterate through an index of the target to the rightmost index. Let's call this iterator as idx.
Step-3: Then iterate from the target index to the leftmost index. Let's call this index as i.
Step-4: Print all the elements between the indices idx and i.
Following the above steps will print all the combinations.
The code for the above algorithm is implemented using python below.
def solution(array,target):
index = -1
for idx,element in enumerate(array):
if(element == target):
index = idx
n = len(array)
for idx in range(n-1,index-1,-1):
for i in range(index,-1,-1):
for j in range(i,idx+1):
print(array[j],end = ",")
print()
arr = [5, 4, 2, 0, 1, 3]
target = 0
solution(arr,target)

Detect outlier in repeating sequence

I have a repeating sequence of say 0~9 (but may start and stop at any of these numbers). e.g.:
3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2
And it has outliers at random location, including 1st and last one, e.g.:
9,4,5,6,7,8,9,0,1,2,3,4,8,6,7,0,9,0,1,2,3,4,1,6,7,8,9,0,1,6
I need to find & correct the outliers, in the above example, I need correct the first "9" into "3", and "8" into "5", etc..
What I came up with is to construct a sequence with no outlier of desired length, but since I don't know which number the sequence starts with, I'd have to construct 10 sequences each starting from "0", "1", "2" ... "9". And then I can compare these 10 sequences with the given sequence and find the one sequence that match the given sequence the most. However this is very inefficient when the repeating pattern gets large (say if the repeating pattern is 0~99, I'd need to create 100 sequences to compare).
Assuming there won't be consecutive outliers, is there a way to find & correct these outliers efficiently?
edit: added some explanation and added the algorithm tag. Hopefully it is more appropriate now.
I'm going to propose a variation of #trincot's fine answer. Like that one, it doesn't care how many outliers there may be in a row, but unlike that one doesn't care either about how many in a row aren't outliers.
The base idea is just to let each sequence element "vote" on what the first sequence element "should be". Whichever gets the most votes wins. By construction, this maximizes the number of elements left unchanged: after the 1-liner loop ends, votes[i] is the number of elements left unchanged if i is picked as the starting point.
def correct(numbers, mod=None):
# this part copied from #trincot's program
if mod is None: # if argument is not provided:
# Make a guess what the range is of the values
mod = max(numbers) + 1
votes = [0] * mod
for i, x in enumerate(numbers):
# which initial number would make x correct?
votes[(x - i) % mod] += 1
winning_count = max(votes)
winning_numbers = [i for i, v in enumerate(votes)
if v == winning_count]
if len(winning_numbers) > 1:
raise ValueError("ambiguous!", winning_numbers)
winning_number = winning_numbers[0]
for i in range(len(numbers)):
numbers[i] = (winning_number + i) % mod
return numbers
Then, e.g.,
>>> correct([9,4,5,6,7,8,9,0,1,2,3,4,8,6,7,0,9,0,1,2,3,4,1,6,7,8,9,0,1,6])
[3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2]
but
>>> correct([1, 5, 3, 7, 5, 9])
...
ValueError: ('ambiguous!', [1, 4])
That is, it's impossible to guess whether you want [1, 2, 3, 4, 5, 6] or [4, 5, 6, 7, 8, 9]. They both have 3 numbers "right", and despite that there are never two adjacent outliers in either case.
I would do a first scan of the list to find the longest sublist in the input that maintains the right order. We will then assume that those values are all correct, and calculate backwards what the first value would have to be to produce those values in that sublist.
Here is how that would look in Python:
def correct(numbers, mod=None):
if mod is None: # if argument is not provided:
# Make a guess what the range is of the values
mod = max(numbers) + 1
# Find the longest slice in the list that maintains order
start = 0
longeststart = 0
longest = 1
expected = -1
for last in range(len(numbers)):
if numbers[last] != expected:
start = last
elif last - start >= longest:
longest = last - start + 1
longeststart = start
expected = (numbers[last] + 1) % mod
# Get from that longest slice what the starting value should be
val = (numbers[longeststart] - longeststart) % mod
# Repopulate the list starting from that value
for i in range(len(numbers)):
numbers[i] = val
val = (val + 1) % mod
# demo use
numbers = [9,4,5,6,7,8,9,0,1,2,3,4,8,6,7,0,9,0,1,2,3,4,1,6,7,8,9,0,1,6]
correct(numbers, 10) # for 0..9 provide 10 as argument, ...etc
print(numbers)
The advantage of this method is that it would even give a good result if there were errors with two consecutive values, provided that there are enough correct values in the list of course.
Still this runs in linear time.
Here is another way using groupby and count from Python's itertools module:
from itertools import count, groupby
def correct(lst):
groupped = [list(v) for _, v in groupby(lst, lambda a, b=count(): a - next(b))]
# Check if all groups are singletons
if all(len(k) == 1 for k in groupped):
raise ValueError('All groups are singletons!')
for k, v in zip(groupped, groupped[1:]):
if len(k) < 2:
out = v[0] - 1
if out >= 0:
yield out
else:
yield from k
else:
yield from k
# check last element of the groupped list
if len(v) < 2:
yield k[-1] + 1
else:
yield from v
lst = "9,4,5,6,7,8,9,0,1,2,3,4,8,6,7,0,9,0,1,2,3,4,1,6,7,8,9,0,1,6"
lst = [int(k) for k in lst.split(',')]
out = list(correct(lst))
print(out)
Output:
[3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2]
Edit:
For the case of [1, 5, 3, 7, 5, 9] this solution will return something not accurate, because i can't see which value you want to modify. This is why the best solution is to check & raise a ValueError if all groups are singletons.
Like this?
numbers = [9,4,5,6,7,8,9,0,1,2,3,4,8,6,7,0,9,0,1,2,3,4,1,6,7,8,9,0,1,6]
i = 0
for n in numbers[:-1]:
i += 1
if n > numbers[i] and n > 0:
numbers[i-1] = numbers[i]-1
elif n > numbers[i] and n == 0:
numbers[i - 1] = 9
n = numbers[-1]
if n > numbers[0] and n > 0:
numbers[-1] = numbers[0] - 1
elif n > numbers[0] and n == 0:
numbers[-1] = 9
print(numbers)

Length of maximum continuous subarray with 2 unique numbers

I have an array of numbers and I want to figure out the maximum length of a continuous subarray of 2 unique numbers repeating.
For example, [2, 3, 4, 3, 2, 2, 4] would return 3 since [3, 2, 2] is of length 3.
[2, 4, 2, 5, 1, 5, 4, 2] would return 3.
[7, 8, 7, 8, 7] would return 5.
Edit: I have considered an O(n^2) solution where I start at each value in the array and iterate until I see a third unique value.
for item in array:
iterate until third unique element
if length of this iteration is greater than existing max, update the max length
return maxlength
I do not, however, think this is an efficient solution.
It can be done O(n). The code is in python3. o and t are one and two respectively. m is the max and c is the current count variable.
a = [7, 8, 7, 8, 7]
m = -1
o = a[0]
t = a[1]
# in the beginning one and two are the first 2 numbers
c = 0
index = 0
for i in a:
if i == o or i == t:
# if current element is either one or two current count is increased
c += 1
else:
# if current element is neither one nor two then they are updated accordingly and max is updated
o = a[index - 1]
t = a[index]
m = max(m, c)
c = 2
index += 1
m = max(m, c)
print(m)
We can use two pointer technique to solve this problem in O(n) run time complexity. These two pointer for example startPtr and endPtr will represent the range in the array. We will maintain this range [startPtr, endPtr] in such way that it contains no more than 2 unique number. We can do this by keeping track of position of the 2 unique number. My implement in C++ is given below:
int main()
{
int array[] = {1,2,3,3,2,3,2,3,2,2,2,1,3,4};
int startPtr = 0;
int endPtr = 0;
// denotes the size of the array
int size= sizeof(array)/sizeof(array[0]);
// contain last position of unique number 1 in the range [startPtr, endPtr]
int uniqueNumPos1 = -1; // -1 value represents it is not set yet
// contain last position of unique number 2 in the range [startPtr, endPtr]
int uniqueNumPos2 = -1; // -1 value represents it is not set yet
// contains length of maximum continuous subarray with 2 unique numbers
int ans = 0;
while(endPtr < size) {
if(uniqueNumPos1 == -1 || array[endPtr] == array[uniqueNumPos1]) {
uniqueNumPos1 = endPtr;
}
else {
if(uniqueNumPos2 == -1 || array[endPtr] == array[uniqueNumPos2]) {
uniqueNumPos2 = endPtr;
}
else {
// for this new third unique number update startPtr with min(uniqueNumPos1, uniqueNumPos2) + 1
// to ensure [startPtr, endPtr] does not contain more that two unique
startPtr = min(uniqueNumPos1, uniqueNumPos2) + 1;
// update uniqueNumPos1 and uniqueNumPos2
uniqueNumPos1 = endPtr -1;
uniqueNumPos2 = endPtr;
}
}
// this conditon is to ensure the range contain exactly two unique number
// if you are looking for the range containing less than or equal to two unique number, then you can omit this condition
if (uniqueNumPos1 != -1 && uniqueNumPos2 !=-1) {
ans = max( ans, endPtr - startPtr + 1);
}
endPtr++;
}
printf("%d\n", ans);
}
Thanks #MBo for pointing out the mistakes.
import java.util.Arrays;
import static java.lang.System.out;
class TestCase{
int[] test;
int answer;
TestCase(int[] test,int answer){
this.test = test;
this.answer = answer;
}
}
public class Solution {
public static void main(String[] args) {
TestCase[] tests = {
new TestCase(new int[]{2, 3, 4, 3, 2, 2, 4},3),
new TestCase(new int[]{2, 3, 3, 3, 3, 4, 3, 3, 2, 2, 4},7),
new TestCase(new int[]{1,2,3,3,4,2,3,2,3,2,2,2,1,3,4},7),
new TestCase(new int[]{2, 7, 8, 7, 8, 7},5),
new TestCase(new int[]{-1,2,2,2,2,2,2,2,2,2,2,-1,-1,4},13),
new TestCase(new int[]{1,2,3,4,5,6,7,7},3),
new TestCase(new int[]{0,0,0,0,0},0),
new TestCase(new int[]{0,0,0,2,2,2,1,1,1,1},7),
new TestCase(new int[]{},0)
};
for(int i=0;i<tests.length;++i){
int ans = maxContiguousArrayWith2UniqueElements(tests[i].test);
out.println(Arrays.toString(tests[i].test));
out.println("Expected: " + tests[i].answer);
out.println("Returned: " + ans);
out.println("Result: " + (tests[i].answer == ans ? "ok" : "not ok"));
out.println();
}
}
private static int maxContiguousArrayWith2UniqueElements(int[] A){
if(A == null || A.length <= 1) return 0;
int max_subarray = 0;
int first_number = A[0],second_number = A[0];
int start_index = 0,same_element_run_length = 1;
for(int i=1;i<A.length;++i){
if(A[i] != A[i-1]){
if(first_number == second_number){
second_number = A[i];
}else{
if(A[i] != first_number && A[i] != second_number){
max_subarray = Math.max(max_subarray,i - start_index);
start_index = i - same_element_run_length;
first_number = A[i-1];
second_number = A[i];
}
}
same_element_run_length = 1;
}else{
same_element_run_length++;
}
}
return first_number == second_number ? max_subarray : Math.max(max_subarray,A.length - start_index);
}
}
OUTPUT:
[2, 3, 4, 3, 2, 2, 4]
Expected: 3
Returned: 3
Result: ok
[2, 3, 3, 3, 3, 4, 3, 3, 2, 2, 4]
Expected: 7
Returned: 7
Result: ok
[1, 2, 3, 3, 4, 2, 3, 2, 3, 2, 2, 2, 1, 3, 4]
Expected: 7
Returned: 7
Result: ok
[2, 7, 8, 7, 8, 7]
Expected: 5
Returned: 5
Result: ok
[-1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, -1, -1, 4]
Expected: 13
Returned: 13
Result: ok
[1, 2, 3, 4, 5, 6, 7, 7]
Expected: 3
Returned: 3
Result: ok
[0, 0, 0, 0, 0]
Expected: 0
Returned: 0
Result: ok
[0, 0, 0, 2, 2, 2, 1, 1, 1, 1]
Expected: 7
Returned: 7
Result: ok
[]
Expected: 0
Returned: 0
Result: ok
Algorithm:
So, we maintain 2 variables first_number and second_number which will hold those 2 unique numbers.
As you know, there could be many possible subarrays we have to consider to get the max subarray length which has 2 unique elements. Hence, we need a pointer variable which will point to start of a subarray. In this, that pointer is start_index.
Any subarray breaks when we find a third number which is not equal to first_number or second_number. So, now, we calculate the previous subarray length(which had those 2 unique elements) by doing i - start_index.
Tricky part of this question is how to get the start_index of the next subarray.
If you closely observe, second_number of previous subarray becomes first_number of current subarray and third number we encountered just now becomes second_number of this current subarray.
So, one way to calculate when this first_number started is to run a while loop backwards to get that start_index. But that would make the algorithm O(n^2) if there are many subarrays to consider(which it will be).
Hence, we maintain a variable called same_element_run_length which just keeps track of the length or frequency of how many times the number got repeated and gets updated whenever it breaks. So, start_index for the next subarray after we encounter the third number becomes start_index = i - same_element_run_length.
Rest of the computation done is self-explanatory.
Time Complexity: O(n), Space Complexity : O(1).

Pseudocode to find the longest run within an array

I know that A run is a sequence of adjacent repeated values , How would you write pseudo code for computing the length of the longest run in an array e.g.
5 would be the longest run in this array of integers.
1 2 4 4 3 1 2 4 3 5 5 5 5 3 6 5 5 6 3 1
Any idea would be helpful.
def longest_run(array):
result = None
prev = None
size = 0
max_size = 0
for element in array:
if (element == prev):
size += 1
if size > max_size:
result = element
max_size = size
else:
size = 0
prev = element
return result
EDIT
Wow. Just wow! This pseudocode is actually working:
>>> longest_run([1,2,4,4,3,1,2,4,3,5,5,5,5,3,6,5,5,6,3,1])
5
max_run_length = 0;
current_run_length = 0;
loop through the array storing the current index value, and the previous index's value
if the value is the same as the previous one, current_run_length++;
otherwise {
if current_run_length > max_run_length : max_run_length = current_run_length
current_run_length = 1;
}
Here a different functional approach in Python (Python looks like Pseudocode). This code works only with Python 3.3+. Otherwise you must replace "return" with "raise StopIteration".
I'm using a generator to yield a tuple with quantity of the element and the element itself. It's more universal. You can use this also for infinite sequences. If you want to get the longest repeated element from the sequence, it must be a finite sequence.
def group_same(iterable):
iterator = iter(iterable)
last = next(iterator)
counter = 1
while True:
try:
element = next(iterator)
if element is last:
counter += 1
continue
else:
yield (counter, last)
counter = 1
last = element
except StopIteration:
yield (counter, last)
return
If you have a list like this:
li = [0, 0, 2, 1, 1, 1, 1, 1, 5, 5, 6, 7, 7, 7, 12, 'Text', 'Text', 'Text2']
Then you can make a new list of it:
list(group_same(li))
Then you'll get a new list:
[(2, 0),
(1, 2),
(5, 1),
(2, 5),
(1, 6),
(3, 7),
(1, 12),
(2, 'Text'),
(1, 'Text2')]
To get longest repeated element, you can use the max function.
gen = group_same(li) # Generator, does nothing until iterating over it
grouped_elements = list(gen) # iterate over the generator until it's exhausted
longest = max(grouped_elements, key=lambda x: x[0])
Or as a one liner:
max(list(group_same(li)), key=lambda x: x[0])
The function max gives us the biggest element in a list. In this case, the list has more than one element. The argument key is just used to get the first element of the tuple as max value, but you'll still get back the tuple.
In : max(list(group_same(li)), key=lambda x: x[0])
Out: (5, 1)
The element 1 occurred 5 times repeatedly.
int main()
{
int a[20] = {1, 2, 4, 4, 3, 1, 2, 4, 3, 5, 5, 5, 5, 3, 6, 5, 5, 6, 3, 1};
int c=0;
for (int i=0;i<19;i++)
{
if (a[i] == a[i+1])
{
if (i != (i+1))
{
c++;
}
}
}
cout << c-1;
return 0;
}

Resources