I'm wondering if it is possible to somehow "sort" items in an array to place them in "equal" spacings.
An example is more than hundreds of words so:
Apple - 1
Banana - 2
Pineapple - 3
Orange - 4
And this is an array:
[ 'Apple', 'Apple', 'Banana', 'Pineapple', 'Pineapple', 'Pineapple', 'Orange' ]
[ 1, 1, 2, 3, 3, 3, 4 ]
What I want to achieve is something similar to this:
[ 'Apple', 'Pineapple', 'Banana', 'Apple', 'Pineapple', 'Orange', 'Pineapple' ]
[ 1, 3, 2, 1, 3, 4, 3 ]
With this transformation Pineapple has 1 item offset between other 'Pineapple' and Apple is placed in [0] and [3] placement.
Before I start an implementation I'm looking for an already invented solution - it should be something related with standard deviation?
The class of algorithm you're looking for is called multiplexing. A multiplexer takes several input streams, and creates a single output stream, selecting one item at a time from the input. There are many different multiplexing strategies. I'll describe one that's easy to implement, and performs well.
The general idea is that each item has a name, rate, and accumulator, and the item with the largest value in its accumulator is chosen next. In the example given in the question, the rates are 2 for Apple, 1 for Banana, 3 for Pineapple, and 1 for Orange. The sum of the rates is the period, which is 7.
The algorithm operates as follows:
initialize all accumulators to 0
for each slot in one period:
choose the item with the largest accumulator, and add it to the output
update each accumulator by adding the rate to the accumulator
subtract the period from the accumulator of the chosen item
The table below shows how the algorithm progresses. The slots are labelled S1 thru S7. For each slot there are two columns of numbers, the accumulator value for each item, and the adjustment to the accumulator.
In slot 1, the Orange is chosen, so the adjustment to the accumulator is +1 -7 = -6 (add the rate, and subtract the period). For every other item the adjustment is equal to the rate. Notice that all the accumulators start at 0, and return to 0 after the seventh slot. Hence, the algorithm could be run for any number of slots, and it would simply repeat the same pattern.
Name Rate __S1__ __S2__ __S3__ __S4__ __S5__ __S6__ __S7__
Orange 1/7 0 -6 -6 +1 -5 +1 -4 +1 -3 +1 -2 +1 -1 +1 0
Banana 1/7 0 +1 1 +1 2 +1 3 -6 -3 +1 -2 +1 -1 +1 0
Apple 2/7 0 +2 2 +2 4 -5 -1 +2 1 +2 3 -5 -2 +2 0
Pineapple 3/7 0 +3 3 -4 -1 +3 2 +3 5 -4 1 +3 4 -4 0
Selected item: Orange Pine Apple Banana Pine Apple Pine
Here's an implementation in Python:
items = ['Apple', 'Apple', 'Banana', 'Pineapple', 'Pineapple', 'Pineapple', 'Orange']
# Convert the list of items into a list that contains the [name, rate, accumulator]
# for each item. The initial value for the accumulator is 0
counts = {}
for item in items:
counts[item] = counts.get(item, 0) + 1
rates = counts.items()
rates = [[name, rate, 0] for (name, rate) in rates]
rates.sort(key=lambda x:x[1])
# Run the multiplexer, which
# adds the item with the largest accumulator to the output
# updates all the accumulators by adding the rate to the accumulator
# subtracts the period from the chosen accumlator
output = []
period = len(items)
for i in range(period):
best = 0
for j in range(len(rates)):
if rates[j][2] > rates[best][2]: # compare accumulators
best = j
rates[j][2] += rates[j][1] # update accumulator
rates[best][2] -= period
output.append(rates[best][0]) # add an item to the output
print output # ['Orange', 'Pineapple', 'Apple', 'Banana', 'Pineapple', 'Apple', 'Pineapple']
Start off by ordering your words by number of occurences. Then iterate over them, first filling up all even indices, then all odd indices.
The first word can at most fill up all even indices. In most modern arrays there should always be at least as many slots with an even index as there are with an odd one. If your language doesn't qualify for that (i.e. one-based arrays), pick even or odd based on the number of available slots.
The second most common word can only occur at most as many times as the most common word, so there's no possibility this way that the same word winds up in two adjacent slots that way.
A simple python-implementation would look like this:
import math
def spaced_ordering(words):
words = sorted(words, key=words.count, reverse=True)
output = [None] * len(words)
for i in range(0, math.ceil(len(words) / 2)):
output[i * 2] = words[i]
for i in range(0, math.floor(len(words) / 2)):
output[i * 2 + 1] = words[math.ceil(len(words) / 2) + i]
return output
Note: The above implementation is neither exactly performant, nor exactly fancy, nor does it include checking for valid inputs (e.g. what happens if a word occurs more than math.ceil(len(words) / 2) times). It only serves to demonstrate the basic principle.
Related
I'm currently studying for an advanced algorithms and datastructures exam, and I simply can't seem to solve one of the practice-problems which is the following:
1.14) "Nice Triangle"
A "nice" triangle is defined in the following way:
There are three different numbers which the triangle consists of, namely the first three prime numbers (2, 3 and 5).
Every number depends on the two numbers below it in the following way.
Numbers are the same, resulting number is also the same. (2, 2 => 2)
Numbers are different, resulting number is the remaining number. (2, 3 => 5)
Given an integer N with length L, corresponding to the base of the triangle, determine the last element at the top
For example:
Given N = 25555 (and thus L = 5), the triangle looks like this:
2
3 5
2 5 5
3 5 5 5
2 5 5 5 5
=> 2 is the result of this example
What does the fact that every number is prime have to do with the problem?
By using a naive approach (simply calculating every single row), one obtains a time-complexity of O(L^2).
However, the professor said, it's possible with O(L), but I simply can't find any pattern!!!
I'm not sure why this problem would be used in an advanced algorithms course, but yes, you can do this in O(l) = O(log n) time.
There are a couple ways you can do it, but they both rely on recognizing that:
For the problem statement, it doesn't matter what digits you use. Lets use 0, 1, and 2 instead of 2, 3, and 5. Then
If a and b are the input numbers and c is the output, then c = -(a+b) mod 3
You can build the whole triangle using c = a+b mod 3 instead, and then just negate every second row.
Now the two ways you can do this in O(log n) time are:
For each digit d in the input, calculate the number of times (call it k) that it gets added into the final sum, add up all the kd mod 3, and then negate the result if you started with an even number of digits. That takes constant time per digit. Alternatively:
recognize that you can do arithmetic on n-sized values in constant time. Make a value that is a bit mask of all the digits in n. That takes 2 bits each. Then by using bitwise operations you can calculate each row from the previous one in constant time, for O(log n) time altogether.
Here's an implementation of the 2nd way in python:
def niceTriangle(n):
# a vector of 3-bit integers mod 3
rowvec = 0
# a vector of 1 for each number in the row
onevec = 0
# number of rows remaining
rows = 0
# mapping for digits 0-9
digitmap = [0, 0, 0, 1, 1, 2, 2, 2, 2, 2]
# first convert n into the first row
while n > 0:
digit = digitmap[n % 10]
n = n//10
rows += 1
onevec = (onevec << 3) + 1
rowvec = (rowvec << 3) + digit
if rows%2 == 0:
# we have an even number of rows -- negate everything
rowvec = ((rowvec&onevec)<<1) | ((rowvec>>1)&onevec)
while rows > 1:
# add each number to its neighbor
rowvec += (rowvec >> 3)
# isolate the entries >= 3, by adding 1 to each number and
# getting the 2^2 bit
gt3 = ((rowvec + onevec) >> 2) & onevec
# subtract 3 from all the greater entries
rowvec -= gt3*3
rows -= 1
return [2,3,5][rowvec%4]
Given the sequence A and B consisting of N numbers that are permutations of 1,2,3,...,N. At each step, you choose a set S in sequence A in order from left to right (the numbers selected will be removed from A), then reverse S and add all elements in S to the beginning of the sequence A. Find a way to transform A into B in log2(n) steps.
Input: N <= 10^4 (number of elements of sequence A, B) and 2 permutations sequence A, B.
Output: K (Number of steps to convert A to B). The next K lines are the set of numbers S selected at each step.
Example:
Input:
5 // N
5 4 3 2 1 // A sequence
2 5 1 3 4 // B sequence
Output:
2
4 3 1
5 2
Step 0: S = {}, A = {5, 4, 3, 2, 1}
Step 1: S = {4, 3, 1}, A = {5, 2}. Then reverse S => S = {1, 3, 4}. Insert S to beginning of A => A = {1, 3, 4, 5, 2}
Step 2: S = {5, 2}, A = {1, 3, 4}. Then reverse S => S = {2, 5}. Insert S to beginning of A => A = {2, 5, 1, 3, 4}
My solution is to use backtracking to consider all possible choices of S in log2(n) steps. However, N is too large so is there a better approach? Thank you.
For each operation of combined selecting/removing/prepending, you're effectively sorting the elements relative to a "pivot", and preserving order. With this in mind, you can repeatedly "sort" the items in backwards order (by that I mean, you sort on the most significant bit last), to achieve a true sort.
For an explicit example, lets take an example sequence 7 3 1 8. Rewrite the terms with their respective positions in the final sorted list (which would be 1 3 7 8), to get 2 1 0 3.
7 -> 2 // 7 is at index 2 in the sorted array
3 -> 1 // 3 is at index 0 in the sorted array
1 -> 0 // so on
8 -> 3
This new array is equivalent to the original- we are just using indices to refer to the values indirectly (if you squint hard enough, we're kinda rewriting the unsorted list as pointers to the sorted list, rather than values).
Now, lets write these new values in binary:
2 10
1 01
0 00
3 11
If we were to sort this list, we'd first sort by the MSB (most significant bit) and then tiebreak only where necessary on the subsequent bit(s) until we're at the LSB (least significant bit). Equivalently, we can sort by the LSB first, and then sort all values on the next most significant bit, and continuing in this fashion until we're at the MSB. This will work, and correctly sort the list, as long as the sort is stable, that is- it doesn't change the order of elements that are considered equal.
Let's work this out by example: if we sorted these by the LSB, we'd get
2 10
0 00
1 01
3 11
-and then following that up with a sort on the MSB (but no tie-breaking logic this time), we'd get:
0 00
1 01
2 10
3 11
-which is the correct, sorted result.
Remember the "pivot" sorting note at the beginning? This is where we use that insight. We're going to take this transformed list 2 1 0 3, and sort it bit by bit, from the LSB to the MSB, with no tie-breaking. And to do so, we're going to pivot on the criteria <= 0.
This is effectively what we just did in our last example, so in the name of space I won't write it out again, but have a look again at what we did in each step. We took the elements with the bits we were checking that were equal to 0, and moved them to the beginning. First, we moved 2 (10) and 0 (00) to the beginning, and then the next iteration we moved 0 (00) and 1 (01) to the beginning. This is exactly what operation your challenge permits you to do.
Additionally, because our numbers are reduced to their indices, the max value is len(array)-1, and the number of bits is log2() of that, so overall we'll only need to do log2(n) steps, just as your problem statement asks.
Now, what does this look like in actual code?
from itertools import product
from math import log2, ceil
nums = [5, 9, 1, 3, 2, 7]
size = ceil(log2(len(nums)-1))
bit_table = list(product([0, 1], repeat=size))
idx_table = {x: i for i, x in enumerate(sorted(nums))}
for bit_idx in range(size)[::-1]:
subset_vals = [x for x in nums if bit_table[idx_table[x]][bit_idx] == 0]
nums.sort(key=lambda x: bit_table[idx_table[x]][bit_idx])
print(" ".join(map(str, subset_vals)))
You can of course use bitwise operators to accomplish the bit magic ((thing << bit_idx) & 1) if you want, and you could del slices of the list + prepend instead of .sort()ing, this is just a proof-of-concept to show that it actually works. The actual output being:
1 3 7
1 7 9 2
1 2 3 5
What is the output expected from the problem? If the goal is to find the earliest time when the frog can jump to the other side of the river How the answer for given sample comes out to be 6???
A small frog wants to get to the other side of a river. The frog is
initially located on one bank of the river (position 0) and wants to
get to the opposite bank (position X+1). Leaves fall from a tree onto
the surface of the river.
You are given a zero-indexed array A consisting of N integers
representing the falling leaves. A[K] represents the position where
one leaf falls at time K, measured in seconds.
The goal is to find the earliest time when the frog can jump to the
other side of the river. The frog can cross only when leaves appear at
every position across the river from 1 to X (that is, we want to find
the earliest moment when all the positions from 1 to X are covered by
leaves). You may assume that the speed of the current in the river is
negligibly small, i.e. the leaves do not change their positions once
they fall in the river.
For example, you are given integer X = 5 and array A such that:
A[0] = 1 A[1] = 3 A[2] = 1 A[3] = 4 A[4] = 2 A[5] = 3
A[6] = 5 A[7] = 4 In second 6, a leaf falls into position 5. This is
the earliest time when leaves appear in every position across the
river.
Time needed to reach the other end of river
or
The index in the array where the X is located
or
The index which contains the highest number?
To cross the river, the frog needs to have leaves in all positions 1 through X (5). The river is initially empty; one leaf per second falls into the river, at the location indicated by A[K], where K is the time-tick at which the leaf falls.
The given sequence for leaf positions, starting at time 0, is [1, 3, 1, 4, 2, 3, 5, 4]. Coverage of the river (given as 5 units wide) as time progresses is like this, where 0 denotes a leaf, - denotes open water:
0 0 - - - -
1 0 - 0 - -
2 0 - 0 - - There are now 2 leaves in position 1
3 0 - 0 0 -
4 0 0 0 0 -
5 0 0 0 0 - ... and a second leaf at 3
6 0 0 0 0 0 ... and now, the frog can cross.
RUBY
I am trying to implement this problem using Ruby with N complexity.. I will run a loop on falling leaves from tree and each time a falling leaf gets new position I will delete that position from the path array which I have created on the basis of steps required.
def frogJump(x,arr)
path_array =(1..x).to_a
count = 0
for position in arr
if path_array.size > 0
count+= 1
path_array.delete(position) if path_array.include?(position)
end
end
puts count
end
x= 5
arr = [1 ,2, 3 , 1 , 4 , 2 , 3 , 5 , 4]
frogJump(x,arr)
I was browsing through the internet when i found out that there is an algorithm called cycle sort which makes the least number of memory writes.But i am not able to find the algorithm anywhere.How to detect whether a cycle is there or not in an array?
Can anybody give a complete explanation for this algorithm?
The cycle sort algorithm is motivated by something called a cycle decomposition. Cycle decompositions are best explained by example. Let's suppose that you have this array:
4 3 0 1 2
Let's imagine that we have this sequence in sorted order, as shown here:
0 1 2 3 4
How would we have to shuffle this sorted array to get to the shuffled version? Well, let's place them side-by-side:
0 1 2 3 4
4 3 0 1 2
Let's start from the beginning. Notice that the number 0 got swapped to the position initially held by 2. The number 2, in turn, got swapped to the position initially held by 4. Finally, 4 got swapped to the position initially held by 0. In other words, the elements 0, 2, and 4 all were cycled forward one position. That leaves behind the numbers 1 and 3. Notice that 1 swaps to where 3 is and 3 swaps to where 1 is. In other words, the elements 1 and 3 were cycled forward one position.
As a result of the above observations, we'd say that the sequence 4 3 0 1 2 has cycle decomposition (0 2 4)(1 3). Here, each group of terms in parentheses means "circularly cycle these elements forward." This means to cycle 0 to the spot where 2 is, 2 to the spot where 4 is, and 4 to the spot where 0 was, then to cycle 1 to the spot where 3 was and 3 to the spot where 1 is.
If you have the cycle decomposition for a particular array, you can get it back in sorted order making the fewest number of writes by just cycling everything backward one spot. The idea behind cycle sort is to try to determine what the cycle decomposition of the input array is, then to reverse it to put everything back in its place.
Part of the challenge of this is figuring out where everything initially belongs since a cycle decomposition assumes you know this. Typically, cycle sort works by going to each element and counting up how many elements are smaller than it. This is expensive - it contributes to the Θ(n2) runtime of the sorting algorithm - but doesn't require any writes.
here's a python implementation if anyone needs
def cycleSort(vector):
writes = 0
# Loop through the vector to find cycles to rotate.
for cycleStart, item in enumerate(vector):
# Find where to put the item.
pos = cycleStart
for item2 in vector[cycleStart + 1:]:
if item2 < item:
pos += 1
# If the item is already there, this is not a cycle.
if pos == cycleStart:
continue
# Otherwise, put the item there or right after any duplicates.
while item == vector[pos]:
pos += 1
vector[pos], item = item, vector[pos]
writes += 1
# Rotate the rest of the cycle.
while pos != cycleStart:
# Find where to put the item.
pos = cycleStart
for item2 in vector[cycleStart + 1:]:
if item2 < item:
pos += 1
# Put the item there or right after any duplicates.
while item == vector[pos]:
pos += 1
vector[pos], item = item, vector[pos]
writes += 1
return writes
x = [0, 1, 2, 2, 2, 2, 1, 9, 3.5, 5, 8, 4, 7, 0, 6]
w = cycleSort(x)
print w, x
Found the following inteview q on the web:
You have an array of
0s and 1s and you want to output all the intervals (i, j) where the
number of 0s and numbers of 1s are equal. Example
pos = 0 1 2 3 4 5 6 7 8
0 1 0 0 1 1 1 1 0
One interval is (0, 1) because there the number
of 0 and 1 are equal. There are many other intervals, find all of them
in linear time.
I think there is no linear time algo, as there may be n^2 such intervals.
Am I right? How can I prove that there are n^2 such ?
This is the fastest way I can think of to do this, and it is linear to the number of intervals there are.
Let L be your original list of numbers and A be a hash of empty arrays where initially A[0] = [0]
sum = 0
for i in 0..n
if L[i] == 0:
sum--
A[sum].push(i)
elif L[i] == 1:
sum++
A[sum].push(i)
Now A is essentially an x y graph of the sum of the sequence (x is the index of the list, y is the sum). Every time there are two x values x1 and x2 to an y value, you have an interval (x1, x2] where the number of 0s and 1s is equal.
There are m(m-1)/2 (arithmetic sum from 1 to m - 1) intervals where the sum is 0 in every array M in A where m = M.length
Using your example to calculate A by hand we use this chart
L # 0 1 0 1 0 0 1 1 1 1 0
A keys 0 -1 0 -1 0 -1 -2 -1 0 1 2 1
L index -1 0 1 2 3 4 5 6 7 8 9 10
(I've added a # to represent the start of the list with an key of -1. Also removed all the numbers that are not 0 or 1 since they're just distractions) A will look like this:
[-2]->[5]
[-1]->[0, 2, 4, 6]
[0]->[-1, 1, 3, 7]
[1]->[8, 10]
[2]->[9]
For any M = [a1, a2, a3, ...], (ai + 1, aj) where j > i will be an interval with the same number of 0s as 1s. For example, in [-1]->[0, 2, 4, 6], the intervals are (1, 2), (1, 4), (1, 6), (3, 4), (3, 6), (5, 6).
Building the array A is O(n), but printing these intervals from A must be done in linear time to the number of intervals. In fact, that could be your proof that it is not quite possible to do this in linear time to n because it's possible to have more intervals than n and you need at least the number of interval iterations to print them all.
Unless of course you consider building A is enough to find all the intervals (since it's obvious from A what the intervals are), then it is linear to n :P
A linear solution is possible (sorry, earlier I argued that this had to be n^2) if you're careful to not actually print the results!
First, let's define a "score" for any set of zeros and ones as the number of ones minus the number of zeroes. So (0,1) has a score of 0, while (0) is -1 and (1,1) is 2.
Now, start from the right. If the right-most digit is a 0 then it can be combined with any group to the left that has a score of 1. So we need to know what groups are available to the left, indexed by score. This suggests a recursive procedure that accumulates groups with scores. The sweep process is O(n) and at each step the process has to check whether it has created a new group and extend the table of known groups. Checking for a new group is constant time (lookup in a hash table). Extending the table of known groups is also constant time (at first I thought it wasn't, but you can maintain a separate offset that avoids updating each entry in the table).
So we have a peculiar situation: each step of the process identifies a set of results of size O(n), but the calculation necessary to do this is constant time (within that step). So the process itself is still O(n) (proportional to the number of steps). Of course, actually printing the results (either during the step, or at the end) makes things O(n^2).
I'll write some Python code to test/demonstrate.
Here we go:
SCORE = [-1,1]
class Accumulator:
def __init__(self):
self.offset = 0
self.groups_to_right = {} # map from score to start index
self.even_groups = []
self.index = 0
def append(self, digit):
score = SCORE[digit]
# want existing groups at -score, to sum to zero
# but there's an offset to correct for, so we really want
# groups at -(score+offset)
corrected = -(score + self.offset)
if corrected in self.groups_to_right:
# if this were a linked list we could save a reference
# to the current value. it's not, so we need to filter
# on printing (see below)
self.even_groups.append(
(self.index, self.groups_to_right[corrected]))
# this updates all the known groups
self.offset += score
# this adds the new one, which should be at the index so that
# index + offset = score (so index = score - offset)
groups = self.groups_to_right.get(score-self.offset, [])
groups.append(self.index)
self.groups_to_right[score-self.offset] = groups
# and move on
self.index += 1
#print self.offset
#print self.groups_to_right
#print self.even_groups
#print self.index
def dump(self):
# printing the results does take longer, of course...
for (end, starts) in self.even_groups:
for start in starts:
# this discards the extra points that were added
# to the data after we added it to the results
# (avoidable with linked lists)
if start < end:
print (start, end)
#staticmethod
def run(input):
accumulator = Accumulator()
print input
for digit in input:
accumulator.append(digit)
accumulator.dump()
print
Accumulator.run([0,1,0,0,1,1,1,1,0])
And the output:
dynamic: python dynamic.py
[0, 1, 0, 0, 1, 1, 1, 1, 0]
(0, 1)
(1, 2)
(1, 4)
(3, 4)
(0, 5)
(2, 5)
(7, 8)
You might be worried that some additional processing (the filtering for start < end) is done in the dump routine that displays the results. But that's because I am working around Python's lack of linked lists (I want to both extend a list and save the previous value in constant time).
It may seem surprising that the result is of size O(n^2) while the process of finding the results is O(n), but it's easy to see how that is possible: at one "step" the process identifies a number of groups (of size O(n)) by associating the current point (self.index in append, or end in dump()) with a list of end points (self.groups_to_right[...] or ends).
Update: One further point. The table of "groups to the right" will have a "typical width" of sqrt(n) entries (this follows from the central limit theorem - it's basically a random walk in 1D). Since an entry is added at each step, the average length is also sqrt(n) (the n values shared out over sqrt(n) bins). That means that the expected time for this algorithm (ie with random inputs), if you include printing the results, is O(n^3/2) even though worst case is O(n^2)
Answering directly the question:
you have to constructing an example where there are more than O(N) matches:
let N be in the form 2^k, with the following input:
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 (here, N=16)
number of matches (where 0 is the starting character):
length #
2 N/2
4 N/2 - 1
6 N/2 - 2
8 N/2 - 3
..
N 1
The total number of matches (starting with 0) is: (1+N/2) * (N/2) / 2 = N^2/8 + N/4
The matches starting with 1 are almost the same, expect that it is one less for each length.
Total: (N^2/8 + N/4) * 2 - N/2 = N^2/4
Every interval will contain at least one sequence of either (0,1) or (1,0). Therefore, it's simply a matter of finding every occurance of (0,1) or (1,0), then for each seeing if it is adjacent to an existing solution or if the two bookend elements form another solution.
With a bit of storage trickery you will be able to find all solutions in linear time. Enumerating them will be O(N^2), but you should be able to encode them in O(N) space.