What is the most efficient way to sort a number list into alternating low-high-low sequences? - algorithm

Suppose you are given an unsorted list of positive integers, and you wish to order them in a manner such that the elements alternate as: (less than preceding element), (greater than preceding element), (less than preceding element), etc... The very first element in the output list may ignore the rule. So for example, suppose your list was: 1,4,9,2,7,5,3,8,6.
One correct output would be...
1,9,2,8,3,7,4,6,5
Another would be...
3,4,2,7,5,6,1,9,8
Assume that the list contains no duplicates, is arbitrarily large, and is not already sorted.
What is the most processing efficient algorithm to achieve this?
Now, the standard approach would be to simply sort the list in ascending order first, and then peel elements from the ends of the list in alternation. However, I'd like to know: Is there a more time-efficient way to do this without first sorting the list?
My reason for asking: (read this only if you care)
Apparently this is a question my sister's boyfriend poses to people at job interviews out in San Francisco. My sister asked me the question, and I immediately came up with the standard response. That's what everyone answers. However, apparently one girl came up with a completely different solution that does not require sorting the list, and it appears to work. My sister couldn't explain to me this solution, but the idea has been confounding me since last night. I'd appreciate any help! Thanks!

You can do this in O(n) by placing each element in turn at the end, or at the penultimate position based on a comparison with the current last element.
For example,
1,4,9,2,7,5,3,8,6
Place 1 at end, current list [1]
4>1 true so place 4 at end, current list [1,4]
9<4 false so place 9 at penultimate position [1,9,4]
2>4 false so place 2 at penultimate [1,9,2,4]
7<4 false so place 7 at penultimate [1,9,2,7,4]
5>4 true so place 5 at end [1,9,2,7,4,5]
3<5 true so place 3 at end [1,9,2,7,4,5,3]
8>3 true so place 8 at end [1,9,2,7,4,5,3,8]
6<8 true so place 6 at end [1,9,2,7,4,5,3,8,6]
Note that the equality tests alternate, and that we place at the end if the equality is true, or at the penultimate position if it is not true.
Example Python Code
A=[1,4,9,2,7,5,3,8,6]
B=[]
for i,a in enumerate(A):
if i==0 or (i&1 and a>B[-1]) or (i&1==0 and a<B[-1]):
B.insert(i,a)
else:
B.insert(i-1,a)
print B

One solution is this. Given in Pseudocode.
Assuming, nums has at least two elements and all elements in nums are distinct.
nums = [list of numbers]
if nums[0] < nums[1]: last_state = INCREASING else: last_state = DECREASING
for i = 2 to len(nums - 1):
if last_state = INCREASING:
if nums[i] > nums[i-1]:
swap (nums[i], nums[i-1])
last_state = DECREASING
else
if nums[i] < nums[i-1]:
swap (nums[i], nums[i-1])
last_state = INCREASING
Proof of correctness:
After each loop iteration, elements upto index i in nums remain alternating and last_state is represent the order of i th and i-1 th elements.
Note that a swapping happens only if last 3 items considered are in order. (Increasing or Decreasing) Therefore, if we swapped ith element with i-1 th element, the order of i-2 th element and i-1th element will not change.

Related

Code times out Ruby

According as the number of elements in a set of numbers is odd or even, median of that set is defined respectively as the middle value or the average of the two middle values in the list that results when the set is sorted.
Below is code for calculating the "running" median of a list of numbers. "Running" median is a dynamic median which is re-calculated with the appearance of a new number as the list is scanned for all numbers that have appeared thus far. Input is an integer n followed by a list of n integers, and output should be the "running" median of the list as the list is scanned. For example,
3
4
1
5
should yield
4
2.5
4
because 4 is the median of [4], 2.5 ((1+4)/2)is the median of [4,1] and 4 again is the median of [4,1,5].
My program works correctly, but it times out on a certain test on very large inputs. I suspect that this copying step is the problem.
a=(a[0,(k=a.index(a.bsearch{|x|x>=t}))].push(t) + a[k,a.length-k])
But I am not sure because this copy is meant to be a shallow copy as far as I know. Also, I am not doing a regular insert anywhere, which would involved shifting elements and thus result in slowing down the code, into the array that contains the numbers.
n=gets.chomp.to_i
a=[]
n.times do
t=gets.chomp.to_i
a==[]||(t<=a.first) ? a.unshift(t): t>=a.last ? a.push(t) : a=(a[0,(k=a.index(a.bsearch{|x|x>=t}))].push(t) + a[k,a.length-k])
p (l=a.count)%2==0 ? ((a[l/2] + a[l/2-1])/2.0).round(1):a[(l-1)/2].round(1)
end
Can anybody point out where the problem could be? Thank you.
Here is a less obfuscated version.
n=gets.chomp.to_i
a=[]
n.times do
t=gets.chomp.to_i
if a==[]||(t<=a.first)
a.unshift(t)
else
k=a.index(a.bsearch{|x|x>=t})
if k.nil? == true
k=a.length
end
a=a[0,k].push(t)+ a[k,a.length-k]
end
p (l=a.count)%2==0 ? ((a[l/2] + a[l/2-1])/2.0).round(1):a[(l-1)/2].round(1)
end
I think...
a=(a[0,(k=a.index(a.bsearch{|x|x>=t}))].push(t) + a[k,a.length-k])
...because it's creating a new array every time, is likely an expensive operation as the array gets bigger.
Better might actually be something that mutates the original array.
a.insert((a.index{|x|x>t} || -1), t)
It also handles the edge cases of less than first or greater than last, so you can remove those tests. Also works on first pass (empty array a)

How to "sort" elements of 2 possible values in place in linear time? [duplicate]

This question already has answers here:
Stable separation for two classes of elements in an array
(3 answers)
Closed 9 years ago.
Suppose I have a function f and array of elements.
The function returns A or B for any element; you could visualize the elements this way ABBAABABAA.
I need to sort the elements according to the function, so the result is: AAAAAABBBB
The number of A values doesn't have to equal the number of B values. The total number of elements can be arbitrary (not fixed). Note that you don't sort chars, you sort objects that have a single char representation.
Few more things:
the sort should take linear time - O(n),
it should be performed in place,
it should be a stable sort.
Any ideas?
Note: if the above is not possible, do you have ideas for algorithms sacrificing one of the above requirements?
If it has to be linear and in-place, you could do a semi-stable version. By semi-stable I mean that A or B could be stable, but not both. Similar to Dukeling's answer, but you move both iterators from the same side:
a = first A
b = first B
loop while next A exists
if b < a
swap a,b elements
b = next B
a = next A
else
a = next A
With the sample string ABBAABABAA, you get:
ABBAABABAA
AABBABABAA
AAABBBABAA
AAAABBBBAA
AAAAABBBBA
AAAAAABBBB
on each turn, if you make a swap you move both, if not you just move a. This will keep A stable, but B will lose its ordering. To keep B stable instead, start from the end and work your way left.
It may be possible to do it with full stability, but I don't see how.
A stable sort might not be possible with the other given constraints, so here's an unstable sort that's similar to the partition step of quick-sort.
Have 2 iterators, one starting on the left, one starting on the right.
While there's a B at the right iterator, decrement the iterator.
While there's an A at the left iterator, increment the iterator.
If the iterators haven't crossed each other, swap their elements and repeat from 2.
Lets say,
Object_Array[1...N]
Type_A objs are A1,A2,...Ai
Type_B objs are B1,B2,...Bj
i+j = N
FOR i=1 :N
if Object_Array[i] is of Type_A
obj_A_count=obj_A_count+1
else
obj_B_count=obj_B_count+1
LOOP
Fill the resultant array with obj_A and obj_B with their respective counts depending on obj_A > obj_B
The following should work in linear time for a doubly-linked list. Because up to N insertion/deletions are involved that may cause quadratic time for arrays though.
Find the location where the first B should be after "sorting". This can be done in linear time by counting As.
Start with 3 iterators: iterA starts from the beginning of the container, and iterB starts from the above location where As and Bs should meet, and iterMiddle starts one element prior to iterB.
With iterA skip over As, find the 1st B, and move the object from iterA to iterB->previous position. Now iterA points to the next element after where the moved element used to be, and the moved element is now just before iterB.
Continue with step 3 until you reach iterMiddle. After that all elements between first() and iterB-1 are As.
Now set iterA to iterB-1.
Skip over Bs with iterB. When A is found move it to just after iterA and increment iterA.
Continue step 6 until iterB reaches end().
This would work as a stable sort for any container. The algorithm includes O(N) insertion/deletion, which is linear time for containers with O(1) insertions/deletions, but, alas, O(N^2) for arrays. Applicability in you case depends on whether the container is an array rather than a list.
If your data structure is a linked list instead of an array, you should be able to meet all three of your constraints. You just skim through the list and accumulating and moving the "B"s will be trivial pointer changes. Pseudo code below:
sort(list) {
node = list.head, blast = null, bhead = null
while(node != null) {
nextnode = node.next
if(node.val == "a") {
if(blast != null){
//move the 'a' to the front of the 'B' list
bhead.prev.next = node, node.prev = bhead.prev
blast.next = node.next, node.next.prev = blast
node.next = bhead, bhead.prev = node
}
}
else if(node.val == "b") {
if(blast == null)
bhead = blast = node
else //accumulate the "b"s..
blast = node
}
3
node = nextnode
}
}
So, you can do this in an array, but the memcopies, that emulate the list swap, will make it quiet slow for large arrays.
Firstly, assuming the array of A's and B's is either generated or read-in, I wonder why not avoid this question entirely by simply applying f as the list is being accumulated into memory into two lists that would subsequently be merged.
Otherwise, we can posit an alternative solution in O(n) time and O(1) space that may be sufficient depending on Sir Bohumil's ultimate needs:
Traverse the list and sort each segment of 1,000,000 elements in-place using the permutation cycles of the segment (once this step is done, the list could technically be sorted in-place by recursively swapping the inner-blocks, e.g., ABB AAB -> AAABBB, but that may be too time-consuming without extra space). Traverse the list again and use the same constant space to store, in two interval trees, the pointers to each block of A's and B's. For example, segments of 4,
ABBAABABAA => AABB AABB AA + pointers to blocks of A's and B's
Sequential access to A's or B's would be immediately available, and random access would come from using the interval tree to locate a specific A or B. One option could be to have the intervals number the A's and B's; e.g., to find the 4th A, look for the interval containing 4.
For sorting, an array of 1,000,000 four-byte elements (3.8MB) would suffice to store the indexes, using one bit in each element for recording visited indexes during the swaps; and two temporary variables the size of the largest A or B. For a list of one billion elements, the maximum combined interval trees would number 4000 intervals. Using 128 bits per interval, we can easily store numbered intervals for the A's and B's, and we can use the unused bits as pointers to the block index (10 bits) and offset in the case of B (20 bits). 4000*16 bytes = 62.5KB. We can store an additional array with only the B blocks' offsets in 4KB. Total space under 5MB for a list of one billion elements. (Space is in fact dependent on n but because it is extremely small in relation to n, for all practical purposes, we may consider it O(1).)
Time for sorting the million-element segments would be - one pass to count and index (here we can also accumulate the intervals and B offsets) and one pass to sort. Constructing the interval tree is O(nlogn) but n here is only 4000 (0.00005 of the one-billion list count). Total time O(2n) = O(n)
This should be possible with a bit of dynamic programming.
It works a bit like counting sort, but with a key difference. Make arrays of size n for both a and b count_a[n] and count_b[n]. Fill these arrays with how many As or Bs there has been before index i.
After just one loop, we can use these arrays to look up the correct index for any element in O(1). Like this:
int final_index(char id, int pos){
if(id == 'A')
return count_a[pos];
else
return count_a[n-1] + count_b[pos];
}
Finally, to meet the total O(n) requirement, the swapping needs to be done in a smart order. One simple option is to have recursive swapping procedure that doesn't actually perform any swapping until both elements would be placed in correct final positions. EDIT: This is actually not true. Even naive swapping will have O(n) swaps. But doing this recursive strategy will give you absolute minimum required swaps.
Note that in general case this would be very bad sorting algorithm since it has memory requirement of O(n * element value range).

Disperse Duplicates in an Array

Source : Google Interview Question
Write a routine to ensure that identical elements in the input are maximally spread in the output?
Basically, we need to place the same elements,in such a way , that the TOTAL spreading is as maximal as possible.
Example:
Input: {1,1,2,3,2,3}
Possible Output: {1,2,3,1,2,3}
Total dispersion = Difference between position of 1's + 2's + 3's = 4-1 + 5-2 + 6-3 = 9 .
I am NOT AT ALL sure, if there's an optimal polynomial time algorithm available for this.Also,no other detail is provided for the question other than this .
What i thought is,calculate the frequency of each element in the input,then arrange them in the output,each distinct element at a time,until all the frequencies are exhausted.
I am not sure of my approach .
Any approaches/ideas people .
I believe this simple algorithm would work:
count the number of occurrences of each distinct element.
make a new list
add one instance of all elements that occur more than once to the list (order within each group does not matter)
add one instance of all unique elements to the list
add one instance of all elements that occur more than once to the list
add one instance of all elements that occur more than twice to the list
add one instance of all elements that occur more than trice to the list
...
Now, this will intuitively not give a good spread:
for {1, 1, 1, 1, 2, 3, 4} ==> {1, 2, 3, 4, 1, 1, 1}
for {1, 1, 1, 2, 2, 2, 3, 4} ==> {1, 2, 3, 4, 1, 2, 1, 2}
However, i think this is the best spread you can get given the scoring function provided.
Since the dispersion score counts the sum of the distances instead of the squared sum of the distances, you can have several duplicates close together, as long as you have a large gap somewhere else to compensate.
for a sum-of-squared-distances score, the problem becomes harder.
Perhaps the interview question hinged on the candidate recognizing this weakness in the scoring function?
In perl
#a=(9,9,9,2,2,2,1,1,1);
then make a hash table of the counts of different numbers in the list, like a frequency table
map { $x{$_}++ } #a;
then repeatedly walk through all the keys found, with the keys in a known order and add the appropriate number of individual numbers to an output list until all the keys are exhausted
#r=();
$g=1;
while( $g == 1 ) {
$g=0;
for my $n (sort keys %x)
{
if ($x{$n}>1) {
push #r, $n;
$x{$n}--;
$g=1
}
}
}
I'm sure that this could be adapted to any programming language that supports hash tables
python code for algorithm suggested by Vorsprung and HugoRune:
from collections import Counter, defaultdict
def max_spread(data):
cnt = Counter()
for i in data: cnt[i] += 1
res, num = [], list(cnt)
while len(cnt) > 0:
for i in num:
if num[i] > 0:
res.append(i)
cnt[i] -= 1
if cnt[i] == 0: del cnt[i]
return res
def calc_spread(data):
d = defaultdict()
for i, v in enumerate(data):
d.setdefault(v, []).append(i)
return sum([max(x) - min(x) for _, x in d.items()])
HugoRune's answer takes some advantage of the unusual scoring function but we can actually do even better: suppose there are d distinct non-unique values, then the only thing that is required for a solution to be optimal is that the first d values in the output must consist of these in any order, and likewise the last d values in the output must consist of these values in any (i.e. possibly a different) order. (This implies that all unique numbers appear between the first and last instance of every non-unique number.)
The relative order of the first copies of non-unique numbers doesn't matter, and likewise nor does the relative order of their last copies. Suppose the values 1 and 2 both appear multiple times in the input, and that we have built a candidate solution obeying the condition I gave in the first paragraph that has the first copy of 1 at position i and the first copy of 2 at position j > i. Now suppose we swap these two elements. Element 1 has been pushed j - i positions to the right, so its score contribution will drop by j - i. But element 2 has been pushed j - i positions to the left, so its score contribution will increase by j - i. These cancel out, leaving the total score unchanged.
Now, any permutation of elements can be achieved by swapping elements in the following way: swap the element in position 1 with the element that should be at position 1, then do the same for position 2, and so on. After the ith step, the first i elements of the permutation are correct. We know that every swap leaves the scoring function unchanged, and a permutation is just a sequence of swaps, so every permutation also leaves the scoring function unchanged! This is true at for the d elements at both ends of the output array.
When 3 or more copies of a number exist, only the position of the first and last copy contribute to the distance for that number. It doesn't matter where the middle ones go. I'll call the elements between the 2 blocks of d elements at either end the "central" elements. They consist of the unique elements, as well as some number of copies of all those non-unique elements that appear at least 3 times. As before, it's easy to see that any permutation of these "central" elements corresponds to a sequence of swaps, and that any such swap will leave the overall score unchanged (in fact it's even simpler than before, since swapping two central elements does not even change the score contribution of either of these elements).
This leads to a simple O(nlog n) algorithm (or O(n) if you use bucket sort for the first step) to generate a solution array Y from a length-n input array X:
Sort the input array X.
Use a single pass through X to count the number of distinct non-unique elements. Call this d.
Set i, j and k to 0.
While i < n:
If X[i+1] == X[i], we have a non-unique element:
Set Y[j] = Y[n-j-1] = X[i].
Increment i twice, and increment j once.
While X[i] == X[i-1]:
Set Y[d+k] = X[i].
Increment i and k.
Otherwise we have a unique element:
Set Y[d+k] = X[i].
Increment i and k.

On counting pairs of words that differ by one letter

Let us consider n words, each of length k. Those words consist of letters over an alphabet (whose cardinality is n) with defined order. The task is to derive an O(nk) algorithm to count the number of pairs of words that differ by one position (no matter which one exactly, as long as it's only a single position).
For instance, in the following set of words (n = 5, k = 4):
abcd, abdd, adcb, adcd, aecd
there are 5 such pairs: (abcd, abdd), (abcd, adcd), (abcd, aecd), (adcb, adcd), (adcd, aecd).
So far I've managed to find an algorithm that solves a slightly easier problem: counting the number of pairs of words that differ by one GIVEN position (i-th). In order to do this I swap the letter at the ith position with the last letter within each word, perform a Radix sort (ignoring the last position in each word - formerly the ith position), linearly detect words whose letters at the first 1 to k-1 positions are the same, eventually count the number of occurrences of each letter at the last (originally ith) position within each set of duplicates and calculate the desired pairs (the last part is simple).
However, the algorithm above doesn't seem to be applicable to the main problem (under the O(nk) constraint) - at least not without some modifications. Any idea how to solve this?
Assuming n and k isn't too large so that this will fit into memory:
Have a set with the first letter removed, one with the second letter removed, one with the third letter removed, etc. Technically this has to be a map from strings to counts.
Run through the list, simply add the current element to each of the maps (obviously by removing the applicable letter first) (if it already exists, add the count to totalPairs and increment it by one).
Then totalPairs is the desired value.
EDIT:
Complexity:
This should be O(n.k.logn).
You can use a map that uses hashing (e.g. HashMap in Java), instead of a sorted map for a theoretical complexity of O(nk) (though I've generally found a hash map to be slower than a sorted tree-based map).
Improvement:
A small alteration on this is to have a map of the first 2 letters removed to 2 maps, one with first letter removed and one with second letter removed, and have the same for the 3rd and 4th letters, and so on.
Then put these into maps with 4 letters removed and those into maps with 8 letters removed and so on, up to half the letters removed.
The complexity of this is:
You do 2 lookups into 2 sorted sets containing maximum k elements (for each half).
For each of these you do 2 lookups into 2 sorted sets again (for each quarter).
So the number of lookups is 2 + 4 + 8 + ... + k/2 + k, which I believe is O(k).
I may be wrong here, but, worst case, the number of elements in any given map is n, but this will cause all other maps to only have 1 element, so still O(logn), but for each n (not each n.k).
So I think that's O(n.(logn + k)).
.
EDIT 2:
Example of my maps (without the improvement):
(x-1) means x maps to 1.
Let's say we have abcd, abdd, adcb, adcd, aecd.
The first map would be (bcd-1), (bdd-1), (dcb-1), (dcd-1), (ecd-1).
The second map would be (acd-3), (add-1), (acb-1) (for 4th and 5th, value already existed, so increment).
The third map : (abd-2), (adb-1), (add-1), (aed-1) (2nd already existed).
The fourth map : (abc-1), (abd-1), (adc-2), (aec-1) (4th already existed).
totalPairs = 0
For second map - acd, for the 4th, we add 1, for the 5th we add 2.
totalPairs = 3
For third map - abd, for the 2th, we add 1.
totalPairs = 4
For fourth map - adc, for the 4th, we add 1.
totalPairs = 5.
Partial example of improved maps:
Same input as above.
Map of first 2 letters removed to maps of 1st and 2nd letter removed:
(cd-{ {(bcd-1)}, {(acd-1)} }),
(dd-{ {(bdd-1)}, {(add-1)} }),
(cb-{ {(dcb-1)}, {(acb-1)} }),
(cd-{ {(dcd-1)}, {(acd-1)} }),
(cd-{ {(ecd-1)}, {(acd-1)} })
The above is a map consisting of an element cd mapped to 2 maps, one containing one element (bcd-1) and the other containing (acd-1).
But for the 4th and 5th cd already existed, so, rather than generating the above, it will be added to that map instead, as follows:
(cd-{ {(bcd-1, dcd-1, ecd-1)}, {(acd-3)} }),
(dd-{ {(bdd-1)}, {(add-1)} }),
(cb-{ {(dcb-1)}, {(acb-1)} })
You can put each word into an array.Pop out elements from that array one by one.Then compare the resulting arrays.Finally you add back the popped element to get back the original arrays.
The popped elements from both the arrays must not be same.
Count number of cases where this occurs and finally divide it by 2 to get the exact solution
Think about how you would enumerate the language - you would likely use a recursive algorithm. Recursive algorithms map onto tree structures. If you construct such a tree, each divergence represents a difference of one letter, and each leaf will represent a word in the language.
It's been two months since I submitted the problem here. I have discussed it with my peers in the meantime and would like to share the outcome.
The main idea is similar to the one presented by Dukeling. For each word A and for each ith position within that word we are going to consider a tuple: (prefix, suffix, letter at the ith position), i.e. (A[1..i-1], A[i+1..n], A[i]). If i is either 1 or n, then the applicable substring is considered empty (these are simple boundary cases).
Having these tuples in hand, we should be able to apply the reasoning I provided in my first post to count the number of pairs of different words. All we have to do is sort the tuples by the prefix and suffix values (separately for each i) - then, words with letters equal at all but ith position will be adjacent to each other.
Here though is the technical part I am lacking. So as to make the sorting procedure (RadixSort appears to be the way to go) meet the O(nk) constraint, we might want to assign labels to our prefixes and suffixes (we only need n labels for each i). I am not quite sure how to go about the labelling stuff. (Sure, we might do some hashing instead, but I am pretty confident the former solution is viable).
While this is not an entirely complete solution, I believe it casts some light on the possible way to tackle this problem and that is why I posted it here. If anyone comes up with an idea of how to do the labelling part, I will implement it in this post.
How's the following Python solution?
import string
def one_apart(words, word):
res = set()
for i, _ in enumerate(word):
for c in string.ascii_lowercase:
w = word[:i] + c + word[i+1:]
if w != word and w in words:
res.add(w)
return res
pairs = set()
for w in words:
for other in one_apart(words, w):
pairs.add(frozenset((w, other)))
for pair in pairs:
print(pair)
Output:
frozenset({'abcd', 'adcd'})
frozenset({'aecd', 'adcd'})
frozenset({'adcb', 'adcd'})
frozenset({'abcd', 'aecd'})
frozenset({'abcd', 'abdd'})

Sorting Linked list : random or nearly sorted?

i have a linked list and i want to check whether its nearly sorted or random? Can anyone suggest how to do that??
Right now what i am trying to do is run upto half of list and compare adjacent elements to check whether given list is nearly sorted or otherwise. But difficulty is that this method is not full proof and i want something concrete.
For example, if you have 100 item then the scale will be out of 100. (Score of how much the list is sorted.) If you have all the list sorted you have a score of 100. If the list is sorted in backwards then you have a 0 score. You will be checking each of the adjacent and decide if the pair is sorted or not (0th and 1st, 1st and 2nd, 2nd and 3rd and so on). Therefore you will have a scale between 0 and 100 (or the linked list size for your case). There are a lot of heuristics about the "sorting scale" but this might be one.
If you want to involve the amplitude of your data you can do (Python3):
import random
l = [random.random() for x in range(100)]
s = 0
for i,x in enumerate(l[0:50]):
s += l[i+1] - x
print(s)
if you'd rather just look at how many values are sorted, replace the s+= line with
s += 1 if l[i+1] > x else 0

Resources