Exact amount of comparisions in Insertion Sort - algorithm

I want to get number of permutations of {1, ..., n} for which Insertion Sort does exactly n(n-1)/2 comparisions.
For example, for {1, 2, 3, 4} we got (4, 3, 2, 1), (3, 4, 2, 1), (4, 2, 3, 1) etc. - for all of them InsertionSort does 4*3/2 = 6 comparisions.
Anybody know some exact formula for that?
I am thinking about something like (n-1) + 1 = n, where
1 stands for reverse sequence and then we can swap all of (n-1) pairs in reverse sequence.

Here is a hint. The complete list for (1, 2, 3, 4) are:
(4, 3, 2, 1)
(3, 4, 2, 1)
(4, 2, 3, 1)
(2, 4, 3, 1)
(4, 3, 1, 2)
(3, 4, 1, 2)
(4, 1, 3, 2)
(1, 4, 3, 2)
Look at it from last column to first.
Walk step by step through the insertion sorts. See where they merge. Do you see a pattern there?
Reversing it, can you figure out how I generated this list? Can you prove that the list is complete?
The why is what matters here. Just saying 2n-1 is useless.

n(n-1)/2 is the sum of all elements in the range (1, n - 1). Since your sequence has length n, you can expand that range to (0, n - 1).
The number of swaps for each insertion would be:
run # list value swaps
1 [] a 0 (no swaps possible)
2 [a] b 1
3 [b, a] c 2
...
10 [i,...,a] j 9
...
n [...] ? n - 1
So we need to move every element through the entire list in order to achieve the required count of swaps. The number of comparisons can be at most one higher than the number of swaps, which means each value that is being inserted must either be placed at the first or second index of the resulting list. Or
Put differently, assuming ascending ordering of the output:
The input list should in general be a nearly descending list, where each element in the list may be preceded by at most one element that is not larger than the element in question.

Related

Data Structure to convert a stream of integers into a list of range

A function is given with a method to get the next integer from a stream of integers. The numbers are fetched sequentially from the stream. How will we go about producing a summary of integers encountered till now?
Given a list of numbers, the summary will consist of the ranges of numbers. Example: The list till now = [1,5,4,2,7] then summary = [[1-2],[4-5],7]
Put the number in ranges if they are continuous.
My Thoughts:
Approach 1:
Maintain the sorted numbers. So when we fetch a new number from a stream, we can use binary search to find the location of the number in the list and insert the element so that the resulting list is sorted. But since this is a list, I think inserting the element will be an O(N) operation.
Approach 2:
Use Balanced binary search trees like Red, Black, or AVL. Each insertion will be O(log N)
and in order will yield the sorted array from which one can compute the range in O(N)
Approach 2 looks like a better approach if I am not making any mistakes. I am unsure if there is a better way to solve this issue.
I'd not keep the original numbers, but aggregate them to ranges on the fly. This has the potential to reduce the number of elements by quite some factor (depending on the ordering and distribution of the incoming values). The task itself seems to imply that you expect contiguous ranges of integers to appear quite frequently in the input.
Then a newly incoming number can fall into one of a few cases:
It is already contained in some range: then simply ignore the number (this is only relevant if duplicate inputs can happen).
It is adjacent to none of the ranges so far: create a new single-element range.
It is adjacent to exactly one range: extend that range by 1, downward or upward.
It is adjacent to two ranges (i.e. fills the gap): merge the two ranges.
For the data structure holding the ranges, you want a good performance for the following operations:
Find the place (position) for a given number.
Insert a new element (range) at a given place.
Merge two (neighbor) elements. This can be broken down into:
Remove an element at a given place.
Modify an element at a given place.
Depending on the expected number und sparsity of ranges, a sorted list of ranges might do. Otherwise, some kind of search tree might turn out helpful.
Anyway, start with the most readable approach, measure performance for typical cases, and decide whether some optimization is necessary.
I suggest maintaining a hashmap that maps each integer seen so far to the interval it belongs to.
Make sure that two numbers that are part of the same interval will point to the same interval object, not to copies; so that if you update an interval to extend it, all numbers can see it.
All operations are O(1), except the operation "merge two intervals" that happens if the stream produces integer x when we have two intervals [a, x - 1] and [x + 1, b]. The merge operation is proportional to the length of the shortest of these two intervals.
As a result, for a stream of n integers, the algorithm's complexity is O(n) in the best-case (where at most a few big merges happen) and O(n log n) in the worst-case (when we keep merging lots of intervals).
In python:
def add_element(intervals, x):
if x in intervals: # do not do anything
pass
elif x + 1 in intervals and x - 1 in intervals: # merge two intervals
i = intervals[x - 1]
j = intervals[x + 1]
if i[1]-i[0] > j[1]-j[0]: # j is shorter: update i, and make everything in j point to i
i[1] = j[1]
for y in range(j[0] - 1, j[1]+1):
intervals[y] = i
else: # i is shorter: update j, and make everything in i point to j
j[0] = i[0]
for y in range(i[0], i[1] + 2):
intervals[y] = j
elif x + 1 in intervals: # extend one interval to the left
i = intervals[x + 1]
i[0] = x
intervals[x] = i
elif x - 1 in intervals: # extend one interval to the right
i = intervals[x - 1]
i[1] = x
intervals[x] = i
else: # add a singleton
intervals[x] = [x,x]
return intervals
from random import shuffle
def main():
stream = list(range(10)) * 2
shuffle(stream)
print(stream)
intervals = {}
for x in stream:
intervals = add_element(intervals, x)
print(x)
print(set(map(tuple, intervals.values()))) # this line terribly inefficient because I'm lazy
if __name__=='__main__':
main()
Output:
[1, 5, 8, 3, 9, 6, 7, 9, 3, 0, 6, 5, 8, 1, 4, 7, 2, 2, 0, 4]
1
{(1, 1)}
5
{(1, 1), (5, 5)}
8
{(8, 8), (1, 1), (5, 5)}
3
{(8, 8), (1, 1), (5, 5), (3, 3)}
9
{(8, 9), (1, 1), (5, 5), (3, 3)}
6
{(3, 3), (1, 1), (8, 9), (5, 6)}
7
{(5, 9), (1, 1), (3, 3)}
9
{(5, 9), (1, 1), (3, 3)}
3
{(5, 9), (1, 1), (3, 3)}
0
{(0, 1), (5, 9), (3, 3)}
6
{(0, 1), (5, 9), (3, 3)}
5
{(0, 1), (5, 9), (3, 3)}
8
{(0, 1), (5, 9), (3, 3)}
1
{(0, 1), (5, 9), (3, 3)}
4
{(0, 1), (3, 9)}
7
{(0, 1), (3, 9)}
2
{(0, 9)}
2
{(0, 9)}
0
{(0, 9)}
4
{(0, 9)}
You could use a Disjoint Set Forest implementation for this. If well-implemented, it gives a near linear time complexity for inserting 𝑛 elements into it. The amortized running time of each insert operation is Θ(Ξ±(𝑛)) where Ξ±(𝑛) is the inverse Ackermann function. For all practical purposes we can not distinguish this from O(1).
The extraction of the ranges can have a time complexity of O(π‘˜), where π‘˜ is the number of ranges, provided that the disjoint set maintains the set of root nodes. If the ranges need to be sorted, then this extraction will have a time complexity of O(π‘˜logπ‘˜), as it will then just perform the sort-operation on it.
Here is an implementation in Python:
class Node:
def __init__(self, value):
self.low = value
self.parent = self
self.size = 1
def find(self): # Union-Find: Path splitting
node = self
while node.parent is not node:
node, node.parent = node.parent, node.parent.parent
return node
class Ranges:
def __init__(self):
self.nums = dict()
self.roots = set()
def union(self, a, b): # Union-Find: Size-based merge
a = a.find()
b = b.find()
if a is not b:
if a.size > b.size:
a, b = b, a
self.roots.remove(a) # Keep track of roots
a.parent = b
b.low = min(a.low, b.low)
b.size = a.size + b.size
def add(self, n):
if n not in self.nums:
self.nums[n] = node = Node(n)
self.roots.add(node)
if (n+1) in self.nums:
self.union(node, self.nums[n+1])
if (n-1) in self.nums:
self.union(node, self.nums[n-1])
def get(self):
return sorted((node.low, node.low + node.size - 1) for node in self.roots)
# example run
ranges = Ranges()
for n in 4, 7, 1, 6, 2, 9, 5:
ranges.add(n)
print(ranges.get()) # [(1, 2), (4, 7), (9, 9)]

Cartesian product but remove duplicates up to cyclic permutations

Given two integers n and r, I want to generate all possible combinations with the following rules:
There are n distinct numbers to choose from, 1, 2, ..., n;
Each combination should have r elements;
A combination may contain more than one of an element, for instance (1,2,2) is valid;
Order matters, i.e. (1,2,3) and (1,3,2) are considered distinct;
However, two combinations are considered equivalent if one is a cyclic permutation of the other; for instance, (1,2,3) and (2,3,1) are considered duplicates.
Examples:
n=3, r=2
11 distinct combinations
(1,1,1), (1,1,2), (1,1,3), (1,2,2), (1,2,3), (1,3,2), (1,3,3), (2,2,2), (2,2,3), (2,3,3) and (3,3,3)
n=2, r=4
6 distinct combinations
(1,1,1,1), (1,1,1,2), (1,1,2,2), (1,2,1,2), (1,2,2,2), (2,2,2,2)
What is the algorithm for it? And how to implement it in c++?
Thank you in advance for advice.
Here is a naive solution in python:
Generate all combinations from the Cartesian product of {1, 2, ...,n} with itself r times;
Only keep one representative combination for each equivalency class; drop all other combinations that are equivalent to this representative combination.
This means we must have some way to compare combinations, and for instance, only keep the smallest combination of every equivalency class.
from itertools import product
def is_representative(comb):
return all(comb[i:] + comb[:i] >= comb
for i in range(1, len(comb)))
def cartesian_product_up_to_cyclic_permutations(n, r):
return filter(is_representative,
product(range(n), repeat=r))
print(list(cartesian_product_up_to_cyclic_permutations(3, 3)))
# [(0, 0, 0), (0, 0, 1), (0, 0, 2), (0, 1, 1), (0, 1, 2), (0, 2, 1), (0, 2, 2), (1, 1, 1), (1, 1, 2), (1, 2, 2), (2, 2, 2)]
print(list(cartesian_product_up_to_cyclic_permutations(2, 4)))
# [(0, 0, 0, 0), (0, 0, 0, 1), (0, 0, 1, 1), (0, 1, 0, 1), (0, 1, 1, 1), (1, 1, 1, 1)]
You mentioned that you wanted to implement the algorithm in C++. The product function in the python code behaves just like a big for-loop that generates all the combinations in the Cartesian product. See this related question to implement Cartesian product in C++: Is it possible to execute n number of nested "loops(any)" where n is given?.

Dynamic programming approach to from number list (use pseudo code)

STATE the running time of the methods. Polynomial time algorithms are expected.
A club wants to organize a party for the students. However, only part of the students can attend
the party. The candidates are choosing by pairs with the following requirements: assume the
students are lined-up, each student is given a lucky number, and some students may have the
same lucky numbers. The club requires that only students who have the same luck numbers may
be paired-up and join in the party while all the students locate within a chosen pair will lose the
opportunity.
We have helped the club to formulate the problem formally.
Given a sequence of positive integers (represent the lucky numbers) A = (a1, a2, ... an), a pair (ai, aj) is defined as a dancing pair if ai=aj, and i<j. A dancing pair list L={(π‘Žπ‘–1, π‘Žπ‘—1),… (π‘Žπ‘–π‘˜, π‘Žπ‘—π‘˜) }
is good if and only if for any two pairs (π‘Žπ‘–π‘₯, π‘Žπ‘—π‘₯), (π‘Žπ‘–π‘¦, π‘Žπ‘—π‘¦) of L, either jx<iy or jy<ix for 1 ≀ π‘₯, 𝑦 ≀ π‘˜.
For example, if A={1, 3, 4, 1, 2, 3, 2, 4, 3, 5, 3, 4, 5}, some good dancing lists might be:
{(1, 1), (2, 2), (3, 3)} (locations in A={1, 3, 4, 1, 2, 3, 2, 4, 3, 5, 3, 4, 5})
{(1, 1), (3, 3) (5, 5)} (locations in A={1, 3, 4, 1, 2, 3, 2, 4, 3, 5, 3, 4, 5})
a) Given a sequence of positive integers A, design a polynomial time algorithm to find a
good dancing pair list with maximum number of dancing pairs.
b) Assume the weight of the dancing pair (ai, aj) is ai+aj. The total weight of a good dancing
list is the sum of all the weights of the dancing pairs in the list. Design a polynomial time
algorithm to find a good dancing pair list with maximum total weight.
for part a,
if i write like this
foreach list L
foreach item I in list L
foreach list L2 such that L2 != L
for each item I2 in L2
if I == I2
return new 3-tuple(L, L2, I) //
how to continue?

N nonΒ­ overlapping Optimal partition

Here is a problem I run into a few days ago.
Given a list of integer items, we want to partition the items into at most N nonΒ­overlapping, consecutive bins, in a way that minimizes the maximum number of items in any bin.
For example, suppose we are given the items (5, 2, 3, 6, 1, 6), and we want 3 bins. We can optimally partition these as follows:
n < 3: 1, 2 (2 items)
3 <= n < 6: 3, 5 (2 items)
6 <= n: 6, 6 (2 items)
Every bin has 2 items, so we can’t do any better than that.
Can anyone share your idea about this question?
Given n bins and an array with p items, here is one greedy algorithm you could use.
To minimize the max number of items in a bin:
p <= n Try to use p bins.
Simply try and put each item in it's own bin. If you have duplicate numbers then your average will be unavoidably worse.
p > n Greedily use all bins but try to keep each one's member count near floor(p / n).
Group duplicate numbers
Pad the largest duplicate bins that fall short of floor(p / n) with unique numbers to the left and right (if they exist).
Count the number of bins you have and determine the number mergers you need to make, let's call it r.
Repeat the following r times:
Check each possible neighbouring bin pairing; find and perform the minimum merger
Example
{1,5,6,9,8,8,6,2,5,4,7,5,2,4,5,3,2,8,7,5} 20 items to 4 bins
{1}{2, 2, 2}{3}{4, 4}{5, 5, 5, 5, 5}{6, 6}{7, 7}{8, 8, 8}{9} 1. sorted and grouped
{1, 2, 2, 2, 3}{4, 4}{5, 5, 5, 5, 5}{6, 6}{7, 7}{8, 8, 8, 9} 2. greedy capture by largest groups
{1, 2, 2, 2, 3}{4, 4}{5, 5, 5, 5, 5}{6, 6}{7, 7}{8, 8, 8, 9} 3. 6 bins but we want 4, so 2 mergers need to be made.
{1, 2, 2, 2, 3}{4, 4}{5, 5, 5, 5, 5}{6, 6, 7, 7}{8, 8, 8, 9} 3. first merger
{1, 2, 2, 2, 3, 4, 4}{5, 5, 5, 5, 5}{6, 6, 7, 7}{8, 8, 8, 9} 3. second merger
So the minimum achievable max was 7.
Here is some psudocode that will give you just one solution with the minimum bin quantity possible:
Sort the list of "Elements" with Element as a pair {Value, Quanity}.
So for example {5,2,3,6,1,6} becomes an ordered set:
Let S = {{1,1},{2,1},{3,1},{5,1},{6,2}}
Let A = the largest quanity of any particular value in the set
Let X = Items in List
Let N = Number of bins
Let MinNum = ceiling ( X / N )
if A > MinNum then Let MinNum = A
Create an array BIN(1 to N+1) of pointers to linked lists of elements.
For I from 1 to N
Remove as many elements from the front of S that are less than MinNum
and Add them to Bin(I)
Next I
Let Bin(I+1)=any remaining in S
LOOP while Bin(I+1) not empty
Let MinNum = MinNum + 1
For I from 1 to N
Remove as many elements from the front of Bin(I+1) so that Bin(I) is less than MinNum
and Add them to Bin(I)
Next I
END LOOP
Your minimum bin size possible will be MinNum and BIN(1) to Bin(N) will contain the distribution of values.

Counting number of days, given a collection of day ranges?

Say I have the following ranges, in some list:
{ (1, 4), (6, 8), (2, 5), (1, 3) }
(1, 4) represents days 1, 2, 3, 4. (6, 8) represents days 6, 7, 8, and so on.
The goal is to find the total number of days that are listed in the collection of ranges -- for instance, in the above example, the answer would be 8, because days 1, 2, 3, 4, 6, 7, 8, and 5 are contained within the ranges.
This problem can be solved trivially by iterating through the days in each range and putting them in a HashSet, then returning the size of the HashSet. But is there any way to do it in O(n) time with respect to the number of range pairs? How about in O(n) time and with constant space? Thanks.
Sort the ranges in ascending order by their lower limits. You can probably do this in linear time since you're dealing with integers.
The rest is easy. Loop through the ranges once keeping track of numDays (initialized to zero) and largestDay (initialized to -INF). On reaching each interval (a, b):
if b > largestDay then
numDays <- numDays + b-max(a - 1, largestDay)
largestDay <- max(largestDay, b)
else nothing.
So, after sorting we have (1,4), (1,3), (2,5), (6,8)
(1,4): numDays <- 0 + (4 - max(1 - 1, -INF)) = 4, largestDay <- max(-INF, 4) = 4
(1,3): b < largestDay, so no change.
(2,5): numDays <- 4 + (5 - max(2 - 1, 4)) = 5, largestDay <- 5
(6,8): numDays <- 5 + (8 - max(6-1, 5)) = 8, largestDay <- 8
The complexity of the following algorithm is O(n log n) where n is the number of ranges.
Sort the ranges (a, b) lexicographically by increasing a then by decreasing b.
Before: { (1, 4), (6, 8), (2, 5), (1, 3) }
After: { (1, 4), (1, 3), (2, 5), (6, 8) }
Collapse the sorted sequence of ranges into a potentially-shorter sequence of ranges, repeatedly merging consecutive (a, b) and (c, d) into (a, max(b, d)) if b >= c.
Before: { (1, 4), (1, 3), (2, 5), (6, 8) }
{ (1, 4), (2, 5), (6, 8) }
After: { (1, 5), (6, 8) }
Map the sequence of ranges to their sizes.
Before: { (1, 5), (6, 8) }
After: { 5, 3 }
Sum the sizes to arrive at the total number of days.
8

Resources