How to prove the optimality of this greedy algo? - algorithm

Given N integers. Each of these numbers can be increased or decreased once by no more than given positive integer L. After each operation if any numbers become equal we consider them as one number. The problem is to calculate cardinality of minimal set of distinct integers.
Constraints: N <= 100, L <= 3200, integers are in the range [-32000, 32000]
Example: N = 3, L = 10
11 21 27
1) increase 11 by 10 => 21 21 27
2) decrease 27 by 6 => 21 21 21
The answer is 1.
Algo in C++ language:
sort(v.begin(), v.end());
// the algo tries to include elements in interval of length 2 * L
int ans = 0;
int first = 0;
for(int i = 1; i < N; ++i) {
if(v[i] - v[first] > 2 * L) { // if we can't include i-th element
ans++; // into the current interval
first = i; // the algo construct new
}
}
ans++;
printf("%d", ans);
I try to understand why this greedy algo is optimal. Any help is appreciated.

Reframed, we're trying to cover the set of numbers that appear in the input with as few intervals of cardinality 2*L + 1 as possible. You can imagine that, for an interval [C - L, C + L], all numbers in it are adjusted to C.
Given any list of intervals in sorted order, we can show inductively in k that, considering only the first k intervals, the first k of greedy covers at least as much of the input. The base case k = 0 is trivial. Inductively, greedy covers the next uncovered input element and as much as it can in addition; the interval in the arbitrary solution that covers its next uncovered input element must be not after greedy's, so the arbitrary solution has no more coverage.

Related

Detecting unsorted elements in an almost sorted array

I have a set of samples that is sorted, but due to errors in the data sometimes unsorted values creep in. I need to detect these values and remove them. I'll show some sample datasets below.
20 30 21 22 23 24 25
30 31 21 22 23 24 25
30 21 22 23 24 25 26
20 21 22 23 18 25 26
20 15 21 22 23 24 25
In each case the bold italic numbers are the ones that should be removed. What would be an algorithm to do remove these numbers/detect the indices of these numbers?
Detecting is relatively simpler and takes less steps - you could do it in O(n) time. Simply iterate over the array and compare each element to the next. You would be able to find (and mark indices or throw-out) out-of-sequence numbers.
However, your second case makes doing this a problem. I will assume that you always want to keep the longest increasing sub-sequence of the list of numbers (like in the 2nd case).
You can solve this problem efficiently with arrays and binary search. The algorithm performs a single binary search per sequence element, its total time can be expressed as O(n log n).
Process the sequence elements in order, maintain the longest increasing sub-sequence found so far. Denote the sequence values as X[0], X[1], etc. L representing the length of the longest increasing sub-sequence found so far.
M[j] stores the index k of the smallest value X[k] such that there is an increasing sub-sequence of length j ending at X[k] on the range k ≤ i. j ≤ k ≤ i always. P[k] stores the index of the predecessor of X[k] in the longest increasing sub-sequence ending at X[k]
The sequence X[M[1]], X[M[2]], ..., X[M[L]] is always nondecreasing at all points of the algorithm.
P = array of length N
M = array of length N + 1 // Using a 1 indexed array for ease of understanding
L = 0
for i in range 0 to N-1:
// Binary search
lo = 1
hi = L
while lo ≤ hi:
mid = ceil((lo+hi)/2)
if X[M[mid]] < X[i]:
lo = mid+1
else:
hi = mid-1
newL = lo
P[i] = M[newL-1]
M[newL] = i
if newL > L:
L = newL
S = array of length L
k = M[L]
for i in range L-1 to 0:
S[i] = X[k]
k = P[k]
return S
The pseudo-code can be found on the Wikipedia article for this.
If you do want to keep the out-of-sequence elements in the list, simply sort the array using insertion sort.
Detecting only
It takes at least N-1 steps to check (check each element and next), to do it.
But it is ambiguous: In list 2, what is wrong ? 30/31, or 21/.../25 ?
If bad numbers are isolated, you just remove them. But if you have, say, 2 numbers, what to do ? You must define more rules.
Detecting, and sorting :
Complexity:
If your list is perfectly sorted, it takes N-1 steps (check each element and next), to do it.
If there is one unsorted element, it takes log N to replace it at good place (if I suppose everything else is sorted, and in a structure ad hoc like binary tree).
It there are k unsorted elements, it will take k log N .
So N (check) + k log N (insert).
And if everything is messed, N log N, which is classical complexity for sorting.
Algorithm:
So, simpliest algorithm is to iterate, and insert at good place, in a balanced tree. It is a sort by insertion.
It is like smoothsort: https://en.wikipedia.org/wiki/Smoothsort
I think this should work for you. It finds the longest subsequence and then clears up the other elements. The implementation is in c#
public static void Main() {
int[][] dataList = {
new []{20,30,21,22,23,24,25},
new []{30,31,21,22,23,24,25},
new []{30,21,22,23,24,25,26},
new []{20,21,22,23,18,25,26},
new []{20,15,21,22,23,24,25}
};
foreach (var data in dataList)
DetectAndRemoveUnsorted(data);
}
/// <summary>
/// Assumes ascending data. You can adapt it for descending data too
/// </summary>
static void DetectAndRemoveUnsorted(IList<int> data) {
// first pass: Find the outliers; rather find the correct sequence
int startOfLongestSeq = 0, lenOfLongestSeq = 0;
int startOfCurrSeq = 0, lenOfCurrSeq = 0;
for (int i = 0; i < data.Count - 1; ++i) {
if (data[i] > data[i + 1]) { // we are breaking the ascending order, so this is another sequence
lenOfCurrSeq = i - startOfCurrSeq + 1;
if (lenOfCurrSeq > lenOfLongestSeq) {
lenOfLongestSeq = lenOfCurrSeq;
startOfLongestSeq = startOfCurrSeq;
}
startOfCurrSeq = i + 1;
}
}
lenOfCurrSeq = data.Count - startOfCurrSeq;
if (lenOfCurrSeq > lenOfLongestSeq) {
lenOfLongestSeq = lenOfCurrSeq;
startOfLongestSeq = startOfCurrSeq;
}
// second pass: cleanup outliers
// now we know which sequence is the largest
// we should get rid of the other sequences
for (int i = startOfLongestSeq - 1; i >= 0; --i)
data[i] = -1; // Mark them as invalid. if you want, you can delete them as well
for (int i = data.Count - 1; i >= startOfLongestSeq + lenOfLongestSeq; --i)
data[i] = -1; // Mark them as invalid. if you want, you can delete them as well
}

nth smallest element in a union of an array of intervals with repetition

I want to know if there is a more efficient solution than what I came up with(not coded it yet but described the gist of it at the bottom).
Write a function calcNthSmallest(n, intervals) which takes as input a non-negative int n, and a list of intervals [[a_1; b_1]; : : : ; [a_m; b_m]] and calculates the nth smallest number (0-indexed) when taking the union of all the intervals with repetition. For example, if the intervals were [1; 5]; [2; 4]; [7; 9], their union with repetition would be [1; 2; 2; 3; 3; 4; 4; 5; 7; 8; 9] (note 2; 3; 4 each appear twice since they're in both the intervals [1; 5] and [2; 4]). For this list of intervals, the 0th smallest number would be 1, and the 3rd and 4th smallest would both be 3. Your implementation should run quickly even when the a_i; b_i can be very large (like, one trillion), and there are several intervals
The way I thought to go about it is the straightforward solution which is to make the union array and traverse it.
This problem can be solved in O(N log N) where N is the number of intervals in the list, regardless of the actual values of the interval endpoints.
The key to solving this problem efficiently is to transform the list of possibly-overlapping intervals into a list of intervals which are either disjoint or identical. In the given example, only the first interval needs to be split:
{ [1,5], [2,4], [7,9]} =>
+-----------------+ +---+ +---+
{[1,1], [2,4], [5,5], [2,4], [7,9]}
(This doesn't have to be done explicitly, though: see below.) Now, we can sort the new intervals, replacing duplicates with a count. From that, we can compute the number of values each (possibly-duplicated) interval represents. Now, we simply need to accumulate the values to figure out which interval the solution lies in:
interval count size values cumulative
in interval values
[1,1] 1 1 1 [0, 1)
[2,4] 2 3 6 [1, 7) (eg. from n=1 to n=6 will be here)
[5,5] 1 1 1 [7, 8)
[7,9] 1 3 3 [8, 11)
I wrote the cumulative values as a list of half-open intervals, but obviously we only need the end-points. We can then find which interval holds value n by, for example, binary-searching the cumulative values list, and we can figure out which value in the interval we want by subtracting the start of the interval from n and then integer-dividing by the count.
It should be clear that the maximum size of the above table is twice the number of original intervals, because every row must start and end at either the start or end of some interval in the original list. If we'd written the intervals as half-open instead of closed, this would be even clearer; in that case, we can assert that the precise size of the table will be the number of unique values in the collection of end-points. And from that insight, we can see that we don't really need the table at all; we just need the sorted list of end-points (although we need to know which endpoint each value represents). We can simply iterate through that list, maintaining the count of the number of active intervals, until we reach the value we're looking for.
Here's a quick python implementation. It could be improved.
def combineIntervals(intervals):
# endpoints will map each endpoint to a count
endpoints = {}
# These two lists represent the start and (1+end) of each interval
# Each start adds 1 to the count, and each limit subtracts 1
for start in (i[0] for i in intervals):
endpoints[start] = endpoints.setdefault(start, 0) + 1
for limit in (i[1]+1 for i in intervals):
endpoints[limit] = endpoints.setdefault(limit, 0) - 1
# Filtering is a possibly premature optimization but it was easy
return sorted(filter(lambda kv: kv[1] != 0,
endpoints.iteritems()))
def nthSmallestInIntervalList(n, intervals):
limits = combineIntervals(intervals)
cumulative = 0
count = 0
index = 0
here = limits[0][0]
while index < len(limits):
size = limits[index][0] - here
if n < cumulative + count * size:
# [here, next) contains the value we're searching for
return here + (n - cumulative) / count
# advance
cumulative += count * size
count += limits[index][1]
here += size
index += 1
# We didn't find it. We could throw an error
So, as I said, the running time of this algorithm is independent of the actual values of the intervals; it only depends in the length of the interval list. This particular solution is O(N log N) because of the cost of the sort (in combineIntervals); if we used a priority queue instead of a full sort, we could construct the heap in O(N) but making the scan O(log N) for each scanned endpoint. Unless N is really big and the expected value of the argument n is relatively small, this would be counter-productive. There might be other ways to reduce complexity, though.
Edit2:
Here's yet another take on your question.
Let's consider the intervals graphically:
1 1 1 2 2 2 3
0-2-4--7--0--3---7-0--4--7--0
[-------]
[-----------------]
[---------]
[--------------]
[-----]
When sorted in increasing order on the lower bound, we could get something that looks like the above for the interval list ([2;10];[4;24];[7;17];[13;30];[20;27]). Each lower bound indicates the start of a new interval, and would also marks the beginning of one more "level" of duplication of the numbers. Conversely, upper bounds mark the end of that level, and decrease the duplication level of one.
We could therefore convert the above into the following list:
[2;+];[4;+];[7;+][10;-];[13;+];[17;-][20;+];[24;-];[27;-];[30;-]
Where the first value indicates the rank of the bound, and the second value whether the bound is lower (+) or upper (-). The computation of the nth element is done by following the list, raising or lowering the duplication level when encountering an lower or upper bound, and using the duplication level as a counting factor.
Let's consider again the list graphically, but as an histogram:
3333 44444 5555
2222222333333344444555
111111111222222222222444444
1 1 1 2 2 2 3
0-2-4--7--0--3---7-0--4--7--0
The view above is the same as the first one, with all the intervals packed vertically.
1 being the elements of the 1st one, 2 the second one, etc. In fact, what matters here
is the height at each index, corresponding of the number of time each index is duplicated in the union of all intervals.
3333 55555 7777
2223333445555567777888
112223333445555567777888999
1 1 1 2 2 2 3
0-2-4--7--0--3---7-0--4--7--0
| | | | | | || | |
We can see that histogram blocks start at lower bounds of intervals, and end either on upper bounds, or one unit before lower bounds, so the new notation must be modified accordingly.
With a list containing n intervals, as a first step, we convert the list into the notation above (O(n)), and sort it in increasing bound order (O(nlog(n))). The second step of computing the number is then in O(n), for a total average time in O(nlog(n)).
Here's a simple implementation in OCaml, using 1 and -1 instead of '+' and '-'.
(* transform the list in the correct notation *)
let rec convert = function
[] -> []
| (l,u)::xs -> (l,1)::(u+1,-1)::convert xs;;
(* the counting function *)
let rec count r f = function
[] -> raise Not_found
| [a,x] -> (match f + x with
0 -> if r = 0 then a else raise Not_found
| _ -> a + (r / f))
| (a,x)::(b,y)::l ->
if a = b
then count r f ((b,x+y)::l)
else
let f = f + x in
if f > 0 then
let range = (b - a) * f in
if range > r
then a + (r / f)
else count (r - range) f ((b,y)::l)
else count r f ((b,y)::l);;
(* the compute function *)
let compute l =
let compare (x,_) (y,_) = compare x y in
let l = List.sort compare (convert l) in
fun m -> count m 0 l;;
Notes:
- the function above will raise an exception if the sought number is above the intervals. This corner case isn't taken in account by the other methods below.
- the list sorting function used in OCaml is merge sort, which effectively performs in O(nlog(n)).
Edit:
Seeing that you might have very large intervals, the solution I gave initially (see down below) is far from optimal.
Instead, we could make things much faster by transforming the list:
we try to compress the interval list by searching for overlapping ones and replace them by prefixing intervals, several times the overlapping one, and suffixing intervals. We can then directly compute the number of entries covered by each element of the list.
Looking at the splitting above (prefix, infix, suffix), we see that the optimal structure to do the processing is a binary tree. A node of that tree may optionally have a prefix and a suffix. So the node must contain :
an interval i in the node
an integer giving the number of repetition of i in the list,
a left subtree of all the intervals below i
a right subtree of all the intervals above i
with this structure in place, the tree is automatically sorted.
Here's an example of an ocaml type embodying that tree.
type tree = Empty | Node of int * interval * tree * tree
Now the transformation algorithm boils down to building the tree.
This function create a tree out of its component:
let cons k r lt rt =
the tree made of count k, interval r, left tree lt and right tree rt
This function recursively insert an interval in a tree.
let rec insert i it =
let r = root of it
let lt = the left subtree of it
let rt = the right subtree of it
let k = the count of r
let prf, inf, suf = the prefix, infix and suffix of i according to r
return cons (k+1) inf (insert prf lt) (insert suf rt)
Once the tree is built, we do a pre-order traversal of the tree, using the count of the node to accelerate the computation of the nth element.
Below is my previous answer.
Here are the steps of my solution:
you need to sort the interval list in increasing order on the lower bound of each interval
you need a deque dq (or a list which will be reversed at some point) to store the intervals
here's the code:
let lower i = lower bound of interval i
let upper i = upper bound of i
let il = sort of interval list
i <- 0
j <- lower (head of il)
loop on il:
i <- i + 1
let h = the head of il
let il = the tail of il
if upper h > j then push h to dq
if lower h > j then
il <- concat dq and il
j <- j + 1
dq <- empty
loop
if i = k then return j
loop
This algorithm works by simply iterating through the intervals, only taking in account the relevant intervals, and counting both the rank i of the element in the union, and the value j of that element. When the targeted rank k has been reached, the value is returned.
The complexity is roughly in O(k) + O(sort(l)).
if i have understood your question correctly, you want to find the kth largest element in union of list of intervals.
If we assume that no of list = 2 the question is :
Find the kth smallest element in union of two sorted arrays (where an interval [2,5] is nothing but elements from 2 to 5 {2,3,4,5}) this sollution can be solved in (n+m)log(n+m) time where (n and m are sizes of list) . where i and j are list iterators .
Maintaining the invariant
i + j = k – 1,
If Bj-1 < Ai < Bj, then Ai must be the k-th smallest,
or else if Ai-1 < Bj < Ai, then Bj must be the k-th smallest.
For details click here
Now the problem is if you have no of lists=3 lists then
Maintaining the invariant
i + j+ x = k – 1,
i + j=k-x-1
The value k-x-1 can take y (size of third list, because x iterates from start point of list to end point) .
problem of 3 lists size can be reduced to y*(problem of size 2 list). So complexity is `y*((n+m)log(n+m))`
If Bj-1 < Ai < Bj, then Ai must be the k-th smallest,
or else if Ai-1 < Bj < Ai, then Bj must be the k-th smallest.
So for problem of size n list the complexity is NP .
But yes we can do minor improvement if we know that k< sizeof(some lists) we can chop the elements starting from k+1th element to end(from our search space ) in those list whose size is bigger than k (i think it doesnt help for large k).If there is any mistake please let me know.
Let me explain with an example:
Assume we are given these intervals [5,12],[3,9],[8,13].
The union of these intervals is:
number : 3 4 5 5 6 6 7 7 8 8 8 9 9 9 10 10 11 11 12 12 13.
indices: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
The lowest will return 11 when 9 is passed an input.
The highest will return 14 when 9 is passed an input.
Lowest and highest function just check whether the x is present in that interval, if it is present then adds x-a(lower index of interval) to return value for that one particular interval. If an interval is completely smaller than x, then adds total number of elements in that interval to the return value.
The find function will return 9 when 13 is passed.
The find function will use the concept of binary search to find the kth smallest element. In the given range [0,N] (if range is not given we can find high range in O(n)) find the mid and calculate the lowest and highest for mid. If given k falls in between lowest and highest return mid else if k is less than or equal to lowest search in the lower half(0,mid-1) else search in the upper half(mid+1,high).
If the number of intervals are n and the range is N, then the running time of this algorithm is n*log(N). we will find lowest and highest (which runs in O(n)) log(N) times.
//Function call will be `find(0,N,k,in)`
//Retrieves the no.of smaller elements than first x(excluding) in union
public static int lowest(List<List<Integer>> in, int x){
int sum = 0;
for(List<Integer> lst: in){
if(x > lst.get(1))
sum += lst.get(1) - lst.get(0)+1;
else if((x >= lst.get(0) && x<lst.get(1)) || (x > lst.get(0) && x<=lst.get(1))){
sum += x - lst.get(0);
}
}
return sum;
}
//Retrieve the no.of smaller elements than last x(including) in union.
public static int highest(List<List<Integer>> in, int x){
int sum = 0;
for(List<Integer> lst: in){
if(x > lst.get(1))
sum += lst.get(1) - lst.get(0)+1;
else if((x >= lst.get(0) && x<lst.get(1)) || (x > lst.get(0) && x<=lst.get(1))){
sum += x - lst.get(0)+1;
}
}
return sum;
}
//Do binary search on the range.
public static int find(int low, int high, int k,List<List<Integer>> in){
if(low > high)
return -1;
int mid = low + (high-low)/2;
int lowIdx = lowest(in,mid);
int highIdx = highest(in,mid);
//k lies between the current numbers high and low indices
if(k > lowIdx && k <= highIdx) return mid;
//k less than lower index. go on to left side
if(k <= lowIdx) return find(low,mid-1,k,in);
// k greater than higher index go to right
if(k > highIdx) return find(mid+1,high,k,in);
else
return -1; // catch statement
}
It's possible to count how many numbers in the list are less than some chosen number X (by iterating through all of the intervals). Now, if this number is greater than n, the solution is certainly smaller than X. Similarly, if this number is less than or equal to n, the solution is greater than or equal to X. Based on these observation we can use binary search.
Below is a Java implementation :
public int nthElement( int[] lowerBound, int[] upperBound, int n )
{
int lo = Integer.MIN_VALUE, hi = Integer.MAX_VALUE;
while ( lo < hi ) {
int X = (int)( ((long)lo+hi+1)/2 );
long count = 0;
for ( int i=0; i<lowerBound.length; ++i ) {
if ( X >= lowerBound[i] && X <= upperBound[i] ) {
// part of interval i is less than X
count += (long)X - lowerBound[i];
}
if ( X >= lowerBound[i] && X > upperBound[i] ) {
// all numbers in interval i are less than X
count += (long)upperBound[i] - lowerBound[i] + 1;
}
}
if ( count <= n ) lo = X;
else hi = X-1;
}
return lo;
}

What is the probability that the array will remain the same?

This question has been asked in Microsoft interview. Very much curious to know why these people ask so strange questions on probability?
Given a rand(N), a random generator which generates random number from 0 to N-1.
int A[N]; // An array of size N
for(i = 0; i < N; i++)
{
int m = rand(N);
int n = rand(N);
swap(A[m],A[n]);
}
EDIT: Note that the seed is not fixed.
what is the probability that array A remains the same?
Assume that the array contains unique elements.
Well I had a little fun with this one. The first thing I thought of when I first read the problem was group theory (the symmetric group Sn, in particular). The for loop simply builds a permutation σ in Sn by composing transpositions (i.e. swaps) on each iteration. My math is not all that spectacular and I'm a little rusty, so if my notation is off bear with me.
Overview
Let A be the event that our array is unchanged after permutation. We are ultimately asked to find the probability of event A, Pr(A).
My solution attempts to follow the following procedure:
Consider all possible permutations (i.e. reorderings of our array)
Partition these permutations into disjoint sets based on the number of so-called identity transpositions they contain. This helps reduce the problem to even permutations only.
Determine the probability of obtaining the identity permutation given that the permutation is even (and of a particular length).
Sum these probabilities to obtain the overall probability the array is unchanged.
1) Possible Outcomes
Notice that each iteration of the for loop creates a swap (or transposition) that results one of two things (but never both):
Two elements are swapped.
An element is swapped with itself. For our intents and purposes, the array is unchanged.
We label the second case. Let's define an identity transposition as follows:
An identity transposition occurs when a number is swapped with itself.
That is, when n == m in the above for loop.
For any given run of the listed code, we compose N transpositions. There can be 0, 1, 2, ... , N of the identity transpositions appearing in this "chain".
For example, consider an N = 3 case:
Given our input [0, 1, 2].
Swap (0 1) and get [1, 0, 2].
Swap (1 1) and get [1, 0, 2]. ** Here is an identity **
Swap (2 2) and get [1, 0, 2]. ** And another **
Note that there is an odd number of non-identity transpositions (1) and the array is changed.
2) Partitioning Based On the Number of Identity Transpositions
Let K_i be the event that i identity transpositions appear in a given permutation. Note this forms an exhaustive partition of all possible outcomes:
No permutation can have two different quantities of identity transpositions simultaneously, and
All possible permutations must have between 0 and N identity transpositions.
Thus we can apply the Law of Total Probability:
Now we can finally take advantage of the the partition. Note that when the number of non-identity transpositions is odd, there is no way the array can go unchanged*. Thus:
*From group theory, a permutation is even or odd but never both. Therefore an odd permutation cannot be the identity permutation (since the identity permutation is even).
3) Determining Probabilities
So we now must determine two probabilities for N-i even:
The First Term
The first term, , represents the probability of obtaining a permutation with i identity transpositions. This turns out to be binomial since for each iteration of the for loop:
The outcome is independent of the results before it, and
The probability of creating an identity transposition is the same, namely 1/N.
Thus for N trials, the probability of obtaining i identity transpositions is:
The Second Term
So if you've made it this far, we have reduced the problem to finding for N - i even. This represents the probability of obtaining an identity permutation given i of the transpositions are identities. I use a naive counting approach to determine the number of ways of achieving the identity permutation over the number of possible permutations.
First consider the permutations (n, m) and (m, n) equivalent. Then, let M be the number of non-identity permutations possible. We will use this quantity frequently.
The goal here is to determine the number of ways a collections of transpositions can be combined to form the identity permutation. I will try to construct the general solution along side an example of N = 4.
Let's consider the N = 4 case with all identity transpositions (i.e. i = N = 4). Let X represent an identity transposition. For each X, there are N possibilities (they are: n = m = 0, 1, 2, ... , N - 1). Thus there are N^i = 4^4 possibilities for achieving the identity permutation. For completeness, we add the binomial coefficient, C(N, i), to consider ordering of the identity transpositions (here it just equals 1). I've tried to depict this below with the physical layout of elements above and the number of possibilities below:
I = _X_ _X_ _X_ _X_
N * N * N * N * C(4, 4) => N^N * C(N, N) possibilities
Now without explicitly substituting N = 4 and i = 4, we can look at the general case. Combining the above with the denominator found previously, we find:
This is intuitive. In fact, any other value other than 1 should probably alarm you. Think about it: we are given the situation in which all N transpositions are said to be identities. What's the probably that the array is unchanged in this situation? Clearly, 1.
Now, again for N = 4, let's consider 2 identity transpositions (i.e. i = N - 2 = 2). As a convention, we will place the two identities at the end (and account for ordering later). We know now that we need to pick two transpositions which, when composed, will become the identity permutation. Let's place any element in the first location, call it t1. As stated above, there are M possibilities supposing t1 is not an identity (it can't be as we have already placed two).
I = _t1_ ___ _X_ _X_
M * ? * N * N
The only element left that could possibly go in the second spot is the inverse of t1, which is in fact t1 (and this is the only one by uniqueness of inverse). We again include the binomial coefficient: in this case we have 4 open locations and we are looking to place 2 identity permutations. How many ways can we do that? 4 choose 2.
I = _t1_ _t1_ _X_ _X_
M * 1 * N * N * C(4, 2) => C(N, N-2) * M * N^(N-2) possibilities
Again looking at the general case, this all corresponds to:
Finally we do the N = 4 case with no identity transpositions (i.e. i = N - 4 = 0). Since there are a lot of possibilities, it starts to get tricky and we must be careful not to double count. We start similarly by placing a single element in the first spot and working out possible combinations. Take the easiest first: the same transposition 4 times.
I = _t1_ _t1_ _t1_ _t1_
M * 1 * 1 * 1 => M possibilities
Let's now consider two unique elements t1 and t2. There are M possibilities for t1 and only M-1 possibilities for t2 (since t2 cannot be equal to t1). If we exhaust all arrangements, we are left with the following patterns:
I = _t1_ _t1_ _t2_ _t2_
M * 1 * M-1 * 1 => M * (M - 1) possibilities (1)st
= _t1_ _t2_ _t1_ _t2_
M * M-1 * 1 * 1 => M * (M - 1) possibilities (2)nd
= _t1_ _t2_ _t2_ _t1_
M * M-1 * 1 * 1 => M * (M - 1) possibilities (3)rd
Now let's consider three unique elements, t1, t2, t3. Let's place t1 first and then t2. As usual, we have:
I = _t1_ _t2_ ___ ___
M * ? * ? * ?
We can't yet say how many possible t2s there can be yet, and we will see why in a minute.
We now place t1 in the third spot. Notice, t1 must go there since if were to go in the last spot, we would just be recreating the (3)rd arrangement above. Double counting is bad! This leaves the third unique element t3 to the final position.
I = _t1_ _t2_ _t1_ _t3_
M * ? * 1 * ?
So why did we have to take a minute to consider the number of t2s more closely? The transpositions t1 and t2 cannot be disjoint permutations (i.e. they must share one (and only one since they also cannot be equal) of their n or m). The reason for this is because if they were disjoint, we could swap the order of permutations. This means we would be double counting the (1)st arrangement.
Say t1 = (n, m). t2 must be of the form (n, x) or (y, m) for some x and y in order to be non-disjoint. Note that x may not be n or m and y many not be n or m. Thus, the number of possible permutations that t2 could be is actually 2 * (N - 2).
So, coming back to our layout:
I = _t1_ _t2_ _t1_ _t3_
M * 2(N-2) * 1 * ?
Now t3 must be the inverse of the composition of t1 t2 t1. Let's do it out manually:
(n, m)(n, x)(n, m) = (m, x)
Thus t3 must be (m, x). Note this is not disjoint to t1 and not equal to either t1 or t2 so there is no double counting for this case.
I = _t1_ _t2_ _t1_ _t3_
M * 2(N-2) * 1 * 1 => M * 2(N - 2) possibilities
Finally, putting all of these together:
4) Putting it all together
So that's it. Work backwards, substituting what we found into the original summation given in step 2. I computed the answer to the N = 4 case below. It matches the empirical number found in another answer very closely!
N = 4
M = 6 _________ _____________ _________
| Pr(K_i) | Pr(A | K_i) | Product |
_________|_________|_____________|_________|
| | | | |
| i = 0 | 0.316 | 120 / 1296 | 0.029 |
|_________|_________|_____________|_________|
| | | | |
| i = 2 | 0.211 | 6 / 36 | 0.035 |
|_________|_________|_____________|_________|
| | | | |
| i = 4 | 0.004 | 1 / 1 | 0.004 |
|_________|_________|_____________|_________|
| | |
| Sum: | 0.068 |
|_____________|_________|
Correctness
It would be cool if there was a result in group theory to apply here-- and maybe there is! It would certainly help make all this tedious counting go away completely (and shorten the problem to something much more elegant). I stopped working at N = 4. For N > 5, what is given only gives an approximation (how good, I'm not sure). It is pretty clear why that is if you think about it: for example, given N = 8 transpositions, there are clearly ways of creating the identity with four unique elements which are not accounted for above. The number of ways becomes seemingly more difficult to count as the permutation gets longer (as far as I can tell...).
Anyway, I definitely couldn't do something like this within the scope of an interview. I would get as far as the denominator step if I was lucky. Beyond that, it seems pretty nasty.
Very much curious to know why these people ask so strange questions on probability?
Questions like this are asked because they allow the interviewer to gain insight into the interviewee's
ability read code (very simple code but at least something)
ability to analyse an algorithm to identify execution path
skills at applying logic to find possible outcomes and edge case
reasoning and problem solving skills as they work through the problem
communication and work skills - do they ask questions, or work in isolation based on information at hand
... and so on. The key to having a question that exposes these attributes of the interviewee is to have a piece of code that is deceptively simple. This shakes out the imposters the non-coder is stuck; the arrogant jump to the wrong conclusion; the lazy or sub-par computer scientist finds a simple solution and stops looking. Often, as they say, it's not whether you get the right answer but whether you impress with your thought process.
I'll attempt to answer the question, too. In an interview I'd explain myself rather than provide a one-line written answer - this is because even if my 'answer' is wrong, I am able to demonstrate logical thinking.
A will remain the same - i.e. elements in the same positions - when
m == n in every iteration (so that every element only swaps with itself); or
any element that is swapped is swapped back to its original position
The first case is the 'simple' case that duedl0r gives, the case that the array isn't altered. This might be the answer, because
what is the probability that array A remains the same?
if the array changes at i = 1 and then reverts back at i = 2, it's in the original state but it didn't 'remain the same' - it was changed, and then changed back. That might be a smartass technicality.
Then considering the chance of elements being swapped and swapped back - I think that calculation is above my head in an interview. The obvious consideration is that that does not need to be a change - change back swap, there could just as easily be a swap between three elements, swapping 1 and 2, then 2 and 3, 1 and 3 and finally 2 and 3. And continuing, there could be swaps between 4, 5 or more items that are 'circular' like this.
In fact, rather than considering the cases where the array is unchanged, it may be simpler to consider the cases where it is changed. Consider whether this problem can be mapped onto a known structure like Pascal's triangle.
This is a hard problem. I agree that it's too hard to solve in an interview, but that doesn't mean it is too hard to ask in an interview. The poor candidate won't have an answer, the average candidate will guess the obvious answer, and the good candidate will explain why the problem is too hard to answer.
I consider this an 'open-ended' question that gives the interviewer insight into the candidate. For this reason, even though it's too hard to solve during an interview, it is a good question to ask during an interview. There's more to asking a question than just checking whether the answer is right or wrong.
Below is C code to count the number of values of the 2N-tuple of indices that rand can produce and calculate the probability. Starting with N = 0, it shows counts of 1, 1, 8, 135, 4480, 189125, and 12450816, with probabilities of 1, 1, .5, .185185, .0683594, .0193664, and .00571983. The counts do not appear in the Encyclopedia of Integer Sequences, so either my program has a bug or this is a very obscure problem. If so, the problem is not intended to be solved by a job applicant but to expose some of their thought processes and how they deal with frustration. I would not regard it as a good interview problem.
#include <inttypes.h>
#include <math.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#define swap(a, b) do { int t = (a); (a) = (b); (b) = t; } while (0)
static uint64_t count(int n)
{
// Initialize count of how many times the original order is the result.
uint64_t c = 0;
// Allocate space for selectors and initialize them to zero.
int *r = calloc(2*n, sizeof *r);
// Allocate space for array to be swapped.
int *A = malloc(n * sizeof *A);
if (!A || !r)
{
fprintf(stderr, "Out of memory.\n");
exit(EXIT_FAILURE);
}
// Iterate through all values of selectors.
while (1)
{
// Initialize A to show original order.
for (int i = 0; i < n; ++i)
A[i] = i;
// Test current selector values by executing the swap sequence.
for (int i = 0; i < 2*n; i += 2)
{
int m = r[i+0];
int n = r[i+1];
swap(A[m], A[n]);
}
// If array is in original order, increment counter.
++c; // Assume all elements are in place.
for (int i = 0; i < n; ++i)
if (A[i] != i)
{
// If any element is out of place, cancel assumption and exit.
--c;
break;
}
// Increment the selectors, odometer style.
int i;
for (i = 0; i < 2*n; ++i)
// Stop when a selector increases without wrapping.
if (++r[i] < n)
break;
else
// Wrap this selector to zero and continue.
r[i] = 0;
// Exit the routine when the last selector wraps.
if (2*n <= i)
{
free(A);
free(r);
return c;
}
}
}
int main(void)
{
for (int n = 0; n < 7; ++n)
{
uint64_t c = count(n);
printf("N = %d: %" PRId64 " times, %g probabilty.\n",
n, c, c/pow(n, 2*n));
}
return 0;
}
The behaviour of the algorithm can be modelled as a Markov chain over the symmetric group SN.
Basics
The N elements of the array A can be arranged in N! different permutations. Let us number these permutations from 1 to N!, e.g. by lexicographic ordering. So the state of the array A at any time in the algorithm can be fully characterized by the permutation number.
For example, for N = 3, one possible numbering of all 3! = 6 permutations might be:
a b c
a c b
b a c
b c a
c a b
c b a
State transition probabilities
In each step of the algorithm, the state of A either stays the same or transitions from one of these permutations to another. We are now interested in the probabilities of these state changes. Let us call Pr(i → j) the probability that the state changes from permutation i to permutation j in a single loop iteration.
As we pick m and n uniformly and independently from the range [0, N-1], there are N² possible outcomes, each of which is equally likely.
Identity
For N of these outcomes m = n holds, so there is no change in the permutation. Therefore,
.
Transpositions
The remaining N² - N cases are transpositions, i.e. two elements exchange their positions and therefore the permutation changes. Suppose one of these transpositions exchanges the elements at positions x and y. There are two cases how this transposition can be generated by the algorithm: either m = x, n = y or m = y, n = x. Thus, the probability for each transposition is 2 / N².
How does this relate to our permutations? Let us call two permutations i and j neighbors if and only if there is a transposition which transforms i into j (and vice versa). We then can conclude:
Transition matrix
We can arrange the probabilities Pr(i → j) in a transition matrix P ∈ [0,1]N!×N!. We define
pij = Pr(i → j),
where pij is the entry in the i-th row and j-th column of P. Note that
Pr(i → j) = Pr(j → i),
which means P is symmetric.
The key point now is the observation of what happens when we multiply P by itself. Take any element p(2)ij of P²:
The product Pr(i → k) · Pr(k → j) is the probability that starting at permutation i we transition into permutation k in one step, and transition into permutation j after another subsequent step. Summing over all in-between permutations k therefore gives us the total probability of transitioning from i to j in 2 steps.
This argument can be extended to higher powers of P. A special consequence is the following:
p(N)ii is the probability of returning back to permutation i after N steps, assuming we started at permutation i.
Example
Let's play this through with N = 3. We already have a numbering for the permutations. The corresponding transition matrix is as follows:
Multiplying P with itself gives:
Another multiplication yields:
Any element of the main diagonal gives us the wanted probability, which is 15/81 or 5/27.
Discussion
While this approach is mathematically sound and can be applied to any value of N, it is not very practical in this form. The transition matrix P has N!² entries, which becomes huge very fast. Even for N = 10 the size of the matrix already exceeds 13 trillion entries. A naive implementation of this algorithm therefore appears to be infeasible.
However, in comparison to other proposals, this approach is very structured and doesn't require complex derivations beyond figuring out which permutations are neighbors. My hope is that this structuredness can be exploited to find a much simpler computation.
For example, one could exploit the fact that all diagonal elements of any power of P are equal. Assuming we can easily calculate the trace of PN, the solution is then simply tr(PN) / N!. The trace of PN is equal to the sum of the N-th powers of its eigenvalues. So if we had an efficient algorithm to compute the eigenvalues of P, we would be set. I haven't explored this further than calculating the eigenvalues up to N = 5, however.
It's easy to observe bounds 1/nn <= p <= 1/n.
Here is an incomplete idea of showing an inverse-exponential upper bound.
You're drawing numbers from {1,2,..,n} 2n times. If any of them is unique (occurs exactly once), the array will definitely be changed, as the element has gone away and cannot return at its original place.
The probability that a fixed number is unique is 2n * 1/n * (1-1/n)^(2n-1)=2 * (1-1/n)^(2n-1) which is asympotically 2/e2, bounded away from 0. [2n because you choose on which try you get it, 1/n that you got it on that try, (1-1/n)^(2n-1) that you did not get it on other tries]
If the events were independent, you'd get that chance that all numbers are nonunique is (2/e2)^n, which would mean p <= O((2/e2)^n). Unfortunately, they are not independent. I feel that the bound can be shown with more sophisticated analysis. The keyword is "balls and bins problem".
One simplistic solution is
p >= 1 / NN
Since one possible way the array stays the same is if m = n for every iteration. And m equals n with possibility 1 / N.
It's certainly higher than that. The question is by how much..
Second thought: One could also argue, that if you shuffle an array randomly, every permutation has equal probability. Since there are n! permutations the probability of getting just one (the one we have at the beginning) is
p = 1 / N!
which is a bit better than the previous result.
As discussed, the algorithm is biased. This means not every permutation has the same probability. So 1 / N! is not quite exact. You have to find out how the distribution of the permutations are.
FYI, not sure the bound above (1/n^2) holds:
N=5 -> 0.019648 < 1/25
N=6 -> 0.005716 < 1/36
Sampling code:
import random
def sample(times,n):
count = 0;
for i in range(times):
count += p(n)
return count*1.0/times;
def p(n):
perm = range(n);
for i in range(n):
a = random.randrange(n)
b = random.randrange(n)
perm[a],perm[b]=perm[b],perm[a];
return perm==range(n)
print sample(500000,5)
Everyone assumes that A[i] == i, which was not explicitly
stated. I'm going to make this assumption too, but note that the probability
depends on the contents. For example if A[i]=0, then the probability = 1 for
all N.
Here's how to do it. Let P(n,i) be the probability that the resulting array
differs by exactly i transpositions from the original array.
We want to know P(n,0). It's true that:
P(n,0) =
1/n * P(n-1,0) + 1/n^2 * P(n-1,1) =
1/n * P(n-1,0) + 1/n^2 * (1-1/(n-1)) * P(n-2,0)
Explanation: we can get the original array in two ways, either by making a "neutral" transposition in an array that's already good, or by reverting the only "bad" transposition. To get an array with only one "bad" transposition, we can take an array with 0 bad transpositions and make one transposition that is not neutral.
EDIT: -2 instead of -1 in P(n-1,0)
It's not a full solution, but it's something at least.
Take a particular set of swaps that have no effect. We know that it must have been the case that its swaps ended up forming a bunch of loops of different sizes, using a total of n swaps. (For the purposes of this, a swap with no effect can be considered a loop of size 1)
Perhaps we can
1) Break them down into groups based on what the sizes of the loops are
2) Calculate the number of ways to get each group.
(The main problem is that there are a ton of different groups, but I'm not sure how you'd actually calculate this if you don't take into account the different groupings.)
Interesting question.
I think the answer is 1/N, but I don't have any proof. When I find a proof, I will edit my answer.
What I got until now:
If m == n, You won't change the array.
The probability to get m == n is 1/N, because there are N^2 options, and only N is suitable ((i,i) for every 0 <= i <= N-1).
Thus, we get N/N^2 = 1/N.
Denote Pk the probability that after k iterations of swaps, the array of size N will remain the same.
P1 = 1/N. (As we saw below)
P2 = (1/N)P1 + (N-1/N)(2/N^2) = 1/N^2 + 2(N-1) / N^3.
Explanation for P2:
We want to calculate the probability that after 2 iterations, the array with
N elements won't change. We have 2 options :
- in the 2 iteration we got m == n (Probability of 1/N)
- in the 2 iteration we got m != n (Probability of N-1/N)
If m == n, we need that the array will remain after the 1 iteration = P1.
If m != n, we need that in the 1 iteration to choose the same n and m
(order is not important). So we get 2/N^2.
Because those events are independent we get - P2 = (1/N)*P1 + (N-1/N)*(2/N^2).
Pk = (1/N)*Pk-1 + (N-1/N)*X. (the first for m == n, the second for m != n)
I have to think more about what X equals. (X is just a replacement for the real formula, not a constant or anything else)
Example for N = 2.
All possible swaps:
(1 1 | 1 1),(1 1 | 1 2),(1 1 | 2 1),(1 1 | 2 2),(1 2 | 1 1),(1 2 | 1 2)
(1 2 | 2 1),(1 2 | 2 2),(2 1 | 1 1),(2 1 | 1 2),(2 1 | 2 1),(2 1 | 2 2)
(2 2 | 1 1),(2 2 | 1 2),(2 2 | 2 1),(2 1 | 1 1).
Total = 16. Exactly 8 of them remain the array the same.
Thus, for N = 2, the answer is 1/2.
EDIT :
I want to introduce another approach:
We can classify swaps to three groups: constructive swaps, destructive swaps and harmless swaps.
Constructive swap is defined to be a swap that cause at least one element to move to its right place.
Destructive swap is defined to be a swap that cause at least one element to move from its correct position.
Harmless swap is defined to be a swap that does not belong to the other groups.
It is easy to see that this is a partition of all possible swaps. (intersection = empty set).
Now the claim I want to prove:
The array will remain the same if and only if
the number of Destructive swap == Constructive swap in the iterations.
If someone has a counter-example, please write it down as a comment.
If this claim is correct, we can take all combinations and sum them -
0 harmless swaps, 1 harmless swaps,..,N harmless swaps.
And for each possible k harmless swap, we check if N-k is even, if no, we skip. If yes, we take (N-k)/2 for destructive, and (N-k) for constructive. And just look all possibilities.
I would model the problem as a multigraph where nodes are elements of the array and swaps is adding an un-directed(!) connection between them. Then look for loops somehow (all nodes is a part of a loop => original)
Really need to get back to work! :(
well, from mathematical perspective:
to have the array elements swapped at the same place every time, then the Rand(N) function must generate the same number twice for int m, and int n. so the probability that the Rand(N) function generates the same number twice is 1/N.
and we have Rand(N) called N times inside the for loop, so we have probability of 1/(N^2)
Naive implementation in C#.
The idea is to create all the possible permutations of initial array and enumerate them.
Then we build a matrix of possible changes of state. Multiplying matrix by itself N times we will get the matrix showing how many ways exists that lead from permutation #i to permutation #j in N steps. Elemet [0,0] will show how many ways will lead to the same initial state. Sum of elements of row #0 will show total number of different ways. By dividing former to latter we get the probability.
In fact total number of permutations is N^(2N).
Output:
For N=1 probability is 1 (1 / 1)
For N=2 probability is 0.5 (8 / 16)
For N=3 probability is 0.1851851851851851851851851852 (135 / 729)
For N=4 probability is 0.068359375 (4480 / 65536)
For N=5 probability is 0.0193664 (189125 / 9765625)
For N=6 probability is 0.0057198259072973293366526105 (12450816 / 2176782336)
class Program
{
static void Main(string[] args)
{
for (int i = 1; i < 7; i++)
{
MainClass mc = new MainClass(i);
mc.Run();
}
}
}
class MainClass
{
int N;
int M;
List<int> comb;
List<int> lastItemIdx;
public List<List<int>> combinations;
int[,] matrix;
public MainClass(int n)
{
N = n;
comb = new List<int>();
lastItemIdx = new List<int>();
for (int i = 0; i < n; i++)
{
comb.Add(-1);
lastItemIdx.Add(-1);
}
combinations = new List<List<int>>();
}
public void Run()
{
GenerateAllCombinations();
GenerateMatrix();
int[,] m2 = matrix;
for (int i = 0; i < N - 1; i++)
{
m2 = Multiply(m2, matrix);
}
decimal same = m2[0, 0];
decimal total = 0;
for (int i = 0; i < M; i++)
{
total += m2[0, i];
}
Console.WriteLine("For N={0} probability is {1} ({2} / {3})", N, same / total, same, total);
}
private int[,] Multiply(int[,] m2, int[,] m1)
{
int[,] ret = new int[M, M];
for (int ii = 0; ii < M; ii++)
{
for (int jj = 0; jj < M; jj++)
{
int sum = 0;
for (int k = 0; k < M; k++)
{
sum += m2[ii, k] * m1[k, jj];
}
ret[ii, jj] = sum;
}
}
return ret;
}
private void GenerateMatrix()
{
M = combinations.Count;
matrix = new int[M, M];
for (int i = 0; i < M; i++)
{
matrix[i, i] = N;
for (int j = i + 1; j < M; j++)
{
if (2 == Difference(i, j))
{
matrix[i, j] = 2;
matrix[j, i] = 2;
}
else
{
matrix[i, j] = 0;
}
}
}
}
private int Difference(int x, int y)
{
int ret = 0;
for (int i = 0; i < N; i++)
{
if (combinations[x][i] != combinations[y][i])
{
ret++;
}
if (ret > 2)
{
return int.MaxValue;
}
}
return ret;
}
private void GenerateAllCombinations()
{
int placeAt = 0;
bool doRun = true;
while (doRun)
{
doRun = false;
bool created = false;
for (int i = placeAt; i < N; i++)
{
for (int j = lastItemIdx[i] + 1; j < N; j++)
{
lastItemIdx[i] = j; // remember the test
if (comb.Contains(j))
{
continue; // tail items should be nulled && their lastItemIdx set to -1
}
// success
placeAt = i;
comb[i] = j;
created = true;
break;
}
if (comb[i] == -1)
{
created = false;
break;
}
}
if (created)
{
combinations.Add(new List<int>(comb));
}
// rollback
bool canGenerate = false;
for (int k = placeAt + 1; k < N; k++)
{
lastItemIdx[k] = -1;
}
for (int k = placeAt; k >= 0; k--)
{
placeAt = k;
comb[k] = -1;
if (lastItemIdx[k] == N - 1)
{
lastItemIdx[k] = -1;
continue;
}
canGenerate = true;
break;
}
doRun = canGenerate;
}
}
}
The probability that m==n on each iteration, then do that N times. P(m==n) = 1/N. So I think P=1/(n^2) for that case. But then you have to consider the values getting swapped back. So I think the answer is (text editor got me) 1/N^N.
Question: what is the probability that array A remains the same?
Condition: Assume that the array contains unique elements.
Tried the solution in Java.
Random swapping happens on primitive int array. In java method parameters are always passed by value so what happens in swap method does not matter as a[m] and a[n] elements of the array (from below code swap(a[m], a[n]) ) are passed not complete array.
The answer is array will remain same. Despite of condition mentioned above. See below java code sample:
import java.util.Random;
public class ArrayTrick {
int a[] = new int[10];
Random random = new Random();
public void swap(int i, int j) {
int temp = i;
i = j;
j = temp;
}
public void fillArray() {
System.out.println("Filling array: ");
for (int index = 0; index < a.length; index++) {
a[index] = random.nextInt(a.length);
}
}
public void swapArray() {
System.out.println("Swapping array: ");
for (int index = 0; index < a.length; index++) {
int m = random.nextInt(a.length);
int n = random.nextInt(a.length);
swap(a[m], a[n]);
}
}
public void printArray() {
System.out.println("Printing array: ");
for (int index = 0; index < a.length; index++) {
System.out.print(" " + a[index]);
}
System.out.println();
}
public static void main(String[] args) {
ArrayTrick at = new ArrayTrick();
at.fillArray();
at.printArray();
at.swapArray();
at.printArray();
}
}
Sample output:
Filling array:
Printing array:
3 1 1 4 9 7 9 5 9 5
Swapping array:
Printing array:
3 1 1 4 9 7 9 5 9 5

array median transformation minimum steps

Given an array A with n
integers. In one turn one can apply the
following operation to any consecutive
subarray A[l..r] : assign to all A i (l <= i <= r)
median of subarray A[l..r] .
Let max be the maximum integer of A .
We want to know the minimum
number of operations needed to change A
to an array of n integers each with value
max.
For example, let A = [1, 2, 3] . We want to change it to [3, 3, 3] . We
can do this in two operations, first for
subarray A[2..3] (after that A equals to [1,
3, 3] ), then operation to A[1..3] .
Also,median is defined for some array A as follows. Let B be the same
array A , but sorted in non-decreasing
order. Median of A is B m (1-based
indexing), where m equals to (n div 2)+1 .
Here 'div' is an integer division operation.
So, for a sorted array with 5 elements,
median is the 3rd element and for a sorted
array with 6 elements, it is the 4th element.
Since the maximum value of N is 30.I thought of brute forcing the result
could there be a better solution.
You can double the size of the subarray containing the maximum element in each iteration. After the first iteration, there is a subarray of size 2 containing the maximum. Then apply your operation to a subarray of size 4, containing those 2 elements, giving you a subarray of size 4 containing the maximum. Then apply to a size 8 subarray and so on. You fill the array in log2(N) operations, which is optimal. If N is 30, five operations is enough.
This is optimal in the worst case (i.e. when only one element is the maximum), since it sets the highest possible number of elements in each iteration.
Update 1: I noticed I messed up the 4s and 8s a bit. Corrected.
Update 2: here's an example. Array size 10, start state:
[6 1 5 9 3 2 0 7 4 8]
To get two nines, run op on subarray of size two containing the nine. For instance A[4…5] gets you:
[6 1 5 9 9 2 0 7 4 8]
Now run on size four subarray that contains 4…5, for instance on A[2…5] to get:
[6 9 9 9 9 2 0 7 4 8]
Now on subarray of size 8, for instance A[1…8], get:
[9 9 9 9 9 9 9 9 4 8]
Doubling now would get us 16 nines, but we have only 10 positions, so round of with A[1…10], get:
[9 9 9 9 9 9 9 9 9 9]
Update 3: since this is only optimal in the worst case, it is actually not an answer to the original question, which asks for a way of finding the minimal number of operations for all inputs. I misinterpreted the sentence about brute forcing to be about brute forcing with the median operations, rather than in finding the minimum sequence of operations.
This is the problem from codechef Long Contest.Since the contest is already over,so awkwardiom ,i am pasting the problem setter approach (Source : CC Contest Editorial Page).
"Any state of the array can be represented as a binary mask with each bit 1 means that corresponding number is equal to the max and 0 otherwise. You can run DP with state R[mask] and O(n) transits. You can proof (or just believe) that the number of statest will be not big, of course if you run good DP. The state of our DP will be the mask of numbers that are equal to max. Of course, it makes sense to use operation only for such subarray [l; r] that number of 1-bits is at least as much as number of 0-bits in submask [l; r], because otherwise nothing will change. Also you should notice that if the left bound of your operation is l it is good to make operation only with the maximal possible r (this gives number of transits equal to O(n)). It was also useful for C++ coders to use map structure to represent all states."
The C/C++ Code is::
#include <cstdio>
#include <iostream>
using namespace std;
int bc[1<<15];
const int M = (1<<15) - 1;
void setMin(int& ret, int c)
{
if(c < ret) ret = c;
}
void doit(int n, int mask, int currentSteps, int& currentBest)
{
int numMax = bc[mask>>15] + bc[mask&M];
if(numMax == n) {
setMin(currentBest, currentSteps);
return;
}
if(currentSteps + 1 >= currentBest)
return;
if(currentSteps + 2 >= currentBest)
{
if(numMax * 2 >= n) {
setMin(currentBest, 1 + currentSteps);
}
return;
}
if(numMax < (1<<currentSteps)) return;
for(int i=0;i<n;i++)
{
int a = 0, b = 0;
int c = mask;
for(int j=i;j<n;j++)
{
c |= (1<<j);
if(mask&(1<<j)) b++;
else a++;
if(b >= a) {
doit(n, c, currentSteps + 1, currentBest);
}
}
}
}
int v[32];
void solveCase() {
int n;
scanf(" %d", &n);
int maxElement = 0;
for(int i=0;i<n;i++) {
scanf(" %d", v+i);
if(v[i] > maxElement) maxElement = v[i];
}
int mask = 0;
for(int i=0;i<n;i++) if(v[i] == maxElement) mask |= (1<<i);
int ret = 0, p = 1;
while(p < n) {
ret++;
p *= 2;
}
doit(n, mask, 0, ret);
printf("%d\n",ret);
}
main() {
for(int i=0;i<(1<<15);i++) {
bc[i] = bc[i>>1] + (i&1);
}
int cases;
scanf(" %d",&cases);
while(cases--) solveCase();
}
The problem setter approach has exponential complexity. It is pretty good for N=30. But not so for larger sizes. I think, it's more interesting to find an exponential time solution. And I found one, with O(N4) complexity.
This approach uses the fact that optimal solution starts with some group of consecutive maximal elements and extends only this single group until whole array is filled with maximal values.
To prove this fact, take 2 starting groups of consecutive maximal elements and extend each of them in optimal way until they merge into one group. Suppose that group 1 needs X turns to grow to size M, group 2 needs Y turns to grow to the same size M, and on turn X + Y + 1 these groups merge. The result is a group of size at least M * 4. Now instead of turn Y for group 2, make an additional turn X + 1 for group 1. In this case group sizes are at least M * 2 and at most M / 2 (even if we count initially maximal elements, that might be included in step Y). After this change, on turn X + Y + 1 the merged group size is at least M * 4 only as a result of the first group extension, add to this at least one element from second group. So extending a single group here produces larger group in same number of steps (and if Y > 1, it even requires less steps). Since this works for equal group sizes (M), it will work even better for non-equal groups. This proof may be extended to the case of several groups (more than two).
To work with single group of consecutive maximal elements, we need to keep track of only two values: starting and ending positions of the group. Which means it is possible to use a triangular matrix to store all possible groups, allowing to use a dynamic programming algorithm.
Pseudo-code:
For each group of consecutive maximal elements in original array:
Mark corresponding element in the matrix and clear other elements
For each matrix diagonal, starting with one, containing this element:
For each marked element in this diagonal:
Retrieve current number of turns from this matrix element
(use indexes of this matrix element to initialize p1 and p2)
p2 = end of the group
p1 = start of the group
Decrease p1 while it is possible to keep median at maximum value
(now all values between p1 and p2 are assumed as maximal)
While p2 < N:
Check if number of maximal elements in the array is >= N/2
If this is true, compare current number of turns with the best result \
and update it if necessary
(additional matrix with number of maximal values between each pair of
points may be used to count elements to the left of p1 and to the
right of p2)
Look at position [p1, p2] in the matrix. Mark it and if it contains \
larger number of turns, update it
Repeat:
Increase p1 while it points to maximal value
Increment p1 (to skip one non-maximum value)
Increase p2 while it is possible to keep median at maximum value
while median is not at maximum value
To keep algorithm simple, I didn't mention special cases when group starts at position 0 or ends at position N, skipped initialization and didn't make any optimizations.

Counting the bits set in the Fibonacci number system?

We know that, each non negative decimal number can be represented uniquely by sum of Fibonacci numbers(here we are concerned about minimal representation i.e- no consecutive Fibonacci numbers are taken in the representation of a number and also each Fibonacci number is taken at most one in the representation).
For example:
1-> 1
2-> 10
3->100
4->101, here f1=1 , f2=2 and f(n)=f(n-1)+f(n-2);
so each decimal number can be represented in the Fibonacci system as a binary sequence. If we write all natural numbers successively in Fibonacci system, we will obtain a sequence like this: 110100101… This is called “Fibonacci bit sequence of natural numbers”.
My task is is counting the numbers of times that bit 1 appears in first N bits of this sequence.Since N can take value from 1 to 10^15,Can i do this without storing the Fibonacci sequence ?
for example: if N is 5,the answer is 3.
So this is just a preliminary sketch of an algorithm. It works when the upper bound is itself a Fibonacci number, but I'm not sure how to adapt it for general upper bounds. Hopefully someone can improve upon this.
The general idea is to look at the structure of the Fibonacci encodings. Here are the first few numbers:
0
1
10
100
101
1000
1001
1010
10000
10001
10010
10100
10101
100000
The invariant in each of these numbers is that there's never a pair of consecutive 1s. Given this invariant, we can increment from one number to the next using the following pattern:
If the last digit is 0, set it to 1.
If the last digit is 1, then since there aren't any consecutive 1s, set the last digit to 0 and the next digit to 1.
Eliminate any doubled 1s by setting them both to 0 and setting the next digit to a 1, repeating until all doubled 1s are eliminated.
The reason that this is important is that property (3) tells us something about the structure of these numbers. Let's revisit the first few Fibonacci-encoded numbers once more. Look, for example, at the first three numbers:
00
01
10
Now, look at all four-bit numbers:
1000
1001
1010
The next number will have five digits, as shown here:
1011 → 1100 → 10000
The interesting detail to notice is that the number of numbers with four digits is equal to the number of values with up to two digits. In fact, we get the four-digit numbers by just prefixing the at-most-two-digit-numbers with 10.
Now, look at three-digit numbers:
000
001
010
100
101
And look at five-digit numbers:
10000
10001
10010
10100
10101
Notice that the five-digit numbers are just the three-digit numbers with 10 prefixed.
This gives us a very interesting way for counting up how many 1s there are. Specifically, if you look at (k+2)-digit numbers, each of them is just a k-digit number with a 10 prefixed to it. This means that if there are B 1s total in all of the k-digit numbers, the number of Bs total in numbers that are just k+2 digits is equal to B plus the number of k-digit numbers, since we're just replaying the sequence with an extra 1 prepended to each number.
We can exploit this to compute the number of 1s in the Fibonacci codings that have at most k digits in them. The trick is as follows - if for each number of digits we keep track of
How many numbers have at most that many digits (call this N(d)), and
How many 1s are represented numbers with at most d digits (call this B(d)).
We can use this information to compute these two pieces of information for one more digit. It's a beautiful DP recurrence. Initially, we seed it as follows. For one digit, N(d) = 2 and B(d) is 1, since for one digit the numbers are 0 and 1. For two digits, N(d) = 3 (there's just one two-digit number, 10, and the two one-digit numbers 0 and 1) and B(d) is 2 (one from 1, one from 10). From there, we have that
N(d + 2) = N(d) + N(d + 1). This is because the number of numbers with up to d + 2 digits is the number of numbers with up to d + 1 digits (N(d + 1)), plus the numbers formed by prefixing 10 to numbers with d digits (N(d))
B(d + 2) = B(d + 1) + B(d) + N(d) (The number of total 1 bits in numbers of length at most d + 2 is the total number of 1 bits in numbers of length at most d + 1, plus the extra we get from numbers of just d + 2 digits)
For example, we get the following:
d N(d) B(d)
---------------------
1 2 1
2 3 2
3 5 5
4 8 10
5 13 20
We can actually check this. For 1-digit numbers, there are a total of 1 one bit used. For 2-digit numbers, there are two ones (1 and 10). For 3-digit numbers, there are five 1s (1, 10, 100, 101). For four-digit numbers, there are 10 ones (the five previous, plus 1000, 1001, 1010). Extending this outward gives us the sequence that we'd like.
This is extremely easy to compute - we can compute the value for k digits in time O(k) with just O(1) memory usage if we reuse space from before. Since the Fibonacci numbers grow exponentially quickly, this means that if we have some number N and want to find the sum of all 1s bits to the largest Fibonacci number smaller than N, we can do so in time O(log N) and space O(1).
That said, I'm not sure how to adapt this to work with general upper bounds. However, I'm optimistic that there is some way to do it. This is a beautiful recurrence and there just has to be a nice way to generalize it.
Hope this helps! Thanks for an awesome problem!
Lest solve 3 problems. Each next is harder then previous, each one uses result of previous.
1. How many ones are set if you write down every number from 0 to fib[i]-1.
Call this dp[i]. Lets look at the numbers
0
1
10
100
101
1000
1001
1010 <-- we want to count ones up to here
10000
If you write all numbers up to fib[i]-1, first you write all numbers up to fib[i-1]-1 (dp[i-1]), then you write the last block of numbers. There are exactly fib[i-2] of those numbers, each has a one on the first position, so we add fib[i-2], and if you erase those ones
000
001
010
then remove leading zeros, you can see that each number from 0 to fib[i-2]-1 is written down. Numbers of one there is equal to dp[i-2], which gives us:
dp[i] = fib[i-2] + dp[i-2] + dp[i-1];
2. How many ones are set if you write down every number from 0 to n.
0
1
10
100
101
1000
1001 <-- we want to count ones up to here
1010
Lets call this solNumber(n)
Suppose, that your number is f[i] + x, where f[i] is a maximum possible fibonacci number. Then anser if dp[i] + solNumber(x). This can be proved in the same way as in point 1.
3. How many ones are set in first n digits.
3a. How many numbers have representation length exactly l
if l = 1 the answer is 1, else its fib[l-2] + 1.
You can note, that if you erase leading ones and then all leading zeros you'll have each number from 0 to fib[l-1]-1. Exactly fib[l] numbers.
//End of 3a
Now you can find such number m than, if you write all numbers from 1 to m, their total length will be <=n. But if you write all from 1 to m+1, total length will be > n. Solve the problem manually for m+1 and add solNumber(m).
All 3 problems are solved in O(log n)
#include <iostream>
using namespace std;
#define FOR(i, a, b) for(int i = a; i < b; ++i)
#define RFOR(i, b, a) for(int i = b - 1; i >= a; --i)
#define REP(i, N) FOR(i, 0, N)
#define RREP(i, N) RFOR(i, N, 0)
typedef long long Long;
const int MAXL = 30;
long long fib[MAXL];
//How much ones are if you write down the representation of first fib[i]-1 natural numbers
long long dp[MAXL];
void buildDP()
{
fib[0] = 1;
fib[1] = 1;
FOR(i,2,MAXL)
fib[i] = fib[i-1] + fib[i-2];
dp[0] = 0;
dp[1] = 0;
dp[2] = 1;
FOR(i,3,MAXL)
dp[i] = fib[i-2] + dp[i-2] + dp[i-1];
}
//How much ones are if you write down the representation of first n natural numbers
Long solNumber(Long n)
{
if(n == 0)
return n;
Long res = 0;
RREP(i,MAXL)
if(n>=fib[i])
{
n -= fib[i];
res += dp[i];
res += (n+1);
}
return res;
}
int solManual(Long num, Long n)
{
int cr = 0;
RREP(i,MAXL)
{
if(n == 0)
break;
if(num>=fib[i])
{
num -= fib[i];
++cr;
}
if(cr != 0)
--n;
}
return cr;
}
Long num(int l)
{
if(l<=2)
return 1;
return fib[l-1];
}
Long sol(Long n)
{
//length of fibonacci representation
int l = 1;
//totatl acumulated length
int cl = 0;
while(num(l)*l + cl <= n)
{
cl += num(l)*l;
++l;
}
//Number of digits, that represent numbers with maxlength
Long nn = n - cl;
//Number of full numbers;
Long t = nn/l;
//The last full number
n = fib[l] + t-1;
return solNumber(n) + solManual(n+1, nn%l);
}
int main(int argc, char** argv)
{
ios_base::sync_with_stdio(false);
buildDP();
Long n;
while(cin>>n)
cout<<"ANS: "<<sol(n)<<endl;
return 0;
}
Compute m, the number responsible for the (N+1)th bit of the sequence. Compute the contribution of m to the count.
We have reduced the problem to counting the number of one bits in the range [1, m). In the style of interval trees, partition this range into O(log N) subranges, each having an associated glob like 10100???? that matches the representations of exactly the numbers belonging to that range. It is easy to compute the contribution of the prefixes.
We have reduced the problem to counting the total number T(k) of one bits in all Fibonacci words of length k (i.e., the ???? part of the globs). T(k) is given by the following recurrence.
T(0) = 0
T(1) = 1
T(k) = T(k - 1) + T(k - 2) + F(k - 2)
Mathematica says there's a closed form solution, but it looks awful and isn't needed for this polylog(N)-time algorithm.
This is not a full answer but it does outline how you can do this calculation without using brute force.
The Fibonacci representation of Fn is a 1 followed by n-1 zeros.
For the numbers from Fn up to but not including F(n+1), the number of 1's consists of two parts:
There are F(n-1) such numbers, so there are F(n-1) leading 1's.
The binary digits after the leading numbers are just the binary representations of all numbers up to but not including F(n-1).
So, if we call the total number of bits in the sequence up to but not including the nth Fibonacci number an, then we have the following recursion:
a(n+1) = an + F(n-1) + a(n-1)
You can also easily get the number of bits in the sequence up to Fn.
If it takes k Fibonacci numbers to get to (but not pass) N, then you can count those bits with the above formula, and after some further manipulation reduce the problem to counting the number of bits in the remaining sequence.
[Edit] : Basically I have followed the property that for any number n which is to be represented in fibonacci base, we can break it as n = n - x where x is the largest fibonacci just less than n. Using this property, any number can be broken in bit form.
First step is finding the decimal number such that Nth bit ends in it.
We can see that all numbers between fibonacci number F(n) and F(n+1) will have same number of bits. Using this, we can pre-calculate a table and find the appropriate number.
Lets say that you have the decimal number D at which there is the Nth bit.
Now, let X be the largest fibonacci number lesser than or equal to D.
To find set bits for all numbers from 1 to D we represnt it as ...
X+0, X+1, X+2, .... X + D-X. So, all the X will be repsented by 1 at the end and we have broken the problem into a much smaller sub-problem. That is, we need to find all set bits till D-X. We keep doing this recusively. Using the same logic, we can build a table which has appropriate number of set bits count for all fibonacci numbers (till limit). We would use this table for finding number of set bits from 1 to X.
So,
Findsetbits(D) { // finds number of set bits from 1 to D.
find X; // largest fibonacci number just less than D
ans = tablesetbits[X];
ans += 1 * (D-x+1); // All 1s at the end due to X+0,X+1,...
ans += Findsetbits(D-x);
return ans;
}
I tried some examples by hand and saw the pattern.
I have coded a rough solution which I have checked by hand for N <= 35. It works pretty fast for large numbers, though I can't be sure that it is correct. If it is an online judge problem, please give the link to it.
#include<iostream>
#include<vector>
#include<map>
#include<algorithm>
using namespace std;
#define pb push_back
typedef long long LL;
vector<LL>numbits;
vector<LL>fib;
vector<LL>numones;
vector<LL>cfones;
void init() {
fib.pb(1);
fib.pb(2);
int i = 2;
LL c = 1;
while ( c < 100000000000000LL ) {
c = fib[i-1] + fib[i-2];
i++;
fib.pb(c);
}
}
LL answer(LL n) {
if (n <= 3) return n;
int a = (lower_bound(fib.begin(),fib.end(),n))-fib.begin();
int c = 1;
if (fib[a] == n) {
c = 0;
}
LL ans = cfones[a-1-c] ;
return ans + answer(n - fib[a-c]) + 1 * (n - fib[a-c] + 1);
}
int fillarr(vector<int>& a, LL n) {
if (n == 0)return -1;
if (n == 1) {
a[0] = 1;
return 0;
}
int in = lower_bound(fib.begin(),fib.end(),n) - fib.begin(),v=0;
if (fib[in] != n) v = 1;
LL c = n - fib[in-v];
a[in-v] = 1;
fillarr(a, c);
return in-v;
}
int main() {
init();
numbits.pb(1);
int b = 2;
LL c;
for (int i = 1; i < fib.size()-2; i++) {
c = fib[i+1] - fib[i] ;
c = c*(LL)b;
b++;
numbits.pb(c);
}
for (int i = 1; i < numbits.size(); i++) {
numbits[i] += numbits[i-1];
}
numones.pb(1);
cfones.pb(1);
numones.pb(1);
cfones.pb(2);
numones.pb(1);
cfones.pb(5);
for (int i = 3; i < fib.size(); i++ ) {
LL c = 0;
c += cfones[i-2]+ 1 * fib[i-1];
numones.pb(c);
cfones.pb(c + cfones[i-1]);
}
for (int i = 1; i < numones.size(); i++) {
numones[i] += numones[i-1];
}
LL N;
cin>>N;
if (N == 1) {
cout<<1<<"\n";
return 0;
}
// find the integer just before Nth bit
int pos;
for (int i = 0;; i++) {
if (numbits[i] >= N) {
pos = i;
break;
}
}
LL temp = (N-numbits[pos-1])/(pos+1);
LL temp1 = (N-numbits[pos-1]);
LL num = fib[pos]-1 + (temp1>0?temp+(temp1%(pos+1)?1:0):0);
temp1 -= temp*(pos+1);
if(!temp1) temp1 = pos+1;
vector<int>arr(70,0);
int in = fillarr(arr, num);
int sub = 0;
for (int i = in-(temp1); i >= 0; i--) {
if (arr[i] == 1)
sub += 1;
}
cout<<"\nNumber answer "<<num<<" "<<answer(num) - sub<<"\n";
return 0;
}
Here is O((log n)^3).
Lets compute how many numbers fits in first N bits
Imagine that we have function:
long long number_of_all_bits_in_sequence(long long M);
It computes length of "Fibonacci bit sequence of natural numbers" created by all numbers that aren't greater than M.
With this function we could use binary search to find how many numbers fits in the first N bits.
How many bits are 1's in representation of first M numbers
Lets create function which calculates how many numbers <= M have 1 at k-th bit.
long long kth_bit_equal_1(long long M, int k);
First lets preprocess results of this function for all small values, lets say M <= 1000000.
Implementation for M > PREPROCESS_LIMIT:
long long kth_bit_equal_1(long long M, int k) {
if (M <= PREPROCESS_LIMIT) return preprocess_result[M][k];
long long fib_number = greatest_fib_which_isnt_greater_than(M);
int fib_index = index_of_fib_in_fibonnaci_sequence(fib);
if (fib_index < k) {
// all numbers are smaller than k-th fibbonacci number
return 0;
}
if (fib_index == k) {
// only numbers between [fib_number, M] have k-th bit set to 1
return M - fib_number + 1;
}
if (fib_index > k) {
long long result = 0;
// all numbers between [fib_number, M] have bit at fib_index set to 1
// so lets subtrack fib_number from all numbers in this interval
// now this interval is [0, M - fib_number]
// lets calculate how many numbers in this inteval have k-th bit set.
result += kth_bit_equal_1(M - fib_number, k);
// don't forget about remaining numbers (interval [1, fib_number - 1])
result += kth_bit_equal_1(fib_number - 1, k);
return result;
}
}
Complexity of this function is O(M / PREPROCESS_LIMIT).
Notice that in reccurence one of the addends is always one of fibbonaci numbers.
kth_bit_equal_1(fib_number - 1, k);
So if we memorize all computed results than complexity will improve to T(N) = T(N/2) + O(1) . T(n) = O(log N).
Lets get back to number_of_all_bits_in_sequence
We can slighly modify kth_bit_equal_1 so it would also count bits equal to 0.
Here's a way to count all the one digits in the set of numbers up to a given digit length bound. This seems to me to be a reasonable starting point for a solution
Consider 10 digits. Start by writing;
0000000000
Now we can turn some number of these zeros into ones, keeping the last digit always as a 0. Consider the possibilities case by case.
0 There's just one way to chose 0 of these to be ones. Summing the 1-bits in this one case gives 0.
1 There are {9 choose 1} ways to turn one of the zeros into a one. Each of these contributes 1.
2 There are {8 choose 2} ways to turn two of the zeros into ones. Each of these contributes 2.
...
5 There are {5 choose 5} ways to turn five of the zeros into ones. Each of these contributes 5 to the bit count.
It's easy to think of this as a tiling problem. The string of 10 zeros is a 10x1 board, which we want to tile with 1x1 squares and 2x1 dominoes. Choosing some number of the zeros to be ones is then the same as choosing some of the tiles to be dominoes. My solution is closely related to Identity 4 in "Proofs that really count" by Benjamin and Quinn.
Second step Now try to use the above construction to solve the original problem
Suppose we want to the one bits in the first 100100010 bits (the number is in Fibonacci representation of course). Start by overcounting the sum for all ways to replace the x's with zeros and ones in 10xxxxx0. To overcompensate for overcounting, subract the count for 10xxx0. Continue the procedure of overcounting and overcompensation.
This problem has a dynamic solution, as illustrated by the tested algorithm below.
Some points to keep in mind, which are evident in the code:
The best solution for each number i will be obtained by using the fibonacci number f where f == i
OR where f is less than i then it must be f and the greatest number n <= f: i = f+n.
Note that the fib sequence is memoized over the entire algorithm.
public static int[] fibonacciBitSequenceOfNaturalNumbers(int num) {
int[] setBits = new int[num + 1];
setBits[0] = 0;//anchor case of fib seq
setBits[1] = 1;//anchor case of fib seq
int a = 1, b = 1;//anchor case of fib seq
for (int i = 2; i <= num; i++) {
int c = b;
while (c < i) {
c = a + b;
a = b;
b = c;
}//fib
if (c == i) {
setBits[i] = 1;
continue;
}
c = a;
int tmp = c;//to optimize further, make tmp the fib before a
while (c + tmp != i) {
tmp--;
}
setBits[i] = 1 + setBits[tmp];
}//done
return setBits;
}
Test with:
public static void main(String... args) {
int[] arr = fibonacciBitSequenceOfNaturalNumbers(23);
//print result
for(int i=1; i<arr.length; i++)
System.out.format("%d has %d%n", i, arr[i]);
}
RESULT OF TEST: i has x set bits
1 has 1
2 has 1
3 has 1
4 has 2
5 has 1
6 has 2
7 has 2
8 has 1
9 has 2
10 has 2
11 has 2
12 has 3
13 has 1
14 has 2
15 has 2
16 has 2
17 has 3
18 has 2
19 has 3
20 has 3
21 has 1
22 has 2
23 has 2
EDIT BASED ON COMMENT:
//to return total number of set between 1 and n inclusive
//instead of returning as in original post, replace with this code
int total = 0;
for(int i: setBits)
total+=i;
return total;

Resources