minimize variance of k integers from n ordered integers

minimize variance of k integers from n ordered integers - algorithm

Given a series of n integers and a number k, n>k, what's the solution of minimizing the variance of k new integers? You may add up any successive integers to a new integer and thus reduce n integers to k integers.
Here is an example. Given n=4, k=2, the series of integers are 4,4,1,1. The solution is 4,6 instead of 8,2 or 9,1.
I have come up with a greedy algorithm which goes like this: for every possible new integers, minimize the absolute value of the difference of this integer and the average of all the integers. But this won't work in some cases. Is there any efficient algorithm works?

The variance of a random variable X is E[(X - E[X])^2]. Here X is a random element of the output list. We know that E[X] is equal to the sum of the input numbers divided by k, so this objective is equivalent to the sum of (x - sum/k)^2 over output values x. This can be accomplished by slightly modifying a word wrap algorithm: Word wrap to X lines instead of maximum width (Least raggedness)

Related

Given numbers a1,a2,...,an whose sum is positive. Find the minimal number s.t. the sum of numbers less than or equal to it is positive, in linear time

Problem: Given n different numbers a1,a2,...,an, whose sum is positive. Show how one can find the minimal number such that the sum of numbers less than or equal to it is positive, in time-complexity of O(n).
Note: the numbers aren't necessarily whole and they aren't necessarily sorted as given.
Some explanation of the problem: if the array was sorted, [x,x,x,y,x,...,x,x,x] and y is the first number such that summing all the numbers up-to it will give a positive/zero sum ( and summing less numbers up-to it will give negative sum ), then y will be returned. ( the x here is just a place holder for a number, all numbers in the array are different )
Attempt:
Define the parameters low , high = 0, n which will serve as boundaries for the summation of the elements within them and also as boundaries for choosing the pivot.
Chose a pivot randomly and partition the array ( for example, by Lomuto's partition ), denote this pivot's index as p'. The partitioning will cost O(n). Sum the numbers from low to p' and designate the sum of these numbers as s.
If s<0 define low=p', and repeat the process of choosing a random pivot ( whose index will be denoted as p' ) and parititoning between low and high and then summing the numbers between these two bounderies as s := s + the new summation value.
Else, define high=p' and repeat the process described in the 'If' condition above.
The process will end when low = high.
Besides a few logical gaps in my attempt, it's overall complexity is O(n) on average and not at worst-case.
Do you have any ideas as to how solve the problem in O(n) time?, I thought maybe using a manipulation of 'Median of Medians' algorithm but I have no idea.
Thanks in advance for any help!

Join (sum) two adjacent elements of an array into one element until its size is K and GCD of new elements is maximum possible

I'm having a problem to solve this one. The task is to make a program that will for a given array of N numbers ( N <= 10^5 ), print a new array that's made by joining any two adjacent elements into their sum, (the sum is replacing these two adjacent elements and the size of the array is smaller by 1), until array's size is K. I need to print a solution where GCD of new elements is maximized. (and also print GCD after printing the array).
Note: Sum of all elements in the given array is not higher than 10^6.
I've realized that I could use prefix sum somehow because the sum of all elements isn't higher than 10^6, but that didn't help me that much.
What is an optimal solution to this problem?

Your GCD will be a divisor of the sum of all elements in the array. Your sum is not greater then 10^6, so the number of divisors is not greater than 240, so you can just check all of this GCDs, and it will be fast enough. You can check if asked gcd is possible in linear time: just go through array while the current sum is not the divisor of wanted gcd. When it is just set the current sum to 0. If you have found at least k blocks, it is possible to get current gcd (you can join any 2 blocks, and gcd will be the same).

Find only two numbers in array that evenly divide each other

Find the only two numbers in an array where one evenly divides the other - that is, where the result of the division operation is a whole number
Input Arrays Output
5 9 2 8 8/2 = 4
9 4 7 3 9/3 = 3
3 8 6 5 6/3 = 2
The brute force approach of having nested loops has time complexity of O(n^2). Is there any better way with less time complexity?
This question is part of advent of code.

Given an array of numbers A, you can identify the denominator by multiplying all the numbers together to give E, then testing each ith element by dividing E by Ai2. If this is a whole number, you have found the denominator, as no other factors can be introduced by multiplication.
Once you have the denominator, it's a simple task to do a second, independent loop searching for the paired numerator.
This eliminates the n2 comparisons.
Why does this work? First, we have an n-2 collection of non-divisors: abcde..
To complete the array, we also have numerator x and denominator y.
However, we know that x and only x has a factor of y, so it can be expressed as yz (z being a whole remainder from the division of x by y)
When we multiply out all the numbers, we end up with xyabcde.., but as x = yz, we can also say y2zabcde..
When we loop through dividing by the squared i'th element from the array, for most of the elements we create a fraction, e.g. for a:
y2zabcde.. / a2 = y2zbcde.. / a
However, for y and y only:
y2zabcde.. / y^2 = zabcde..
Why doesn't this work? The same is true of the other numbers. There's no guarantee that a and b can't produce another common factor when multiplied. Take the example of [9, 8, 6, 4], 9 and 8 multiplied equals 72, but as they both include prime factors 2 and 3, 72 has a factor of 6, also in the array. When we multiply it all out to 1728, those combine with the original 6 so that it can divide soundly by 36.
How might this be fixed? More accurately, if y is a factor of x, then y's prime factors will uniquely be a subset of x's prime factors, so maybe things can be refined along those lines. Obtaining a prime factorization should not scale according to the size of the array, but comparing subsets would, so it's not clear to me if this is at all useful.

I think that O(n^2) is the best time complexity you can get without any assumptions on the data.
If you can't tell anything about the numbers, knowing that x and y do not divide each other tells you nothing about x and z or y and z for any x, y, z. Therefore, in the worst case you must check all pairs of numbers - equal to n Choose 2 = n*(n-1)/2 = O(n^2).

Clearly, we can get O(n * sqrt(m)), where m is the absolute value range, by listing the pairs of divisors of each element against a hash of unique values in the array. This can be more efficient than O(n^2) depending on the input.
5 9 2 8
list divisor pairs (at most sqrt m iterations per element m)
5 (1,5)
9 (1,9), (3,3)
2 (1,2)
8 (1,8), (2,4) BINGO!

If you prime factorise all the numbers in the array progressively into a tree, when we discover a completely factored number leaf while factoring another number, we know we've found the divisor.
However, given we don't know which number is the divisor, we do need to test all primes up to divisor's largest factor. The largest factor for any m-digit number is, at most, sqrt(m), while the average number of primes below any m-digit number is m / ln(m). This means we will make at most n (sqrt(m) / ln(sqrt(m)) operations with very basic factorization and no optimization.
To be a little more specific, the algorithm should keep track of four things: a common tree of explored prime factors, the original number from the array, its current partial factorization, and its position in the tree.
For each prime number, we should test all numbers in the array (repeatedly to account for repeated factors). If the number divides evenly, we a) update the partial factorization, b) add/navigate to the corresponding child to the tree, c) if the partial factorization is 1, we have found the last factor and can indicate a leaf by adding the terminating '1' child, and d) if not, we can check for other numbers having left a child '1' to indicate they are completely factored.
When we find a child '1', we can identify the other number by multiplying out the partial factorization (e.g. all the parents up the tree) and exit.
For further optimization, we can cache the factorization (both partial and full) of numbers. We can also stop checking further factors of numbers that have a unique factor, narrowing the field of candidates over time.

Explanation of radix sort n x (k/d)

I have looked at the best, average and worst case time for the radix sort algorithm.
The average is N X K / D
I understand that N is the number of elements in the algorithm
I understand that K is the number of keys/buckets
Does anyone know what D represents?
I am going by the table on wikipedia, thanks
Reference - http://en.wikipedia.org/wiki/Sorting_algorithm#Radix_sort

D is the number of digits in base K.
For example, if you have K = 16, and the largest number is 255, D = 2 (16 ^ 2 = 256). If you change K to 4, then Dbecomes 4 (4 ^ 4 = 256).

The running time of Radix Sort is commonly simplified to O(n), but there are several factors that can cause this running time to increase dramatically.
Assigning variables:
n = the number of elements to be sorted/n
L = the length of the elements aka the number of digits in each element
k = the range of the digits in each element (digits range from 1 to k)
The radix sort algorithm performs a bucket sort for each digit in every element.
Therefore, L sorts must be done and each sort takes O(n+k) time because it is a bucket sort of n elements into k buckets. Therefore, the more accurate running time of Radix Sort is O(L(n+k)).
When the range of digits, k, is a small constant, as in decimal numbers, then the running time of Radix Sort can be simplified to O(Ln).
Due to these factors impacting the running time of the Radix Sort algorithm, there are certain considerations that need to be made about using the algorithm to perform a sort in order or it to be efficient. The data needs to:
have a fixed length (can choose to pad elements in order to create a uniform length for all elements)
have length of the elements, L, be linear in n
have the digit range, k, be linear in n

"Programming Pearls": Sampling m elements from a sequence

From Programming Pearls: Column 12: A Sample Problem:
The input consists of two integers m and n, with m < n. The output is
a sorted list of m random integers in the range 0..n-1 in which no
integer occurs more than once. For probability buffs, we desire a
sorted selection without replacement in which each selection occurs
with equal probability.
The author provides one solution:
initialize set S to empty
size = 0
while size < m do
t = bigrand() % n
if t is not in S
insert t into S
size++
print the elements of S in sorted order
In the above pseudocode, bigrand() is a function returns a large random integer (much larger than m and n).
Can anyone help me prove the correctness of the above algorithm?
According to my understanding, every output should have the probability of 1/C(n, m).
How to prove the above algorithm can guarantee the output with the probability of 1/C(n, m)?

Each solution this algorithm yields is valid.
How many solutions are there?
Up to last line there (sorting) are n*(n-1)*(n-2)*..*(n-m) different permutations or
n!/(n-m)! and each result has same probability
When you sort you reduce number of possible solutions by m!.
So number of possible outputs is n!/((n-m)!*m!) and this is what you asked for.
n!/((n-m)!m!) = C(n,m)

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio