How to get the minimal absolute value of the differences? - algorithm

Given an array a[], and do the operation a[i]-x, a[j]+x (x <= a[i]) to two elements of this array for each time. After at most K times operation like that, to ensure the value of max(abs (a[i] - a[j])) is smallest, and get this smallest value?
My solution:
Each time, choose two number from this array, and ensure their sum is constant. After K times operation,
we can get the minimal absolute value of the difference of two elements in the array.
However, I do not know whether my idea is correct? if not, how to solve it correctly?

If I correctly understand your algorithm/question there is no need to make any calculations during performing a[i]-x, a[j]+x operations. So my suggestion is:
1) make required number of a[i]-x, a[j]+x operations
2) do the following procedure (in pseudo-code):
_aSorted[] = sort(_a[])
_dif = max integer value
for (i=0; i < _a[].length - 1; i++){
if(abs(_aSorted[i]-_aSorted[i+1]) < _dif)
_dif = abs(_aSorted[i] -_aSorted[i+1]);
}
So after this procedure _dif holds the required result

Related

Algorithm to find matching real values in a list

I have a complex algorithm which calculates the result of a function f(x). In the real world f(x) is a continuous function. However due to rounding errors in the algorithm this is not the case in the computer program. The following diagram gives an example:
Furthermore I have a list of several thousands values Fi.
I am looking for all the x values which meet an Fi value i.e. f(xi)=Fi
I can solve this problem with by simply iterating through the x values like in the following pseudo code:
for i=0 to NumberOfChecks-1 do
begin
//calculate the function result with the algorithm
x=i*(xmax-xmin)/NumberOfChecks;
FunctionResult=CalculateFunctionResultWithAlgorithm(x);
//loop through the value list to see if the function result matches a value in the list
for j=0 to NumberOfValuesInTheList-1 do
begin
if Abs(FunctionResult-ListValues[j])<Epsilon then
begin
//mark that element j of the list matches
//and store the corresponding x value in the list
end
end
end
Of course it is necessary to use a high number of checks. Otherwise I will miss some x values. The higher the number of checks the more complete and accurate is the result. It is acceptable that the list is 90% or 95% complete.
The problem is that this brute force approach takes too much time. As I mentioned before the algorithm for f(x) is quite complex and with a high number of checks it takes too much time.
What would be a better solution for this problem?
Another way to do this is in two parts: generate all of the results, sort them, and then merge with the sorted list of existing results.
First step is to compute all of the results and save them along with the x value that generated them. That is:
results = list of <x, result>
for i = 0 to numberOfChecks
//calculate the function result with the algorithm
x=i*(xmax-xmin)/NumberOfChecks;
FunctionResult=CalculateFunctionResultWithAlgorithm(x);
results.Add(x, FunctionResult)
end for
Now, sort the results list by FunctionResult, and also sort the FunctionResult-ListValues array by result.
You now have two sorted lists that you can move through linearly:
i = 0, j = 0;
while (i < results.length && j < ListValues.length)
{
diff = ListValues[j] - results[i];
if (Abs(diff) < Episilon)
{
// mark this one with the x value
// and move to the next result
i = i + 1
}
else if (diff > 0)
{
// list value is much larger than result. Move to next result.
i = i + 1
}
else
{
// list value is much smaller than result. Move to next list value.
j = j + 1
}
}
Sort the list, producing an array SortedListValues that contains
the sorted ListValues and an array SortedListValueIndices that
contains the index in the original array of each entry in
SortedListValues. You only actually need the second of these and
you can create both of them with a single sort by sorting an array
of tuples of (value, index) using value as the sort key.
Iterate over your range in 0..NumberOfChecks-1 and compute the
value of the function at each step, and then use a binary chop
method to search for it in the sorted list.
Pseudo-code:
// sort as described above
SortedListValueIndices = sortIndices(ListValues);
for i=0 to NumberOfChecks-1 do
begin
//calculate the function result with the algorithm
x=i*(xmax-xmin)/NumberOfChecks;
FunctionResult=CalculateFunctionResultWithAlgorithm(x);
// do a binary chop to find the closest element in the list
highIndex = NumberOfValuesInTheList-1;
lowIndex = 0;
while true do
begin
if Abs(FunctionResult-ListValues[SortedListValueIndices[lowIndex]])<Epsilon then
begin
// find all elements in the range that match, breaking out
// of the loop as soon as one doesn't
for j=lowIndex to NumberOfValuesInTheList-1 do
begin
if Abs(FunctionResult-ListValues[SortedListValueIndices[j]])>=Epsilon then
break
//mark that element SortedListValueIndices[j] of the list matches
//and store the corresponding x value in the list
end
// break out of the binary chop loop
break
end
// break out of the loop once the indices match
if highIndex <= lowIndex then
break
// do the binary chop searching, adjusting the indices:
middleIndex = (lowIndex + 1 + highIndex) / 2;
if ListValues[SortedListValueIndices[middleIndex] < FunctionResult then
lowIndex = middleIndex;
else
begin
highIndex = middleIndex;
lowIndex = lowIndex + 1;
end
end
end
Possible complications:
The binary chop isn't taking the epsilon into account. Depending on
your data this may or may not be an issue. If it is acceptable that
the list is only 90 or 95% complete this might be ok. If not then
you'll need to widen the range to take it into account.
I've assumed you want to be able to match multiple x values for each FunctionResult. If that's not necessary you can simplify the code.
Naturally this depends very much on the data, and especially on the numeric distribution of Fi. Another problem is that the f(x) looks very jumpy, eliminating the concept of "assumption of nearby value".
But one could optimise the search.
Picture below.
Walking through F(x) at sufficient granularity, define a rough min
(red line) and max (green line), using suitable tolerance (the "air"
or "gap" in between). The area between min and max is "AREA".
See where each Fi-value hits AREA, do a stacked marking ("MARKING") at X-axis accordingly (can be multiple segments of X).
Where lots of MARKINGs at top of each other (higher sum - the vertical black "sum" arrows), do dense hit tests, hence increasing the overall
chance to get as many hits as possible. Elsewhere do more sparse tests.
Tighten this schema (decrease tolerance) as much as you dare.
EDIT: Fi is a bit confusing. Is it an ordered array or does it have random order (as i assumed)?
Jim Mischel's solution would work in a O(i+j) instead of the O(i*j) solution that you currently have. But, there is a (very) minor bug in his code. The correct code would be :
diff = ListValues[j] - results[i]; //no abs() here
if (abs(diff) < Episilon) //add abs() here
{
// mark this one with the x value
// and move to the next result
i = i + 1
}
the best methods will relay on the nature of your function f(x).
The best solution is if you can create the reversing to F(x) and use it
as you said F(x) is continuous:
therefore you can start evaluating small amount of far points, then find ranges that makes sense, and refine your "assumption" for x that f(x)=Fi
it is not bullet proof, but it is an option.
e.g. Fi=5.7; f(1)=1.4 ,f(4)=4,f(16)=12.6, f(10)=10.1, f(7)=6.5, f(5)=5.1, f(6)=5.8, you can take 5 < x < 7
on the same line as #1, and IF F(x) is hard to calculate, you can use Interpolation, and then evaluate F(x) only at the values that are probable.

For Loop. Why is it less than < not less than or equal to <=?

Below is a question from a tutorial I'm doing.
Code the first line of a for loop with the usual counter, the usual starting value, and the usual incrementing. Limit the number of loops by the number of elements in the array pets.
My answer is:
for (var i = 0; i <= pets.length; i++) {
The tutorial answer is:
for (var i = 0; i < pets.length; i++) {
Why is it < if we are trying to find the length of the array?
In programming languages, most of the time, indexes and arrays start at 0 and not 1. So, the first element would be 0, and not 1.
Therefor, you need to put less than as you need to compensate for the numbering system.
Cheers
Imagine you have an array of size 1. On the first iteration, i would be zero and fulfill both conditions. On the second, i would only fulfill the <=, but remember you've already looped through every element in the array, so you will likely get an error in your loop for trying to access an element not in your array.
Arrays are indexed starting with 0, and up to arr.length - 1. The last index does not have the same index value as the length of the array. Notice, that by starting at zero and iterating up to the length of the array minus one, the entire length of the array has still been traversed.
You start counting from 0 and not 1. Consider what would happen if you put in an equals there. It would try to access array[pets.length] which is an array out of bounds exception in most languages. pets.length gives you the NUMBER OF ITEMS in the array. What you need is an index. Starting from 0 and not 1 you can go upto pets.length - 1. Hope that clears it up.

Find all number pairs in a given range

I have N numbers let say 20 30 15 30 30 40 15 20. Now I want to find how many numbers pairs are in a given range.(L and R given).
number pair= both numbers are same.
My approach:
Create a Map of Array, such that key of map= number, and value=ArrayList of indexes at which that number appears. Then I traverse from L to R and for each value in that range I traverse in the corresponding arraylist to find if there is a pair that fits in range, and then increment count.
But I think this approach is too slow. Is there some faster method to do the same?
Example: for above given sequence and L=0 and R=6
Answer=5. Possible pairs are 1 for 20, 1 for 15 and 3 for 30.
I am developing a solution, assuming numbers can be upto 10^8( and non negative).
If you are looking for speed and don't care about memory there's maybe a better way.
You can use a set as an auxiliary data structure to see if a number was found, and then simply walk the array. Pseudo code:
int numPairs = 0;
set setVisited;
for (int i = L; i < R; i++) {
if (setVisited.contains(a[i])) {
// found the second of a pair. count it up and reset.
numPairs++;
setVisited.remove(a[i]);
} else {
// remember that we saw this number, so we can spot the next pair.
setVisited.add(a[i]);
}
New solution... hopefully better this time. Psuedo C-ish code:
// Sort the sub-array a[L..R]. This can be done O(nlogn) using qsort.
// ... code omitted ...
// Walk through the sorted array counting how many times number occurs.
// When the number changes, count how many possibles ways to make pairs
// from the given count.
int totalPairs = 0;
int count = 1;
int current = a[L];
for (i = L+1; i < R; i++) {
if (a[i] == current) { // found another, keep counting
count++;
} else { // found a different one
if (count > 1) { // need at least 2 to make a pair!
totalPairs += factorial(count) / 2;
}
}
// start counting the new one
current = a[i];
count = 1;
}
// count the final one
if (count > 1) {
totalPairs += factorial(count) / 2;
}
The sort runs O(nlgn), and the loop body runs O(n). Interestingly the performance barrier is now factorial. For really long arrays with really high numbers of occurrences, factorial is expensive unless you optimize further.
One way would be to have loop count repetitions but not compute factorial yet -- leave yet another array of counts of numbers. Then sort this array (again Nlg(N)), then walk through this array and re-use previously computed factorial to compute the next one.
Also if this array gets big, you'll need a large integer to represent the total. I don't know the O() performance of large integers off the top of my head.
Cool problem!

Number of distinct sequences of fixed length which can be generated using a given set of numbers

I am trying to find different sequences of fixed length which can be generated using the numbers from a given set (distinct elements) such that each element from set should appear in the sequence. Below is my logic:
eg. Let the set consists of S elements, and we have to generate sequences of length K (K >= S)
1) First we have to choose S places out of K and place each element from the set in random order. So, C(K,S)*S!
2) After that, remaining places can be filled from any values from the set. So, the factor
(K-S)^S should be multiplied.
So, overall result is
C(K,S)S!((K-S)^S)
But, I am getting wrong answer. Please help.
PS: C(K,S) : No. of ways selecting S elements out of K elements (K>=S) irrespective of order. Also, ^ : power symbol i.e 2^3 = 8.
Here is my code in python:
# m is the no. of element to select from a set of n elements
# fact is a list containing factorial values i.e. fact[0] = 1, fact[3] = 6& so on.
def ways(m,n):
res = fact[n]/fact[n-m+1]*((n-m)**m)
return res
What you are looking for is the number of surjective functions whose domain is a set of K elements (the K positions that we are filling out in the output sequence) and the image is a set of S elements (your input set). I think this should work:
static int Count(int K, int S)
{
int sum = 0;
for (int i = 1; i <= S; i++)
{
sum += Pow(-1, (S-i)) * Fact(S) / (Fact(i) * Fact(S - i)) * Pow(i, K);
}
return sum;
}
...where Pow and Fact are what you would expect.
Check out this this math.se question.
Here's why your approach won't work. I didn't check the code, just your explanation of the logic behind it, but I'm pretty sure I understand what you're trying to do. Let's take for example K = 4, S = {7,8,9}. Let's examine the sequence 7,8,9,7. It is a unique sequence, but you can get to it by:
Randomly choosing positions 1,2,3, filling them randomly with 7,8,9 (your step 1), then randomly choosing 7 for the remaining position 4 (your step 2).
Randomly choosing positions 2,3,4, filling them randomly with 8,9,7 (your step 1), then randomly choosing 7 for the remaining position 1 (your step 2).
By your logic, you will count it both ways, even though it should be counted only once as the end result is the same. And so on...

Find max subset of a huge set of integers

I have a huge set (S) of long unsigned integers in a .txt file. How can I find the max subset (Pmax) of S with the following property:
P{X1,X2,X3,...,Xn) | X1>=(Xn/4)
More details:
When I say max subset I mean the subset with the greatest number of elements(n->max).
I can't load .txt into an array because of limited memory.
My system memory is 200MB
txt file has 10^6 integers. Each integer can be long unsigned 32bit.
I need to find the biggest subset of S with the condition:
X1 < X2 < X3 < ... < Xn-1 < Xn such as X1 >= (XN/4)
For example if the txt file has the following:
15,14,13,4,2,2,3,10,1,2,2
then these are the possible subsets:
P1(4,10,13,14,15)
P2(3,4,10)
P3(1,2,2,2,2,3,4)
so Pmax(1,2,2,2,2,3,4) because it has more elements.
In fact I don't want to find exactly which is the Pmax. I just want to find the number of elements of the subset Pmax. So here it is 7.
The algorithm should be really fast.
I don't look for someone to do my work. I just need a corresponding problem so I can look for the efficient solution. Thanks in advance!!!
Assuming your condition means "where all elements in the subset are larger than X1 divided by 4" you'd need 2 simple nested loops and some helper variables.
In pseudocode something like this should work:
var idx = 0, largest = 0, currentIdx = 0;
while(var current = getIntegerFromFileById(currentIdx))
{
var size = 1;
while(getIntegerFromFileById(currentIdx + size++) > current / 4);
if(size > largest) {
idx = currentIdx;
largest = size;
}
currentIdx++;
}
print "Longest subset is at index {idx}.";
print "It contains {largest} consecutive elements.";
This is also the de facto optimal implementation. The most obvious optimization would be to load the integers progressively in an in-memory buffer during the scan to prevent double I/O operations.
In case I misunderstood the condition this should still be easily adaptable to most other conditions, the surrounding algorithm stays the same, you just modify the condition in the inner while.
The easiest solution is:
Sort the list first (Complexity O(nlogn)
With a moving window, find the largest acceptable window. (Complexity O(n))
Complexity: O(nlogn).
More details about step2:
Let low keep track of the lowest element and high the highest element.
Initialization: Set low to the first element. Do a binary search for 4*x[low], and that is your high location. Set maxWindow=high-low+1.
At every step: Increment high by 1, and increment low such that x[low]>=x[high]. Calculate number of elements = high-low+1, and update maxWindow accordingly.

Resources