Adjust, sort and average numbers in different values

Adjust, sort and average numbers in different values - sorting

I have set of numbers like:
273 275,91; 30005; 0; 2738; 250,9371; 25; etc...
Result of adjusting:
25000, 25093, 27327, 27380, 30005, 0
Result of sorting:
30005, 27327, 27380, 27327, 25093, 25000
They all have almost same value, in this case 25 000-30 005. I need take average and also sort them but they need to have the similar value. Also I wanna exclude zeros from the average.
They are always within range 50-150% of the average.

C7:
=AVERAGEIF(B5:H5; ">0")
C8:
=SORT(FLATTEN(B2:H5))

Related

Algorithems and probability related question

We are given a function rand() that returns a random number from the segment [0,1],
how can we use this function to create a size 100 uniform array,
of exactly 50 0′s and 50 1′s.

I don't get the question, if you want random numbers why does it have to be exactly 50/50, which is the most like outcome. However you can never guarantee it will be 50/50 because this would contradict the idea of randomness.
Simply create and array filling it with 50/50 each and shuffle it with some fancy algorithm, also keep in mind the implementation of randomness isn't random for most programming languages.

Create array of size 100, set first 50 elements to 0, next 50 to 1. Then make element swaps n times to make it random. For example:
i = 99 * rand(), j = 99 * rand() and swap this elements array[i] <--> array[j]

Non-linear comparison sorting / scoring

I have an array I want to sort based on assigning a score to each element in the array.
Let's say the possible score range is 0-100. And to get that score we are going to use 2 comparison data points, one with a weighting of 75 and one with a weighting of 25. Let's call them valueA and valueB. And we will transpose each value into a score. So:
valueA (range = 0-10,000)
valueB (range = 0-70)
scoreA (range = 0 - 75)
scoreB (range = 0 - 25)
scoreTotal = scoreA + scoreB (0 - 100)
Now the question is how to transpose valueA to scoreA in a non-linear way with heavier weighting for being close to the min value. What I mean by that is that for valueA, 0 would be a perfect score (75), but a value of say 20 would give a mid-point score of 37.5 and a value of say 100 would give a very low score of say 5, and then everything greater would trend towards 0 (e.g. a value of 5,000 would be essentially 0). Ideally I could setup a curve with a few data points (say 4 quartile points) and then the algorithm would fit to that curve. Or maybe the simplest solution is to create a bunch of points on the curve (say 10) and do a linear transposition between each of those 10 points? But I'm hoping there is a much simpler algorithm to accomplish this without figuring out all the points on the curve myself and then having to tweak 10+ variables. I'd rather 1 or 2 inputs to define how steep the curve is. Possible?
I don't need something super complex or accurate, just a simple algorithm so there is greater weighting for being close to the min of the range, and way less weighting for being close to the max of the range. Hopefully this makes sense.
My stats math is so rusty I'm not even sure what this is called for searching for a solution. All those years of calculus and statistics for naught.
I'm implementing this in Objective C, but any c-ish/java-ish pseudo code would be fine.

A function you may want to try is
max / [(log(x+2)/log(2))^N]
where max is either 75 or 25 in your case. The log(x+2)/log(2) part ensures that f(0) == max (you can substitute log(x+C)/log(C) here for any C > 0; a higher C will slow the curve's descent); the ^N determines how quickly your function drops to 0 (you can play around with the function here to get a picture of what's going on)

Algorithm to smooth numbers with variable input time

I have an app that accepts integers at a variable rate every .25 to 2 seconds.
I'd like to output the data in a smoothed format for 3, 5 or 7 seconds depending on user input.
If the data always came in at the same rate, let's say every .25 seconds, then this would be easy. The variable rate is what confuses me.
Data might come in like this:
Time - Data
0.25 - 100
0.50 - 102
1.00 - 110
1.25 - 108
2.25 - 107
2.50 - 102
ect...
I'd like to display a 3 second rolling average every .25 seconds on my display.
The simplest form of doing this is to put each item into an array with a time stamp.
array.push([0.25, 100])
array.push([0.50, 102])
array.push([1.00, 110])
array.push([1.25, 108])
ect...
Then every .25 seconds I would read through the array, back to front, until I got to a time that was less than now() - rollingAverageTime. I would sum that and display it. I would then .Shift() the beginning of the array.
That seems not very efficient though. I was wondering if someone had a better way to do this.

Why don't you save the timestamp of the starting value and then accumulate the values and the number of samples until you get a timestamp that is >= startingTime + rollingAverageTime and then divide the accumulator by the number of samples taken?
EDIT:
If you want to preserve the number of samples, you can do this way:
Take the accumulator, and for each input value sum it and store the value and the timestamp in a shift register; at every cycle, you have to compare the latest sample's timestamp with the oldest timestamp in the shift register plus the smoothing time; if it's equal or more, subtract the oldest saved value from the accumulator, delete that entry from the shift register and output the accumulator, divided by the smoothing time. If you iterate you obtain a rolling average with (i think) the least amount of computation for each cycle:
a sum (to increment the accumulator)
a sum and a subtraction (to compare the timestamp)
a subtraction (from the accumulator)
a division (to calculate the average, done in a smart way can be a shift right)
For a total of about 4 algebric sums and a division (or shift)
EDIT:
For taking into account the time from the last sample as a weighting factor, you can divide the value for the ratio between this time and the averaging time, and you obtain an already weighted average, without having to divide the accumulator.
I added this part because it doesn't add computational load, so you can implement quite easy if you want to.

The answer from clabacchio has the basics right, but perhaps you need a bit more sophisticated answer.
Calculating the average:
0.25 - 100
0.50 - 102
1.00 - 110
In the above subset of the data what is the answer you want? You could use the mean of these numbers or you could do it in a weighted fashion. You could convert the data into:
0.50 - 0.25 = 0.25 ---- (100+102)/2 = 101
1.00 - 0.50 = 0.50 ---- (102+110)/2 = 106
Then you can take the weighted average of these values, weight being the time difference, and value being the average value.
The final answer = (0.25*101 + 0.5*106)/(0.25+0.5) = whatever the value is.
Now coming to "moving" averages:
You can either use previous k values or previous k seconds worth of data. In both cases you can keep two sums: weighted sum and sum of weights.

So... the worst case scenario is 4 readings per second over 7 seconds = 28 values in your array to process. That will be done in nanoseconds anyway, so not worth optimizing IMHO.

Create multiple combinations summing to 100

I would like to be able to create multiple combinations that sum to 100%, given a defined number of "buckets" with a defined 'difference factor'. In the below example, the difference is a factor of 20 to make it simple, but I will probably reduce it to 1 in the final solution.
For example, with 3 "buckets" A, B, C you could have:
A 100 80 80 60 60 ... 0
B 0 20 0 20 40 ... 0
C 0 0 20 20 0 ... 100
Each column is one combination (summing to 100) that I would like to store and do further calculations on.
This is a business problem and not homework.
Please help me come up with a solution. A brute force way would be to create a multi-dimension array for every possible combination, e.g. 100x100x100 and then go through each 1 million combination to see which ones sum to 100. However this looks like it will be way too inefficient.
Much appreciated. I hope I have explained clearly enough.

This problem is known as partitions rather than combinations, which is something different.
First off: the 'difference factor' just turns the problem from finding partitions of 100 to (in your example) finding partitions of 5 (then multiplying by 20).
Next up: If the number of buckets is constant, you can just do (pseudo code):
for i = 0 to n
for j = 0 to n-i
output (i, j, n-(i+j))
If the number of buckets is going to be dynamic, you'd have to be a bit cleverer, but this approach will basically work.

This looks like it would yield well to a bit of cacheing and dynamic programming.
fun partition (partitions_left, value):
if partitions_left == 0
return empty_list
if value == 0:
return list of list of partitions_left 0 elements
return_value = empty_list
for possible_value from value downto 1:
remainder = value-possible_value
children = partition(partitions_left-1, remainder)
for child in children:
append (cons of possible_value and child) to return_value
return return_value
If you also make sure that you serve already-computed values from the cache, "all" you need to then do is to generate all possible permutations of all generated partitions.

Algorithm wise you could make a list of all the numbers between 0 and 100 in steps of 20 in list A, then make a copy of list A to be list B.
Next, compare each of list A's values to list B seeing which values add up to 100 or fewer and store a record of these in list C. Next, do the same to list C again (checking all the values between 0 and 100 with a step of 20) to see which values add up to 100.

Algorithm For Ranking Items

I have a list of 6500 items that I would like to trade or invest in. (Not for real money, but for a certain game.) Each item has 5 numbers that will be used to rank it among the others.
Total quantity of item traded per day: The higher this number, the better.
The Donchian Channel of the item over the last 5 days: The higher this number, the better.
The median spread of the price: The lower this number, the better.
The spread of the 20 day moving average for the item: The lower this number, the better.
The spread of the 5 day moving average for the item: The higher this number, the better.
All 5 numbers have the same 'weight', or in other words, they should all affect the final number in the with the same worth or value.
At the moment, I just multiply all 5 numbers for each item, but it doesn't rank the items the way I would them to be ranked. I just want to combine all 5 numbers into a weighted number that I can use to rank all 6500 items, but I'm unsure of how to do this correctly or mathematically.
Note: The total quantity of the item traded per day and the donchian channel are numbers that are much higher then the spreads, which are more of percentage type numbers. This is probably the reason why multiplying them all together didn't work for me; the quantity traded per day and the donchian channel had a much bigger role in the final number.

The reason people are having trouble answering this question is we have no way of comparing two different "attributes". If there were just two attributes, say quantity traded and median price spread, would (20million,50%) be worse or better than (100,1%)? Only you can decide this.
Converting everything into the same size numbers could help, this is what is known as "normalisation". A good way of doing this is the z-score which Prasad mentions. This is a statistical concept, looking at how the quantity varies. You need to make some assumptions about the statistical distributions of your numbers to use this.
Things like spreads are probably normally distributed - shaped like a normal distribution. For these, as Prasad says, take z(spread) = (spread-mean(spreads))/standardDeviation(spreads).
Things like the quantity traded might be a Power law distribution. For these you might want to take the log() before calculating the mean and sd. That is the z score is z(qty) = (log(qty)-mean(log(quantities)))/sd(log(quantities)).
Then just add up the z-score for each attribute.
To do this for each attribute you will need to have an idea of its distribution. You could guess but the best way is plot a graph and have a look. You might also want to plot graphs on log scales. See wikipedia for a long list.

You can replace each attribute-vector x (of length N = 6500) by the z-score of the vector Z(x), where
Z(x) = (x - mean(x))/sd(x).
This would transform them into the same "scale", and then you can add up the Z-scores (with equal weights) to get a final score, and rank the N=6500 items by this total score. If you can find in your problem some other attribute-vector that would be an indicator of "goodness" (say the 10-day return of the security?), then you could fit a regression model of this predicted attribute against these z-scored variables, to figure out the best non-uniform weights.

Start each item with a score of 0. For each of the 5 numbers, sort the list by that number and add each item's ranking in that sorting to its score. Then, just sort the items by the combined score.

You would usually normalize your data entries to their respective range. Since there is no fixed range for them, you'll have to use a sliding range - or, to keep it simpler, normalize them to the daily ranges.
For each day, get all entries for a given type, get the highest and the lowest of them, determine the difference between them. Let Bottom=value of the lowest, Range=difference between highest and lowest. Then you calculate for each entry (value - Bottom)/Range, which will result in something between 0.0 and 1.0. These are the numbers you can continue to work with, then.
Pseudocode (brackets replaced by indentation to make easier to read):
double maxvalues[5];
double minvalues[5];
// init arrays with any item
for(i=0; i<5; i++)
maxvalues[i] = items[0][i];
minvalues[i] = items[0][i];
// find minimum and maximum values
foreach (items as item)
for(i=0; i<5; i++)
if (minvalues[i] > item[i])
minvalues[i] = item[i];
if (maxvalues[i] < item[i])
maxvalues[i] = item[i];
// now scale them - in this case, to the range of 0 to 1.
double scaledItems[sizeof(items)][5];
double t;
foreach(i=0; i<5; i++)
double delta = maxvalues[i] - minvalues[i];
foreach(j=sizeof(items)-1; j>=0; --j)
scaledItems[j][i] = (items[j][i] - minvalues[i]) / delta;
// linear normalization
something like that. I'll be more elegant with a good library (STL, boost, whatever you have on the implementation platform), and the normalization should be in a separate function, so you can replace it with other variations like log() as the need arises.

Total quantity of item traded per day: The higher this number, the better. (a)
The Donchian Channel of the item over the last 5 days: The higher this number, the better. (b)
The median spread of the price: The lower this number, the better. (c)
The spread of the 20 day moving average for the item: The lower this number, the better. (d)
The spread of the 5 day moving average for the item: The higher this number, the better. (e)
a + b -c -d + e = "score" (higher score = better score)

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio