Finding the interval with the highest summed count?

Finding the interval with the highest summed count? - genetic-algorithm

Given a set of entries, each containing a time index and a int count value,
ie
class Entry
{
time:int
count:int
}
write a function that will give the time interval with the highest count together,
ie,
if we had entries
100, 2
100, 1
110, 10
200, 4
1000, 3
1200, 8
and we ran something like
int highestInterval(int interval_range)
highestInterval( 50 )
it would return 100, because in 100-150, you have counts 2, 1, and 10.
I managed to get a O(n^2) solution for it, but I think theres a better solution. I think it might have to do with some preprocessing of the interval buckets, but I can't figure out the solution.

It seems that you made it already using two for loops so it is a mere question of improvement.
Here goes one possible solution:
CODE:
raw_data=[100,2;
100,1;
110,10;
200,4;
1000,3;
1200,8];
[max_val,indx]=max(cell2mat(arrayfun(#(A) sum(raw_data(abs(raw_data(A,1)-raw_data(:,1))<50,2)),1:size(raw_data,1),'UniformOutput',false)));
raw_data(indx,1)
OUTPUT:
ans =
100

Related

Algorithm using a combination of numbers to achieve a target or exceed it in the most efficient way

I am looking into a problem given a list and a target. I can use any number in the list multiple times to achieve the target or slightly exceed it.
It needs to be the most efficient combo. The ones I have been finding try to hit the target and if they can't then we return nothing.
For example, if I have a target of 242 and a list of 40,and 100, 240.
the most efficient would be to use 40 four times and 100 once. That gives us 260.
I tried going down the approach of using remainders. I would start with the largest number, see what remains
Just going down the algo first (not the most efficient way)
242 % 240 --> Quotient: 1, Remainder: 2--> So Use 240 + 240 = 480.
242 % 100 --> Quotient: 2, Remainder: 42 --> Use 100, 100, 100 = 300 --> Better
242 % 40 --> Quotient: 6, Remainder: 2 --> Use 6*40 + 40 = 280 --> Even better.
Try to use a combo
242 % 240 --> Remainder is 2. Try using the next smallest size. 240 + 100 --> 340. Bad
242 % 100 --> Remainder is 42. Try using the next smallest size. 40 + 40. 100 + 100 + 40 + 40. 280. Better.
Last case doesn't matter.
None of these work. I need to determine that 100 + 40 + 40 +40 + 40 = 260. This would be the best.
Do I need to go through every combination of potential values? Any direction would be helpful.

Here is a solution using A* search. It is guaranteed to find the path to the smallest amount over, using the least coins, in polynomial time. If a greedy solution works, it will get there very quickly. If it has to backtrack, it will backtrack as little as it needs to.
Note the k hack is to break all comparisons from heapq.heappush. In particular we never would want to wind up comparing down to the potential trailing None at the end (which would be a problem).
import heapq
from collections import namedtuple
def min_change (target, denominations):
denominations = list(reversed(sorted(denominations)))
k = 0
CoinChain = namedtuple('CoinChain', ['over', 'est', 'k', 'coins', 'value', 'i', 'prev'])
queue = [CoinChain(0, target/denominations[0], k, 0, 0, 0, None)]
found = {}
while True: # will break out when we have our answer.
chain = heapq.heappop(queue)
if target <= chain.value:
# Found it!
answer = []
while chain.prev is not None:
answer.append(denominations[chain.i])
chain = chain.prev
return list(reversed(answer))
elif chain.value in found and found[chain.value] <= chain.i:
continue # We can't be better than the solution that was here.
else:
found[chain.value] = chain.i # Mark that we've been here.
i = chain.i
while i < len(denominations):
k = k+1
heapq.heappush(
queue,
CoinChain(
max(chain.value + denominations[i] - target, 0),
chain.coins + 1,
k,
chain.coins + 1,
chain.value + denominations[i],
i,
chain
)
)
i += 1
print(min_change(242, [40, 100, 240]))

This is actually kind of knapsack problem or "change-making" alghoritm: https://en.wikipedia.org/wiki/Change-making_problem
The change-making problem addresses the question of finding the minimum number of coins (of certain denominations) that add up to a given amount of money. It is a special case of the integer knapsack problem, and has applications wider than just currency.
Easy way to implement it is backtracking with these rules:
Add highest possible value to your solution
Sum up the values of your solution
If treshold was exceeded, replace the last highest value with lower one. If there is already lowest possible value, remove the value and lower the next value
Example:
current: [] = 0, adding 240
current: [240] = 240, adding 240
current: [240, 240] = 480, replacing with lower
current: [240, 100] = 340, replacing with lower
current: [240, 40] = 280, replacing with lower not possible. Remove and lower next value
current: [100] = 100. Adding highest
current: [100, 240] = 340. Replacing with lower
current: [100, 100] = 200. Adding highest
....
Eventually you get to [100,40,40,40,40] = 260
Just read about that there can be amount that cannot be achieved. I suppose those are the rules:
If the value can be achieved with coins, the correct solution is the exact value with lowest possible number of coins.
If value cannot be achieved, then the best solution is the one that exceeds it, but has lowest possible difference (+if there are more solutions with this same value, the one with lowest number of coins wins)
Then you just use what I wrote, but you will also remember solutions that exceeded it. If you find solution that exceeded, you will persist it. If you find solution with better results (less exceeding or same value, but less coins), you replace it as your "best solution".
You have to go through all the possibilities (basically to the state when this alghoritm deletes all the values and cannot do anything anymore) to find the optimal solution.
You have to remember the solution that is the best so far and then return it at the end.

Ranking algorithm with multiples factors

So I'd like to rank items depending on multiples factors, however some are more important than others.
Concretely, I've a list of products which all have the following properties:
A price
A weight (in kilogrammes)
A time to build (in minutes)
A size (in centimetres)
Each property has a different scale, and I know the min & max range of them.
For example the price are between 10 and 200, while the weight are between 1.2 and 3.4, etc.
I'd like to apply a priority to the size, then to the time to build, weight and finally the price.
However, I'd like to ensure that no matter the time to build, the weight or the price values are, the size should be the first things that should matters.
For example:
[{
price: 320,
size: 10,
weight: 0.4
time: 4
},
{
price: 230,
size: 5,
weight: 1.2
time: 23
},
{
price: 230,
size: 10,
weight: 1.2
time: 23
}]
should results in:
[{
price: 230,
size: 5, // the lowest the better
weight: 1.2
time: 23
},
{
price: 320, // the higher the better
size: 10,
weight: 0.4
time: 4
},
{
price: 230,
size: 10,
weight: 1.2
time: 23
}]
I'm not very good at math and I don't really know where to start.
I'm thinking of something like scale each values on the same range (for example from 0 to 100) and them apply a factor the resulting range value and them add them all before to sort or something like that.
Any ideas.

It looks like you want to sort by size in increasing order, then if two objects have the same size then by time in increasing order, then by weight in increasing order, then by price in decreasing order (according to the comment in your example).
If you are using a language like Python, just put the four values into a tuple, with the items to be in decreasing order with their values replaced with their negatives. For each item your key is
(size, time, weight, -price)
Then Python itself will sort those tuples appropriately--it is built into the language. This is the easiest thing to do.
If you are using a language without sorted tuples, or for some reason you really want the key to be a floating point value, you can do this. For each factor, look at the known minimum and maximum. Use those to scale that factor to a number between 1 and 10, perhaps including 1 but not including 10. Make sure 1 goes with the value to be sorted first. This can be done with
scale1to10 = (value - min) / (max + 1 - min) * 9 + 1
for increasing factors, and with
scale1to10 = (max - value) / (max + 1 - min) * 9 + 1
for decreasing factors. Then combine all the factors into one "4-digit" number, as in
scale = scalesize * 1000 + scaletime * 100 + scaleweight * 10 + scaleprice
Note that the most important factor is multiplied by the highest power of 10, while the lowest has no multiplier.

Your sorting function should have the ability to pass a comparison function. Pass a comparison function that tests each field in order, only moving to the next if everything so far is equal. In most languages this is not too hard to do. My guess (based on your posting history) is that you're using PHP so http://www.the-art-of-web.com/php/sortarray/ is likely to be the best guide to how to do that.

Algorithm for scaling one list of ranges to another

I have a constant base list, like this:
[50, 100, 150, 200, 500, 1000]
The list defines ranges: 0 to 50, 50 to 100, and do on until 1000 to infinity.
I want to write a function for transforming any list of numbers into a list compatible with the above. By "compatible" I mean it has only numbers from that list in it, but the numbers are as close to the original value as possible. So for an example input of [111, 255, 950], I would get [100, 200, 1000]. So far I have a naive code that works like this:
for each i in input
{
calculate proximity to each number in base list
get the closest number
remove that number from the base list
return the number
}
This works fine for most scenarios, but breaks down when the input scale goes way out of hand. When I have an input like [1000, 2000, 3000], the first number gets the last number from the base list, then 2000 and 3000 get respectively 500 and 200 (since 1000 and then 500 are already taken). This results in a backwards list [1000, 500, 200].
How would I guard against that?

Approach 1
This can be solved in O(n^3) time by using the Hungarian algorithm where n is max(len(list),len(input)).
First set up a matrix that gives the cost of assigning each input to each number in the list.
matrix[i,j] = abs(input[i]-list[j])
Then use the Hungarian algorithm to find the minimum cost matching of inputs to numbers in the list.
If you have more numbers in the list than inputs, then add some extra dummy inputs which have zero cost of matching with any number in the list.
Approach 2
If the first approach is too slow, then you could use dynamic programming to compute the best fit.
The idea is to compute a function A(a,b) which gives the best match of the first a inputs to the first b numbers in your list.
A(a,b) = min( A(a-1,b-1)+matrix[a,b], A(a,b-1) )
This should give an O(n^2) solution but will require a bit more effort in order to read back the solution.

Maximum value of postage stamps on an envelope

The postage stamp problem is a mathematical riddle that asks what is the smallest postage value which cannot be placed on an envelope, if the letter can hold only a limited number of stamps, and these may only have certain specified face values.
For example, suppose the envelope can hold only three stamps, and the available stamp values are 1 cent, 2 cents, 5 cents, and 20 cents. Then the solution is 13 cents; since any smaller value can be obtained with at most three stamps (e.g. 4 = 2 + 2, 8 = 5 + 2 + 1, etc.), but to get 13 cents one must use at least four stamps.
Is there an algorithm that given the maximum amount of stamps allowed and the face value of the stamps, one can find the smallest postage that cannot be placed on the envelope?
Another example:
Maximum of 5 stamps can be used
Valued: 1, 4, 12, 21
The smallest value that cannot be reached is 72. Values 1-71 can be created with a certain combination.
In the end I will probably be using Java to code this.

Yes, there is such an algorithm. Naively: starting with 1, try every possible combination of stamps until we find a combination that yields a sum of 1, then try for 2, and so on. Your algorithm finishes when it finds a number such that no combination of stamps adds to that number.
Albeit possibly slow, for small enough problems (say 100 stamps, 10 positions) you can solve this in a "reasonable" amount of time...
But, for large problems, ones where we have many stamps available (say 1000s) and many possible positions (say 1000s), this algorithm might take an intractable amount of time. That is, for very large problems, the time to solve the problem using this approach might be say, the lifetime of the universe, and thus it's not really useful to you.
If you have really large problems you need to find ways to speed up your search, these speedups are called heuristics. You can't beat the problem, but you can possibly solve the problem faster than the naive approach by applying some sort of domain knowledge.
A simple way to improve this naive approach might be that any time you try a combination of stamps that doesn't equal the number you're looking for you remove that combination from the possible set to try for any future number, and mark that future number as unavailable. Said another way: keep a list of numbers you've found already and the combinations that got you there, then don't look for those numbers or their combinations again.

Here is another tip: Every set of stamps that adds up to some given number can be formed by adding 1 stamp to a minimum-sized set of stamps that adds up to less than that number.
For example, suppose we have the stamps 1, 2, 7, 12, and 50, and a limit of 5 stamps, and we want to find out whether 82 can be represented. To get that 82, you must add either:
A 1 to a set of stamps adding up to 82-1=81, or
A 2 to a set of stamps adding up to 82-2=80, or
A 7 to a set of stamps adding up to 82-7=75, or
A 12 to a set of stamps adding up to 82-12=70, or
A 50 to a set of stamps adding up to 82-50=32.
Those are the only possible ways that 82 can be formed. Among all those 5 possibilities, one (or possibly more than one) will have the minimum number of stamps. If that minimum number is > 5, then 82 can't be represented with stamps.
Notice also that if a number can be represented, you need to record the minimum number of stamps needed for it so that calculations for higher numbers can use it.
This, plus Steve Jessop's answer, will hopefully get your mind on the right track for a dynamic programming solution... If you're still stumped, let me know.

Rather than exhaustively computing the sums of all the possible combinations of stamps (perhaps by recursion), consider all the possible sums, and work out what the smallest number of stamps is to produce each sum. There are loads of combinations of stamps, but a lot fewer distinct sums.
In the example you gave in a comment, 10 stamps fit on an envelope, and no stamp has value greater than 100. There are n^10 combinations of stamps, where n is the number of denominations of stamp available. But the greatest possible sum of 10 stamps is only 1000. Create an array up to 1001, and try to think of an efficient way to work out, for all of those values together, the least number of stamps required to make each one. Your answer is then the least index requiring 11 (or more) stamps, and you can cap each stamp-count at 11, too.
"Efficient" in this case basically means, "avoid repeating any work you don't have to". So you're going to want to re-use intermediate results as much as possible.
If that's not enough of a hint then either (a) I'm wrong about the approach (in which case sorry, I haven't actually solved the problem myself before answering) or (b) update to say how far you've got along those lines.

Maybe it's a bit unhelpful to just give "hints" about a DP solution when there is speculation that one even exists. So here is runnable Perl code implementing the actual DP algorithm:
#!/usr/bin/perl
my ($n, #stamps) = #ARGV;
my #_solved; # Will grow as necessary
# How many stamps are needed to represent a value of $v cents?
sub solve($) {
my ($v) = #_;
my $min = $n + 1;
return 0 if $v == 0;
foreach (#stamps) {
if ($v >= $_) {
my $try = $_solved[$v - $_] + 1;
$min = $try if $try < $min;
}
}
$_solved[$v] = $min;
return $min;
}
my $max = (sort { $a <=> $b } #stamps)[-1];
# Main loop
for (my $i = 0; $i <= $max * $n; ++$i) {
my $ans = solve($i);
if ($ans > $n) {
print "$i cannot be represented with <= $n stamps of values " . join(", ", #stamps) . ".\n";
last;
}
}
Ordinarily solve() would require a recursive call, but because we always try values in the order 0, 1, 2, 3..., we can just use the #_solved array directly to get the answer for smaller problem sizes.
This takes 93ms on my PC to solve the case for stamp sizes 1, 4, 12, 21 and envelope size 1000. (The answer is 20967.) A compiled language will be even faster.

import java.util.ArrayList;
import java.util.List;
/**
*
* #author Anandh
*
*/
public class MinimumStamp {
/**
* #param args
*/
public static void main(String[] args) {
// TODO Auto-generated method stub
int stamps[]={90,30,24,15,12,10,5,3,2,1};
int stampAmount = 70;
List<Integer> stampList = minimumStamp(stamps, stampAmount);
System.out.println("Minimum no.of stamps required-->"+stampList.size());
System.out.println("Stamp List-->"+minimumStamp(stamps, stampAmount));
}
public static List<Integer> minimumStamp(int[] stamps, int totalStampAmount){
List<Integer> stampList = new ArrayList<Integer>();
int sumOfStamps = 0;
int remainingStampAmount = 0;
for (int currentStampAmount : stamps) {
remainingStampAmount = totalStampAmount-sumOfStamps;
if(remainingStampAmount%currentStampAmount == 0){
int howMany = remainingStampAmount / currentStampAmount;
while(howMany>0){
stampList.add(currentStampAmount);
howMany--;
}
break;
}else if(totalStampAmount == (sumOfStamps+currentStampAmount)){
stampList.add(currentStampAmount);
break;
}else if(totalStampAmount > (sumOfStamps+currentStampAmount) ){
int howMany = remainingStampAmount / currentStampAmount;
if(howMany>0){
while(howMany>0){
stampList.add(currentStampAmount);
sumOfStamps += currentStampAmount;
howMany--;
}
}else{
stampList.add(currentStampAmount);
sumOfStamps += currentStampAmount;
}
}
}
return stampList;
}
}

Algorithm for nice graph labels for time/date axis?

I'm looking for a "nice numbers" algorithm for determining the labels on a date/time value axis. I'm familiar with Paul Heckbert's Nice Numbers algorithm.
I have a plot that displays time/date on the X axis and the user can zoom in and look at a smaller time frame. I'm looking for an algorithm that picks nice dates to display on the ticks.
For example:
Looking at a day or so: 1/1 12:00, 1/1 4:00, 1/1 8:00...
Looking at a week: 1/1, 1/2, 1/3...
Looking at a month: 1/09, 2/09, 3/09...
The nice label ticks don't need to correspond to the first visible point, but close to it.
Is anybody familiar with such an algorithm?

The 'nice numbers' article you linked to mentioned that
the nicest numbers in decimal are 1, 2, 5 and all power-of-10 multiples of these numbers
So I think for doing something similar with date/time you need to start by similarly breaking down the component pieces. So take the nice factors of each type of interval:
If you're showing seconds or minutes use 1, 2, 3, 5, 10, 15, 30
(I skipped 6, 12, 15, 20 because they don't "feel" right).
If you're showing hours use 1, 2, 3, 4, 6, 8, 12
for days use 1, 2, 7
for weeks use 1, 2, 4 (13 and 26 fit the model but seem too odd to me)
for months use 1, 2, 3, 4, 6
for years use 1, 2, 5 and power-of-10 multiples
Now obviously this starts to break down as you get into larger amounts. Certainly you don't want to do show 5 weeks worth of minutes, even in "pretty" intervals of 30 minutes or something. On the other hand, when you only have 48 hours worth, you don't want to show 1 day intervals. The trick as you have already pointed out is finding decent transition points.
Just on a hunch, I would say a reasonable crossover point would be about twice as much as the next interval. That would give you the following (min and max number of intervals shown afterwards)
use seconds if you have less than 2 minutes worth (1-120)
use minutes if you have less than 2 hours worth (2-120)
use hours if you have less than 2 days worth (2-48)
use days if you have less than 2 weeks worth (2-14)
use weeks if you have less than 2 months worth (2-8/9)
use months if you have less than 2 years worth (2-24)
otherwise use years (although you could continue with decades, centuries, etc if your ranges can be that long)
Unfortunately, our inconsistent time intervals mean that you end up with some cases that can have over 1 hundred intervals while others have at most 8 or 9. So you'll want to pick the size of your intervals such than you don't have more than 10-15 intervals at most (or less than 5 for that matter). Also, you could break from a strict definition of 2 times the next biggest interval if you think its easy to keep track of. For instance, you could use hours up to 3 days (72 hours) and weeks up to 4 months. A little trial and error might be necessary.
So to go back over, choose the interval type based on the size of your range, then choose the interval size by picking one of the "nice" numbers that will leave you with between 5 and about 15 tick marks. Or if you know and/or can control the actual number of pixels between tick marks you could put upper and lower bounds on how many pixels are acceptable between ticks (if they are spaced too far apart the graph may be hard to read, but if there are too many ticks the graph will be cluttered and your labels may overlap).

Have a look at
http://tools.netsa.cert.org/netsa-python/doc/index.html
It has a nice.py ( python/netsa/data/nice.py ) which i think is stand-alone, and should work fine.

Still no answer to this question... I'll throw my first idea in then! I assume you have the range of the visible axis.
This is probably how I would do.
Rough pseudo:
// quantify range
rangeLength = endOfVisiblePart - startOfVisiblePart;
// qualify range resolution
if (range < "1.5 day") {
resolution = "day"; // it can be a number, e.g.: ..., 3 for day, 4 for week, ...
} else if (range < "9 days") {
resolution = "week";
} else if (range < "35 days") {
resolution = "month";
} // you can expand this in both ways to get from nanoseconds to geological eras if you wish
After that, it should (depending on what you have easy access to) be quite easy to determine the value to each nice label tick. Depending on the 'resolution', you format it differently. E.g.: MM/DD for "week", MM:SS for "minute", etc., just like you said.

[Edit - I expanded this a little more at http://www.acooke.org/cute/AutoScalin0.html ]
A naive extension of the "nice numbers" algorithm seems to work for base 12 and 60, which gives good intervals for hours and minutes. This is code I just hacked together:
LIM10 = (10, [(1.5, 1), (3, 2), (7, 5)], [1, 2, 5])
LIM12 = (12, [(1.5, 1), (3, 2), (8, 6)], [1, 2, 6])
LIM60 = (60, [(1.5, 1), (20, 15), (40, 30)], [1, 15, 40])
def heckbert_d(lo, hi, ntick=5, limits=None):
'''
Heckbert's "nice numbers" algorithm for graph ranges, from "Graphics Gems".
'''
if limits is None:
limits = LIM10
(base, rfs, fs) = limits
def nicenum(x, round):
step = base ** floor(log(x)/log(base))
f = float(x) / step
nf = base
if round:
for (a, b) in rfs:
if f < a:
nf = b
break
else:
for a in fs:
if f <= a:
nf = a
break
return nf * step
delta = nicenum(hi-lo, False)
return nicenum(delta / (ntick-1), True)
def heckbert(lo, hi, ntick=5, limits=None):
'''
Heckbert's "nice numbers" algorithm for graph ranges, from "Graphics Gems".
'''
def _heckbert():
d = heckbert_d(lo, hi, ntick=ntick, limits=limits)
graphlo = floor(lo / d) * d
graphhi = ceil(hi / d) * d
fmt = '%' + '.%df' % max(-floor(log10(d)), 0)
value = graphlo
while value < graphhi + 0.5*d:
yield fmt % value
value += d
return list(_heckbert())
So, for example, if you want to display seconds from 0 to 60,
>>> heckbert(0, 60, limits=LIM60)
['0', '15', '30', '45', '60']
or hours from 0 to 5:
>>> heckbert(0, 5, limits=LIM12)
['0', '2', '4', '6']

I'd suggest you grab the source code to gnuplot or RRDTool (or even Flot) and examine how they approach this problem. The general case is likely to be N labels applied based on width of your plot, which some kind of 'snapping' to the nearest 'nice' number.
Every time I've written such an algorithm (too many times really), I've used a table of 'preferences'... ie: based on the time range on the plot, decide if I'm using Weeks, Days, Hours, Minutes etc as the main axis point. I usually included some preferred formatting, as I rarely want to see the date for each minute I plot on the graph.
I'd be happy but surprised to find someone using a formula (like Heckbert does) to find 'nice', as the variation in time units between minutes, hours, days, and weeks are not that linear.

In theory you can also change your concept. Where it is not your data at the center of the visualization, but at the center you have your scale.
When you know the start and the end of the dates of your data, you can create a scale with all dates and dispatch you data in this scale. Like a fixed scales.
You can have a scale of type year, month, day, hours, ... and limit the scaling just to these scales, implying you remove the concept of free scaling.
The advantage is to can easily show dates gaps. But if you have a lot of gaps, that can become also useless.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio