RxJS operator bufferTime - rxjs

Why does the code
interval(1000)
.pipe(
take(15),
bufferTime(5000)
)
.subscribe(sequence => {
console.log(sequence);
});
generates
[ 0, 1, 2, 3 ]
[ 4, 5, 6, 7, 8 ]
[ 9, 10, 11, 12, 13 ]
[ 14 ]
Why the first batch only has 4 items?

This is because interval(1000) will wait initially 1s to emit hence for the first batch the 4 values only. If you want to "fix" this and have 5 in the first batch, use timer(0, 1000) which will do the same as your interval(1000) except that it'll start straight away instead of waiting 1s.

In this case it's probably because interval doesn't emit immediately, so when 5000ms have elapsed, it's possible that only 4 emissions have happened.
I say possible because due to how intervals and timeouts work in JS, there aren't any guarantees of the kind that would assure us of exactly 5 emissions per buffer in your example. Callbacks that are scheduled in a specific amount are only guaranteed to be called sometime after the timeout elapses, not exactly when the duration ends.
There's a decent chance you could see a group of 6 if you run it enough (since 1000 divides so evenly into 5000, even a millisecond of variance would push it over the edge). Using 100ms and 500ms might illustrate it more clearly more quickly, but the principle holds.

Related

Matching problem with multiples assigment

Introduction
I have a bipartite graph with workers(W) and Tasks(T).
Te goal is assign all task to the workers to minimize the maximum time spend. IE finish the last tasks as soon as possible.
Question
What modification to the Hungarian algorithm have to be done to accomplish this task.
If Hungarian algorithm is not useful what could be a good mathematical approach?
Mathematically i don't know how to work with multiple task assignments for workers.
I will implement it in python once i understand the math theory.
Problem
Conditions:
A task can only be assigned to one worker
There isn't any restriction in the amount of task
All task must be assigned
A worker could have multiple task assigned
There isn't any restriction in the amount of workers
A worker could have no assignation.
Workers are not free to start working at the same time
Example
If i have 7 task T={T₁, T₂, T₃, T₄, T₅, T₆, T₇} and 3 workers W={W₁, W₂, W₃}, workers will be free to start working in F={4, 7, 8} (where Fᵢ is the time Wᵢ needs to be free to start working) and the cost matrix is:
A matching example could be (not necessary correct in this case, is just an example):
W₁ = {T₁, T₂, T₃}
W₂ = {T₄, T₅}
W₃ = {T₆, T₇}
in this case the time expend for each worker is:
Time(W₁) = 4+5+4+3 = 16
Time(W₂) = 7+4+9 = 20
Time(W₃) = 8+1+7 = 16
Explained as:
For W₁, we have to wait for:
4 till he is free
after that he will finish T₁ in 5
T₂ in 4
T₃ in 3
giving a total time of 16.
For W₂, we have to wait for:
7 till he is free
After that he will finish T₄ in 4
T₅ in 9
Giving a total time of 20.
For W₃, we have to wait for:
8 till he is free
after that he will finish T₆ in 1
T₇ in 7
Giving a total time of 16.
Goal
Minimize the maximum total time. Not the sum of totals.
If Total times {9, 6, 6} (sum 21) is a solution then {9, 9, 9} (sum 27) is a solution too and {10, 1, 1} (sum 12) is not because in the first and second case the last task is finished at time 9 and in the third case in time 10.

I'm having some trouble while solving Google's foobar challenge.I'm at level 4 and I'm not getting how did we get the path matrix [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I don't want any code.Just explain me the question (Especially the path matrix).Here is the question :
You and your rescued bunny prisoners need to get out of this collapsing death trap of a space station - and fast! Unfortunately, some of the bunnies have been weakened by their long imprisonment and can't run very fast. Their friends are trying to help them, but this escape would go a lot faster if you also pitched in. The defensive bulkhead doors have begun to close, and if you don't make it through in time, you'll be trapped! You need to grab as many bunnies as you can and get through the bulkheads before they close.
The time it takes to move from your starting point to all of the bunnies and to the bulkhead will be given to you in a square matrix of integers. Each row will tell you the time it takes to get to the start, first bunny, second bunny, ..., last bunny, and the bulkhead in that order. The order of the rows follows the same pattern (start, each bunny, bulkhead). The bunnies can jump into your arms, so picking them up is instantaneous, and arriving at the bulkhead at the same time as it seals still allows for a successful, if dramatic, escape. (Don't worry, any bunnies you don't pick up will be able to escape with you since they no longer have to carry the ones you did pick up.) You can revisit different spots if you wish, and moving to the bulkhead doesn't mean you have to immediately leave - you can move to and from the bulkhead to pick up additional bunnies if time permits.
In addition to spending time traveling between bunnies, some paths interact with the space station's security checkpoints and add time back to the clock. Adding time to the clock will delay the closing of the bulkhead doors, and if the time goes back up to 0 or a positive number after the doors have already closed, it triggers the bulkhead to reopen. Therefore, it might be possible to walk in a circle and keep gaining time: that is, each time a path is traversed, the same amount of time is used or added.
Write a function of the form answer(times, time_limit) to calculate the most bunnies you can pick up and which bunnies they are, while still escaping through the bulkhead before the doors close for good. If there are multiple sets of bunnies of the same size, return the set of bunnies with the lowest prisoner IDs (as indexes) in sorted order. The bunnies are represented as a sorted list by prisoner ID, with the first bunny being 0. There are at most 5 bunnies, and time_limit is a non-negative integer that is at most 999.
For instance, in the case of
[
[0, 2, 2, 2, -1], # 0 = Start
[9, 0, 2, 2, -1], # 1 = Bunny 0
[9, 3, 0, 2, -1], # 2 = Bunny 1
[9, 3, 2, 0, -1], # 3 = Bunny 2
[9, 3, 2, 2, 0], # 4 = Bulkhead
]
and a time limit of 1, the five inner array rows designate the starting point, bunny 0, bunny 1, bunny 2, and the bulkhead door exit respectively. You could take the path:
Start End Delta Time Status
- 0 - 1 Bulkhead initially open
0 4 -1 2
4 2 2 0
2 4 -1 1
4 3 2 -1 Bulkhead closes
3 4 -1 0 Bulkhead reopens; you and the bunnies exit
With this solution, you would pick up bunnies 1 and 2. This is the best combination for this space station hallway, so the answer is [1, 2].
Let's model the problem using graph theory. The position of each interest point (start, each bunny, bulkhead) can be thought of as a vertex. The direct path from each of these points to another will be a weighted edge in the graph.
As you can see, we have a dense graph here as there is a direct path that connects any two interest points directly.
The matrix is just telling you the relative cost in time to the bulkhead closure (a path can have a negative weight if it adds more time before closure of the bulkhead than the actual time required to walk it). It means it is the adjacency matrix modeling the graph we defined above.
As such, each row of the matrix represents the paths from one point to another:
First row is always the starting point. It tells you the impact on the closure time to go from the starting point to any other point (bunnies, bulkhead)
Then come the rows for the bunnies, the second row in your example tells you the time impact to go from the position of bunny #0 to any other point and so on.
Finally you have the paths from the bulkhead to the other points
Some hints to solve the problem:
If there is a negative cycle in the graph, you can escape with all the bunnies and since you are only required to return the set of rescued bunnies... You can exit as soon as a negative cycle is detected!
If there isn't, then you'ld better think about Bellman-Ford and see where this leads you (good thing Bellman-Ford algorithm can also be used to detect negative cycles!)
EDIT: Expliciting the logic behind the matrix
To see how it works let's unroll the given example:
time_limit = 1
times = [
[0, 2, 2, 2, -1], # 0 = Start
[9, 0, 2, 2, -1], # 1 = Bunny 0
[9, 3, 0, 2, -1], # 2 = Bunny 1
[9, 3, 2, 0, -1], # 3 = Bunny 2
[9, 3, 2, 2, 0], # 4 = Bulkhead
]
The deltas are simply coming from the relevant matrix coefficients. Each time you walk a path with a given delta, you have to update the time_limit accordingly:
delta = times[starting_point][ending_point]
time_limit = time_limit - delta
If the time_limit becomes strictly negative, the bulkhead closes. If it gets back up to zero (going through negative paths), it reopens. The question asks you to find the path to take to save the most bunnies and escape with them. This means that such a path must end in bulkhead with time_limit >= 0.
Let's walk through the example and explicit the deltas and the time_limit updates.
The best scenario is to walk the following path (the deltas simply come from the matrix coefficients):
Start --> Bulkhead: cost is time[0][4] # == -1 so time_limit = 1 - (-1) = 2
Bulkhead --> Bunny #1: cost is time[4][2] # == 2 so time_limit = 2 - 2 = 0
Bunny #1 --> Bulkhead: cost is time[2][4] # == -1 so time_limit = 0 - (-1) = 1 (Bunny #1 escapes)
Bulkhead --> Bunny #2: cost is time[4][3] # == 2 so time_limit = 1 - 2 = -1 (Bulkhead closes because time_limit became negative)
Bunny #2 --> Bulkhead: cost is time[3][4] # == -1 so time_limit = -1 - (-1) = 0 (Bulkhead reopens, you escape with Bunny #2)
The set of rescued bunny is thus [1, 2] (Bunny #1 and Bunny #2 ID's, sorted in ascending order as requested per the problem description).

Find a period of eventually periodic sequence

Short explanation.
I have a sequence of numbers [0, 1, 4, 0, 0, 1, 1, 2, 3, 7, 0, 0, 1, 1, 2, 3, 7, 0, 0, 1, 1, 2, 3, 7, 0, 0, 1, 1, 2, 3, 7]. As you see, from the 3-rd value the sequence is periodic with a period [0, 0, 1, 1, 2, 3, 7].
I am trying to automatically extract this period from this sequence. The problem is that neither I know the length of the period, nor do I know from which position the sequence becomes periodic.
Full explanation (might require some math)
I am learning combinatorial game theory and a cornerstone of this theory requires one to calculate Grundy values of a game graph. This produces infinite sequence, which in many cases becomes eventually periodic.
I found a way to efficiently calculate grundy values (it returns me a sequence). I would like to automatically extract offset and period of this sequence. I am aware that seeing a part of the sequence [1, 2, 3, 1, 2, 3] you can't be sure that [1, 2, 3] is a period (who knows may be the next number is 4, which breaks the assumption), but I am not interested in such intricacies (I assume that the sequence is enough to find the real period). Also the problem is the sequence can stop in the middle of the period: [1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, ...] (the period is still 1, 2, 3).
I also need to find the smallest offset and period. For example for original sequence, the offset can be [0, 1, 4, 0, 0] and the period [1, 1, 2, 3, 7, 0, 0], but the smallest is [0, 1, 4] and [0, 0, 1, 1, 2, 3, 7].
My inefficient approach is to try every possible offset and every possible period. Construct the sequence using this data and check whether it is the same as original. I have not done any normal analysis, but it looks like it is at least quadratic in terms of time complexity.
Here is my quick python code (have not tested it properly):
def getPeriod(arr):
min_offset, min_period, n = len(arr), len(arr), len(arr)
best_offset, best_period = [], []
for offset in xrange(n):
start = arr[:offset]
for period_len in xrange(1, (n - offset) / 2):
period = arr[offset: offset+period_len]
attempt = (start + period * (n / period_len + 1))[:n]
if attempt == arr:
if period_len < min_period:
best_offset, best_period = start[::], period[::]
min_offset, min_period = len(start), period_len
elif period_len == min_period and len(start) < min_offset:
best_offset, best_period = start[::], period[::]
min_offset, min_period = len(start), period_len
return best_offset, best_period
Which returns me what I want for my original sequence:
offset [0, 1, 4]
period [0, 0, 1, 1, 2, 3, 7]
Is there anything more efficient?
Remark: If there is a period P1 with length L, then there is also a period P2, with the same length, L, such that the input sequence ends exactly with P2 (i.e. we do not have a partial period involved at the end).
Indeed, a different period of the same length can always be obtained by changing the offset. The new period will be a rotation of the initial period.
For example the following sequence has a period of length 4 and offset 3:
0 0 0 (1 2 3 4) (1 2 3 4) (1 2 3 4) (1 2 3 4) (1 2 3 4) (1 2
but it also has a period with the same length 4 and offset 5, without a partial period at the end:
0 0 0 1 2 (3 4 1 2) (3 4 1 2) (3 4 1 2) (3 4 1 2) (3 4 1 2)
The implication is that we can find the minimum length of a period by processing the sequence in reverse order, and searching the minimum period using zero offset from the end. One possible approach is to simply use your current algorithm on the reversed list, without the need of the loop over offsets.
Now that we know the length of the desired period, we can also find its minimum offset. One possible approach is to try all various offsets (with the advantage of not needing the loop over lengths, since the length is known), however, further optimizations are possible if necessary, e.g. by advancing as much as possible when processing the list from the end, allowing the final repetition of the period (i.e. the one closest to the start of the un-reversed sequence) to be partial.
I would start with constructing histogram of the values in the sequence
So you just make a list of all numbers used in sequence (or significant part of it) and count their occurrence. This is O(n) where n is sequence size.
sort the histogram ascending
This is O(m.log(m)) where m is number of distinct values. You can also ignore low probable numbers (count<treshold) which are most likely in the offset or just irregularities further lowering m. For periodic sequences m <<< n so you can use it as a first marker if the sequence is periodic or not.
find out the period
In the histogram the counts should be around multiples of the n/period. So approximate/find GCD of the histogram counts. The problem is that you need to take into account there are irregularities present in the counts and also in the n (offset part) so you need to compute GCD approximately. for example:
sequence = { 1,1,2,3,3,1,2,3,3,1,2,3,3 }
has ordered histogram:
item,count
2 3
1 4
3 6
the GCD(6,4)=2 and GCD(6,3)=3 you should check at least +/-1 around the GCD results so the possible periods are around:
T = ~n/2 = 13/2 = 6
T = ~n/3 = 13/3 = 4
So check T={3,4,5,6,7} just to be sure. Use always GCD between the highest counts vs. lowest counts. If the sequence has many distinct numbers you can also do a histogram of counts checking only the most common values.
To check period validity just take any item near end or middle of the sequence (just use probable periodic area). Then look for it in close area near probable period before (or after) its occurrence. If found few times you got the right period (or its multiple)
Get the exact period
Just check the found period fractions (T/2, T/3, ...) or do a histogram on the found period and the smallest count tells you how many real periods you got encapsulated so divide by it.
find offset
When you know the period this is easy. Just scan from start take first item and see if after period is there again. If not remember position. Stop at the end or in the middle of sequence ... or on some treshold consequent successes. This is up to O(n) And the last remembered position is the last item in the offset.
[edit1] Was curious so I try to code it in C++
I simplified/skip few things (assuming at least half of the array is periodic) to test if I did not make some silly mistake in my algorithm and here the result (Works as expected):
const int p=10; // min periods for testing
const int n=500; // generated sequence size
int seq[n]; // generated sequence
int offset,period; // generated properties
int i,j,k,e,t0,T;
int hval[n],hcnt[n],hs; // histogram
// generate periodic sequence
Randomize();
offset=Random(n/5);
period=5+Random(n/5);
for (i=0;i<offset+period;i++) seq[i]=Random(n);
for (i=offset,j=i+period;j<n;i++,j++) seq[j]=seq[i];
if ((offset)&&(seq[offset-1]==seq[offset-1+period])) seq[offset-1]++;
// compute histogram O(n) on last half of it
for (hs=0,i=n>>1;i<n;i++)
{
for (e=seq[i],j=0;j<hs;j++)
if (hval[j]==e) { hcnt[j]++; j=-1; break; }
if (j>=0) { hval[hs]=e; hcnt[hs]=1; hs++; }
}
// bubble sort histogram asc O(m^2)
for (e=1,j=hs;e;j--)
for (e=0,i=1;i<j;i++)
if (hcnt[i-1]>hcnt[i])
{ e=hval[i-1]; hval[i-1]=hval[i]; hval[i]=e;
e=hcnt[i-1]; hcnt[i-1]=hcnt[i]; hcnt[i]=e; e=1; }
// test possible periods
for (j=0;j<hs;j++)
if ((!j)||(hcnt[j]!=hcnt[j-1])) // distinct counts only
if (hcnt[j]>1) // more then 1 occurence
for (T=(n>>1)/(hcnt[j]+1);T<=(n>>1)/(hcnt[j]-1);T++)
{
for (i=n-1,e=seq[i],i-=T,k=0;(i>=(n>>1))&&(k<p)&&(e==seq[i]);i-=T,k++);
if ((k>=p)||(i<n>>1)) { j=hs; break; }
}
// compute histogram O(T) on last multiple of period
for (hs=0,i=n-T;i<n;i++)
{
for (e=seq[i],j=0;j<hs;j++)
if (hval[j]==e) { hcnt[j]++; j=-1; break; }
if (j>=0) { hval[hs]=e; hcnt[hs]=1; hs++; }
}
// least count is the period multiple O(m)
for (e=hcnt[0],i=0;i<hs;i++) if (e>hcnt[i]) e=hcnt[i];
if (e) T/=e;
// check/handle error
if (T!=period)
{
return;
}
// search offset size O(n)
for (t0=-1,i=0;i<n-T;i++)
if (seq[i]!=seq[i+T]) t0=i;
t0++;
// check/handle error
if (t0!=offset)
{
return;
}
Code is still not optimized. For n=10000 it takes around 5ms on mine setup. The result is in t0 (offset) and T (period). You may need to play with the treshold constants a bit
I had to do something similar once. I used brute force and some common sense, the solution is not very elegant but it works. The solution always works, but you have to set the right parameters (k,j, con) in the function.
The sequence is saved as a list in the variable seq.
k is the size of the sequence array, if you think your sequence will take long to become periodic then set this k to a big number.
The variable found will tell us if the array passed the periodic test with period j
j is the period.
If you expect a huge period then you must set j to a big number.
We test the periodicity by checking the last j+30 numbers of the sequence.
The bigger the period (j) the more we must check.
As soon as one of the test is passed we exit the function and we return the smaller period.
As you may notice the accuracy depends on the variables j and k but if you set them to very big numbers it will always be correct.
def some_sequence(s0, a, b, m):
try:
seq=[s0]
snext=s0
findseq=True
k=0
while findseq:
snext= (a*snext+b)%m
seq.append(snext)
#UNTIL THIS PART IS JUST TO CREATE THE SEQUENCE (seq) SO IS NOT IMPORTANT
k=k+1
if k>20000:
# I IS OUR LIST INDEX
for i in range(1,len(seq)):
for j in range(1,1000):
found =True
for con in range(j+30):
#THE TRICK IS TO START FROM BEHIND
if not (seq[-i-con]==seq[-i-j-con]):
found = False
if found:
minT=j
findseq=False
return minT
except:
return None
simplified version
def get_min_period(sequence,max_period,test_numb):
seq=sequence
if max_period+test_numb > len(sequence):
print("max_period+test_numb cannot be bigger than the seq length")
return 1
for i in range(1,len(seq)):
for j in range(1,max_period):
found =True
for con in range(j+test_numb):
if not (seq[-i-con]==seq[-i-j-con]):
found = False
if found:
minT=j
return minT
Where max_period is the maximun period you want to look for, and test_numb is how many numbers of the sequence you want to test, the bigger the better but you have to make max_period+test_numb < len(sequence)

Finding the interval with the highest summed count?

Given a set of entries, each containing a time index and a int count value,
ie
class Entry
{
time:int
count:int
}
write a function that will give the time interval with the highest count together,
ie,
if we had entries
100, 2
100, 1
110, 10
200, 4
1000, 3
1200, 8
and we ran something like
int highestInterval(int interval_range)
highestInterval( 50 )
it would return 100, because in 100-150, you have counts 2, 1, and 10.
I managed to get a O(n^2) solution for it, but I think theres a better solution. I think it might have to do with some preprocessing of the interval buckets, but I can't figure out the solution.
It seems that you made it already using two for loops so it is a mere question of improvement.
Here goes one possible solution:
CODE:
raw_data=[100,2;
100,1;
110,10;
200,4;
1000,3;
1200,8];
[max_val,indx]=max(cell2mat(arrayfun(#(A) sum(raw_data(abs(raw_data(A,1)-raw_data(:,1))<50,2)),1:size(raw_data,1),'UniformOutput',false)));
raw_data(indx,1)
OUTPUT:
ans =
100

Algorithm for nice graph labels for time/date axis?

I'm looking for a "nice numbers" algorithm for determining the labels on a date/time value axis. I'm familiar with Paul Heckbert's Nice Numbers algorithm.
I have a plot that displays time/date on the X axis and the user can zoom in and look at a smaller time frame. I'm looking for an algorithm that picks nice dates to display on the ticks.
For example:
Looking at a day or so: 1/1 12:00, 1/1 4:00, 1/1 8:00...
Looking at a week: 1/1, 1/2, 1/3...
Looking at a month: 1/09, 2/09, 3/09...
The nice label ticks don't need to correspond to the first visible point, but close to it.
Is anybody familiar with such an algorithm?
The 'nice numbers' article you linked to mentioned that
the nicest numbers in decimal are 1, 2, 5 and all power-of-10 multiples of these numbers
So I think for doing something similar with date/time you need to start by similarly breaking down the component pieces. So take the nice factors of each type of interval:
If you're showing seconds or minutes use 1, 2, 3, 5, 10, 15, 30
(I skipped 6, 12, 15, 20 because they don't "feel" right).
If you're showing hours use 1, 2, 3, 4, 6, 8, 12
for days use 1, 2, 7
for weeks use 1, 2, 4 (13 and 26 fit the model but seem too odd to me)
for months use 1, 2, 3, 4, 6
for years use 1, 2, 5 and power-of-10 multiples
Now obviously this starts to break down as you get into larger amounts. Certainly you don't want to do show 5 weeks worth of minutes, even in "pretty" intervals of 30 minutes or something. On the other hand, when you only have 48 hours worth, you don't want to show 1 day intervals. The trick as you have already pointed out is finding decent transition points.
Just on a hunch, I would say a reasonable crossover point would be about twice as much as the next interval. That would give you the following (min and max number of intervals shown afterwards)
use seconds if you have less than 2 minutes worth (1-120)
use minutes if you have less than 2 hours worth (2-120)
use hours if you have less than 2 days worth (2-48)
use days if you have less than 2 weeks worth (2-14)
use weeks if you have less than 2 months worth (2-8/9)
use months if you have less than 2 years worth (2-24)
otherwise use years (although you could continue with decades, centuries, etc if your ranges can be that long)
Unfortunately, our inconsistent time intervals mean that you end up with some cases that can have over 1 hundred intervals while others have at most 8 or 9. So you'll want to pick the size of your intervals such than you don't have more than 10-15 intervals at most (or less than 5 for that matter). Also, you could break from a strict definition of 2 times the next biggest interval if you think its easy to keep track of. For instance, you could use hours up to 3 days (72 hours) and weeks up to 4 months. A little trial and error might be necessary.
So to go back over, choose the interval type based on the size of your range, then choose the interval size by picking one of the "nice" numbers that will leave you with between 5 and about 15 tick marks. Or if you know and/or can control the actual number of pixels between tick marks you could put upper and lower bounds on how many pixels are acceptable between ticks (if they are spaced too far apart the graph may be hard to read, but if there are too many ticks the graph will be cluttered and your labels may overlap).
Have a look at
http://tools.netsa.cert.org/netsa-python/doc/index.html
It has a nice.py ( python/netsa/data/nice.py ) which i think is stand-alone, and should work fine.
Still no answer to this question... I'll throw my first idea in then! I assume you have the range of the visible axis.
This is probably how I would do.
Rough pseudo:
// quantify range
rangeLength = endOfVisiblePart - startOfVisiblePart;
// qualify range resolution
if (range < "1.5 day") {
resolution = "day"; // it can be a number, e.g.: ..., 3 for day, 4 for week, ...
} else if (range < "9 days") {
resolution = "week";
} else if (range < "35 days") {
resolution = "month";
} // you can expand this in both ways to get from nanoseconds to geological eras if you wish
After that, it should (depending on what you have easy access to) be quite easy to determine the value to each nice label tick. Depending on the 'resolution', you format it differently. E.g.: MM/DD for "week", MM:SS for "minute", etc., just like you said.
[Edit - I expanded this a little more at http://www.acooke.org/cute/AutoScalin0.html ]
A naive extension of the "nice numbers" algorithm seems to work for base 12 and 60, which gives good intervals for hours and minutes. This is code I just hacked together:
LIM10 = (10, [(1.5, 1), (3, 2), (7, 5)], [1, 2, 5])
LIM12 = (12, [(1.5, 1), (3, 2), (8, 6)], [1, 2, 6])
LIM60 = (60, [(1.5, 1), (20, 15), (40, 30)], [1, 15, 40])
def heckbert_d(lo, hi, ntick=5, limits=None):
'''
Heckbert's "nice numbers" algorithm for graph ranges, from "Graphics Gems".
'''
if limits is None:
limits = LIM10
(base, rfs, fs) = limits
def nicenum(x, round):
step = base ** floor(log(x)/log(base))
f = float(x) / step
nf = base
if round:
for (a, b) in rfs:
if f < a:
nf = b
break
else:
for a in fs:
if f <= a:
nf = a
break
return nf * step
delta = nicenum(hi-lo, False)
return nicenum(delta / (ntick-1), True)
def heckbert(lo, hi, ntick=5, limits=None):
'''
Heckbert's "nice numbers" algorithm for graph ranges, from "Graphics Gems".
'''
def _heckbert():
d = heckbert_d(lo, hi, ntick=ntick, limits=limits)
graphlo = floor(lo / d) * d
graphhi = ceil(hi / d) * d
fmt = '%' + '.%df' % max(-floor(log10(d)), 0)
value = graphlo
while value < graphhi + 0.5*d:
yield fmt % value
value += d
return list(_heckbert())
So, for example, if you want to display seconds from 0 to 60,
>>> heckbert(0, 60, limits=LIM60)
['0', '15', '30', '45', '60']
or hours from 0 to 5:
>>> heckbert(0, 5, limits=LIM12)
['0', '2', '4', '6']
I'd suggest you grab the source code to gnuplot or RRDTool (or even Flot) and examine how they approach this problem. The general case is likely to be N labels applied based on width of your plot, which some kind of 'snapping' to the nearest 'nice' number.
Every time I've written such an algorithm (too many times really), I've used a table of 'preferences'... ie: based on the time range on the plot, decide if I'm using Weeks, Days, Hours, Minutes etc as the main axis point. I usually included some preferred formatting, as I rarely want to see the date for each minute I plot on the graph.
I'd be happy but surprised to find someone using a formula (like Heckbert does) to find 'nice', as the variation in time units between minutes, hours, days, and weeks are not that linear.
In theory you can also change your concept. Where it is not your data at the center of the visualization, but at the center you have your scale.
When you know the start and the end of the dates of your data, you can create a scale with all dates and dispatch you data in this scale. Like a fixed scales.
You can have a scale of type year, month, day, hours, ... and limit the scaling just to these scales, implying you remove the concept of free scaling.
The advantage is to can easily show dates gaps. But if you have a lot of gaps, that can become also useless.

Resources