PromQL usage of hour() in a range over midnight - metrics

I'm trying to specifically return metrics between a certain timeslot. I'm trying to put an alert on a specific metric with different watermarks depending on being at night or during the day.
When I query the daytime metric it works fine: sum(increase(some_counter_total[5m])) and hour() >4 <20
This of course works as the bigger than 4 but smaller than 23 is a valid range.
However, this becomes more interesting when one wants to do this during the night time, for the reverse timeslot: sum(increase(some_counter_total[5m])) and hour() >20 <4
The first condition bigger then 20 already eliminates the following smaller then 4. The same is true when reversing the order.
I have tried some different combinations of using or, but I must be doing it wrong as I never got the querie to work properly. Ideally, I think I would be searching for a way to do a reverse query sum(increase(some_counter_total[5m])) and WHEN NOT hour() >4 <20.

Mind that Prometheus operates on data vectors. Try negation (and vs. unless) to exclude the hours you don't want.
Day time:
sum(increase(some_counter_total[5m])) and hour() >=9 <21
Night time:
sum(increase(some_counter_total[5m])) unless hour() >=9 <21
This is probably not the most efficient solution, but I couldn't find another way such as your straightforward suggestion or something like WHERE value IN (4, 5, 6, <...night time hours...>)).

Related

Summing times in Google sheets

I have a sheet where I record my working hours (this is more for me to remind me to stop working than anything else). For every day, I have three possible shifts - early, normal & late, and I have a formula which will sum up any times put into these columns and give me the daily total hours.
To summarise the duration of time spent working in a day, I use the following formula: =(C41-B41)+(E41-D41)+12+(G41-F41) which is:
early end time minus early start time
normal end time minus normal start time PLUS 12 hours
late end time minus late start time
Which gives me output like this:
What I cannot seem to achieve is, the ability to sum the daily totals into something which shows me the total hours worked over 1-week. If I attempt to sum the daily totals together for the example image shown, I get some wild figure such as 1487:25:00 when formatting as 'Duration' or 23:25:00 when formatted as 'Time'!
All my cells where I record the hours worked are formatted as 'Time'
When using arithmetic operations on date values in Google Sheets, it's important to remember that the internal representation of a date is numeric, and understood as the number of days since January 1, 1970.
What follows from that, is that if you want to add 12 hours to a time duration, you should not write "+12" because that will in fact add 12 days. Instead add "+12/24". In other words, try the following formula instead of the one you are using now:
=(C41-B41)+(E41-D41)+(12/24+G41-F41)

How do I calculate how "true" something STILL is? Weighted votes with the time they happened?

Say I have a thing, and you can upvote/downvote it whether it's true. As time goes by, this thing changes, so I want votes that happened longer ago to matter less (at least, that's what I think should happen). Is there a formula for something like this? Am I missing something?
an idea:
vote# is +1 if upvote, -1 if downvote
score = (vote1/time since vote1) + (vote2/time since vote2) + (vote3/time since vote3)
That way score will increase with more votes and each individual vote starts decaying. High score would mean a lot of people upvoted that, and recently.
I believe the most common approach is to use an exponentially-weighted moving average: https://en.wikipedia.org/wiki/Moving_average#Exponential_moving_average.
The nice thing with exponential weighting is that you only need to keep track of your current average, and then you can easily update it for each new incoming value, rather than needing to keep a full history of all past values in order to compute an updated average.
(Or at least, that's the case for weighting based on "number of data-points since this one". For weighting based on "amount of time since this data-point", where the data-points aren't at fixed intervals, it's slightly more complicated, because you'll also need to keep the timestamp of the last data-point; but you still don't need to keep the full history, so it's still very simple and efficient.)

Is there a complex date filter algorithm?

Essentially, I want a system that can filter simply such as "Between August 4th and August 7th", but be as complicated as "Every third saturday or monday of each january on leap years".
I figured that in order to represent the complicated boolean algebra, I would need a tree structure. Each node would either be a boolean operation (AND, OR, XOR, NOT) and then would have children that it apply to, which can either be specific filters or another boolean operation.
Each "specific filter" would be something like "Sundays" or "Leap Years". I think everything up to this point is very doable. However, the problem then arises in parsing the tree to actually find what dates are needed, in order to then make database queries to get the data points.
With the example above (Every third saturday or monday of each january on leap years), if we pre-restrict ourselves to the years that we have data (5 years worth). If the sat/mon filters happen to be the top nodes in the tree, we will end up with 500 segmented dates (2 per week, 50 weeks a year, 5 years). Then, the next node has to search through all 500 to find which ones conform to "every third" filter. This isn't even the most complicated example, because an arbitrary number of filters should be allowed, and XOR makes that even more crazy.
So, is there any easy route? Did someone already build this? This is just a small part a project involving data visualization, but it seems that it could be an entire project by itself.
I found a couple in Ruby. IceCube seems promising, even though it might not support all your needs.

How to handle recurring times?

First off, I marked this question as language agnostic, but I'm using PHP and MySQL. It shouldn't affect the question itself very much tho.
I'm creating an application which shows times of certain shows throughout the week. Every single show is recurring (on weekly basis) and there might be shows which are airing through 2 days - eg. starting on Sunday at 23:30, ending on Monday at 00:30. I'm storing start of the show (day of the week - Monday, Tuesday... - it's never exact date; time) and duration. There are never shows that would take more than 24 hours.
My problem is with validation if newly added shows aren't overlapping some old ones. Especially if it comes to Sunday-Monday shows.
How are such recurring events usually handled on both DB side and server side?
tl;dr version with stuff I considered
My first idea was to create some custom validation algorithm, but it seemed too cumbersome and complicated. Not that I'd whine about complicated hand-made solutions, but I'm interested if there isn't something more basic that I'm missing.
Other alternative that came to mind was to change table structure to use datetime (instead of "day of week" and "time"), and use a fake fixed date range to store the data. For example all Mondays would be set to 5th Jan 1970, Sundays would use 11th Jan 1970. There would be one exception to this rule - if there would be some show which starts on Sunday and ends on Monday, it would be stored as 12th Jan 1970. This solution would allow more flexible quering of the DB than the original one, and it would also simplify queries for shows which overlap between individual weeks (since we can do the comparison directly in the query). There are some disadvantages to this solution as well (for one, using fake dates might make it confusing).
Both solutions smell of wrong algorithms to me and would love to hear some opinions from more experienced fellow developers.
Sounds like you could just store the starting minute of each show as an integer number of minutes since the start of the week (10,080 possible values).
Then a show starting at minute $a with duration $dur_a will overlap $b if and only if
(10080 + $b - $a) % 10080 < $dur_a
For example consider a show starting at 11pm Sunday and another starting at 12.30am Monday. Here $a == 10020 and $dur_a == 120 and $b == 30. (10080 + $b - $a) % 10080 == 90. This is less than $dur_a and hence the shows overlap.
This problem could be simplified by converting the data into a format that is amenable to the calculations that are required. I recommend creating a type that represents the start times as the number of minutes from Sunday at midnight. Then simple integer range comparisons could be used to find overlapping shows.
The internal representation must, of course, be hidden and abstracted. You may, at some point, want to change the representation from minutes to seconds, for example.
I would opt for a custom validation algorithm:
For each show, compute all showing intervals [start1, end1], [start2, end2], ... [startN, endN], where N is the number of recurrence of the show.
For a new show, also compute these intervals.
Now check if any of these new intervals intersect any old intervals. This is the case if the start or the end of one interval is contained in the other.

Algorithm for most recently/often contacts for auto-complete?

We have an auto-complete list that's populated when an you send an email to someone, which is all well and good until the list gets really big you need to type more and more of an address to get to the one you want, which goes against the purpose of auto-complete
I was thinking that some logic should be added so that the auto-complete results should be sorted by some function of most recently contacted or most often contacted rather than just alphabetical order.
What I want to know is if there's any known good algorithms for this kind of search, or if anyone has any suggestions.
I was thinking just a point system thing, with something like same day is 5 points, last three days is 4 points, last week is 3 points, last month is 2 points and last 6 months is 1 point. Then for most often, 25+ is 5 points, 15+ is 4, 10+ is 3, 5+ is 2, 2+ is 1. No real logic other than those numbers "feel" about right.
Other than just arbitrarily picked numbers does anyone have any input? Other numbers also welcome if you can give a reason why you think they're better than mine
Edit: This would be primarily in a business environment where recentness (yay for making up words) is often just as important as frequency. Also, past a certain point there really isn't much difference between say someone you talked to 80 times vs say 30 times.
Take a look at Self organizing lists.
A quick and dirty look:
Move to Front Heuristic:
A linked list, Such that whenever a node is selected, it is moved to the front of the list.
Frequency Heuristic:
A linked list, such that whenever a node is selected, its frequency count is incremented, and then the node is bubbled towards the front of the list, so that the most frequently accessed is at the head of the list.
It looks like the move to front implementation would best suit your needs.
EDIT: When an address is selected, add one to its frequency, and move to the front of the group of nodes with the same weight (or (weight div x) for courser groupings). I see aging as a real problem with your proposed implementation, in that it requires calculating a weight on each and every item. A self organizing list is a good way to go, but the algorithm needs a bit of tweaking to do what you want.
Further Edit:
Aging refers to the fact that weights decrease over time, which means you need to know each and every time an address was used. Which means, that you have to have the entire email history available to you when you construct your list.
The issue is that we want to perform calculations (other than search) on a node only when it is actually accessed -- This gives us our statistical good performance.
This kind of thing seems similar to what is done by firefox when hinting what is the site you are typing for.
Unfortunately I don't know exactly how firefox does it, point system seems good as well, maybe you'll need to balance your points :)
I'd go for something similar to:
NoM = Number of Mail
(NoM sent to X today) + 1/2 * (NoM sent to X during the last week)/7 + 1/3 * (NoM sent to X during the last month)/30
Contacts you did not write during the last month (it could be changed) will have 0 points. You could start sorting them for NoM sent in total (since it is on the contact list :). These will be showed after contacts with points > 0
It's just an idea, anyway it is to give different importance to the most and just mailed contacts.
If you want to get crazy, mark the most 'active' emails in one of several ways:
Last access
Frequency of use
Contacts with pending sales
Direct bosses
Etc
Then, present the active emails at the top of the list. Pay attention to which "group" your user uses most. Switch to that sorting strategy exclusively after enough data is collected.
It's a lot of work but kind of fun...
Maybe count the number of emails sent to each address. Then:
ORDER BY EmailCount DESC, LastName, FirstName
That way, your most-often-used addresses come first, even if they haven't been used in a few days.
I like the idea of a point-based system, with points for recent use, frequency of use, and potentially other factors (prefer contacts in the local domain?).
I've worked on a few systems like this, and neither "most recently used" nor "most commonly used" work very well. The "most recent" can be a real pain if you accidentally mis-type something once. Alternatively, "most used" doesn't evolve much over time, if you had a lot of contact with somebody last year, but now your job has changed, for example.
Once you have the set of measurements you want to use, you could create an interactive apoplication to test out different weights, and see which ones give you the best results for some sample data.
This paper describes a single-parameter family of cache eviction policies that includes least recently used and least frequently used policies as special cases.
The parameter, lambda, ranges from 0 to 1. When lambda is 0 it performs exactly like an LFU cache, when lambda is 1 it performs exactly like an LRU cache. In between 0 and 1 it combines both recency and frequency information in a natural way.
In spite of an answer having been chosen, I want to submit my approach for consideration, and feedback.
I would account for frequency by incrementing a counter each use, but by some larger-than-one value, like 10 (To add precision to the second point).
I would account for recency by multiplying all counters at regular intervals (say, 24 hours) by some diminisher (say, 0.9).
Each use:
UPDATE `addresslist` SET `favor` = `favor` + 10 WHERE `address` = 'foo#bar.com'
Each interval:
UPDATE `addresslist` SET `favor` = FLOOR(`favor` * 0.9)
In this way I collapse both frequency and recency to one field, avoid the need for keeping a detailed history to derive {last day, last week, last month} and keep the math (mostly) integer.
The increment and diminisher would have to be adjusted to preference, of course.

Resources