What data structure can we use to efficiently check for resource availability? - data-structures

This question is asked on behalf of reddit user /u/Dasharg95.
I want to build a hotel room reservation system where each hotel room can be booked for an arbitrary set of time frames. A common query against the reservation data set is trying to figure out what rooms are available for a given time frame. Is there a data structure for the reservation data set that allows this kind of query to be performed efficiently?
For example, say, we have five rooms with the following occupation times:
room 1: 9:00 -- 12:00, 15:00 -- 18:00, 19:30 -- 20:00
room 2: 8:00 -- 9:30, 15:30 -- 17:30, 18:00 -- 20:00
room 3: 6:30 -- 7:00, 7:30 -- 8:15
room 4: 12:00 -- 20:00,
room 5: 7:00 -- 14:15, 18:00 -- 21:55
I want a data structure for the occupation times that is reasonably space efficient and allows for the following queries to be performed with reasonable performance:
what times a given room is occupied for
what rooms are free for the entirety of a given time frame

The 2D array system can still be useful without a heavy resource usage. The room number could be equivalent to the index- for example, index i is the room number:
String [] = {"taken","not","taken","not","taken"}
An index is the position of an element
The second element, "not", is the index of 1, as the first element (item) is the index of zero. To get the room number, add the index with 1, as if a hotel had just one room, it would be "Room 1" not "Room 0". So the index + 1 holds the number.
If you assign the times with equal size (xxxx.yyyy, with xxxx being the open time and yyyy being the close), then you can cut half of the element by using a substring to get the first four / last four characters for the time, printing it out by putting a colon in the middle of the xxxx like xx:xx.
It could be stored in a simple 1D array, like so:
String [] rooms = {"0900.1200", "1500.1800", "1930.2000")
...... edit, just realised that those times would be for one room x( ...
So, to assign multiple times for one room you might want to use a formatting system - like:
// * = the next four digits are the opening time
// - = the next four digits are the closing time
So you could hold multiple times in one element, like: `{"*0800-0930*1530-1730*1800*2000", ....}
Its extremely complicated, but this only uses one array, and the computer could use a while loop to check if there are more times after the closing time -> if there are none, move to the next element / set of times, and room number / index.
Once you cycle through all elements, the room check is finished.

Just imagine if you like to have it in 15min intervall then u would have 24×4 = 92 different intervalls with the first from 0:00 to 0:15. Put this in binary with some added information to check if you selected the right room u could use 100 bits. Now you create functions to create the bitstring and to decrypt the string an store the strings in an array. Done.

Related

Find a list of meetup time slots when a group of friends can meet? (data structure/algo challenge)

A group of friends decided to meetup on a chosen date. Everyone in the group provided his/her schedule on that date.
Finally you are given a joined schedule of all members, which lists all the time slots when at least one member is unavailable.
The schedule is a list of lists. Each element in the list is a pair of strings
[[startTime1, endTime1],[startTime2, endTime2],[startTime3, endTime3]]
Start time and end time follows the format HH:MM, where the first 2 digits denote hour and the last 2 digits denote minute, delimited by :.
Your job is to find all the potential time slots for the meetup. Find all time slots that
between 7:00 and 18:00
when all group members are available
Example 1:
Input: schedule = [["16:00", "16:30"], ["6:00", "7:30"], ["8:00", "9:20"], ["8:00", "9:00"], ["17:30", "19:20"]]
Output: [["7:30", "8:00"], ["9:20", "16:00"], ["16:30", "17:30"]]
Example 2:
Input: schedule = [["12:00", "17:30"], ["8:00", "10:00"], ["10:00", "11:30"]]
Output: [["7:00", "8:00"], ["11:30", "12:00"], ["17:30", "18:00"]]
I was thinking about: Sort by start time
Keep track of largest end time
If we got a start time which is greater than the previous largest end time: record the time range
If extra memory is allowed,
Make a pair of time and bool whether the time is start or end.
Sort this array.
Start a running counter with 0
If it is start time, increment the counter, and decrement if it end time
Time periods when this counter is 0 is the free time available

Operations on Two Streams of Data - Design Algorithm

I have seen this algorithm question or variants of it several times but have not found or been able to determine an optimal solution to the problem. The question is:
You are given two queues where each queue contains {timestamp, price}
pair. You have to print "price 1, price 2" pair for all those
timestamps where abs(ts1-ts2) <= 1 second where ts1 and price1 are
from the first queue and ts2 and price2 are from the second queue.
How would you design a system to handle these requirements?
Then a followup on this questions: what if one of the queues is slower than the other (data is delayed). How would you handle this?
You can do this in a similar fashion to the merging algorithm from merge sort, only doubled.
I'm going to describe an algorithm in which I choose queue #1 to be my "main queue." This will only provide a partial solution; I'll explain how to complete it afterwards.
At any time you keep one entry from each queue in memory. Whenever the two entries you have uphold your condition of being less than one second apart, print out their prices. Whether or not you did, you discard the one with the lower time stamp and get the next one. If at any point the time stamp from queue #1 is lower than that from queue #2, discard entries from queue #1 until that is no longer the case. If they both have the same time stamp, print it out and advance the one from queue #1. Repeat until done.
This will print out all the pairs of "price1, price2" whose corresponding ts1 and ts2 uphold that 0 <= ts1 - ts2 <= 1.
Now, for the other half, do the same only this time choose queue #2 as your "main queue" (i.e. do everything I just said with the numbers 1 and 2 reversed) - except don't print out pairs with equal time stamps, since you've already printed those in the first part.
This will print out all the pairs of "price1, price2" whose corresponding ts1 and ts2 uphold that 0 < ts2 - ts1 <= 1, which is like saying 0 > ts1 - ts2 >= -1.
Together you get the printout for all the cases in which -1 <= ts1 - ts2 <= 1, i.e. in which abs(ts1 - ts2) <= 1.
Additionally to the queues use two hashmaps (each exclusive for one queue)
As soon as a new item arrives strip the seconds out and use this as the key of the corresponding hashmap.
Using the very same key, retrieve all the items in the other hashmap.
One by one compare if the actual retrieved items are 1 second away of the item in bullet 2.
Note that this will fail to detect items with a difference in minutes: 10:00:59 and 10:01:00 will not be detected.
To solve this:
for items like XX:XX:59 you will need to hit the hashmap twice using keys XX:XX and XX:XX+1.
for items like XX:XX:00 you will need to hit the hashmap twice using keys XX:XX and XX:XX-1.
Note: do a date addition (not a mathematical one) since it will automatically deal with things like 01:59:59 + 1 = 02:00:00 or Monday 1 23:59:59 becoming Tuesday 2 00:00:00.
BTW, this algorithm also deals with the delay issue.
The speed of the queues does not matter at all if the algorithm is based on the comparison of timestamps alone. If one queue is empty and you cannot proceed just check periodically until you can continue.
You can solve this by managing a list for one of the queues. In the algorithm below the first was chosen, therefore it is called l1. It works like a sliding window.
Dequeue the 2nd queue: d2.
While the timestamp of the head of l1 is smaller than the one of d2 and the difference is greater than 1: remove the head from l1.
Go through the list and print all the pairs l1[i].price, d2.price as long as the difference of the timestamps is smalller than 1. If you don't reach the end of the list, continue with step 1.
Get the next element from the first queue and add it to the list. If the difference between the timestamps is smaller than 1 print the prices and repeat, if not continue with step 1.
here is my solution, you need following services.
Design a service to read the message from Queue1 and push the data to DB
Design another service to read the message from Queue2 and push the data to same DB.
Design another service to read the data from DB and print the result as per the frequency of result needed.
Edit
above specified system is designed ,keepin below point in mind
Scalablity - if load for system increases number of services can be scale up
Slowness as already mention one queu is slow then other , chances are ,first queue recieving more message then second ,hence not able to produce desired out put.
Otput Frequencey in future if requirement changes and instead of 1 sec difference we want to show 1 hour diffference ,then it is also very much possible.
Get the first element from both queues.
Compare the timestamps. If within one second, output the pair.
From the queue that gave the earlier timestamp, get the next element.
Repeat.
EDIT:
After #maraca's comment, I had to rethink my algorithm. And yes, if on both queues there are multiple events within a second, it will not produce all combinations.

extracting either two or one intervals in a tier

I'm new to praat scripting so bear with me: I have a for loop set up and I want to extract data from three tiers. My first two tiers work beautifully but I'm having trouble with the third tier.
So in the third tier, at a given point in the loop, there could either be 1 or 2 elements, (My linguistics researcher is having me write this; I don't have a full understanding of what exactly I'm extracting) and I don't know how to check how many elements there are. Is there a function I can use that allows me to get the number of elements at a given interval? My line of thought at the moment is get the number of elements in the third tier at that point in the loop. If there is only one, get that one, assign it to the correct variable name, and move on. If there are two, grab both.
I can think of two ways to do this, "manually" and by extracting parts of the TextGrid.
Let's imagine (for clarity) that you want to count the number of points that fall within a given interval. There are some differences between this and counting intervals that fall within intervals, but baby steps.
Manually
What I mean by manually is that you can get the index of the "first" point within your interval (the first point after the beginning of the interval), and the index of the "last" point, and then just subtracting (beware of fencepost errors!). If the first is 3 and the last is 8, you know there are 6 points in your interval.
Let's assume we have this:
textgrid = selected("TextGrid")
main_tier = 1 ; The tier with the main interval
sub_tier = 2 ; The tier with the elements you want to count
interval = 3 ; The interval in the main tier
start = Get start point: main_tier, interval
end = Get end point: main_tier, interval
Then we can do this:
first = Get high index from time: sub_tier, start
last = Get low index from time: sub_tier, end
total = last - first + 1
appendInfoLine: "There are ", total, " points within interval ", interval
(Or you could use the "Count points in range..." command in the tgutils CPrAN plugin).
If you were counting intervals, you'd have to change that slightly:
first = Get high interval at time: sub_tier, start
last = Get low interval at time: sub_tier, end
Or, if you wanted to count only those intervals that fall entirely within your main interval
first = Get high interval at time: sub_tier, start
last_edge = Get interval edge from time: sub_tier, end
last = last_edge + 1
Extracting parts
An entirely different approach would be to use the "Extract part..."
command for TextGrids. You can extract the part of the TextGrid that
falls within your time window, and then work with that part only.
Counting the number of intervals in that part would then simply be a
matter of counting the total number of intervals in that new TextGrid.
Of course, this does not check whether your the intervals that are considered to be within are entirely within.
A simple example:
Extract part: start, end, "yes"
# And then you just count the intervals
intervals = Get number of intervals: sub_tier
# or points
points = Get number of points: sub_tier
If you want to do this repeatedly (eg. for each of the intervals in your main tier), the tgutils plugin mentioned above has a script to
"explode" TextGrids. Although the name might be a bit unnerving,
this just separates a TextGrid into interval-sized chunks using the
intervals in a given tier (by calling the same command mentioned above). As an example, if you "explode" a TextGrid
using an interval tier with 5 intervals, you'd get as a result 5
smaller TextGrids, corresponding to each of the original intervals.
The script can preserve the time stamps of the resulting TextGrids, to
make it easier to refer back to the original. And if run with a
TextGrid and a Sound selected, it will "explode" the Sound as well, so
you can work on the combination of both objects as well.
(Full disclosure: I wrote that plugin).

Algorithm to find middle of largest free time slot in period?

Say I want to schedule a collection of events in the period 00:00–00:59. I schedule them on full minutes (00:01, never 00:01:30).
I want to space them out as far apart as possible within that period, but I don't know in advance how many events I will have total within that hour. I may schedule one event today, then two more tomorrow.
I have the obvious algorithm in my head, and I can think of brute-force ways to implement it, but I'm sure someone knows a nicer way. I'd prefer Ruby or something I can translate to Ruby, but I'll take what I can get.
So the algorithm I can think of in my head:
Event 1 just ends up at 00:00.
Event 2 ends up at 00:30 because that time is the furthest from existing events.
Event 3 could end up at either 00:15 or 00:45. So perhaps I just pick the first one, 00:15.
Event 4 then ends up in 00:45.
Event 5 ends up somewhere around 00:08 (rounded up from 00:07:30).
And so on.
So we could look at each pair of taken minutes (say, 00:00–00:15, 00:15–00:30, 00:30–00:00), pick the largest range (00:30–00:00), divide it by two and round.
But I'm sure it can be done much nicer. Do share!
You can use bit reversing to schedule your events. Just take the binary representation of your event's sequential number, reverse its bits, then scale the result to given range (0..59 minutes).
An alternative is to generate the bit-reversed words in order (0000,1000,0100,1100,...).
This allows to distribute up to 32 events easily. If more events are needed, after scaling the result you should check if the resulting minute is already occupied, and if so, generate and scale next word.
Here is the example in Ruby:
class Scheduler
def initialize
#word = 0
end
def next_slot
bit = 32
while (((#word ^= bit) & bit) == 0) do
bit >>= 1;
end
end
def schedule
(#word * 60) / 64
end
end
scheduler = Scheduler.new
20.times do
p scheduler.schedule
scheduler.next_slot
end
Method of generating bit-reversed words in order is borrowed from "Matters Computational
", chapter 1.14.3.
Update:
Due to scaling from 0..63 to 0..59 this algorithm tends to make smallest slots just after 0, 15, 30, and 45. The problem is: it always starts filling intervals from these (smallest) slots, while it is more natural to start filling from largest slots. Algorithm is not perfect because of this. Additional problem is the need to check for "already occupied minute".
Fortunately, a small fix removes all these problems. Just change
while (((#word ^= bit) & bit) == 0) do
to
while (((#word ^= bit) & bit) != 0) do
and initialize #word with 63 (or keep initializing it with 0, but do one iteration to get the first event). This fix decrements the reversed word from 63 to zero, it always distributes events to largest possible slots, and allows no "conflicting" events for the first 60 iteration.
Other algorithm
The previous approach is simple, but it only guarantees that (at any moment) the largest empty slots are no more than twice as large as the smallest slots. Since you want to space events as far apart as possible, algorithm, based on Fibonacci numbers or on Golden ratio, may be preferred:
Place initial interval (0..59) to the priority queue (max-heap, priority = interval size).
To schedule an event, pop the priority queue, split the resulting interval in golden proportion (1.618), use split point as the time for this event, and put two resulting intervals back to the priority queue.
This guarantees that the largest empty slots are no more than (approximately) 1.618 times as large as the smallest slots. For smaller slots approximation worsens and sizes are related as 2:1.
If it is not convenient to keep the priority queue between schedule changes, you can prepare an array of 60 possible events in advance, and extract next value from this array every time you need a new event.
Since you can have only 60 events at maximum to schedule, then I suppose using static table is worth a shot (compared to thinking algorithm and testing it). I mean for you it is quite trivial task to layout events within time. But it is not so easy to tell computer how to do it nice way.
So, what I propose is to define table with static values of time at which to put next event. It could be something like:
00:00, 01:00, 00:30, 00:15, 00:45...
Since you can't reschedule events and you don't know in advance how many events will arrive, I suspect your own proposal (with Roman's note of using 01:00) is the best.
However, if you have any sort of estimation on how many events will arrive at maximum, you can probably optimize it. For example, suppose you are estimating at most 7 events, you can prepare slots of 60 / (n - 1) = 10 minutes and schedule the events like this:
00:00
01:00
00:30
00:10
00:40
00:20
00:50 // 10 minutes apart
Note that the last few events might not arrive and so 00:50 has a low probability to be used.
which would be fairer then the non-estimation based algorithm, especially in the worst-case scenario were all slots are used:
00:00
01:00
00:30
00:15
00:45
00:07
00:37 // Only 7 minutes apart
I wrote a Ruby implementation of my solution. It has the edge case that any events beyond 60 will all stack up at minute 0, because every free space of time is now the same size, and it prefers the first one.
I didn't specify how to handle events beyond 60, and I don't really care, but I suppose randomization or round-robin could solve that edge case if you do care.
each_cons(2) gets bigrams; the rest is probably straightforward:
class Scheduler
def initialize
#scheduled_minutes = []
end
def next_slot
if #scheduled_minutes.empty?
slot = 0
else
circle = #scheduled_minutes + [#scheduled_minutes.first + 60]
slot = 0
largest_known_distance = 0
circle.each_cons(2) do |(from, unto)|
distance = (from - unto).abs
if distance > largest_known_distance
largest_known_distance = distance
slot = (from + distance/2) % 60
end
end
end
#scheduled_minutes << slot
#scheduled_minutes.sort!
slot
end
def schedule
#scheduled_minutes
end
end
scheduler = Scheduler.new
20.times do
scheduler.next_slot
p scheduler.schedule
end

Simple Popularity Algorithm

Summary
As Ted Jaspers wisely pointed out, the methodology I described in the original proposal back in 2012 is actually a special case of an exponential moving average. The beauty of this approach is that it can be calculated recursively, meaning you only need to store a single popularity value with each object and then you can recursively adjust this value when an event occurs. There's no need to record every event.
This single popularity value represents all past events (within the limits of the data type being used), but older events begin to matter exponentially less as new events are factored in. This algorithm will adapt to different time scales and will respond to varying traffic volumes. Each time an event occurs, the new popularity value can be calculated using the following formula:
(a * t) + ((1 - a) * p)
a — coefficient between 0 and 1 (higher values discount older events faster)
t — current timestamp
p — current popularity value (e.g. stored in a database)
Reasonable values for a will depend on your application. A good starting place is a=2/(N+1), where N is the number of events that should significantly affect the outcome. For example, on a low-traffic website where the event is a page view, you might expect hundreds of page views over a period of a few days. Choosing N=100 (a≈0.02) would be a reasonable choice. For a high-traffic website, you might expect millions of page views over a period of a few days, in which case N=1000000 (a≈0.000002) would be more reasonable. The value for a will likely need to be gradually adjusted over time.
To illustrate how simple this popularity algorithm is, here's an example of how it can be implemented in Craft CMS in 2 lines of Twig markup:
{% set popularity = (0.02 * date().timestamp) + (0.98 * entry.popularity) %}
{% do entry.setFieldValue("popularity", popularity) %}
Notice that there's no need to create new database tables or store endless event records in order to calculate popularity.
One caveat to keep in mind is that exponential moving averages have a spin-up interval, so it takes a few recursions before the value can be considered accurate. This means the initial condition is important. For example, if the popularity of a new item is initialized using the current timestamp, the item immediately becomes the most popular item in the entire set before eventually settling down into a more accurate position. This might be desirable if you want to promote new content. Alternatively, you may want content to work its way up from the bottom, in which case you could initialize it with the timestamp of when the application was first launched. You could also find a happy medium by initializing the value with an average of all popularity values in the database, so it starts out right in the middle.
Original Proposal
There are plenty of suggested algorithms for calculating popularity based on an item's age and the number of votes, clicks, or purchases an item receives. However, the more robust methods I've seen often require overly complex calculations and multiple stored values which clutter the database. I've been contemplating an extremely simple algorithm that doesn't require storing any variables (other than the popularity value itself) and requires only one simple calculation. It's ridiculously simple:
p = (p + t) / 2
Here, p is the popularity value stored in the database and t is the current timestamp. When an item is first created, p must be initialized. There are two possible initialization methods:
Initialize p with the current timestamp t
Initialize p with the average of all p values in the database
Note that initialization method (1) gives recently added items a clear advantage over historical items, thus adding an element of relevance. On the other hand, initialization method (2) treats new items as equals when compared to historical items.
Let's say you use initialization method (1) and initialize p with the current timestamp. When the item receives its first vote, p becomes the average of the creation time and the vote time. Thus, the popularity value p still represents a valid timestamp (assuming you round to the nearest integer), but the actual time it represents is abstracted.
With this method, only one simple calculation is required and only one value needs to be stored in the database (p). This method also prevents runaway values, since a given item's popularity can never exceed the current time.
An example of the algorithm at work over a period of 1 day: http://jsfiddle.net/q2UCn/
An example of the algorithm at work over a period of 1 year: http://jsfiddle.net/tWU9y/
If you expect votes to steadily stream in at sub-second intervals, then you will need to use a microsecond timestamp, such as the PHP microtime() function. Otherwise, a standard UNIX timestamp will work, such as the PHP time() function.
Now for my question: do you see any major flaws with this approach?
I think this is a very good approach, given its simplicity. A very interesting result.
I made a quick set of calculations and found that this algorithm does seem to understand what "popularity" means. Its problem is that it has a clear tendency to favor recent votes like this:
Imagine we take the time and break it into discrete timestamp values ranging from 100 to 1000. Assume that at t=100 both A and B (two items) have the same P = 100.
A gets voted 7 times on 200, 300, 400, 500, 600, 700 and 800
resulting on a final Pa(800) = 700 (aprox).
B gets voted 4 times on 300, 500, 700 and 900
resulting on a final Pb(900) = 712 (aprox).
When t=1000 comes, both A and B receive votes, so:
Pa(1000) = 850 with 8 votes
Pb(1000) = 856 with 5 votes
Why? because the algorithm allows an item to quickly beat historical leaders if it receives more recent votes (even if the item has fewer votes in total).
EDIT INCLUDING SIMULATION
The OP created a nice fiddle that I changed to get the following results:
http://jsfiddle.net/wBV2c/6/
Item A receives one vote each day from 1970 till 2012 (15339 votes)
Item B receives one vote each month from Jan to Jul 2012 (7 votes)
The result: B is more popular than A.
The proposed algorithm is a good approach, and is a special case of an Exponential Moving Average where alpha=0.5:
p = alpha*p + (1-alpha)*t = 0.5*p + 0.5*t = (p+t)/2 //(for alpha = 0.5)
A way to tweak the fact that the proposed solution for alpha=0.5 tends to favor recent votes (as noted by daniloquio) is to choose higher values for alpha (e.g. 0.9 or 0.99). Note that applying this to the testcase proposed by daniloquio is not working however, because when alpha increases the algorithm needs more 'time' to settle (so the arrays should be longer, which is often true in real applications).
Thus:
for alpha=0.9 the algorithm averages approximately the last 10 values
for alpha=0.99 the algorithm averages approximately the last 100 values
for alpha=0.999 the algorithm averages approximately the last 1000 values
etc.
I see one problem, only the last ~24 votes count.
p_i+1 = (p + t) / 2
For two votes we have
p2 = (p1 + t2) / 2 = ((p0 + t1) /2 + t2 ) / 2 = p0/4 + t1/4 + t2/2
Expanding that for 32 votes gives:
p32 = t*2^-32 + t0*2^-32 + t1*2^-31 + t2*2^-30 + ... + t31*2^-1
So for signed 32 bit values, t0 has no effect on the result. Because t0 gets divided by 2^32, it will contribute nothing to p32.
If we have two items A and B (no matter how big the differences are) if they both get the same 32 votes, they will have the same popularity. So you're history goes back for only 32 votes. There is no difference in 2032 and 32 votes, if the last 32 votes are the same.
If the difference is less than a day, they will be equal after 17 votes.
The flaw is that something with 100 votes is usually more meaningful than something with only one recent vote. However it isn't hard to come up with variants of your scheme that work reasonably well.
I don't think that the above-discussed logic is going to work.
p_i+1= (p_i + t) /2
Article A gets viewed on timestamps: 70, 80, 90 popularity(Article A): 82.5
Article B gets viewed on timestamps: 50, 60, 70, 80, 90 popularity(Article B): 80.625
In this case, the popularity of Article B should have been more. Firstly Article B was viewed as recently as Article A and secondly, it was also viewed more times than Article A.

Resources