Predicting opening and closing of an organisation - algorithm

Given Training data of an organisation meter reading recorded at an interval of 15 minutes each day .Like for some N days we will be provided with data.
And Now with help of this data we need to tell that on a particular day an organisation is closed or open . I need to know if any link can help in this matter if someone has worked into this field.
By closed I mean that on that day consumption of electricity will obviously be almost constant,Though this is just a single feature to take into account.
So how to predict this in best way ?

Maybe the use of threshold not suit well for different kind of organisation, but for sure there are a minimum consumption.
Maybe you would like to use the information of past 15 minutes to increase your reliability.
Adding one more input to your classifier.

Related

Algorithm for Scheduling One Appointment in Already Full Schedule

I'm building a calendar scheduling application for, let's say a plumbing company. The company has one or more plumbers, who each have a schedule of appointments at different times throughout the day. So Josh's schedule on May 30th might include a 30-minute appointment at 10 AM, a 45-minute appointment at 1 PM, and an hour-long appointment at 3 PM, while Maria has a completely different schedule that day. Now say a customer wants to book an appointment with this company, and my program has already calculated the time this new appointment will take. I want my program to return a list of possible appointment times for any plumber(s). Is there a standard algorithm for this type of problem?
I'd prefer language-agnostic, general steps just to be more helpful to anyone who might be in a similar situation with a different language, though I'm using PHP and PostgreSQL if there's a specific language feature suited to this.
Here's what I've tried so far:
Get all available shifts for every plumber on the requested day
Get all appointments already made on that day
Do a sort of boolean subtraction to cut the appointments out of the shifts, leaving gaps in each plumber's schedule
Get rid of all schedule gaps that are smaller than the requested appointment length (I also calculate drive times here so I know how far appointments need to be from one another)
Return those gaps to the customer, trimmed to the appointment length, as appointment possibilities
I've learned that the problem with this approach is that it doesn't understand what to do with a gap much larger than the requested appointment. If you have a gap from 10 AM to 6 PM but you want an hour-long appointment, it will only suggest 10 AM to 11 AM. This approach doesn't allow for time-of-day choices, either. In that same scenario, what if the customer wants a morning appointment? Then it should only suggest 10-11 and 11-12. Or if they want an evening appointment, it should only suggest 5-6 PM. This approach also doesn't consider two plumbers working together. If we assume that two workers = half the time, then maybe the algorithm should look for the same 30 minutes available in both Josh and Maria's schedules along with 60-minute gaps in either plumber's schedule. Lastly, this algorithm feels very inefficient.
By the way, I've looked at several other questions here and around the Internet about how to solve similar situations, but I'm finding that most (if not all) of those questions involve optimizing a schedule. That might be valuable for other parts of this program, but for now, let's assume that the existing appointments are fixed and unchangeable. We're just looking to fit a new appointment into an existing schedule. I know this is possible because applications like Calendly have similar inputs and outputs.
In short, is there a better way of meeting these goals:
Suggest available gaps in one plumber's schedule given a time interval
If possible, only return appointment possibilities in the given time of day (morning = 4-12, afternoon = 12-5, evening = 5-10, night = 10-4, or any), and if not possible, continue with the algorithm as if no time of day had been specified
Suggest smaller gaps where n plumbers might do the job in 1/n time (there aren't that many plumbers, so setting a limit on this isn't necessary). This isn't as important as the other criteria, so if this isn't possible or would make the algorithm far more complex, then don't worry about it.
Split big appointment gaps into smaller gaps so we can suggest 4 hour-long gaps in between 10 AM and 2 PM. Obviously we can't suggest all possible hour-long segments of that gap because they'd be infinite
Thank you in advance.
There is no need for any sophisticated algorithm. There is only a small number of possible appointment times throughout a day, let's say every 30 minutes or so. Iterate over all possible times: 06:00, 06:30, 07:00, ... 20:00. Check each time if it matches the requirements, that check can either return a yes/no result, or a number that say how good a match that time is. You end up with a list of possible appointment times, pick the best one or all of them.

Designing an algorithm for detecting anamoly and statistical significance for ordinal data using python

Firstly, I would like to apologise for the detailed problem statement. Being a novice, I couldn't express it in any lesser words.
Environment Setup Details:
To give some background, I work in a cloud company where we have multiple servers geographically located in all continents. So, we have hierarchy like this:
Several partitions
Each partition has 7 pop's
Each pop has multiple nodes all set up with redundancy.
Turn servers connecting traffic to each node depending on the client location
Actual clients-ios, android, mac, windows,etc.
Now, every time the user uses our product/service, he leaves a rating out of 5, 5 being outstanding. This data is stored in our databases and we mine it and analyse it to pin-point the exact issue on any particular day.
For example, if the users from Asia are giving more bad ratings on Tuesday this week than a usual Tuesday, what factors can cause this - is it something to do with clients app version, or server release , physical factors, loss, increased round trip delay etc.
What we have done:
Till now we have been using visualization tools to track each of these metrics separately per day to see the trends and detect the issues manually.
But, due to growing micr-services, it is becoming difficult day by day. Now, we want to automate it using python/pandas.
What I want to do:
If the ratings drop on a particular day/hour, I run the script and it should do all the manual work by taking all the permutations and combinations of all factors and list out the exact combinations which could have lead to the drop.
The second step would be to check whether the drop was significant due to varying number of ratings.
What I know:
I understand that I can do this using pandas by creating a dataframe for each predictor variable and trying to do it per variable.
And then I can apply tests like whitney test etc for ordinal data.
What I need help with:
But I just wanted to know if there is a better way to do it? It is perfectly fine if there is a learning curve involved. I can learn and do it. I just wanted some help in choosing the right approach for this.

Vehicle usage optimization using GTFS

I have a GTFS feed defined for my fleet. This tells the routes, trips and timings. Now using this GTFS feed, is it possible to optimize the utilization of my fleet's vehicles? Can I schedule the vehicles such that once it completes a trip, it can be assigned to serve a trip of another route?
I have constriants such as no vehicle should be running more than 12 hours, every vehicle will undergo a health check for 2 hrs, etc.
To me this sounds like a case of the Knapsack problem.
If such a project exists, kindly let me know. Is there an algorithm that can solve this problem?
Thanks,
Yash
You're asking a question that is typically assigned to a scheduling system, one which would produce GTFS files from the get-go. In smaller systems, this actually is not difficult to do, but as the number of routes (or "trip patterns") increases, the process gets more complex.
Before you undertake any project like this, I suggest reading over the TCRP manual on scheduling, paying close attention to the terms "cycle time," "headway," and "interlining."
While I'd love to help more, I don't have time right now to get into the specifics. I performed a similar analysis with automatically collected cycle times on a limited set of routes in my masters thesis, starting on page 118.
I hope this helps. If you have any follow-up questions, post a comment and I'll respond when I have time.

Google Analytics High Session Rate vs. Low Avg. Session Duration

My department has installed google analytics on our companies website and none of us are exactly experts on understanding why the data is the way it is.
Anyways, our company is fairly large, but I wouldn't say we are exactly a well known company. We provide internet and Video on Demand to Hotels worldwide. Anyways, as of right now, since I have installed our code last month, we have a total session number of over 78,000. Our average session duration is only 24 seconds, with an average page view per visitor at 1.18 and a bounce rate of 91%.
I don't doubt the session average time. Me and my co-workers are just a little confused as to how with that many visitors, we are consistently across the board getting such a fairly small session duration and a high bounce rate. Could visitors possibly just come to our website, look for our phone number and than leave the site? I'm just trying to find a way to reduce the bounce rate and hopefully increase the session duration average. Or is it possible to add a filter that will exclude visits to the site that are less than 30 seconds, or something like that? I apologize for asking such fairly basic questions I'm sure. I am trying to get up to speed and familiarize myself with how this all works. Just thought I'd maybe ask and see if I am missing something important. Any advice would be greatly appreciated. Thank you!
Its hard to tell why your metrics are so low across the board. Your referral traffic could be to blame here - possibly bad ad copy from adwords or bing, which is making users think they're going to a different page. You can always create a filter by the acquisition, and where the clicks and sessions are coming from. From there it'll be easier to see which source of traffic is to blame, and also how you can improve your site overall for traffic optimization and user friendliness. For more info, reach us at RLCppc.com on the matter. hope that answers your question

How to make a GPS transportation app?

I live in Split,Croatia, and a city bus company recently accuired a new piece of software, and what it does is this: if I am a passenger and am awaiting a bus on a bus station, there is a huge monitor on which I can see the bus code and the time it will take for him to get to my station. The problem is, that in two years of having the software, not once have I seen that the time of the arrival is even remotley accurate. I am aware that GPS data can be inaccurate but this.... And that makes me so frustrated, that I decided that I will try to write a similar application for my final exam in CS in college. The problem is that I searched the web extensively in the last few days, and I cannot find good starting points. So my question is: have you ever been involved in such a project, and if so could you give me some pointers be it tutorials, or books regarding the subject? I appreciate any kind of input.
If I made a mistake regarding the question itself feel free to close it.
Thanks!
You will probably have:
Vehicle object containing each vehicles position, assigned route, direction of travel on route, and next scheduled stop, previous scheduled stop, arrival time at previous scheduled stop
Array of routes which comprises a list of stops and a data structure holding historic transit times between stops on each route
Now, updates to a vehicle's location push to the vehicle's object.
When you want to update a display at a station, find all routes passing through that station and for each route display the estimated arrival time of the next vehicle on that route.
The estimated arrival time structure is at the heart of this. Seed it by assuming some distance between stops and an average travel speed.
Now, every time a vehicle arrives at a stop, calculate the real time it took to get there from its last stop and use this to update an average transit time binned by half-hour increments (or what have you), you could also bin by season and/or day of week. The purpose of the binning is to implicitly account for varying traffic congestion by time of day, day of week, and/or season. Assuming otherwise homogeneous conditions, you'll eventually converge on a decent estimate of the transit time between each station.
You may find it useful to employ a Kalman filter.
Estimates of travel times between more distant stations may be more accurate than travel times between adjacent stations, if you feel like looking into that. Higher-order Markov chains may also help describe the underlying statistics of transit times.
Just ideas.

Resources