Algorithm for a health/activity score for completing tasks on a website - algorithm

We're creating a basic site where users get to-do's delivered to them each week. I need to create an algorithm which based on there completion (or lack there of) of these to-do's, I assign them a health score of 0 - 100.
There are 0 - 4 to-do's delivered each week.
A to-do can be completed, deleted (marked as irrelevant), or left pending.
If users aren't completing their to-do's (to-do's in the pending state) then they have a low health meter.
I'd also like to weight the pending states. For example pending to-do's in the first week aren't as detrimental to the score as pending to-do's in the 4th week. I'm only thinking of using the last 4-6 weeks of data to determine the score.
Any help with the approach I should take would be much appreciated.
I'm currently using the following notation
t0 # total to-dos given in week 0
t0_c # completed to-dos from week 0
t0_d # deleted to-dos from week 0
t0_p # pending to-dos from week 0

This sounds like the perfect place for a Moving average.
Example:
health for a week = 100*((not done for that week)^1.5)/8
Then use an exponential moving average on the historical and current health scores to get the current health.

May be you could assign completion scores each week and assign weight-age for each week's completion scores. Since you are considering say only last 5 weeks of scores, you could have something like (1 for completion, 0 for pending)
week 1 [1,0,0,1] completion score = 50%
week 2 [1,0,1] completion score = 66.6%
week 3 [0] completion score = 0%
week 4 [1] completion score = 100%
week 5 [1,0,0,0] completion score = 25%
All dont have total of 4 to-dos as some may be irrelevant
Now you assign weekly weights, you know week1 is 5 weeks back and so should have more weight and eventually comes down at week 5, so something like
week1 weight = 30%
week2 weight = 25%
week3 weight = 20%
week4 weight = 15%
week5 weight = 10%
Now just multiply week's completion score to its weight and add the terms
(50*30 + 66.6*25 + 25*0 + 15*100 + 25*10)/100 = 49.15%
One downside to this approach is that,
Say Guy1 week1 has 1 to-do pending i.e
week 1 [0] => score = 0%
Say Guy2 week1 has 2 to-do pending but one complete i.e
week 1 [0,0,1] => score = 33%
Guy2 gets much higher score though he has more pending work
If number of to-dos are roughly same among the customers on an average then this wont be a big issue.

Related

Algorithm to match people with available appointments

I need some help making a program that finds the best solution for everyone (more on that later).
6 7
0 0 0 0 0 0 0
1 0 0 1 1 0 0
2 2 2 1 2 2 2
2 1 1 1 2 1 2
0 1 2 2 1 0 0
1 2 1 2 0 1 1
The example given above is a problem that the algorithm is supposed to solve,
the first number of the first row indicates the number of people (6)
the second number of the first row indicates the number of appointments (7)
0 = the person doesnt have a problem with the date
1 = the person could choose these date if none else is available
2 = the person cant choose this appointment
Row = Person
Colum = Available Appointment
What the program needs to do now is to find the best possible solution for everyone by choosing which colum would be the best for the person's desire by arranging peoples appointments based on their choices
ex.
In the 3rd row the person can only attend the appointment on the 4th column since he cant attend to the other ones (2) which also makes column 4 complete and out of use for the other people.
The reason I need help with this is because I have no idea on how to approach this because this might be a simple example but since its an algorithm its supposed to work with dozens of peoples and appointments.
The exercise is somewhat ambiguous, probably on purpose. My wild guess would be to sort the meetings by:
the highest number of possible participants, i.e., the lowest number of 2s in a matrix column.
the lowest “badness”, i.e., the lowest number of 1s in a matrix column.
Why not #2s: Because we don’t care about those who cannot participate at this sorting stage.
Why not #0s: Because we want to minimize the number of people inconvenienced by the meeting time, not (necessarily) maximize the number of people pleased with the meeting time.
#!/usr/bin/env python
import sys
n_people, n_appointments = (int(i)
for i in sys.stdin.readline().split())
people_appointments = tuple(tuple(int(i)
for i in line.split())
for line in sys.stdin)
assert len(people_appointments) == n_people
for appointments in people_appointments:
assert len(appointments) == n_appointments
appointment_metric = {}
for appointment in range(n_appointments):
n_missing = sum(people_appointments[i][appointment] == 2
for i in range(n_people))
badness = sum(people_appointments[i][appointment] == 1
for i in range(n_people))
appointment_metric.setdefault(
(n_missing, badness), []).append(str(appointment + 1))
for metric in sorted(appointment_metric):
print(f'Appointment Nr. {" / ".join(appointment_metric[metric])} '
f'(absence {metric[0]}, badness {metric[1]})')
Possible output (best appointment (by the metric described above) to worst appointment):
Appointment Nr. 6 (absence 1, badness 2)
Appointment Nr. 7 (absence 2, badness 1)
Appointment Nr. 1 / 2 / 3 / 5 (absence 2, badness 2)
Appointment Nr. 4 (absence 2, badness 3)
There are (of course) many other ways to evaluate meetings. Picking and defining a metric is quite likely an implicit part of the exercise.

Amazon Quicksight - how to calculate Percentage of Total UNIQUE values?

How to calculate percent of count of UNIQUE values?
E.g. I have a dataset with people who can pick multiple symptoms (i.e each person can have 0 to 10 values).
person 1 - symptom A, B
person 2 - symptom B, C, D
person 3 - no symptoms
person 4 - symptom A
etc.
E.g. if total UNIQUE count of people is 4 and 2 of them have picked symptom A, then I'd like to see:
A = 2/4 = 50%
Currently QuickSight is able to calculate shares based on total count of people (not unique count) as one person can have multiple symptoms, so A is 2/6 = 33% (not what I need).
As much as I've tried, QuickSight doesn't enable that??

Schedule meeting problem (count how many meetings an owner can schedule based on investor availabilities)

I tried to solve the task which sounds like "Given the schedules of the days investors are available, determine how many meetings the owner can schedule". The owner is looking to meet new investors to get some funds for his company. The owner must respect the investor's schedule. Note that the owner can only have one meeting per day.
The schedule consists of 2 integer arrays, firstDay and lastDay. Each element in the array firstDay represents the first day an investor is available, and each element in lastDay represents the last day an investor is available, both inclusive.
Example:
firstDay = [1,2,3,3,3]
lastDay = [2,2,3,4,4]
There are 5 investors [i0, i1, i2, i3, i4]
The investor i0 is available from day 1 to day 2 inclusive [1,2]
The investor i1 is available in day 2 only [2,2]
The investor i2 is available in day3 only [3,3]
The investors i3 and i4 are available from day 3 to day 4 only [3,4]
The owner can only meet 4 investors out of 5: i0 in day 1, i1 in day 2, i2 in day 3 and i3 in day 4. The image below shows the scheduled meetings in green and blocked days are in gray.
A graphic shows the scheduled meetings
The task is to implement the function which takes 2 lists of integers as input parameters and returns integer result that represents the maximum number of meetings possible.
Constraints
array length - bigger or equal 1 and less or equal 100000
firstDay[i], lastDay[i] bigger or equal 1 and less or equal 100000 (i bigger than or equal 0 less than n)
firstDay[i] less or equal lastDay[i]
My implementation of this task is the following:
public static int countMeetings(List<int> firstDay, List<int> lastDay)
{
var count = 0;
count = firstDay.Concat(lastDay).Distinct().Count();
if (count > firstDay.Count)
{
count = firstDay.Count;
}
return count;
}
And this code successfully passes 8 of 12 provided tests. I'll be glad to see and discuss any working solutions to this issue. Thanks.
For the input
firstDay = [1,1,1]
lastDay = [5,5,5]
your code returns 2 however correct answer is 3

Minimize the number of trips or Group maximum possible orders

We have one distribution center ( ware house ) and we are getting orders in real time whose time/distance from ware house and other order locations is known.
time matrix=
W O1 O2 O3
W 0 5 20 2
O1 5 0 21 7
O2 20 21 0 11
O3 2 7 11 0
order time of O1= 10:00 AM
order time of O2= 10:20 AM
order time of O3= 10:25 AM
I want to club as many as order possible such that delivery time of any order does not exceed by 2 hours of its order time. Thus the question is to reduce the number of trips(Trip is when delivery agent goes for delivery).
I am trying to come up with algorithm for this. there are two competing factors when
We can combine all the orders in the sequence as they are coming till it satisfies the constraint of delivery of the order within 2 hours of its ordering time.
We can modify above approach to find the bottleneck order(due to which we can not club more order now in approach 1). and pull it out from trip1 and make it a part of trip 2(new order) and wait for other orders to club it with trip1 or trip2 depending.
All the orders are coming in realtime. What will be the best approach to conquer this situation. Let me know if you need more clarity on this.
Very safe and easy algorithm which is guaranteed to not exceed the maximal waiting time for an order:
Let TSP() be a function which returns the estimate of time spent to visit given places. The estimate is pessimistic, i.e. the actual ride time can be shorter or equals to estimate, but not longer. For the good start you can implement TSP() very easily in a greedy way: from each place go to the nearest place. You can subtract the length of the longer edge coming out from W to have better estimate (so a car will always take the shorter edge coming out of W). If TSP() would happen to be optimal, then the whole algorithm presented here would be also optimal. The overall algorithm is as good as TSP() implementation is, it highly depends on good estimation.
Let earliestOrderTime be a time of the earliest not handled yet order.
Repeat every minute:
If there is a new order: If s is empty, set earliestOrderTime to current time. Add it to a set s. Calculate t = TSP(s + W).
If (current time + t >= earliestOrderTime + 2 hours): send a car for a TSP(s + W) trip. Make s an empty set.
Example
For your exemplary data it will work like this:
10:00. earliestOrderTime = 10:00. s = {O1}. t = TSP({01, W}) = 10 - 5 = 5.
10:00 + 0:05 < 10:00 + 2:00, so we don't send a car yet, we wait.
...
10:20. s = {O1, O2}. t = 46 - 20 = 26.
10:20 + 0:26 < 10:00 + 2:00, so we wait.
...
10:25. s = {O1, O2, O3}. t = 2 + 7 + 21 + 20 - 20 = 30.
10:25 + 0:30 < 10:00 + 2:00, so we wait.
...
11.30.
11:30 + 0:30 >= 10:00 + 2:00, so we send a car to go to O3, O1, O2 and back to W. He visits orders at 11:32, 11:39, 12:00 and come backs at 12:20. Guys where waiting 67, 99 and 100 minutes.

Probability of event

Here is a probability problem: you observe .5 cars on average passing in front of you every 5 minutes on a road. What is the probability of seeing at least 1 car in 10 minutes?
I'm trying to solve this in 2 ways. The first way is to say: P(no car in 5 minutes) = 1 - .5 = .5. P(no car in first 5 minutes and no car in second 5 minutes) = P(no car in first 5 minutes) * P(no car in second 5 minutes) by independence. Therefore P(at least 1 car in 10 minutes) = 1 - .5*.5 = .75.
However, if I try the same, with a Poisson distribution with rate lambda = .5 per unit of time, for 2 units of time, I get: P(at least 1 car in 2 units of time) = 1 - exp(-2*lambda) = .63.
Am I doing something wrong? If not, what explains the discrepancy?
Thanks!
Your first calculation is incorrect. An average .5 cars / 5 minutes does not imply P(no car in 5 minutes) = 0.5. Consider for instance a process where every five minute, you see either no car with probability 90%, or 5 cars with probability 10%. On average you will see 0.5 cars every five minute, but the probability you see 0 cars in the next 5 minutes is clearly not 50%.
I haven't checked the computations for your second example; the calculation logic is looks correct, but the conclusion is incorrect: you are making an assumption about the distribution (Poisson) which is plausible but not implied by the problem statement.
If you take again my example, which is consistent with your problem description, the probability to see 0 cars in 10 minutes is 0.9 x 0.9 = 0.81, which gives you 19% of seeing one car or more. We could arbitrarily change my example to give you a wide variety of probabilities.
From your problem statement, the only thing you can say is that "in the long run, you'll see 0.5 cars every 5 minutes". Beyond that you can't make a statement on what should be expected within 10 minutes, unless you make some assumptions about the distribution of the cars arrivals.

Resources