rollapply + specnumber = species richness over sampling intervals that vary in length? - intervals

I have a community matrix (samples x species of animals). I sampled the animals weekly over many years (in this example, three years). I want to figure out how sampling timing (start week and duration a.k.a. number of weeks) affects species richness. Here is an example data set:
Data <- data.frame(
Year = rep(c('1996', '1997', '1998'), each = 5),
Week = rep(c('1', '2', '3', '4', '5'), 3),
Species1 =sample(0:5, 15, replace=T),
Species2 =sample(0:5, 15, replace=T),
Species3 =sample(0:5, 15, replace=T)
)
The outcome that I want is something along the lines of:
Year StartWeek Duration(weeks) SpeciesRichness
1996 1 1 2
1996 1 2 3
1996 1 3 1
...
1998 5 1 1
I had tried doing this via a combination of rollapply and vegan's specnumber, but got a sample x species matrix instead of a vector of Species Richness. Weird.
For example, I thought that this should give me species richness for sampling windows of two weeks:
test<-rollapply(Data[3:5],width=2,specnumber,align="right")
Thank you for your help!

I figured it out by breaking up the task into two parts:
1. Summing up species abundances using rollapplyr, as implemented in a ddplyr mutate_each thingamabob
2. Calculating species richness using vegan.
I did this for each sampling duration window separately.
Here is the bare bones version (I just did this successively for each sampling duration that I wanted by changing the width argument):
weeksum2 <- function(x) {rollapply(x, width = 2, align = 'left', sum, fill=NA)}
sum2weeks<-Data%>%
arrange(Year, Week)%>%
group_by(Year)%>%
mutate_each(funs(weeksum2), -Year, -Week)
weeklyspecnumber2<-specnumber(sum2weeks[,3:ncol(sum2weeks)],
groups = interaction(sum2weeks$Week, sum2weeks$Year))
weeklyspecnumber2<-unlist(weeklyspecnumber2)
weeklyspecnumber2<-as.data.frame(weeklyspecnumber2)
weeklyspecnumber2$WeekYear<-as.factor(rownames(weeklyspecnumber2))
weeklyspecnumber2<-tidyr::separate(weeklyspecnumber2, WeekYear, into = c('Week', 'Year'), sep = '[.]')

Related

Optimizing the algorithm for checking available reservations for the store

I would like to ask about some algorithms related to checking if a customer can book a table at the store?
I will describe my problem with the following example:
Restaurant:
User M has a restaurant R. R is open from 08:00 to 17:00.
Restaurant R has 3 tables (T1, T2, T3), each table will have 6 seats.
R offers F1 food, which can be eaten within 2 hours.
Booking:
R has a customer C has booked a table T1 for 5 people with F1 food | B[0]
B[0] has a start time: 9AM
M is the manager of the store, so M wants to know if the YYYY-MM-DD date has been ordered by the customer or not?
My current algorithm is:
I will create an array with the elements as the number of minutes of the day, and their default value is 0
24 * 60 = 1440
=> I have: arr[1440] = [0, 0, 0, ...]
Next I will get all the bookings for the day YYYY-MM-DD. The result will be an array B[].
Then I will loop the array B[]
for b in B[]
I then keep looping for the start_time, to the end_time of b with step of 1 min.
for time = start_time, time <= end_time. time++
With each iteration I will reassign the value of the array arr with index as the corresponding number of minutes in the day to 1
(It is quite similar to Sieve of Eratosthenes)
Then what I need to do is iterate over the array arr 1 more time, if there is at least 1 value 0 in the array it means YYYY-MM-DDdate is still bookable.
But my algorithm will not be optimal if increase the number of tables that the store has, the number of days to check is many days (for example from 2022-01-01 -> 2022-02-01), ...
Thank you very much.
P/S: Regarding the technology background, I am currently using laravel 9

PowerBI line chart growth from 100%

In PowerBI, I want to compare the value growth of categories (lets take A and B) over time from any starting year. To compare this easily, I am using a line graph with the time on the x-axis and category as a legend. I would like both categories to start at 100% and show the growth relative to that starting point. I then want to be able to use a continuous date slicer to vary the start and end-points of my line graph.
I've created a dummy data to illustrate this
Category, Year, Value
A 2000 5
A 2001 8
A 2002 8
A 2003 10
B 2000 10
B 2001 8
B 2002 12
B 2003 10
Without any date filter, I would like to display years 2000-2003 with the following values for the lines:
A: 100%, 160%, 160%, 200%
B: 100%, 80%, 120%, 100%
(The first value of category A is 5. Therefore the line graph should display A's values relative to 5. It's values 5, 8, 8, 10 are then displayed as the mentioned percentages. The first value of category B is 10, so B's values should be displayed relative to 10).
With a date slicer set to filter years 2001-2003, I want the line values to become:
A: 100%, 100%, 125%
B: 100%, 150%, 125%
(Due to the slicer the first value of category A is 8, so I want the % values relative to 8. The first value of B is also 8)
I was thinking of writing a measure for this. Can anyone help me with it? Thank you in advance.
You can create a measure to establish the earliest filtered year, the value for that year, then divide each evaluated value by the min year value:
MyMeasure =
VAR MinYear =
CALCULATE (
MIN ( MyTable[Year] ),
ALLSELECTED ( MyTable[Year] )
)
VAR BaseValue =
CALCULATE (
SUM ( MyTable[Value] ),
REMOVEFILTERS ( MyTable[Year] ),
MyTable[Year] = MinYear
)
VAR CurrentValue =
SUM ( MyTable[Value] )
RETURN
DIVIDE (
CurrentValue,
BaseValue
)
Which results in:

Schedule meeting problem (count how many meetings an owner can schedule based on investor availabilities)

I tried to solve the task which sounds like "Given the schedules of the days investors are available, determine how many meetings the owner can schedule". The owner is looking to meet new investors to get some funds for his company. The owner must respect the investor's schedule. Note that the owner can only have one meeting per day.
The schedule consists of 2 integer arrays, firstDay and lastDay. Each element in the array firstDay represents the first day an investor is available, and each element in lastDay represents the last day an investor is available, both inclusive.
Example:
firstDay = [1,2,3,3,3]
lastDay = [2,2,3,4,4]
There are 5 investors [i0, i1, i2, i3, i4]
The investor i0 is available from day 1 to day 2 inclusive [1,2]
The investor i1 is available in day 2 only [2,2]
The investor i2 is available in day3 only [3,3]
The investors i3 and i4 are available from day 3 to day 4 only [3,4]
The owner can only meet 4 investors out of 5: i0 in day 1, i1 in day 2, i2 in day 3 and i3 in day 4. The image below shows the scheduled meetings in green and blocked days are in gray.
A graphic shows the scheduled meetings
The task is to implement the function which takes 2 lists of integers as input parameters and returns integer result that represents the maximum number of meetings possible.
Constraints
array length - bigger or equal 1 and less or equal 100000
firstDay[i], lastDay[i] bigger or equal 1 and less or equal 100000 (i bigger than or equal 0 less than n)
firstDay[i] less or equal lastDay[i]
My implementation of this task is the following:
public static int countMeetings(List<int> firstDay, List<int> lastDay)
{
var count = 0;
count = firstDay.Concat(lastDay).Distinct().Count();
if (count > firstDay.Count)
{
count = firstDay.Count;
}
return count;
}
And this code successfully passes 8 of 12 provided tests. I'll be glad to see and discuss any working solutions to this issue. Thanks.
For the input
firstDay = [1,1,1]
lastDay = [5,5,5]
your code returns 2 however correct answer is 3

Find the time period with the maximum number of overlapping intervals

There is one very famous problem. I am asking the same here.
There is number of elephants time span given, here time span means, year of birth to year of death.
You have to calculate the period where maximum number of elephants are alive.
Example:
1990 - 2013
1995 - 2000
2010 - 2020
1992 - 1999
Answer is 1995 - 1999
I tried hard to solve this, but I am unable to do so.
How can I solve this problem?
I got approach for when a user asks to find the number of elephants in any year. I solved that by using segment tree, whenever any elephants time span given, increase every year of that time span by 1. We can solve that in this way. Can this be used to solve the above problem?
For above question, I only need the high-level approach, I will code it myself.
Split each date range into start date and end date.
Sort the dates. If a start date and an end date are the same, put the end date first (otherwise you could get an empty date range as the best).
Start with a count of 0.
Iterate through the dates using a sweep-line algorithm:
If you get a start date:
Increment the count.
If the current count is higher than the last best count, set the count, store this start date and set a flag.
If you get an end date:
If the flag is set, store the stored start date and this end date with the count as the best interval so far.
Reset the flag.
Decrement the count.
Example:
For input:
1990 - 2013
1995 - 2000
2010 - 2020
1992 - 1999
Split and sorted: (S = start, E = end)
1990 S, 1992 S, 1995 S, 1999 E, 2000 E, 2010 S, 2013 E, 2020 E
Iterating through them:
count = 0
lastStart = N/A
1990: count = 1
count = 1 > 0, so set flag
and lastStart = 1990
1992: count = 2
count = 2 > 0, so set flag
and lastStart = 1992
1995: count = 3
count = 3 > 0, so set flag
and lastStart = 1995
1999: flag is set, so
record [lastStart (= 1995), 1999] with a count of 3
reset flag
count = 2
2000: flag is not set
reset flag
count = 1
2010: count = 2
since count = 2 < 3, don't set flag
2013: flag is not set
reset flag
count = 1
2020: flag is not set
reset flag
count = 0
How about this?
Say I have all the above data stored in a file. Read it into two arrays separated by the " - ".
Hence, now I have birthYear[] which contains all the birth years and deathYear[] containing all the death years.
so birthYear[] = [1990, 1995, 2010, 1992]
deathYear[] = [2013, 2000, 2020, 1999]
Get the min birth year and the max death year. Create a Hashtable with the Key as a year, and the Value as the count.
Hence,
HashTable<String, Integer> numOfElephantsAlive = new HashTable<String, Integer>();
Now, from the min(BirthYear) to the max(BirthYear), do the following :
Iterate through the Birth Year Array and do an add to the HashTable all the years in between the BirthYear and Corresponding DeathYear with the count being 1. If the key already exists, add 1 to it. Hence, for the last case :
1992 - 1999
HashTable.put(1992, 1)
HashTable.put(1993, 1)
and so on for every year.
Say, for example, you have a Hashtable that looks like this at the end of it:
Key Value
1995 3
1996 3
1992 2
1993 1
1994 3
1998 1
1997 2
1999 2
Now, you need the range of the Years when the number of elephants were maximum. Hence, let's iterate and find the year with the max value. This is pretty easy. Iterate over the keySet() and get the year.
Now, you need a contiguous range of years. You can either do this in two ways:
Do Collections.sort() over the keySet() and when you hit the max value, save all contiguous locations.
Hence, on hitting 3 for our example at 1994, we would check for all the following years with a 3. This will return you your range which is the min-year, max-year combo.
One approach maybe:
Iterate through the periods. Keep track of a list of periods up to now. Note: At each step, the number of periods increases by 2 (or 1 if there is no overlap with the existing list of periods).
For example
1990 - 2013
Period List contains 1 period { (1990,2013) }
Count List contains 1 entry { 1 }
1995 - 2000
Period List contains 3 periods { (1990,1995), (1995,2000), (2000,2013) }
Count List contains 3 entries { 1, 2, 1 }
2010 - 2020
Period List contains 5 periods { (1990,1995), (1995,2000), (2000,2010), (2010, 2013), (2013, 2020) }
Count List contains 5 entries { 1, 2, 1, 2, 1 }
1992 - 1999
Period List contains 7 periods { (1990,1992), (1992,1995), (1995,1999), (1999,2000), (2000,2010), (2010, 2013), (2013, 2020) }
Count List contains 7 entries { 1, 2, 3, 2, 1, 2, 1 }
1) arrange in assending order year wise starting from the largest series.
2) count the years for largest series for whole data set
3) then identify the largest count.
4) the largest count is your answer for years... this can be done in Algo.

User submitted rankings

I was looking to have members submit their top-10 list of something, or their top 10 rankings, then have some algorithm combine the results. Is there something out there like that?
Thanks!
Ahhhh, that's open-ended alright. Let's consider a simple case where only two people vote:
1 ALPHA
2 BRAVO
3 CHARLIE
1 ALPHA
2 DELTA
3 BRAVO
We can't go purely by count... ALPHA should obviously win, though it has the same votes as BRAVO. Yet, we must avoid a case where just a few first place votes dominate a massive amount of 10th place votes. To do this, I suggest the following:
$score = log($num_of_answers - $rank + 2)
First place would then be worth just a bit over one point, and tenth place would get .3 points. That logarithmic scaling prevents ridiculous dominance, yet still gives weight to rankings. From those example votes (and assuming they were the top 3 of a list of 10), you would get:
ALPHA: 2.08
BRAVO: 1.95
DELTA: .1
CHARLIE: .95
Why? Well, that's subjective. I feel out of a very long list that 4,000 10th place votes is worth more than 1,000 1st place votes. You may scale it differently by changing the base of your log (natural, 2, etc.), or choose a different system.
You could just add up the total for each item of the ranking given by a user and then sort them.
ie:
A = (a,b,c)
B = (a,c,b)
C = (b,a,c)
D = (c,b,a)
E = (a,c,b)
F = (c,a,b)
a = 1 + 1 + 2 + 3 + 1 + 2 = 10
b = 2 + 3 + 1 + 2 + 3 + 3 = 14
c = 3 + 2 + 3 + 1 + 2 + 1 = 12
Thus,
a
c
b
I think you could solve this problem by using a max flow algorithm, to create an aggregate ranking, assuming the following:
Each unique item from the list of items is a node in a graph. E.g. if there are 10 things to vote on, there are 10 nodes.
An edge goes from node *a* to node *b* if *a* is immediately before *b* in a _single user submitted_ ranking.
The last node created from a _single user submitted_ ranking will have an edge pointed at the *sink*
The first node created from a _single user submitted_ ranking will have an incoming edge from the *source*
This should get you an aggregated top-10 list.

Resources