Elasticsearch, how to calculate cumulative probability of normal distribution? - elasticsearch

Suppose, each document has the following data:
{
//some_other_fields,
"seasonal_data": [
{
"day_of_year": 1,
"sales": 3
},
{
"day_of_year": 2,
"sales": 5
}
]
}
When ranking documents, I want to consider seasonal score along with others.
For a given day_of_year, I'll consider 7 days (including 3 days prior, 3 days after).
I'll get average sales value of seven days.
We assume sales data follows a normal distribution:
seasonal_score = p-value(avg_sales_7days)
How can this be done in Elasticsearch?

Related

Calculate P&L on the arbitrary period

Let's assume I have an array of trades (Buys/Sells), associated with timestamps.
[
{
"time": "01-05-2021", // DD-MM-YYYY
"operation": "BUY",
"amount": 2,
"price": 10
},
{
"time": "01-06-2021",
"operation": "SELL",
"amount": 1,
"price": 15
},
{
"time": "01-07-2021",
"operation": "BUY",
"amount": 2,
"price": 20
},
{
"time": "01-08-2021",
"operation": "SELL",
"amount": 3,
"price": 25
}
]
And I want to calculate P&L on these trades, using FIFO, but for arbitrary time period.
The problem is - calculated value depends on time period I'll choose.
For 08.2021 it'll be 0 (3 items were sold, none was bought).
For 07-08.2021 it'll be 10 (2 items were bought for total of 40, 2 were sold for total 50).
For 06-08.2021 it'll be 0 (SELL 1 on 15 -> BUY 1 on 20 == -5 and BUY 1 on 20 -> SELL 1 on 25 == 5).
And so on.
The only working solution that I have right now is to calculate P&L values for each deal, from the beginning of trading activity. And then, just "cut off" everything beside required period. But it's not scalable, because with even without automated trading it can be thousands of deals each year.
The most obvious thing to do is to add some initial state to the beginning of given period, which will be starting point for all further calculations.
Are there any algorithms or tools which I can utilize to perform this task?
I'm a Javascript developer, and my solution works in the browser's runtime. Maybe I need some backend, maybe I need R with it's statistics-dedicated package...

Algorithm for searching the combination with lowest price [duplicate]

I have items with ID (1001, 1002, 1003, 1004, 1005, 1006). There respective quantities are (2, 5, 1, 1, 5, 2): Now I have data like following.There is an offerId for each row.
offerId :{[Item_Id, Item_quantity_on_which_offer_Applied, Discount per quantity]}
1 :{[1001, 2, 21]}
4 :{[1002, 5, 5]}
6 :{[1003, 1, 25] [1004, 1, 25]}
5 :{[1004, 1, 20]}
3 :{[1005, 5, 17.5] [1002, 5, 17.5]}
2 :{[1005, 2, 18.33] [1001, 2, 26] [1006, 2, 21.67]}
Explaination When offer Id 1 is applied, I get Item 2 quantities of Item Id 1002 at 21 rs. discount per quantity i.e. I am getting 21 rs. discount on 1 quantity of 1002.
Objective I want to get the best offer combination. For example in above case best offer combination will be:
OfferId : 2 (discount = 132 (i.e. (18.33+26+21.67)*2))
OfferId : 3 (note: for 3 quantities of item 1005 and 3 quantities of item 1002 since 2 quantities of item 1005 is already in offer Id 2). (discount = 105(i.e. (17.5+17.5)*3))
Now item 1002 has 2 quantity remaining , so:
offerId : 4 (applied on 2 quantities of item 1002)(discount = 10(i.e 5*2))
offerId : 6 (discount = (25+25)*1 = 50)
So in a nutshell, offerids 2, 3 , 4 , 6 will give me best combination of offers where offer 4 is applied on 2 quantities of item 1002,
offer 3 for 3 quantities of item 1005 and 3 quantities of item 1002.
Above is the result I desire to compute best offer combination depending on quantity.
So far, I had been able to find best offer combination without considering quantity. But now my requirement is to consider quantities of Items and then find best offer combination.
It would be really helpful if anyone can provide me with a pseudocode. Any suggestions are greatly appreciated.
P.S. I am writing my code in Golang but solutions in any language are welcomed.
I hope I framed my question correctly. Comment below if any more information regarding question is required.
Thanks in advance.
Even if there is only a single item of each type and every offer gives the same total discount (say, $1), this is NP-hard, since the NP-hard problem Set Packing can be reduced to it: for each set, make an offer for the same elements with total discount $1. Since all offers provide the same benefit, the optimal solution to this constructed instance of your problem is the one that uses the largest number of offers, and this solution corresponds directly to the optimal solution to the original Set Packing problem.
Thus there's no hope for a polynomial-time solution to your problem.

Algorithm to find best offer combination that gives maximum discount on a given set Of Items

I have items with ID (1001, 1002, 1003, 1004, 1005, 1006). There respective quantities are (2, 5, 1, 1, 5, 2): Now I have data like following.There is an offerId for each row.
offerId :{[Item_Id, Item_quantity_on_which_offer_Applied, Discount per quantity]}
1 :{[1001, 2, 21]}
4 :{[1002, 5, 5]}
6 :{[1003, 1, 25] [1004, 1, 25]}
5 :{[1004, 1, 20]}
3 :{[1005, 5, 17.5] [1002, 5, 17.5]}
2 :{[1005, 2, 18.33] [1001, 2, 26] [1006, 2, 21.67]}
Explaination When offer Id 1 is applied, I get Item 2 quantities of Item Id 1002 at 21 rs. discount per quantity i.e. I am getting 21 rs. discount on 1 quantity of 1002.
Objective I want to get the best offer combination. For example in above case best offer combination will be:
OfferId : 2 (discount = 132 (i.e. (18.33+26+21.67)*2))
OfferId : 3 (note: for 3 quantities of item 1005 and 3 quantities of item 1002 since 2 quantities of item 1005 is already in offer Id 2). (discount = 105(i.e. (17.5+17.5)*3))
Now item 1002 has 2 quantity remaining , so:
offerId : 4 (applied on 2 quantities of item 1002)(discount = 10(i.e 5*2))
offerId : 6 (discount = (25+25)*1 = 50)
So in a nutshell, offerids 2, 3 , 4 , 6 will give me best combination of offers where offer 4 is applied on 2 quantities of item 1002,
offer 3 for 3 quantities of item 1005 and 3 quantities of item 1002.
Above is the result I desire to compute best offer combination depending on quantity.
So far, I had been able to find best offer combination without considering quantity. But now my requirement is to consider quantities of Items and then find best offer combination.
It would be really helpful if anyone can provide me with a pseudocode. Any suggestions are greatly appreciated.
P.S. I am writing my code in Golang but solutions in any language are welcomed.
I hope I framed my question correctly. Comment below if any more information regarding question is required.
Thanks in advance.
Even if there is only a single item of each type and every offer gives the same total discount (say, $1), this is NP-hard, since the NP-hard problem Set Packing can be reduced to it: for each set, make an offer for the same elements with total discount $1. Since all offers provide the same benefit, the optimal solution to this constructed instance of your problem is the one that uses the largest number of offers, and this solution corresponds directly to the optimal solution to the original Set Packing problem.
Thus there's no hope for a polynomial-time solution to your problem.

rollapply + specnumber = species richness over sampling intervals that vary in length?

I have a community matrix (samples x species of animals). I sampled the animals weekly over many years (in this example, three years). I want to figure out how sampling timing (start week and duration a.k.a. number of weeks) affects species richness. Here is an example data set:
Data <- data.frame(
Year = rep(c('1996', '1997', '1998'), each = 5),
Week = rep(c('1', '2', '3', '4', '5'), 3),
Species1 =sample(0:5, 15, replace=T),
Species2 =sample(0:5, 15, replace=T),
Species3 =sample(0:5, 15, replace=T)
)
The outcome that I want is something along the lines of:
Year StartWeek Duration(weeks) SpeciesRichness
1996 1 1 2
1996 1 2 3
1996 1 3 1
...
1998 5 1 1
I had tried doing this via a combination of rollapply and vegan's specnumber, but got a sample x species matrix instead of a vector of Species Richness. Weird.
For example, I thought that this should give me species richness for sampling windows of two weeks:
test<-rollapply(Data[3:5],width=2,specnumber,align="right")
Thank you for your help!
I figured it out by breaking up the task into two parts:
1. Summing up species abundances using rollapplyr, as implemented in a ddplyr mutate_each thingamabob
2. Calculating species richness using vegan.
I did this for each sampling duration window separately.
Here is the bare bones version (I just did this successively for each sampling duration that I wanted by changing the width argument):
weeksum2 <- function(x) {rollapply(x, width = 2, align = 'left', sum, fill=NA)}
sum2weeks<-Data%>%
arrange(Year, Week)%>%
group_by(Year)%>%
mutate_each(funs(weeksum2), -Year, -Week)
weeklyspecnumber2<-specnumber(sum2weeks[,3:ncol(sum2weeks)],
groups = interaction(sum2weeks$Week, sum2weeks$Year))
weeklyspecnumber2<-unlist(weeklyspecnumber2)
weeklyspecnumber2<-as.data.frame(weeklyspecnumber2)
weeklyspecnumber2$WeekYear<-as.factor(rownames(weeklyspecnumber2))
weeklyspecnumber2<-tidyr::separate(weeklyspecnumber2, WeekYear, into = c('Week', 'Year'), sep = '[.]')

Complexity and binary tries

I have a big problem understanding complexity and especially with binary trees.
For example, I know that when we have a problem, with say the problem's size is x=log2(sizeofarray) but I don't understand where this log2 comes from?
Let's take a binary search as the easy example. Say you have a sorted list of 64 elements, and you're searching for a particular one. In each iteration, you halve the dataset. By the time your dataset has 1 element, you have halved it 6 times (count the arrows, not the numbers):
64 -> 32 -> 16 -> 8 -> 4 -> 2 -> 1
The reason for this is the fact that 64 = 2 ^ 6, where 2 is the base (you divide the dataset in 2 parts in each iteration), and the exponent is 6 (as you get to the bottom in 6 iterations). There is another way to write this, since exponentiation has its inverse in logarithm:
64 = 2 ^ 6
6 = log2 64
So we can see that the number of iterations scales with the base-two logarithm of the number of elements.
It's log2 because each level of tree splits your problem into two.
For instance, consider this set of data:
{ 1, 2, 3, 4, 5, 6, 7, 8 }
The first level could be
{ 1, 2, 3, 4 }, { 5, 6, 7, 8 }
the second level:
{ 1, 2 }, { 3, 4 }, { 5, 6 }, { 7, 8 }
the third level:
{ 1 }, { 2 }, { 3 }, { 4 }, { 5 }, { 6 }, { 7 }, { 8 }
Here with 8 values, log2(8) = 3, and there are 3 levels in the tree.
Also see these other StackOverflow questions for more:
"Why is the height of a balanced binary tree log(n)? (Proof)" - the answer follows a similar vein to the answer that Amadan posted on this question.
"Search times for binary search tree" - contains some pretty ASCII art, and examines best/worst case scenarios.

Resources