How to detect trending items with popularity count? - elasticsearch

I made a search app with elasticsearch.
Items have name and follower count. I use follower count to boost elasticsearch result.
Ex: Let's say i have two item. item_1 = [name = "abc def", follower = 1000] and item_2 = [name = "abc", follower = 10].
So, when user search for "abc", even though item_2 is exact match I bring item_1 as most likely result.
This works just fine for me.
But I want to add new feature to this.
I want to be able to detect items that getting popular and boost their score.
So, I think if I store follower count daily for a week or a month.
Like;
ItemNo Day1 Day2 Day3 Day4 ...
1 1000 1030 1040 1050 ...
2 50 100 200 400 ...
3 1M 1.001M 1.002m 1.003M ...
4 1.1M 1.1M 1.1M 1.1M ...
So, if daily follower count increase like this for item 1, 2, 3 and 4.
Then, I should be able to detect increase of follower count for item 2 and boost it over item 1.
Because, even though item 1 has more follower, item 2 is getting more follower every day.
But, item 3 and should not be boosted over item 4 because percentage of increase on item 3 is very small.
Bottom line, i want to be able to detect increasing popularity but it should be based on increase percentage.
So, do you have any suggestion to do this. or can you refer any paper that help me to solve this problem?

Related

Validation list with multiple criterias

I sort out my problem but I am wondering if an easier way doing it doesn't exist.
With the function onEdit(), I am creating from a first validation list, a second validation list in the next cell.
So, the code below is just for information.
var validationRange = data.getRange(3, myIndex, data.getLastRow());
var validationRule = SpreadsheetApp.newDataValidation().requireValueInRange(validationRange).build();
activeCell.offset(0, 1).setDataValidation(validationRule);
In this validation list, I need to select a name, taking into account two additionnal criterias :
Name
Times
LastWeek
Patrick
2
W22
Rachel
5
W15
Claire
3
W14
Olivier
4
W16
Hélene
2
W20
It means that "Patrick" attended "2" times and last time was in week "W22".
I need to take into account these two criterias to select one of the attendee
I.E. : I try to let each person participate the same number of times but not every week (the oldest first)
So, I created a validation list with a "sorted key" that allows me to first see the person who has attended less and for longer.
Key Sorted
#02W20#Hélene
#02W22#Patrick
#03W14#Claire
#04W16#Olivier
#05W15#Rachel
This validation list is used 3, 5, 7 times in the same sheet because person can do different activities. Then, when all person are selected another script removes the values between # to keep only the name in the final sheet.
So, the question is :
could we create a validation list with multiple columns, the selected value being only the value of the first column.
I guess many users would enjoy it when we glance at questions about multiple criteras selection lists.
Thanks

DAX Power Pivot measure – last previous value with where condition

I am looking for a DAX measure to calculate volume = quantity * price, where the price is the last previous price for a given product.
In other words, I am looking for a DAX measure with last previous value with a "where" condition.
Take this example from the attached workbook:
I have 3 products:
Apples
Bananas
Oranges
Each of these has a USD price and volume is simply quantity * price.
However, oranges can also be exchanged for apples!
To sum up the value of these orange-for-apple exchange transactions with the other USD transactions, I first need to calculate the USD value of the oranges, and for this I need to know the last price paid for an apple, i.e. last previous price, where product = apple.
Take this example from the attached workbook:
The last previous price paid for an apple was USD 5
The total USD price (volume) for 10 apples sold is: 10*50= USD50
Subsequently 3 oranges were exchanged for apples, at a rate of 4 apples per orange
The total USD price (volume) for 3 oranges is: 3x4x5= USD60, i.e. # number of oranges * ratio oranges to apples * last previous price for an apple
Total transaction volume = 50 + 60 = USD 110
There are a few more examples in this sample file:
https://docs.google.com/spreadsheets/d/1PTaKg9a3Yv1um2RTnpeYC4gdLVjQXEzl/edit?usp=sharing&ouid=106440602605717108817&rtpof=true&sd=true
What I am looking for is a DAX formula that gives me the last previous value with a condition or filter or where clause.
The following works as a calculated column, but is expensive, because it uses EARLIER:
=
VAR Conditional_Volume = IF(Transactions[product] = "orange_apple",
CALCULATE(
MAX(Transactions[price]),
ALL(Transactions),
(Transactions[product]="apple"),
Transactions[transaction_id] < EARLIER(Transactions[transaction_id])
)*Transactions[price]*Transactions[quantity],Transactions[price]*Transactions[quantity])
RETURN Conditional_Volume

Power Query (M language) 50 day moving Average

I have a list of products and would like to get a 50 day simple moving average of its volume using Power Query (M).
The table is sorted by product name and date. I add a custom column and applied the code below.
if [date] >= #date(2018,1,29)
then List.Average(List.Range(Source[Volume],[Volume]-1,-50))
else ""
Since it is already sorted by date and name, an if statement was applied with a date as criteria/filter. However, an error occurs that says
'Volume' column not found in the table.
I expect to have an added column in the power query with volume 50 day moving average per product. the calculation to be done if date is greater than or equal Jan 29, 2018.
We don't know what your columns are, but assuming you have [product], [date] and [volume] in Source, this would average the last 50 days of [volume] for the identical [product] based on each [date], and place in a new column
AvgAmountAdded = Table.AddColumn(Source, "AverageAmount", (i) => List.Average(Table.SelectRows(Source, each ([product] = i[product] and [date]<=i[date] and [date]>=Date.AddDays(i[date],-50)))[volume]), type number)
Finally! found a solution.
First, apply Index by product see this post for further details
Then index again without criteria (index all rows)
Then, apply below code
= Table.AddColumn(#"Previous Step", "Volume SMA(50)", each if [Index_byProduct] >= 50 then List.Average(List.Range(#"Previous Step"[Volume], ([Index_All]-50),50)) else 0),
For large dataset, Table.Buffer function is recommended after index-expand step to improve PQ calculation speed

MongoDB ranged pagination

It's said that using skip() for pagination in MongoDB collection with many records is slow and not recommended.
Ranged pagination (based on >_id comparsion) could be used
db.items.find({_id: {$gt: ObjectId('4f4a3ba2751e88780b000000')}});
It's good for displaying prev. & next buttons - but it's not very easy to implement when you want to display actual page numbers 1 ... 5 6 7 ... 124 - you need to pre-calculate from which "_id" each page starts.
So I have two questions:
1) When should I start worry about that? When there're "too many records" with noticeable slowdown for skip()? 1 000? 1 000 000?
2) What is the best approach to show links with actual page numbers when using ranged pagination?
Good question!
"How many is too many?" - that, of course, depends on your data size and performance requirements. I, personally, feel uncomfortable when I skip more than 500-1000 records.
The actual answer depends on your requirements. Here's what modern sites do (or, at least, some of them).
First, navbar looks like this:
1 2 3 ... 457
They get final page number from total record count and page size. Let's jump to page 3. That will involve some skipping from the first record. When results arrive, you know id of first record on page 3.
1 2 3 4 5 ... 457
Let's skip some more and go to page 5.
1 ... 3 4 5 6 7 ... 457
You get the idea. At each point you see first, last and current pages, and also two pages forward and backward from the current page.
Queries
var current_id; // id of first record on current page.
// go to page current+N
db.collection.find({_id: {$gte: current_id}}).
skip(N * page_size).
limit(page_size).
sort({_id: 1});
// go to page current-N
// note that due to the nature of skipping back,
// this query will get you records in reverse order
// (last records on the page being first in the resultset)
// You should reverse them in the app.
db.collection.find({_id: {$lt: current_id}}).
skip((N-1)*page_size).
limit(page_size).
sort({_id: -1});
It's hard to give a general answer because it depends a lot on what query (or queries) you are using to construct the set of results that are being displayed. If the results can be found using only the index and are presented in index order then db.dataset.find().limit().skip() can perform well even with a large number of skips. This is likely the easiest approach to code up. But even in that case, if you can cache page numbers and tie them to index values you can make it faster for the second and third person that wants to view page 71, for example.
In a very dynamic dataset where documents will be added and removed while someone else is paging through data, such caching will become out-of-date quickly and the limit and skip method may be the only one reliable enough to give good results.
I recently encounter the same problem when trying to paginate a request while using a field that wasn't unique, for example "FirstName". The idea of this query is to be able to implement pagination on a non-unique field without using skip()
The main problem here is being able to query for a field that is not unique "FirstName" because the following will happen:
$gt: {"FirstName": "Carlos"} -> this will skip all the records where first name is "Carlos"
$gte: {"FirstName": "Carlos"} -> will always return the same set of data
Therefore the solution I came up with was making the $match portion of the query unique by combining the targeted search field with a secondary field in order to make it a unique search.
Ascending order:
db.customers.aggregate([
{$match: { $or: [ {$and: [{'FirstName': 'Carlos'}, {'_id': {$gt: ObjectId("some-object-id")}}]}, {'FirstName': {$gt: 'Carlos'}}]}},
{$sort: {'FirstName': 1, '_id': 1}},
{$limit: 10}
])
Descending order:
db.customers.aggregate([
{$match: { $or: [ {$and: [{'FirstName': 'Carlos'}, {'_id': {$gt: ObjectId("some-object-id")}}]}, {'FirstName': {$lt: 'Carlos'}}]}},
{$sort: {'FirstName': -1, '_id': 1}},
{$limit: 10}
])
The $match part of this query is basically behaving as an if statement:
if firstName is "Carlos" then it needs to also be greater than this id
if firstName is not equal to "Carlos" then it needs to be greater than "Carlos"
Only problem is that you cannot navigate to an specific page number (it can probably be done with some code manipulation) but other than it solved my problem with pagination for non-unique fields without having to use skip which eats a lot of memory and processing power when getting to the end of whatever dataset you are querying for.

How do I select the max child value in ActiveRecord?

I'm not even sure how to word this, so an example:
I have two models,
Chicken
id
name
EggCounterReadings
id
chicken_id
value_on_counter
timestamp
I don't always record a count for every chicken when I do counts.
Using ActiveRecord how do I get the latest egg count per chicken?
So if I have 1 chicken and 3 counts, the counts would be 1 today, 15 tomorrow, and 18 the next day. That chicken has laid 18 eggs, not 34
UPDATE: Found exactly what I was trying to do in MySQL. Find "The Rows Holding the Group-wise Maximum of a Certain Column". So I need to .find_by_sql("SELECT * FROM (SELECT * FROM EggCounterReadings WHERE <conditions> ORDER BY timestamp DESC) GROUP BY chicken_id")
Given your updated question, I've changed my answer.
chicken = Chicken.first
count = chicken.egg_counter_readings.last.value_on_counter
If you don't want the latest record, but the largest egg yield, then try this:
chicken = Chicken.first
count = chicken.egg_counter_readings.maximum(value_on_counter)
I believe that should do what you want.

Resources