Custom scoring in ElasticSearch - elasticsearch

How do i use the following function? (For elastica in PHP with respect to Function Score query)
addScriptScoreFunction($script, $filter)
Does the filter filter out results or only score based on the script for those that pass the filter? How efficient is the scoring?
Also can i add more than one script score function to function score query?

$keyword = 'foo';
$fiels = 'name';
$inner_query = new Elastica\Query\Match();
$inner_query->setFieldQuery($field, $keyword);
// Wrap the function_score around the initial query
$scorefunction = new Elastica\Query\FunctionScore();
$scorefunction->setQuery($inner_query);
$scorefunction->setBoostMode('replace'); // Otherwise it will be multiplied with _score
// Make the custom score function: boost max 20% of initial _score, depending on popularity
$script = new Elastica\Script("_score + (doc['popularity'].value * 0.2 * _score)/100");
$scorefunction->addScriptScoreFunction($script);
// Last step: put that all in Elastica\Query and execute with Elastica\Search
There are some possible pitfalls:
without ->setBoostMode('replace'); the original _score will be multiplied with the result of the script. As in my case the addition was desired, therefore 'replace'.
It seems that divisions are rounded down. As the popularity that I used in my formula is allways between 1 and 100, thus popularity/100 alone was allways rounded down to 0 and the formula seemed to have no effect.

Related

Alternative to PowerBI FILTER() function in Tableau calculated field

I am trying to recreate the following calculation from PowerBI in Tableau but I am not sure how to achieve what FILTER() function does.
(CALCULATE(SUM(Data[Amount]),FILTER(Data,Data[Paid]="True"))/SUM(Data[Amount]) +
CALCULATE(COUNT(Data[Document ID]),FILTER(Data,Data[Paid]="True"))/COUNT(Data[Document ID]))/2
My assumption was to use IF or CASE but then I couldn't figure out how to divide by number of all documents, regardless of their payment status.
I suppose you need the following:
SUM(IF [paid] = "True" then [Amount] end)/SUM([Amount])
+
COUNT(IF [paid] = "True" then [Document ID] end)/COUNT([Document ID])/2
The filter is specified using the if statement.

Result number for Boolean queries with Apache Lucene

When benchmarking Apache Lucene v7.5 I noticed a strange behavior:
I indexed the English Wikipedia dump (5,677,776 docs) using Lucene with the SimpleAnalyzer (No stopwords, no stemming)
Then I searched the index with the following queries:
the totalHits=5,382,873
who totalHits=1,687,254
the who totalHits=5,411,305
"the who" totalHits=8,827
The result number for the Boolean query the who is both larger than the result number for the single term the and the result number for the single term who, when it should be smaller than both.
Is there an explanation for that?
Code snippet:
analyzer = new SimpleAnalyzer();
MultiFieldQueryParser parser = new MultiFieldQueryParser(new String[]{"title", "content","domain","url"},analyzer);
// Parse
Query q = parser.parse(querystr);
// top-10 results
int hitsPerPage = 10;
IndexReader indexReader = DirectoryReader.open(index);
IndexSearcher searcher = new IndexSearcher(indexReader);
// Ranker
TopScoreDocCollector collector = TopScoreDocCollector.create(hitsPerPage);
// Search
searcher.search(q, collector);
// Retrieve the top-10 documents
TopDocs topDocs=collector.topDocs();
ScoreDoc[] hits = topDocs.scoreDocs;
totalHits=topDocs.totalHits;
System.out.println("query: "+querystr + " " + hits.length+" "+String.format("%,d",totalHits));
The explanation is that the default operator is OR and not AND as you assume. Searching for the who returns documents that have either the or who or both.
the - 5,382,873
who - 1,687,254
the OR who - 5,411,305
I.e. most documents that contain who also contains the, except for 28 432 documents which are added to the result set when you retrieve both.
You can change this behavior by changing the default operator:
parser.setDefaultOperator(QueryParserBase.AND_OPERATOR)

Azure Search Scoring Profile Magnitude by Downloads

I am new to Azure Search so I just want to run this by before I try to implement it. We have a search setup on items and we want to score/rank the results based on its initial score and how many times the item has been used/downloaded. We want the items downloaded the most to appear at the top of the result list.
We have a separate field in the search index that contains the used/download count (itemCount).
I know I have to set up a Magnitude profile but I am not sure what to use for the range as the itemCount can contain 0 - N So do I just set the range to be some large number i.e. 100,000,000 or what is the best practice?
var functionRankByDownload = new MagnitudeFunction()
{
Boost = 1000,
BoostingRangeStart = 0,
BoostingRangeEnd = 100000000,
ConstantBoostBeyondRange = true,
FieldName = "itemCount",
Interpolation = InterpolationTypes.Linear
};
scoringProfile1.Functions = new List() { functionRankByDownload };
I found the score calculation is as follows:
((initialScore * boost * itemCount) - min) / (max-min)
So it seems like it should work ok having a large value for the max but again just wanting to know the best practice.
Thanks!
That seems reasonable. The BoostingRangeEnd can be any reasonable bound to your range depending on the scenario. Since, you are using ConstantBoostBeyondRange, it would also take care of boosting values outside ranges appropriately.
You might also want to experiment with the boost value for a large range like this and see if a bigger boost value is more helpful for your scenario.

Laravel sum on ->first() sums more than 1 result

I have the following eloquent query
$raw = Model::select('out', 'in')->orderBy('created_at', 'DESC')->first();
That returns a collection of a single item, where Out = 0.0 and In = 90.0.
If I then do this:
$sumO = $raw->sum('out');
$sumI = $raw->sum('in');
I get $sumO = 13,651.41 and $sumI = 13371.69
I don't understand, because those sums don't even equal the sum of my entire table for those colums.
But it seems like->sum() is being called on the entire table/query instead of just the first result like I thought it would.
Now, I know sum of a single row is weird, and I'm not actually doing this in production. I just want to know what it is doing.
Shouldn't it still just sum the 1 number to equal itself?
It's just one row when using ->first(), so there's no need to use ->sum() just use $raw->in and $raw->out.
Also, ->sum() used with a single column at a time.

How can we manually manipulate score field in Elasticsearch

I am working on a current scenario where there is a need to boost few documents in case if there is a particular text search.
The scenario is, I have a set of documents where I have to do the term query based on a particular keyword , but the catch here is. Let's say we search for a keyword test it will fetch 100 records but the requirement says that few docs should always come as top result, irrespective of there weightage and other criteria. How can we achieve this is Elasticsearch, any suggestion and ideas are most welcome.
You can control relevance with scripts. Take a look at:
https://www.elastic.co/guide/en/elasticsearch/guide/current/script-score.html
This is an example using Groovy:
price = doc['price'].value
margin = doc['margin'].value
if (price < threshold) {
return price * margin / target
}
return price * (1 - discount) * margin / target
So, in pseudo-code it would be something like:
if (word == 'test') {
return score * n
}
return score

Resources