ElasticSearch - how to SUM function_score by field - elasticsearch

I have a query that is using function_score to rank the results. Here is a sample of what is returned:
{
"_index": "clone",
"_type": "authEvent",
"_id": "6431823",
"_score": 4.8,
"fields": {
"authInput.uID": "MPXWDKW2P",
"authResult.productValue": 1,
"authInput.userName": "F936F3AA-E26C-48DB-BDBC-44956B634260",
"authResult.authEventDate": "2014-02-27T09:29:30.703125-06:00",
"authResult.rulesFailed": [
"AuthCountByUser"
]
}
}
What I want to is take the results and run the equivalent of this SQL statement:
SELECT TOP 20 "authInput.userName", SUM("_score")
FROM foo
GROUP BY "authInput.userName"
ORDER BY SUM("_score") DESC
How can I do this with ES?
NOTE: I'm using ES 0.9x, we will be moving to 1.0.0 soon but we have not yet.

Use a facet query to get the total of the amount returned in the query where the facet contains the field where you need the count

Related

Elasticsearch: Multiple partial words not scored high enough

so I'm trying to get good search results out of an Elasticsearch installation.
But I run into problems when I'm trying to make a fuzzy search on some very simple data.
Somehow multiple (some of them partial) words are scored too low and only get scored higher, when more letters of the word are present in the search query.
Let me explain:
I have a simple index built with two simple documents.
{
"name": "Product with good qualities and awesome sound system"
},
{
"name": "Another Product that has better acustics than the other one"
}
Now I query the index with this parameters:
{
"query": {
"multi_match": {
"fields": ["name"],
"query": "product acust",
"fuzziness": "auto"
}
}
}
And the results look like this:
"hits": [
{
"_index": "test_products",
"_type": "_doc",
"_id": "1",
"_score": 0.19100355,
"_source": {
"name": "Product with good qualities and awesome sound system"
}
},
{
"_index": "test_products",
"_type": "_doc",
"_id": "2",
"_score": 0.17439455,
"_source": {
"name": "Another Product that has better acustics than the other one"
}
}
]
As you can see the product with the ID 2 is scored less than the other product even though it has possibly more similarity with the given query string than the other product because it has 1 full word match and 1 partial word match.
When the query would looke like "product acusti" the results would start to behave correctly.
I've already fiddled around with bool search but the results are identical.
Any ideas how I can get the wanted results back faster than having to have almost the whole second word typed in?
As far as I know, Elasticsearch does not do partial word matching by default, so the term acust is not matched in neither of your documents.
The reason you are getting a higher score in the first document is that your matched term, product, appears in a shorter sentence:
Product with good qualities and awesome sound system
But as for the second document, product appears in a longer sentence:
Another Product that has better acoustics than the other one
So your second document is getting a lower score because the ratio of your match term (product) to the number of terms in the sentence is lower.
In other words in has lower Field length normalization:
norm = 1/sqrt(numFieldTerms)
Now if you you want to be able to do partial prefix matching, you need to tokenize your term into ngrams, for example you can create the following ngrams for the term "acoustics":
"ac", "aco", "acou", "acous", "acoust", "acousti", "acoustic", "acoustics"
You have 2 options to achieve this, see the answer by Russ Cam on this question
use Analyze API
with an analyzer that will tokenize the field into tokens/terms from
which you would want to partial prefix match, and index this
collection as the input to the completion field. The Standard analyzer
may be a good one to start with...
Don't use the Completion Suggester here and instead set up your field (name) as a text datatype with
multi-fields
that include the different ways that name should be analyzed (or not
analyzed, with a keyword sub field for example). Spend some time with the Analyze API to build an analyzer that will
allow for partial prefix of terms anywhere in the name. As a start,
something like the Standard tokenizer, Lowercase token filter,
Edgengram token filter and possibly Stop token filter would get you
running...
You may also find this guide helpful.

Create a keyword field concatenated of other fields

I've got an index with a mapping of 3 fields. Let's say f1, f2 and f3.
I want a new keyword field with the concatenation of the values of f1, f2 and f3 to be able to aggregate by it to avoid having lots of nested loops when checking the search results.
I've seen that this could be achieved by source transformation, but since elastic v5, this feature was deleted.
ElasticSearch version used: 6.5
Q: How can I archieve the concatenation in ElasticSearch v 6.5?
There was indeed source transformation prior to ES 5, but as of ES 5 there is now a more powerful feature called ingest nodes which will allow you to easily achieve what you need:
First, define an ingest pipeline using a set processor that will help you concatenate three fields into one:
PUT _ingest/pipeline/concat
{
"processors": [
{
"set": {
"field": "field4",
"value": "{{field1}} {{field2}} {{field3}}"
}
}
]
}
You can then index a document using that pipeline:
PUT index/doc/1?pipeline=concat
{
"field1": "1",
"field2": "2",
"field3": "3"
}
And the indexed document will look like:
{
"field1": "1",
"field2": "2",
"field3": "3",
"field4": "1 2 3"
}
Just make sure to create the index with the appropriate mapping for field4 prior to indexing the first document.

Why return error result when use _routing field?

Such as url index-0/_search?routing=24320,i search data from 24230 routing,but the result is
"_index": "index-0",
"_type": "member",
"_id": "40865630",
"_score": 1,
"_routing": "22500",
Why 22500 match the search condition?
What happens is that when specifying ?routing=24320 in your search query, you're basically selecting the single shard on which documents with the routing value of 24320 have been stored.
Now, since your query doesn't specify any other constraints, you're basically getting all documents stored on that shard, which obviously means that you also get documents whose routing value is 22500 (and probably others, too).

How to plot aggregated data in kibana

I'm a newbie to kibana.
I have following data stored in ES:
{
"_index": "test",
"_type": "impressions",
"_id": "AVZ4QLgkLqvQLIzbvF4e",
"_version": 1,
"_score": 1,
"_source": {
"campaign_id": "1011",
"count": 691,
"played_dt": "2016-01-02"
}
}
So, basically I have counts per campaign_id which is already aggregated data.
I want a simple bar chart which plots counts per campaign_id where X axis is campaign_id and Y axis is it's count.
I'm getting hits for that specific campaign_id as unique count rather than the actual value in count field.
Thanks in advance!
Go to "Visualize" tab, select "Vertical bar chart":
Choose new search and select appropriate index. Now you probably want to visualize your data in time. So, on X axis use "Date histogram" and select your time filed (played_dt).
Now you can use e.g. "Split bars", use splitting by terms and select campaign_id field.

Retrieve all docs using top_hits aggregation ElasticSearch

I am using top_hits aggregation to retrieve documents along with counts.I need to retrieve all the document based on my earlier post here, for which thought passing size 0 will do it but it throws following error.
org.elasticsearch.search.query.QueryPhaseExecutionException: [my-demo][3]: query[ConstantScore(*:*)],from[0],size[10]: Query Failed [Failed to execute main query]
at org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:162)
at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:261)
at org.elasticsearch.search.action.SearchServiceTransportAction$5.call(SearchServiceTransportAction.java:206)
at org.elasticsearch.search.action.SearchServiceTransportAction$5.call(SearchServiceTransportAction.java:203)
at org.elasticsearch.search.action.SearchServiceTransportAction$23.run(SearchServiceTransportAction.java:517)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:744)
Caused by: java.lang.IllegalArgumentException: numHits must be > 0; please use TotalHitCountCollector if you just need the total hit count
at org.apache.lucene.search.TopScoreDocCollector.create(TopScoreDocCollector.java:254)
at org.apache.lucene.search.TopScoreDocCollector.create(TopScoreDocCollector.java:238)
at org.elasticsearch.search.aggregations.metrics.tophits.TopHitsAggregator.collect(TopHitsAggregator.java:108)
at org.elasticsearch.search.aggregations.bucket.BucketsAggregator.collectBucketNoCounts(BucketsAggregator.java:74)
at org.elasticsearch.search.aggregations.bucket.BucketsAggregator.collectExistingBucket(BucketsAggregator.java:63)
at org.elasticsearch.search.aggregations.bucket.BucketsAggregator.collectBucket(BucketsAggregator.java:55)
at org.elasticsearch.search.aggregations.bucket.terms.GlobalOrdinalsStringTermsAggregator$WithHash.collect(GlobalOrdinalsStringTermsAggregator.java:236)
at org.elasticsearch.search.aggregations.AggregatorFactories$1.collect(AggregatorFactories.java:114)
at org.elasticsearch.search.aggregations.BucketCollector$2.collect(BucketCollector.java:81)
at org.elasticsearch.search.aggregations.bucket.BucketsAggregator.collectBucketNoCounts(BucketsAggregator.java:74)
According to elasticseach, size - The maximum number of top matching hits to return per bucket. By default the top three matching hits are returned. so, size = 0 means no documents (i think) so try sending maximum values. – progrrammer 31 mins ago
Top hit aggregation response is of format,
"top_tags_hits": {
"hits": {
"total": 25365,
"max_score": 1,
"hits": [
{
"_index": "stack",
"_type": "question",
"_id": "602679",
"_score": 1,
"_source": {
"title": "Windows port opening"
},
"sort": [
1370143231177
]
}
]
}
here hits - > total give total no of hits, you can use pagination(from, and size) as in search api, to get documents or use maximum integer value [(2^31)-1] to get all the documents.
Hope this helps.

Resources