Adding sort priority to documents matching certain condition - elasticsearch

I am looking for a way to do this. I need to show all experts inside the users mapping. (Experts are documents with field role equals 3). But while showing the experts, I need to show experts having "Linkedin" inside their social medias (social_medias is an array field in the users mapping) first and those without "Army" afterwards. For ex:, I have 5 documents:
[
{
role: 3,
name: "David",
social_medias: ["Twitter", "Facebook"]
},
{
role: 3,
name: "James",
social_medias: ["Facebook", "Linkedin"]
},
{
role: 3,
name: "Michael",
social_medias: ["Linkedin", "Facebook"]
},
{
role: 3,
name: "Peter",
social_medias: ["Facebook"]
},
{
role: 3,
name: "John",
social_medias: ["Facebook", "Twitter"]
},
{
role: 2,
name: "Babu",
social_medias: ["Linkedin", "Facebook"]
}
]
So, I want to get documents with role 3 and while fetching it, documents having "Linkedin" in social media should come first. So, the output after query should be in this order:
[
{
role: 3,
name: "James",
social_medias: ["Facebook", "Linkedin"]
},
{
role: 3,
name: "Michael",
social_medias: ["Linkedin", "Facebook"]
},
{
role: 3,
name: "David",
social_medias: ["Twitter", "Facebook"]
},
{
role: 3,
name: "Peter",
social_medias: ["Facebook"]
},
{
role: 3,
name: "John",
social_medias: ["Facebook", "Twitter"]
}
]
I am trying with function_score now. I can specify a column to have more priority in function_score, but cant figure out how to specify condition based priority.

Why not let the default sorting in ES (sort by score) do the job for you, without custom ordering or custom scoring:
GET /my_index/media/_search
{
"query": {
"filtered": {
"query": {
"bool": {
"should": [
{"match": {"social_medias": "Linkedin"}},
{"match_all": {}},
{"query_string": {
"default_field": "social_medias",
"query": "NOT Army"
}}
]
}
},
"filter": {
"term": {
"role": "3"
}
}
}
}
}
The query above filters for "role":"3" and then in a should clause it basically says: if the documents match social_medias field with value Linkedin then give them a score based on this matching. To, also, include all others documents that don't match Linkedin, add another should for match_all. Now, everything that matches match_all gets a score. If those documents, also, match Linkedin then they get an additional score, thus making them score higher and be first in the list of results.

Related

Get only last version (custom field) of document when executing a search

I am using the Java API for elasticsearch and I am trying to get only the last version (which is a custom field) of each document when executing a search.
For example :
{ id: 1, name: "John Greenwood", version: 1}
{ id: 1, name: "John Greenwood", version: 2}
{ id: 2, name: "John Underwood", version: 1}
While searching with Jhon, I want this result :
{ id: 1, name: "John Greenwood", follower_count: 2}
{ id: 2, name: "John Underwood", follower_count: 1}
Apparently I am supposed to use aggregation, but Im not sure how to use them with the Java API.
Also, how can I regroup the documents with the ID also ? Because I only want the latest version for the same ID
Tldr;
Yes, you are on the right track.
You will want to aggregate on the id of each user. The get the top_hit per regard to the version.
Solution
The first aggregation per_id is grouping user by their id, then inside this aggregation we perform another one.
lastest_version that is going to select the best hit with regards to the version. I select the size: 1 to get a top 1 per group.
GET 74550367/_search
{
"query": {
"match_all": {}
},
"aggs": {
"per_id": {
"terms": {
"field": "id"
},
"aggs": {
"lastest_version": {
"top_hits": {
"sort": [
{
"version": {
"order": "desc"
}
}
],
"size": 1
}
}
}
}
}
}
To Reproduce
POST _bulk
{ "index": {"_index":"74550367"}}
{ "id": 1, "name": "John Greenwood", "version": 1}
{ "index": {"_index":"74550367"}}
{ "id": 1, "name": "John Greenwood", "version": 2}
{ "index": {"_index":"74550367"}}
{ "id": 2, "name": "John Underwood", "version": 1}

Strapi GraphQL query: "start" argument wouldn't work

I am running into a very strange problem with my queries in Strapi (version 3.0.0-alpha.26.2). I have a users collection with 3 documents that I'm trying to fetch via GraphQL. To fetch all users the query is:
users {
firstName
}
This returns the following:
{
"data": {
"users": [
{
"firstName": "Arnold"
},
{
"firstName": "Bill"
},
{
"firstName": "Vin"
}
]
}
}
3 names. Now, say, I wished to retrieve only the first 2 users. For such pagination use-cases, there's two arguments one could pass in a Strapi query: start (defines the index to start at) and limit (defines the number of elements to return). So now the query would be:
users(start: 0, limit: 2) {
firstName
}
This returns the first two names as expected:
{
"data": {
"users": [
{
"firstName": "Arnold"
},
{
"firstName": "Bill"
}
]
}
}
But what if I want the last 2 users here, i.e. Bill and Vin? Should be as straightforward as:
users(start: 1, limit: 2){
firstName
}
But this still returns Arnold and Bill, while you'd expect the following:
{
"data": {
"users": [
{
"firstName": "Bill"
},
{
"firstName": "Vin"
}
]
}
}
No matter what value I use for start, it always starts at the 0th item. You could do start: 200 (when there are only 3 items in the users collection) and it'd still return the exact same result! What sorcery is this??
The issue can be reproduced at https://dev.schandillia.com/graphql.

Elasticsearch to return documents based on 2 criteria where one is based on the other

I have documents in the following format:
{
"id": number
"chefId: number
"name": String,
"ingredients": List<String>,
"isSpecial": boolean
}
Here is a list of 5 documents:
{
"id": 1,
"chefId": 1,
"name": "Roasted Potatoes",
"ingredients": ["Potato", "Onion", "Oil", "Salt"],
"isSpecial": false
},
{
"id": 2,
"chefId": 1,
"name": "Dauphinoise potatoes",
"ingredients": ["Potato", "Garlic", "Cream", "Salt"],
"isSpecial": true
},
{
"id": 3,
"chefId": 2,
"name": "Boiled Potatoes",
"ingredients": ["Potato", "Salt"],
"isSpecial": true
},
{
"id": 4,
"chefId": 3
"name": "Mashed Potatoes",
"ingredients": ["Potato", "Butter", "Milk"],
"isSpecial": false
},
{
"id": 5,
"chefId": 4
"name": "Hash Browns",
"ingredients": ["Potato", "Onion", "Egg"],
"isSpecial": false
}
I will be doing a search where "Potatoes" is contained in the name field. Like this:
{
"query": {
"wildcard": {
"status": {
"value": "*Potatoes*"
}
}
}
}
But I also want to add some extra criteria when returning documents:
If the ingredients contain onion or milk, then return the documents. So documents with the id 1 and 4 will be returned. Note that this means that we have documents returned where chef ids are 1 and 3.
Then, for the documents where we haven't already got another document with the same chef id, return where the isSpecial flag is set to true. So only document 3 will be returned. 2 wouldn't be returned as we already have a document where the chef id is equal to one.
Is it possible to do this kind of chaining in Elasticsearch? I would like to be able to do this in a single query so that I can avoid adding logic to my (Java) code.
You can't have that sort of logic in one elasticsearch query. You could have a tricky query with aggregations / post_filter and so to have all the data you need in one query and then transform it in your Java application.
But the best approach (and the more maintainable) is to have two queries.

Adding additional fields to ElasticSearch terms aggregation

Indexed documents are like:
{
id: 1,
title: 'Blah',
...
platform: {id: 84, url: 'http://facebook.com', title: 'Facebook'}
...
}
What I want is count and output stats-by-platform.
For counting, I can use terms aggregation with platform.id as a field to count:
aggs: {
platforms: {
terms: {field: 'platform.id'}
}
}
This way I receive stats as a multiple buckets looking like {key: 8, doc_count: 162511}, as expected.
Now, can I somehow add to those buckets also platform.name and platform.url (for pretty output of stats)? The best I've came with looks like:
aggs: {
platforms: {
terms: {field: 'platform.id'},
aggs: {
name: {terms: {field: 'platform.name'}},
url: {terms: {field: 'platform.url'}}
}
}
}
Which, in fact, works, and returns pretty complicated structure in each bucket:
{key: 7,
doc_count: 528568,
url:
{doc_count_error_upper_bound: 0,
sum_other_doc_count: 0,
buckets: [{key: "http://facebook.com", doc_count: 528568}]},
name:
{doc_count_error_upper_bound: 0,
sum_other_doc_count: 0,
buckets: [{key: "Facebook", doc_count: 528568}]}},
Of course, name and url of platform could be extracted from this structure (like bucket.url.buckets.first.key), but is there more clean and simple way to do the task?
It seems the best way to show intentions is top hits aggregation: "from each aggregated group select only one document", and then extract platform from it:
aggs: {
platforms: {
terms: {field: 'platform.id'},
aggs: {
platform: {top_hits: {size: 1, _source: {include: ['platform']}}}
}
}
This way, each bucked will look like:
{"key": 7,
"doc_count": 529939,
"platform": {
"hits": {
"hits": [{
"_source": {
"platform":
{"id": 7, "name": "Facebook", "url": "http://facebook.com"}
}
}]
}
},
}
Which is kinda too deeep (as usual with ES), but clean: bucket.platform.hits.hits.first._source.platform
If you don't necessarily need to get the value of platform.id, you could get away with a single aggregation instead using a script that concatenates the two fields name and url:
aggs: {
platforms: {
terms: {script: 'doc["platform.name"].value + "," + doc["platform.url"].value'}
}
}

Filter Elasticsearch

How do I filter fields within an array that is inside of an object?
Sample:
{
_index: "consult",
_type: "user",
_id: "TlgRL71xRyq-0guJTGA9WQ",
_score: 1,
_source: {
token: "1113",
userlist: [
{
id: "1",
nome: "Mark"
},
{
id: "2",
nome: "Joe"
}
]
}
}
You can access the object's properties by using a fully qualified path (e.g. "dot notation"). For example, here is a Term filter looking for a specific id value:
{
"query": {
"filtered": {
"filter": {
"term": {
"userlist.id": 1
}
}
}
}
}

Resources