Say I am creating a search engine for a photo sharing social network and the documents of the site have the following schema
{
"id": 123456
"name": "Foo",
"num_followers": 123456,
"num_photos": 123456
}
I would like my search results to satisfy the following requirements:
Only have results where the search query strings matches the "name" field in the document
Rank the search results by number of followers descending
In the case where multiple customers have the same number of followers, rank by number of photos descending
For example, say I have the following documents in my index:
{
"id": 1,
"name": "Customer",
"num_followers": 3,
"num_photos": 27
}
{
"id": 2,
"name": "Customer",
"num_followers": 25,
"num_photos": 1
}
{
"id": 3,
"name": "Customer",
"num_followers": 8,
"num_photos": 2
}
{
"id": 4,
"name": "Customer",
"num_followers": 8,
"num_photos": 5
}
{
"id": 5,
"name": "FooBar",
"num_followers": 10000,
"num_photos": 20000
}
If I search "Customer" in the search bar of the site, the ES hits should be in the following order:
{
"id": 2,
"name": "Customer",
"num_followers": 25,
"num_photos": 1
}
{
"id": 4,
"name": "Customer",
"num_followers": 8,
"num_photos": 5
}
{
"id": 3,
"name": "Customer",
"num_followers": 8,
"num_photos": 2
}
{
"id": 1,
"name": "Customer",
"num_followers": 3,
"num_photos": 27
}
I'm assuming I will need to perform some sort of compact query to create this "tiebreaker" logic. What clauses should I be using? If anyone had an example of something similar that would be amazing. Thanks in advance.
This sounds like a pretty standard sorting use case. Elasticsearch can sort on multiple fields in a predefined priority order. See documentation here.
GET /my_index/_search
{
"sort" : [
{ "num_followers" : {"order" : "desc"}},
{ "num_photos" : "desc" }
],
"query" : {
"term" : { "name" : "Customer" }
}
}
Obviously this is just a simple term query -- you may want that to be a keyword search instead based on the wording of your question.
Related
It is possible to make a search by the results of another search?. For example:
// index: A
{ "ID": 1, "status": "done" }
{ "ID": 2, "status": "processing" }
{ "ID": 3, "status": "done" }
{ "ID": 4, "status": "done" }
// index: B
{ "ID": 1, "user": 1, "value": 10 }
{ "ID": 1, "user": 2, "value": 3 }
{ "ID": 2, "user": 1,"value": 1 }
{ "ID": 3, "user": 1, "value": 3 }
{ "ID": 4, "user": 1, "value": 7 }
Q1: Search in index "A" status == "done" and return the ID
RES: 1,3,4
Q2: From the results in Q1 search value > 5 and return the ID
RES: 1,4
My current solution is use two queries and download the results of "Q1" and make a second search in "Q2" but is very complicated because have 30k of results.
the problem to me seems to be more of a traditional union of filters in 2 indexes sort of a join , what we have in relational databases , not sure of the exact solution but recently had used a plug-in for the joins -> https://siren.io/siren-federate-20-0-introducing-a-scalable-inner-join-for-elasticsearch/ this might help
The query statement which need to be executed is
dslContext.select(
jsonObject(
key("id").value(ENTITY.ID),
key("name").value(ENTITY.NAME),
key("attributes").value(
coalesce(
select(
jsonArrayAgg(
jsonObject(
key("id").value(ATTRIBUTE.ID),
key("name").value(ATTRIBUTE.NAME),
key("indexValue").value(ATTRIBUTE.INDEX_VALUE)
)
)
).from(ATTRIBUTE)
.where(ATTRIBUTE.ENTITY_ID.eq(ENTITY.ID))
.orderBy(ATTRIBUTE.INDEX_VALUE.asc()),
jsonArray()
)
)
)
).from(ENTITY).fetchInto(EntityDto.class)
Response for the above query:
[
{
"id": 2,
"name": "Address",
"attributes": [
{
"id": 3,
"name": "Pincode",
"indexValue": 4
},
{
"id": 4,
"name": "Country",
"indexValue": 3
},
{
"id": 5,
"name": "City",
"indexValue": 2
},
{
"id": 6,
"name": "Address",
"indexValue": 1
}
]
}
]
The attributes are not sorting in ascending order with respect to indexValue.
How to make the attributes sort in the ascending order?
Use the ORDER BY clause on JSON_ARRAYAGG:
jsonArrayAgg(...).orderBy(...)
I want to get all the distinct records as per "departmentNo" .
Please check the below Index Data : (it is dummy data.)
{'departmentNo': 1, 'departmentName': 'Food', 'departmentLoc': "I1", "departmentScore": "5", "employeeid" : 1, "employeeName": "vijay", ...}
{'departmentNo': 1, 'departmentName': 'Food', 'departmentLoc': "I1", "departmentScore": "5", "employeeid" : 2, "employeeName": "rathod", ...}
{'departmentNo': 2, 'departmentName': 'Non-Food', 'departmentLoc': "I2", "departmentScore": "6", "employeeid" : 3, "employeeName": "ajay", ...}
{'departmentNo': 2, 'departmentName': 'Non-Food', 'departmentLoc': "I2", "departmentScore": "6", "employeeid" : 4, "employeeName": "kamal", ...}
{'departmentNo': 1, 'departmentName': 'Food', 'departmentLoc': "I1", "departmentScore": "5", "employeeid" : 5, "employeeName": "rahul", ...}
I want the below output.
{'departmentNo': 1, 'departmentName': 'Food', 'departmentLoc': "I1", "departmentScore": "5", "employeeid" : 1, "employeeName": "vijay", ...}
{'departmentNo': 2, 'departmentName': 'Non-Food', 'departmentLoc': "I2", "departmentScore": "6", "employeeid" : 3, "employeeName": "ajay", ...}
I was trying to get data in hits section. But didn't found the answer.
So I tried with aggeration. Used below query
{
"size": 0,
"aggs": {
"Group_By_Dept": {
"terms": {
"field": "departmentNo"
},
"aggs": {
"group_docs": {
"top_hits": {
"size": 1
}
}
}
}
}
}
I got the data by the above query. But I want all the distinct data and they should support pagination + sorting.
In elastic 6.0 we could use bucket_sort , but I am using 5.6.7.So I can't use bucket_sort.
So Can I do it in any other way.?
If I could get data in hits's section then it will be good.
(I don't want to change my index mapping. Actually here i have added dummy mapping. but usecase is same.)
You can do that by using field collapsing:
{
"query": { ... },
"from": 153,
"size": 27,
"collapse": {
"field": "departmentNo"
}
}
This will leave only one document for each repeating value in such field. You can control which document it would be using standard sort (i.e. document with highest sort value among collapsed would be returned).
Please note that there is additional functionality called inner hits, which you may want to use in the future - be aware that it multiplies document fetches and negatively affects performance.
I have documents in the following format:
{
"id": number
"chefId: number
"name": String,
"ingredients": List<String>,
"isSpecial": boolean
}
Here is a list of 5 documents:
{
"id": 1,
"chefId": 1,
"name": "Roasted Potatoes",
"ingredients": ["Potato", "Onion", "Oil", "Salt"],
"isSpecial": false
},
{
"id": 2,
"chefId": 1,
"name": "Dauphinoise potatoes",
"ingredients": ["Potato", "Garlic", "Cream", "Salt"],
"isSpecial": true
},
{
"id": 3,
"chefId": 2,
"name": "Boiled Potatoes",
"ingredients": ["Potato", "Salt"],
"isSpecial": true
},
{
"id": 4,
"chefId": 3
"name": "Mashed Potatoes",
"ingredients": ["Potato", "Butter", "Milk"],
"isSpecial": false
},
{
"id": 5,
"chefId": 4
"name": "Hash Browns",
"ingredients": ["Potato", "Onion", "Egg"],
"isSpecial": false
}
I will be doing a search where "Potatoes" is contained in the name field. Like this:
{
"query": {
"wildcard": {
"status": {
"value": "*Potatoes*"
}
}
}
}
But I also want to add some extra criteria when returning documents:
If the ingredients contain onion or milk, then return the documents. So documents with the id 1 and 4 will be returned. Note that this means that we have documents returned where chef ids are 1 and 3.
Then, for the documents where we haven't already got another document with the same chef id, return where the isSpecial flag is set to true. So only document 3 will be returned. 2 wouldn't be returned as we already have a document where the chef id is equal to one.
Is it possible to do this kind of chaining in Elasticsearch? I would like to be able to do this in a single query so that I can avoid adding logic to my (Java) code.
You can't have that sort of logic in one elasticsearch query. You could have a tricky query with aggregations / post_filter and so to have all the data you need in one query and then transform it in your Java application.
But the best approach (and the more maintainable) is to have two queries.
I've started the process of learning ElasticSearch and I was wondering if somebody could help me shortcut the process by providing some examples of how I would a build couple of queries.
Here's my example schema...
PUT /sales/_mapping
{
"sale": {
"properties": {
"productCode: {"type":"string"},
"productTitle": {"type": "string"},
"quantity" : {"type": "integer"},
"unitPrice" : {"type": double}
}
}
}
POST /sales/1
{"productCode": "A", "productTitle": "Widget", "quantity" : 5, "unitPrice":
5.50}
POST /sales/2
{"productCode": "B", "productTitle": "Gizmo", "quantity" : 10, "unitPrice": 1.10}
POST /sales/3
{"productCode": "C", "productTitle": "Spanner", "quantity" : 5, "unitPrice":
9.00}
POST /sales/4
{"productCode": "A", "productTitle": "Widget", "quantity" : 15, "unitPrice":
5.40}
POST /sales/5
{"productCode": "B", "productTitle": "Gizmo", "quantity" : 20, "unitPrice":
1.00}
POST /sales/6
{"productCode": "B", "productTitle": "Gizmo", "quantity" : 30, "unitPrice":
0.90}
POST /sales/7
{"productCode": "B", "productTitle": "Gizmo", "quantity" : 40, "unitPrice":
0.80}
POST /sales/8
{"productCode": "C", "productTitle": "Spanner", "quantity" : 100,
"unitPrice": 7.50}
POST /sales/9
{"productCode": "C", "productTitle": "Spanner", "quantity" : 200,
"unitPrice": 5.50}
What query would I need to generate the following results?
a). Show the show the number of documents grouped by product code
Product code Title Count
A Widget 2
B Gizmo 4
C Spanner 3
b). Show the total units sold by product code, i.e.
Product code Title Total units sold
A Widget 20
B Gizmo 100
C Spanner 305
TIA
You can accomplish that using aggregations, in particular Terms Aggregations. And it can be done in just one run, by including them within your query structure; in order to instruct ES to generate analytic data based in aggregations, you need to include the aggregations object (or aggs), and specify within it the type of aggregations you would like ES to run upon your data.
{
"query": {
"match_all": {}
},
"aggs": {
"group_by_product": {
"terms": {
"field": "productCode"
},
"aggs": {
"units_sold": {
"sum": {
"field": "quantity"
}
}
}
}
}
}
By running that query, besides the resulting hits from your search (in this case we are doing a match all), and additional object will be included, within the response object, holding the corresponding resulting aggregations. For example
{
...
"hits": {
"total": 6,
"max_score": 1,
"hits": [ ... ]
},
"aggregations": {
"group_by_product": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "b",
"doc_count": 3,
"units_sold": {
"value": 60
}
},
{
"key": "a",
"doc_count": 2,
"units_sold": {
"value": 20
}
},
{
"key": "c",
"doc_count": 1,
"units_sold": {
"value": 5
}
}
]
}
}
}
I omitted some details from the response object for brevity, and to highlight the important part, which is within the aggregations object. You can see how the aggregated data consists of different buckets, each representing the distinct product types (identified by the key key) that were found within your documents, doc_count has the number of occurrences per product type, and the unit_sold object, holds the total sum of units sold per each of the product types.
One important thing to keep into consideration is that in order to perform aggregations on string or text fields, you need to enable the fielddata setting within your field mapping, as that setting is disabled by default on all text based fields. In order to update the mapping, for ex. of the product code field, you just need to to a PUT request to the corresponding mapping type within the index, for example
PUT http://localhost:9200/sales/sale/_mapping
{
"properties": {
"productCode": {
"type": "string",
"fielddata": true
}
}
}
(more info about the fielddata setting)