ElasticSearch sorting by more conditions - sorting

I have index with simple data and I have to filter and sort it like this:
Records are like this:
{
"name": "Product ABC variant XYZ subvariant JKL",
"date": "2023-01-03T10:34:39+01:00"
}
And I'm searching name, where it is: "Product FGH"
Get records with exact match (field name) and sort them by date (field date) DESC
if nothing found in 1) or if there is not exact match, but similar records, then the rest records sort by default score.
Is it possible to do it in one elasticsearch request? And how it should look the whole query?
Thanks

What you are looking for is running Elasticsearch queries based on the conditions, which is not possible in a single query, you need to first fire first query and if it doesn't return any hit, you need to fire the second one.

Using script_query, you can do it how you want. Convert the date to milliseconds and assign it to the "_score" field for an exact match. for non exact match, you can simply return _score field
For an exact match, it will be sorted by date field desc.
For non exact match, it will sorted by _score field
For example:
Mapping:
{
"mappings": {
"properties": {
"name" : {"type": "keyword"},
"date" : {"type": "date", "format": "yyyy-MM-dd HH:mm:ss"}
}
}
}
Insert:
PUT func/_doc/1
{
"name" : "Product ABC variant XYZ subvariant JKL",
"date" : "2023-01-03 10:34:39"
}
PUT func/_doc/2
{
"name" : "Product ABC variant XYZ subvariant JKL",
"date" : "2022-12-03 10:33:39"
}
PUT func/_doc/3
{
"name" : "Product ABC",
"date" : "2022-11-03 10:33:39"
}
PUT func/_doc/4
{
"name" : "Product ABC",
"date" : "2023-01-03 10:33:39"
}
Query:
GET /func/_search
{
"query": {
"script_score": {
"query": {
"match_all": {}
},
"script": {
"source": "if (doc['name'].value == params.search_term) { return doc['date'].value.toInstant().toEpochMilli(); } else return _score",
"params": {
"search_term": "Product ABC"
}
}
}
}
}
output:
{
"took": 29,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 4,
"relation": "eq"
},
"max_score": 1672742040000,
"hits": [
{
"_index": "func",
"_id": "4",
"_score": 1672742040000,
"_source": {
"name": "Product ABC",
"date": "2023-01-03 10:33:39"
}
},
{
"_index": "func",
"_id": "3",
"_score": 1667471640000,
"_source": {
"name": "Product ABC",
"date": "2022-11-03 10:33:39"
}
},
{
"_index": "func",
"_id": "1",
"_score": 1,
"_source": {
"name": "Product ABC variant XYZ subvariant JKL",
"date": "2023-01-03 10:34:39"
}
},
{
"_index": "func",
"_id": "2",
"_score": 1,
"_source": {
"name": "Product ABC variant XYZ subvariant JKL",
"date": "2022-12-03 10:33:39"
}
}
]
}
}

Related

Query on Elastic Search on multiple criterias

I have this document in elastic search
{
"_index" : "master",
"_type" : "_doc",
"_id" : "q9IGdXABeXa7ITflapkV",
"_score" : 0.0,
"_source" : {
"customer_acct" : "64876457056",
"ssn_number" : "123456789",
"name" : "Julie",
"city" : "NY"
}
I wanted to query the master index , with the customer_acct and ssn_number to retrive the entire document. I wanted to disable scoring and relevance , I have used the below query
curl -X GET "localhost/master/_search/?pretty" -H 'Content-Type: application/json' -d'
{
"query": {
"term": {
"customer_acct": {
"value":"64876457056"
}
}
}
}'
I need to include the second criteria in the term query as well which is the ssn_number, how would I do that? , I want to turn off scoring and relevance would that be possible, I am new to Elastic Search and how would I fit the second criteria on ssn_number in the above query that I have tried?
First, you need to define the proper mapping of your index. your customer_acct and ssn_number are of numeric type but you are storing it as a string. Also looking at your sample I can see you have to use long to store them. and then you can just use filter context in your query as you don't need score and relevance in your result. Read more about filter context in official ES doc as well as below snippet from the link.
In a filter context, a query clause answers the question “Does this
document match this query clause?” The answer is a simple Yes or
No — no scores are calculated. Filter context is mostly used for
filtering structured data,
which is exactly your use-case.
1. Index Mapping
{
"mappings": {
"properties": {
"customer_acct": {
"type": "long"
},
"ssn_number" :{
"type": "long"
},
"name" : {
"type": "text"
},
"city" :{
"type": "text"
}
}
}
}
2. Index sample docs
{
"name": "Smithe John",
"city": "SF",
"customer_acct": 64876457065,
"ssn_number": 123456790
}
{
"name": "Julie",
"city": "NY",
"customer_acct": 64876457056,
"ssn_number": 123456789
}
3. Main search query to filter without the score
{
"query": {
"bool": {
"filter": [ --> only filter clause
{
"term": {
"customer_acct": 64876457056
}
},
{
"term": {
"ssn_number": 123456789
}
}
]
}
}
}
Above search query gives below result:
{
"took": 186,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 0.0,
"hits": [
{
"_index": "so-master",
"_type": "_doc",
"_id": "1",
"_score": 0.0, --> notice score is 0.
"_source": {
"name": "Smithe John",
"city": "SF",
"customer_acct": 64876457056,
"ssn_number": 123456789
}
}
]
}
}

ElasticSearch Range query

I have created the index by using the following mapping:
put test1
{
"mappings": {
"type1": {
"properties": {
"age": {
"type": "text",
"fields": {
"raw": {
"type": "keyword",
"ignore_above": 32766
}
}
}
}
}
}
}
Added following documents into index:
PUT test1/type1/1/_create
{
"age":50
}
PUT test1/type1/2/_create
{
"age":100
}
PUT test1/type1/3/_create
{
"age":150
}
PUT test1/type1/4/_create
{
"age":200
}
I have used the following range query to fetch result:
GET test1/_search
{
"query": {
"range" : {
"age" : {
"lte" : 150
}
}
}
}
It is giving me the following response :
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 1,
"hits": [
{
"_index": "test1",
"_type": "type1",
"_id": "2",
"_score": 1,
"_source": {
"age": 100
}
},
{
"_index": "test1",
"_type": "type1",
"_id": "3",
"_score": 1,
"_source": {
"age": 150
}
}
]
}
}
the above response not showing document having age is 50 it is showing only age is 100 and 150. As 50 also less than 200. What is wrong here?
Can anyone help me to get a valid result?
In my schema age field type text, I don't want to change it.
How can I get a valid result?
Because age field type is text, the range query is using alphabetically order. So the results are correct:
"100"<"150"
"150"="150"
"50">"150"
If you are ingesting only numbers in age field, you should change the age field type to number, or add another inner field as number, just you did with raw inner field.
UPDATE: Tested on local system and it is working.
NOTE: Ideally, you would want the mappings to be correct, but if there is no other choice and you are not the person to decide on the mapping then you can still achieve it by following.
For ES version 6.3 onwards, try this.
GET test1/type1/_search
{
"query": {
"bool" : {
"must" : {
"script" : {
"script" : {
"source": "Integer.parseInt(doc['age.raw'].value) <= 150",
"lang": "painless"
}
}
}
}
}
}
Sources to refer:
https://www.elastic.co/guide/en/elasticsearch/reference/6.3/query-dsl-script-query.html
https://discuss.elastic.co/t/painscript-script-cast-string-as-int/97034
Type for your field age in mapping is set to text. That is reason it is doing dictionary sorting where 50 > 150. Please use long data type. https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping.html

Elasticsearch is giving unnecessary records on match query

I want to get All the documents whose dateofbirth is having substring "-11-09".
This is my elasticsearch query :
{ "query": { "bool" : { "must": { "match": { "dobdata": ".*-11-09.*"} } } } }
And the result i am getting is
{
"took": 8,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 4,
"max_score": 5.0782137,
"hits": [
{
"_index": "userindexv1",
"_type": "usertype",
"_id": "58f9a9d1acf8c47037000038",
"_score": 5.0782137,
"_source": {
"fullname": "Eshwar ",
"fullname_raw": "Eshwar ",
"mobile1": "7222222256",
"uid": "UIDS1010",
"mobile2": "",
"classname": "Class 5",
"classname_raw": "Class 5",
"divid": 63,
"category": "S",
"dobdata": "2010-11-09"
}
},
{
"_index": "userindexv1",
"_type": "usertype",
"_id": "57960b35acf8c4c43000002c",
"_score": 1.259227,
"_source": {
"fullname": "Sindhu ",
"fullname_raw": "Sindhu ",
"mobile1": "9467952335",
"uid": "UIDS1006",
"mobile2": "",
"classname": "class 1s Group for class g",
"classname_raw": "class 1s Group for class g",
"divid": 63,
"category": "S",
"dobdata": "2012-11-08"
}
},
{
"_index": "userindexv1",
"_type": "usertype",
"_id": "58eb62d2acf8c4d43300002f",
"_score": 1.1471639,
"_source": {
"fullname": "Himanshu ",
"fullname_raw": "Himanshu ",
"mobile1": "9898785484",
"uid": "",
"mobile2": "",
"classname": "Play Group",
"classname_raw": "Play Group",
"divid": 63,
"category": "S",
"dobdata": "2012-11-08"
}
},
{
"_index": "userindexv1",
"_type": "usertype",
"_id": "580dbe5bacf8c4b82300002a",
"_score": 1.1471639,
"_source": {
"fullname": "Sai Bhargav ",
"fullname_raw": "Sai Bhargav ",
"mobile1": "9739477159",
"uid": "",
"mobile2": "7396226318",
"classname": "class 1s Group for class g",
"classname_raw": "class 1s Group for class g",
"divid": 63,
"category": "S",
"dobdata": "2012-11-07"
}
}
]
}}
I am getting the records whose dateofbirth does not contain the string "-11-09". I tried to work around it. I am not able to find the soultion.
I am new to elasticsearch. I want only the first record. Can anyone please help me out. Sorry for bad english.
Even I faced same problem and I solved it by doing two things.
1. Changed the format of date of birth from Y-m-d to YMd and made the index as not_analyzed.
2. Used wildcard query insteadof match query
{
"query": {
"wildcard": {
"dobdata": {
"value": "*Nov09*"
}
}
}
}
It solved my problem.Hope this will solve your problem too.
For getting results with month-date=11-09 use following query.
The mapping for dobdata is
"dobdata": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
Query is:
curl -XGET "http://localhost:9200/userindexv1/_search" -H 'Content-Type:
application/json' -d'
{
"query": {
"wildcard": {
"dobdata.keyword":"*-11-09*"
}
}
}'
Also use multifield mapping instead of _raw fields Refer here.
#K Sathish so you use elasticsearch 2.x? String type for date is not so confortable. I suggest to you to change the datatype from string in date type , so you can get also the range query. Once changed in date type you can make your query in elastic 2.x in this way:
{
"query": {
"filtered": {
"filter": {
"script": {
"script": "doc.dobdata.date.monthOfYear == 11 && doc.dobdata.date.dayOfMonth == 9"
}
}
}
}
}

Elasticsearch query scores all documents 1.0. Why?

I'm using ElasticSearch 2.4.1. When I execute the following query, all documents are scored 1.0. Why?
I get the same behavior if I remove the "bool" and just do a match on one field.
Query:
{
"query" :
{
"bool": {
"must" : [
{"match" : { "last" : { "query" : "SMITH" , fuzziness: 2.0}} }
],
"should" : [
{"match" : {"first" :{ "query" : "JOE", fuzziness: 1.0, boost: 99.0}}}
]
}
}
}
Explain for one match gives me:
1.0 = sum of:
1.0 = ConstantScore(+(last:1mith^0.8 last:1smith^0.8 last:4mith^0.8 last:amith^0.8 last:asmith^0.8 last:bsmith^0.8 last:csmith^0.8 last:dsmith^0.8 last:emith^0.8 last:esmith^0.8 last:fsmith^0.8 last:hmith^0.8 last:hsmith^0.8 last:imith^0.8 last:ismith^0.8 last:jmith^0.8 last:jsmith^0.8 last:ksmith^0.8 last:lsmith^0.8 last:msith^0.8 last:msmith^0.8 last:nsmith^0.8 last:omith^0.8 last:osmith^0.8 last:psmith^0.8 last:qsmith^0.8 last:rsmith^0.8 last:saith^0.8 last:samith^0.8 last:scmith^0.8 last:seith^0.8 last:shith^0.8 last:simith^0.8 last:simth^0.8 last:skith^0.8 last:slith^0.8 last:smaith^0.8 last:smath^0.8 last:smdith^0.8 last:smeth^0.8 last:smfith^0.8 last:smich^0.8 last:smidh^0.8 last:smidth^0.8 last:smieth^0.8 last:smigh^0.8 last:smiht^0.8 last:smiih^0.8 last:smiith^0.8 last:smith) (first:aoe^0.6666666 first:bjoe^0.6666666 first:boe^0.6666666 first:coe^0.6666666 first:djoe^0.6666666 first:doe^0.6666666 first:eoe^0.6666666 first:foe^0.6666666 first:goe^0.6666666 first:hoe^0.6666666 first:ioe^0.6666666 first:j0e^0.6666666 first:jae^0.6666666 first:jbe^0.6666666 first:jce^0.6666666 first:jee^0.6666666 first:jeo^0.6666666 first:jge^0.6666666 first:jhe^0.6666666 first:jhoe^0.6666666 first:jie^0.6666666 first:jioe^0.6666666 first:jke^0.6666666 first:jle^0.6666666 first:jme^0.6666666 first:jne^0.6666666 first:jnoe^0.6666666 first:joa^0.6666666 first:joae^0.6666666 first:job^0.6666666 first:jobe^0.6666666 first:joc^0.6666666 first:joce^0.6666666 first:jod^0.6666666 first:jode^0.6666666 first:joe first:joea^0.6666666 first:joeb^0.6666666 first:joec^0.6666666 first:joed^0.6666666 first:joee^0.6666666 first:joef^0.6666666 first:joeg^0.6666666 first:joeh^0.6666666 first:joei^0.6666666 first:joej^0.6666666 first:joek^0.6666666 first:joel^0.6666666 first:joem^0.6666666 first:joen^0.6666666)^99.0), product of:
1.0 = boost
1.0 = queryNorm
0.0 = match on required clause, product of:
0.0 = # clause
0.0 = weight(_type:mytype in 327) [], result of:
0.0 = score(doc=327,freq=1.0), with freq of:
1.0 = termFreq=1.0
Type mapping:
{
"ourindex1": {
"mappings": {
"people": {
"properties": {
"city": {
"type": "string"
},
"first": {
"type": "string"
},
"last": {
"type": "string"
},
"middle": {
"type": "string"
},
"state": {
"type": "string"
},
"street": {
"type": "string"
},
"suffix": {
"type": "string"
},
"suite": {
"type": "string"
},
"territory": {
"type": "string"
},
"zip5": {
"type": "string"
}
}
}
}
}
}
Edit: Simplified Reproduction:
Download clean version of elasticsearch 2.4.1 and start it up
Create new index with:
POST /newindex/people
{"first" : "JOE", "last": "SMITH", "street" : "1 FIRST STREET", "city" : "LOS ANGELES", "state" : "CA", "middle" : ""}
Issue the following query:
{ "query" : {"match" : { "last" : { "query" : "SMITHX", fuzziness: 1.0} } }}
When I do this, document returned is scored 1.0 and explain says something about ConstantScore.
Edit 2: It appears my reproduction steps included an unintentional lie
The library my app uses to communicate with elasticsearch (elastic4s), appears to mangle the query so that it becomes:
{"query" : { "query" : {"match" : { "last" : { "query" : "SMITHX", fuzziness: 1.0} } }}}
(Note that extra "query." This mangled query returns the results I'd expect, but with score = 1.0.) I thought I had already tried executing the query directly with curl, but evidently not.
This is happening because of double query keyword. So, basically it working like this - inner query selects hits and produce something like this:
{
"took": 7,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0.30685285,
"hits": [
{
"_index": "my_index",
"_type": "people",
"_id": "2",
"_score": 0.30685285,
"_source": {
"first": "JOHN",
"last": "SMITHS",
"street": "2 SECOND STREET",
"city": "LA",
"state": "CA",
"middle": ""
}
},
{
"_index": "my_index",
"_type": "people",
"_id": "1",
"_score": 0.30685282,
"_source": {
"first": "JOE",
"last": "SMITH",
"street": "1 FIRST STREET",
"city": "LOS ANGELES",
"state": "CA",
"middle": ""
}
}
]
}
}
which is fully correct response with proper score, but then the second query appears, which didn't change result set, but only "eat" the score and replace it with 1.0. So, you need to fix your usage of elastic4s

Get specific fields from index in elasticsearch

I have an index in elastic-search.
Sample structure :
{
"Article": "Article7645674712",
"Genre": "Genre92231455",
"relationDesc": [
"Article",
"Genre"
],
"org": "user",
"dateCreated": {
"date": "08/05/2015",
"time": "16:22 IST"
},
"dateModified": "08/05/2015"
}
From this index i want to retrieve selected fields: org and dateModified.
I want result like this
{
"took": 265,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 28,
"max_score": 1,
"hits": [
{
"_index": "couchrecords",
"_type": "couchbaseDocument",
"_id": "3",
"_score": 1,
"_source": {
"doc": {
"org": "user",
"dateModified": "08/05/2015"
}
}
},
{
"_index": "couchrecords",
"_type": "couchbaseDocument",
"_id": "4",
"_score": 1,
"_source": {
"doc": {
"org": "user",
"dateModified": "10/05/2015"
}
}
}
]
}
}
How to query elastic-search to get only selected specific fields ?
You can retrieve only a specific set of fields in the result hits using the _source parameter like this:
curl -XGET localhost:9200/couchrecords/couchbaseDocument/_search?_source=org,dateModified
Or in this format:
curl -XPOST localhost:9200/couchrecords/couchbaseDocument/_search -d '{
"_source": ["doc.org", "doc.dateModified"], <---- you just need to add this
"query": {
"match_all":{} <----- or whatever query you have
}
}'
That's easy. Considering any query of this format :
{
"query": {
...
},
}
You'll just need to add the fields field into your query which in your case will result in the following :
{
"query": {
...
},
"fields" : ["org","dateModified"]
}
{
"_source" : ["org","dateModified"],
"query": {
...
}
}
Check ElasticSearch source filtering.

Resources