manipulate returned fields in elasticsearch - elasticsearch

Is there a way to manipulate (for example concatenate) returned fields from a query?
This is how I created my index:
PUT /megacorp/employee/1
{
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to go rock climbing",
"interests": [ "sports", "music" ]
}
And this is how I query it:
GET /megacorp/employee/_search
{
"query": {"match_all": {}}
}
The response is this:
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "megacorp",
"_type": "employee",
"_id": "1",
"_score": 1,
"_source": {
"first_name": "John",
"last_name": "Smith",
"age": 25,
"about": "I love to go rock climbing",
"interests": [
"sports",
"music"
]
}
}
]
}
}
That's all working fine.
What I want is to concatenate two fields from the _source and display it in the output as a new field.
first_name and last_name should be combined to a new field "full_name". I can't figure out how to do that without creating a new field in my index. I have looked at "copy_to", but it requires you to explicitly set the store property in the mapping and you have to explicitly ask for the stored field in the query. But the main downside is that when you do both that, the first_name and last_name are returned comma separated. I would like a nice string: "John Smith"

GET /megacorp/employee/_search
{
"query": {"match_all": {}},
"script_fields": {
"combined": {
"script": "_source['first_name'] + ' ' + _source['last_name']"
}
}
}
And you need to enable dynamic scripting.

You can use script_fields to achieve that
GET /megacorp/employee/_search
{
"query": {"match_all": {}},
"script_fields" : {
"full_name" : {
"script" : "[doc['first_name'].value, doc['last_name'].value].join(' ')"
}
}
}
You need to make sure to enable dynamic scripting in order for this to work.

Related

ElasticSearch sorting by more conditions

I have index with simple data and I have to filter and sort it like this:
Records are like this:
{
"name": "Product ABC variant XYZ subvariant JKL",
"date": "2023-01-03T10:34:39+01:00"
}
And I'm searching name, where it is: "Product FGH"
Get records with exact match (field name) and sort them by date (field date) DESC
if nothing found in 1) or if there is not exact match, but similar records, then the rest records sort by default score.
Is it possible to do it in one elasticsearch request? And how it should look the whole query?
Thanks
What you are looking for is running Elasticsearch queries based on the conditions, which is not possible in a single query, you need to first fire first query and if it doesn't return any hit, you need to fire the second one.
Using script_query, you can do it how you want. Convert the date to milliseconds and assign it to the "_score" field for an exact match. for non exact match, you can simply return _score field
For an exact match, it will be sorted by date field desc.
For non exact match, it will sorted by _score field
For example:
Mapping:
{
"mappings": {
"properties": {
"name" : {"type": "keyword"},
"date" : {"type": "date", "format": "yyyy-MM-dd HH:mm:ss"}
}
}
}
Insert:
PUT func/_doc/1
{
"name" : "Product ABC variant XYZ subvariant JKL",
"date" : "2023-01-03 10:34:39"
}
PUT func/_doc/2
{
"name" : "Product ABC variant XYZ subvariant JKL",
"date" : "2022-12-03 10:33:39"
}
PUT func/_doc/3
{
"name" : "Product ABC",
"date" : "2022-11-03 10:33:39"
}
PUT func/_doc/4
{
"name" : "Product ABC",
"date" : "2023-01-03 10:33:39"
}
Query:
GET /func/_search
{
"query": {
"script_score": {
"query": {
"match_all": {}
},
"script": {
"source": "if (doc['name'].value == params.search_term) { return doc['date'].value.toInstant().toEpochMilli(); } else return _score",
"params": {
"search_term": "Product ABC"
}
}
}
}
}
output:
{
"took": 29,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 4,
"relation": "eq"
},
"max_score": 1672742040000,
"hits": [
{
"_index": "func",
"_id": "4",
"_score": 1672742040000,
"_source": {
"name": "Product ABC",
"date": "2023-01-03 10:33:39"
}
},
{
"_index": "func",
"_id": "3",
"_score": 1667471640000,
"_source": {
"name": "Product ABC",
"date": "2022-11-03 10:33:39"
}
},
{
"_index": "func",
"_id": "1",
"_score": 1,
"_source": {
"name": "Product ABC variant XYZ subvariant JKL",
"date": "2023-01-03 10:34:39"
}
},
{
"_index": "func",
"_id": "2",
"_score": 1,
"_source": {
"name": "Product ABC variant XYZ subvariant JKL",
"date": "2022-12-03 10:33:39"
}
}
]
}
}

Query on Elastic Search on multiple criterias

I have this document in elastic search
{
"_index" : "master",
"_type" : "_doc",
"_id" : "q9IGdXABeXa7ITflapkV",
"_score" : 0.0,
"_source" : {
"customer_acct" : "64876457056",
"ssn_number" : "123456789",
"name" : "Julie",
"city" : "NY"
}
I wanted to query the master index , with the customer_acct and ssn_number to retrive the entire document. I wanted to disable scoring and relevance , I have used the below query
curl -X GET "localhost/master/_search/?pretty" -H 'Content-Type: application/json' -d'
{
"query": {
"term": {
"customer_acct": {
"value":"64876457056"
}
}
}
}'
I need to include the second criteria in the term query as well which is the ssn_number, how would I do that? , I want to turn off scoring and relevance would that be possible, I am new to Elastic Search and how would I fit the second criteria on ssn_number in the above query that I have tried?
First, you need to define the proper mapping of your index. your customer_acct and ssn_number are of numeric type but you are storing it as a string. Also looking at your sample I can see you have to use long to store them. and then you can just use filter context in your query as you don't need score and relevance in your result. Read more about filter context in official ES doc as well as below snippet from the link.
In a filter context, a query clause answers the question “Does this
document match this query clause?” The answer is a simple Yes or
No — no scores are calculated. Filter context is mostly used for
filtering structured data,
which is exactly your use-case.
1. Index Mapping
{
"mappings": {
"properties": {
"customer_acct": {
"type": "long"
},
"ssn_number" :{
"type": "long"
},
"name" : {
"type": "text"
},
"city" :{
"type": "text"
}
}
}
}
2. Index sample docs
{
"name": "Smithe John",
"city": "SF",
"customer_acct": 64876457065,
"ssn_number": 123456790
}
{
"name": "Julie",
"city": "NY",
"customer_acct": 64876457056,
"ssn_number": 123456789
}
3. Main search query to filter without the score
{
"query": {
"bool": {
"filter": [ --> only filter clause
{
"term": {
"customer_acct": 64876457056
}
},
{
"term": {
"ssn_number": 123456789
}
}
]
}
}
}
Above search query gives below result:
{
"took": 186,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 0.0,
"hits": [
{
"_index": "so-master",
"_type": "_doc",
"_id": "1",
"_score": 0.0, --> notice score is 0.
"_source": {
"name": "Smithe John",
"city": "SF",
"customer_acct": 64876457056,
"ssn_number": 123456789
}
}
]
}
}

Get specific fields from index in elasticsearch

I have an index in elastic-search.
Sample structure :
{
"Article": "Article7645674712",
"Genre": "Genre92231455",
"relationDesc": [
"Article",
"Genre"
],
"org": "user",
"dateCreated": {
"date": "08/05/2015",
"time": "16:22 IST"
},
"dateModified": "08/05/2015"
}
From this index i want to retrieve selected fields: org and dateModified.
I want result like this
{
"took": 265,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 28,
"max_score": 1,
"hits": [
{
"_index": "couchrecords",
"_type": "couchbaseDocument",
"_id": "3",
"_score": 1,
"_source": {
"doc": {
"org": "user",
"dateModified": "08/05/2015"
}
}
},
{
"_index": "couchrecords",
"_type": "couchbaseDocument",
"_id": "4",
"_score": 1,
"_source": {
"doc": {
"org": "user",
"dateModified": "10/05/2015"
}
}
}
]
}
}
How to query elastic-search to get only selected specific fields ?
You can retrieve only a specific set of fields in the result hits using the _source parameter like this:
curl -XGET localhost:9200/couchrecords/couchbaseDocument/_search?_source=org,dateModified
Or in this format:
curl -XPOST localhost:9200/couchrecords/couchbaseDocument/_search -d '{
"_source": ["doc.org", "doc.dateModified"], <---- you just need to add this
"query": {
"match_all":{} <----- or whatever query you have
}
}'
That's easy. Considering any query of this format :
{
"query": {
...
},
}
You'll just need to add the fields field into your query which in your case will result in the following :
{
"query": {
...
},
"fields" : ["org","dateModified"]
}
{
"_source" : ["org","dateModified"],
"query": {
...
}
}
Check ElasticSearch source filtering.

Fuzzy not functioning as expected (one term search, see example)

Consider the following results from:
curl -XGET 'http://localhost:9200/megacorp/employee/_search' -d
'{ "query" :
{"match":
{"last_name": "Smith"}
}
}'
Result:
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0.30685282,
"hits": [
{
"_index": "megacorp",
"_type": "employee",
"_id": "1",
"_score": 0.30685282,
"_source": {
"first_name": "John",
"last_name": "Smith",
"age": 25,
"about": "I love to go rock climbing on the weekends.",
"interests": [
"sports",
"music"
]
}
},
{
"_index": "megacorp",
"_type": "employee",
"_id": "2",
"_score": 0.30685282,
"_source": {
"first_name": "Jane",
"last_name": "Smith",
"age": 25,
"about": "I love to go rock climbing",
"interests": [
"sports",
"music"
]
}
}
]
}
}
Now when I execute the following query:
curl -XGET 'http://localhost:9200/megacorp/employee/_search' -d
'{ "query" :
{"fuzzy":
{"last_name":
{"value":"Smitt",
"fuzziness": 1
}
}
}
}'
Returns NO results despite the Levenshtein distance of "Smith" and "Smitt" being 1. The same thing results with a value of "Smit." If I put in a fuzziness value of 2, I get results. What am I missing here?
I assume that the last_name field your are querying is an analyzed string. The indexed term will though be smith and not Smith.
Returns NO results despite the Levenshtein distance of "Smith" and
"Smitt" being 1.
The fuzzy query don't analyze term, so actually, your Levenshtein distance is not 1 but 2 :
Smitt -> Smith
Smith -> smith
Try using this mapping, and your query with fuzziness = 1 will work :
PUT /megacorp/employee/_mapping
{
"employee":{
"properties":{
"last_name":{
"type":"string",
"index":"not_analyzed"
}
}
}
}
Hope this helps

Which field matched query in multi_match search in Elasticsearch?

I have query with multi_match in Elasticsearch:
{
"query": {
"multi_match": {
"query": "luk",
"fields": [
"xml_string.autocomplete",
"state"
]
}
},
"size": 10,
"fields": [
"xml_string",
"state"
]
}
It works great, result returns expected value:
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.41179964,
"hits": [
{
"_index": "documents",
"_type": "document",
"_id": "11",
"_score": 0.41179964,
"fields": {
"xml_string": "Lukas bla bla bla",
"state": "new"
}
}
]
}
}
I've searched a lot, but I am not able to find out which field matched the query(if it was xml_string OR state)
I have found solution: I have used highlight feature and it's working great
This is how my curl looks like:
curl -X GET 'http://xxxxx.com:9200/documents/document/_search?load=false&size=10&pretty' -d '{
"query": {
"multi_match": {
"query": "123",
"fields": ["some_field", "another_field"]
}
},
"highlight": {
"fields": {
"some_field": {},
"another_field": {}
}
},
"size": 10,
"fields": ["field","another_field"]
}'
As far as I know there is no feature for telling you which field has matched the query.
But you can use the explain feature for debugging your query. You only have to add to your query the pamameter &explain=true. With this parameter you will see an explanation for each field of why it is in the result set and you will guess which field matched the query.

Resources