search first element of a multivalue text field in elasticsearch - elasticsearch

I want to search first element of array in documents of elasticsearch, but I can't.
I don't find it that how can I search.
For test, I created new index with fielddata=true, but I still didn't get the response that I wanted
Document
"name" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
Values
name : ["John", "Doe"]
My request
{
"query": {
"bool" : {
"must" : {
"script" : {
"script" : {
"source": "doc['name'][0]=params.param1",
"params" : {
"param1" : "john"
}
}
}
}
}
}
}
Incoming Response
"reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [name] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."

You can use the following script that is used in a search request to return a scripted field:
{
"script_fields": {
"firstElement": {
"script": {
"lang": "painless",
"inline": "params._source.name[0]"
}
}
}
}
Search Result:
"hits": [
{
"_index": "stof_64391432",
"_type": "_doc",
"_id": "1",
"_score": 1.0,
"fields": {
"firstElement": [
"John" <-- note this
]
}
}
]
You can use a Painless script to create a script field to return a customized value for each document in the results of a query.
You need to use equality equals operator '==' to COMPARE two
values where the resultant boolean type value is true if the two
values are equal and false otherwise in the script query.
Adding a working example with index data, mapping, search query, and search result
Index Mapping:
{
"mappings":{
"properties":{
"name":{
"type":"text",
"fielddata":true
}
}
}
}
Index data:
{
"name": [
"John",
"Doe"
]
}
Search Query:
{
"script_fields": {
"my_field": {
"script": {
"lang": "painless",
"source": "params['_source']['name'][0] == params.params1",
"params": {
"params1": "John"
}
}
}
}
}
Search Result:
"hits": [
{
"_index": "test",
"_type": "_doc",
"_id": "1",
"_score": 1.0,
"fields": {
"my_field": [
true <-- note this
]
}
}
]

Arrays of objects do not work as you would expect: you cannot query
each object independently of the other objects in the array. If you
need to be able to do this then you should use the nested data type
instead of the object data type.
You can use the script as shown in my another answer if you want to just compare the value of the first element of the array to some other value. But based on your comments, it looks like your use case is quite different.
If you want to search the first element of the array you need to convert your data, into nested form. Using arrays of object at search time you can’t refer to “the first element” or “the last element”.
Adding a working example with index data, mapping, search query, and search result
Index Mapping:
{
"mappings": {
"properties": {
"name": {
"type": "nested"
}
}
}
}
Index Data:
{
"booking_id": 2,
"name": [
{
"first": "John Doe",
"second": "abc"
}
]
}
{
"booking_id": 1,
"name": [
{
"first": "Adam Simith",
"second": "John Doe"
}
]
}
{
"booking_id": 3,
"name": [
{
"first": "John Doe",
"second": "Adam Simith"
}
]
}
Search Query:
{
"query": {
"nested": {
"path": "name",
"query": {
"bool": {
"must": [
{
"match_phrase": {
"name.first": "John Doe"
}
}
]
}
}
}
}
}
Search Result:
"hits": [
{
"_index": "test",
"_type": "_doc",
"_id": "2",
"_score": 0.9400072,
"_source": {
"booking_id": 2,
"name": [
{
"first": "John Doe",
"second": "abc"
}
]
}
},
{
"_index": "test",
"_type": "_doc",
"_id": "3",
"_score": 0.9400072,
"_source": {
"booking_id": 3,
"name": [
{
"first": "John Doe",
"second": "Adam Simith"
}
]
}
}
]

Related

How to search over all fields and return every document containing that search in elasticsearch?

I have a problem regarding searching in elasticsearch.
I have a index with multiple documents with several fields. I want to be able to search over all the fields running a query and want it to return all the documents that contains the value specified in the query. I Found that using simple_query_string worked well for this. However, it does not return consistent results. In my index I have documents with several fields that contain dates. For example:
"revisionDate" : "2008-01-01T00:00:00",
"projectSmirCreationDate" : "2008-07-01T00:00:00",
"changedDate" : "1971-01-01T00:00:00",
"dueDate" : "0001-01-01T00:00:00",
Those are just a few examples, however when I index for example:
GET new_document-20_v2/_search
{
"size": 1000,
"query": {
"simple_query_string" : {
"query": "2008"
}
}
}
It only returns two documents, this is a problem because I have much more documents than just two that contains the value "2008" in their fields.
I also have problem searching file names.
In my index there are fields that contain fileNames like this:
"fileName" : "testPDF.pdf",
"fileName" : "demo.pdf",
"fileName" : "demo.txt",
When i query:
GET new_document-20_v2/_search
{
"size": 1000,
"query": {
"simple_query_string" : {
"query": "demo"
}
}
}
I get no results
But if i query:
GET new_document-20_v2/_search
{
"size": 1000,
"query": {
"simple_query_string" : {
"query": "demo.txt"
}
}
}
I get the proper result.
Is there any better way to search across all documents and fields than I did? I want it to return all the document matching the query and not just two or zero.
Any help would be greatly appreciated.
Elasticsearch uses a standard analyzer if no analyzer is specified. Since no analyzer is specified on "fileName", demo.txt gets tokenized to
{
"tokens": [
{
"token": "demo.txt",
"start_offset": 0,
"end_offset": 8,
"type": "<ALPHANUM>",
"position": 0
}
]
}
Now when you are searching for demo it will not give any result, but searching for demo.txt will give the result.
You can instead use a wildcard query to search for a document having demo in fileName
{
"query": {
"wildcard": {
"fileName": {
"value": "demo*"
}
}
}
}
Search Result will be
"hits": [
{
"_index": "67303015",
"_type": "_doc",
"_id": "2",
"_score": 1.0,
"_source": {
"fileName": "demo.pdf"
}
},
{
"_index": "67303015",
"_type": "_doc",
"_id": "3",
"_score": 1.0,
"_source": {
"fileName": "demo.txt"
}
}
]
Since revisionDate, projectSmirCreationDate, changedDate, dueDate are all of type date, so you cannot do a partial search on these dates.
You can use multi-fields, to add one more field (of text type) in the above fields. Modify your index mapping as shown below
{
"mappings": {
"properties": {
"changedDate": {
"type": "date",
"fields": {
"raw": {
"type": "text"
}
}
},
"projectSmirCreationDate": {
"type": "date",
"fields": {
"raw": {
"type": "text"
}
}
},
"dueDate": {
"type": "date",
"fields": {
"raw": {
"type": "text"
}
}
},
"revisionDate": {
"type": "date",
"fields": {
"raw": {
"type": "text"
}
}
}
}
}
}
Index Data:
{
"revisionDate": "2008-02-01T00:00:00",
"projectSmirCreationDate": "2008-02-01T00:00:00",
"changedDate": "1971-01-01T00:00:00",
"dueDate": "0001-01-01T00:00:00"
}
{
"revisionDate": "2008-01-01T00:00:00",
"projectSmirCreationDate": "2008-07-01T00:00:00",
"changedDate": "1971-01-01T00:00:00",
"dueDate": "0001-01-01T00:00:00"
}
Search Query:
{
"query": {
"multi_match": {
"query": "2008"
}
}
}
Search Result:
"hits": [
{
"_index": "67303015",
"_type": "_doc",
"_id": "2",
"_score": 1.0,
"_source": {
"revisionDate": "2008-01-01T00:00:00",
"projectSmirCreationDate": "2008-07-01T00:00:00",
"changedDate": "1971-01-01T00:00:00",
"dueDate": "0001-01-01T00:00:00"
}
},
{
"_index": "67303015",
"_type": "_doc",
"_id": "1",
"_score": 0.18232156,
"_source": {
"revisionDate": "2008-02-01T00:00:00",
"projectSmirCreationDate": "2008-02-01T00:00:00",
"changedDate": "1971-01-01T00:00:00",
"dueDate": "0001-01-01T00:00:00"
}
}
]

Matching the stored values and queries in Elastic Search

I have a field called that is inside a nested field "name" that is a "Keyword" in elastic search.
Name field contains 2 values.
Jagannathan Rajagopalan
Rajagopalan.
If I query "Rajagopalan", I should get only the item #2.
If I query the complete Jagannathan Rajagopalan, I should get #1.
How do I achieve it?
You need to use the term query which is used for exact search. Added a working example according to your use-case.
Index mapping
{
"mappings": {
"properties": {
"name": {
"type": "nested",
"properties": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}
Index sample docs
{
"name" : {
"keyword" : "Jagannathan Rajagopalan"
}
}
And another doc
{
"name" : {
"keyword" : "Jagannathan"
}
}
And search query
{
"query": {
"nested": {
"path": "name",
"query": {
"bool": {
"must": [
{
"match": {
"name.keyword": "Jagannathan Rajagopalan"
}
}
]
}
}
}
}
}
Search result
"hits": [
{
"_index": "key",
"_type": "_doc",
"_id": "2",
"_score": 0.6931471,
"_source": {
"name": {
"keyword": "Jagannathan Rajagopalan"
}
}
}
]

Is it possible to boost suggestions based on the Elasticsearch index

I have multiple indices that I want suggestions from, but I want to score/order the suggestions based on the index they're from. I've successfully boosted searches based on indices (using indices_boost), but this doesn't seem to work for suggestions. I tried something like:
GET index1,index2/_search
{
"indices_boost" : [
{ "index1" : 9 },
{ "index2" : 1 }
],
"suggest": {
"mySuggest":{
"text":"someText",
"completion": {
"field":"suggestField",
"size":6
}
}
}
}
Is this doable?
At the moment I've resorted to sorting the suggestions in code.
I believe you can try to use category boost in context suggester to achieve the desired behavior. You need to attach a special category field to each suggestion document, which can be exactly the same as the index name.
How to use category context to boost suggestions
The mapping may look like this:
PUT food
{
"mappings": {
"properties" : {
"suggestField" : {
"type" : "completion",
"contexts": [
{
"name": "index_name",
"type": "category"
}
]
}
}
}
}
For demonstration purposes I will create another index, exactly like the one above but with name movie. (Index names can be arbitrary.)
Let's add the suggest documents:
PUT food/_doc/1
{
"suggestField": {
"input": ["timmy's", "starbucks", "dunkin donuts"],
"contexts": {
"index_name": ["food"]
}
}
}
PUT movie/_doc/2
{
"suggestField": {
"input": ["star wars"],
"contexts": {
"index_name": ["movie"]
}
}
}
Now we can run a suggest query with our boosts set:
POST food,movie/_search
{
"suggest": {
"my_suggestion": {
"prefix": "star",
"completion": {
"field": "suggestField",
"size": 10,
"contexts": {
"index_name": [
{
"context": "movie",
"boost": 9
},
{
"context": "food",
"boost": 1
}
]
}
}
}
}
}
Which will return something like this:
{
"suggest": {
"my_suggestion": [
{
"text": "star",
"offset": 0,
"length": 4,
"options": [
{
"text": "star wars",
"_index": "movie",
"_type": "_doc",
"_id": "2",
"_score": 9.0,
"_source": ...,
"contexts": {
"index_name": [
"movie"
]
}
},
{
"text": "starbucks",
"_index": "food",
"_type": "_doc",
"_id": "1",
"_score": 1.0,
"_source": ...,
"contexts": {
"index_name": [
"food"
]
...
Why didn't indices_boost work?
Seems like indices_boost parameter only affects the search, not suggest. _suggest used to be a standalone endpoint, but was deprecated, and probably this is the source of confusion.
Hope that helps!

How to turn an array of object to array of string while reindexing in elasticsearch?

Let say the source index have a document like this :
{
"name":"John Doe",
"sport":[
{
"name":"surf",
"since":"2 years"
},
{
"name":"mountainbike",
"since":"4 years"
},
]
}
How to discard the "since" information so once reindexed the object will contain only sport names? Like this :
{
"name":"John Doe",
"sport":["surf","mountainbike"]
}
Note that it would be fine if the resulting field keep the same name, but it's not mandatory.
I don't know which version of elasticsearch you're using, but here is a solution based on pipelines, introduced with ingest nodes in ES v5.0.
1) A script processor is used to extract the values from each subobject and set it in another field (here, sports)
2) The previous sport field is removed with a remove processor
You can use the Simulate pipeline API to test it :
POST _ingest/pipeline/_simulate
{
"pipeline": {
"description": "random description",
"processors": [
{
"script": {
"lang": "painless",
"source": "ctx.sports =[]; for (def item : ctx.sport) { ctx.sports.add(item.name) }"
}
},
{
"remove": {
"field": "sport"
}
}
]
},
"docs": [
{
"_index": "index",
"_type": "doc",
"_id": "id",
"_source": {
"name": "John Doe",
"sport": [
{
"name": "surf",
"since": "2 years"
},
{
"name": "mountainbike",
"since": "4 years"
}
]
}
}
]
}
which outputs the following result :
{
"docs": [
{
"doc": {
"_index": "index",
"_type": "doc",
"_id": "id",
"_source": {
"name": "John Doe",
"sports": [
"surf",
"mountainbike"
]
},
"_ingest": {
"timestamp": "2018-07-12T14:07:25.495Z"
}
}
}
]
}
There may be a better solution, as I've not used pipelines a lot, or you could make this with Logstash filters before submitting the documents to your Elasticsearch cluster.
For more information about the pipelines, take a look at the reference documentation of ingest nodes.

Name searching in ElasticSearch

I have a index created in ElasticSearch with the field name where I store the whole name of a person: Name and Surname. I want to perform full text search over that field so I have indexed it using the analyzer.
My issue now is that if I search:
"John Rham Rham"
And in the index I had "John Rham Rham Luck", that value has higher score than "John Rham Rham".
Is there any posibility to have better score on the exact field than in the field with more values in the string?
Thanks in advance!
I worked out a small example (assuming you're running on ES 5.x cause of the difference in scoring):
DELETE test
PUT test
{
"settings": {
"similarity": {
"my_bm25": {
"type": "BM25",
"b": 0
}
}
},
"mappings": {
"test": {
"properties": {
"name": {
"type": "text",
"similarity": "my_bm25",
"fields": {
"length": {
"type": "token_count",
"analyzer": "standard"
}
}
}
}
}
}
}
POST test/test/1
{
"name": "John Rham Rham"
}
POST test/test/2
{
"name": "John Rham Rham Luck"
}
GET test/_search
{
"query": {
"function_score": {
"query": {
"match": {
"name": {
"query": "John Rham Rham",
"operator": "and"
}
}
},
"functions": [
{
"script_score": {
"script": "_score / doc['name.length'].getValue()"
}
}
]
}
}
}
This code does the following:
Replace the default BM25 implementation with a custom one, tweaking the B parameter (field length normalisation)
-- You could also change the similarity to 'classic' to go back to TF/IDF which doesn't have this normilisation
Create an inner field for your name field, which counts the number of tokens inside your name field.
Update the score according to the length of the token
This will result in:
"hits": {
"total": 2,
"max_score": 0.3596026,
"hits": [
{
"_index": "test",
"_type": "test",
"_id": "1",
"_score": 0.3596026,
"_source": {
"name": "John Rham Rham"
}
},
{
"_index": "test",
"_type": "test",
"_id": "2",
"_score": 0.26970196,
"_source": {
"name": "John Rham Rham Luck"
}
}
]
}
}
Not sure if this is the best way of doing it, but it maybe point you in the right direction :)

Resources