How to search over all fields and return every document containing that search in elasticsearch? - elasticsearch

I have a problem regarding searching in elasticsearch.
I have a index with multiple documents with several fields. I want to be able to search over all the fields running a query and want it to return all the documents that contains the value specified in the query. I Found that using simple_query_string worked well for this. However, it does not return consistent results. In my index I have documents with several fields that contain dates. For example:
"revisionDate" : "2008-01-01T00:00:00",
"projectSmirCreationDate" : "2008-07-01T00:00:00",
"changedDate" : "1971-01-01T00:00:00",
"dueDate" : "0001-01-01T00:00:00",
Those are just a few examples, however when I index for example:
GET new_document-20_v2/_search
{
"size": 1000,
"query": {
"simple_query_string" : {
"query": "2008"
}
}
}
It only returns two documents, this is a problem because I have much more documents than just two that contains the value "2008" in their fields.
I also have problem searching file names.
In my index there are fields that contain fileNames like this:
"fileName" : "testPDF.pdf",
"fileName" : "demo.pdf",
"fileName" : "demo.txt",
When i query:
GET new_document-20_v2/_search
{
"size": 1000,
"query": {
"simple_query_string" : {
"query": "demo"
}
}
}
I get no results
But if i query:
GET new_document-20_v2/_search
{
"size": 1000,
"query": {
"simple_query_string" : {
"query": "demo.txt"
}
}
}
I get the proper result.
Is there any better way to search across all documents and fields than I did? I want it to return all the document matching the query and not just two or zero.
Any help would be greatly appreciated.

Elasticsearch uses a standard analyzer if no analyzer is specified. Since no analyzer is specified on "fileName", demo.txt gets tokenized to
{
"tokens": [
{
"token": "demo.txt",
"start_offset": 0,
"end_offset": 8,
"type": "<ALPHANUM>",
"position": 0
}
]
}
Now when you are searching for demo it will not give any result, but searching for demo.txt will give the result.
You can instead use a wildcard query to search for a document having demo in fileName
{
"query": {
"wildcard": {
"fileName": {
"value": "demo*"
}
}
}
}
Search Result will be
"hits": [
{
"_index": "67303015",
"_type": "_doc",
"_id": "2",
"_score": 1.0,
"_source": {
"fileName": "demo.pdf"
}
},
{
"_index": "67303015",
"_type": "_doc",
"_id": "3",
"_score": 1.0,
"_source": {
"fileName": "demo.txt"
}
}
]
Since revisionDate, projectSmirCreationDate, changedDate, dueDate are all of type date, so you cannot do a partial search on these dates.
You can use multi-fields, to add one more field (of text type) in the above fields. Modify your index mapping as shown below
{
"mappings": {
"properties": {
"changedDate": {
"type": "date",
"fields": {
"raw": {
"type": "text"
}
}
},
"projectSmirCreationDate": {
"type": "date",
"fields": {
"raw": {
"type": "text"
}
}
},
"dueDate": {
"type": "date",
"fields": {
"raw": {
"type": "text"
}
}
},
"revisionDate": {
"type": "date",
"fields": {
"raw": {
"type": "text"
}
}
}
}
}
}
Index Data:
{
"revisionDate": "2008-02-01T00:00:00",
"projectSmirCreationDate": "2008-02-01T00:00:00",
"changedDate": "1971-01-01T00:00:00",
"dueDate": "0001-01-01T00:00:00"
}
{
"revisionDate": "2008-01-01T00:00:00",
"projectSmirCreationDate": "2008-07-01T00:00:00",
"changedDate": "1971-01-01T00:00:00",
"dueDate": "0001-01-01T00:00:00"
}
Search Query:
{
"query": {
"multi_match": {
"query": "2008"
}
}
}
Search Result:
"hits": [
{
"_index": "67303015",
"_type": "_doc",
"_id": "2",
"_score": 1.0,
"_source": {
"revisionDate": "2008-01-01T00:00:00",
"projectSmirCreationDate": "2008-07-01T00:00:00",
"changedDate": "1971-01-01T00:00:00",
"dueDate": "0001-01-01T00:00:00"
}
},
{
"_index": "67303015",
"_type": "_doc",
"_id": "1",
"_score": 0.18232156,
"_source": {
"revisionDate": "2008-02-01T00:00:00",
"projectSmirCreationDate": "2008-02-01T00:00:00",
"changedDate": "1971-01-01T00:00:00",
"dueDate": "0001-01-01T00:00:00"
}
}
]

Related

conditionally query for fields in elasticsearch

I m new to Elasticsearch and before posting this question I have googled for help but not understanding how to write the query which i wanted to write.
My problem is I have few bunch of documents which i want to query, few of those documents has field "DueDate" and few of those has "PlannedCompletionDate" but not both exist in a single document. So I want to write a query which should conditionally query for a field from documents and return all documents.
For example below I m proving sample documents of each type and my query should return results from both the documents, I need to write query which should check for field existence and return the document
"_source": {
...
"plannedCompleteDate": "2019-06-30T00:00:00.000Z",
...
}
"_source": {
...
"dueDate": "2019-07-26T07:00:00.000Z",
...
}
You can use range query with the combination of the boolean query to achieve your use case.
Adding a working example with index mapping, data, search query, and search result
Index Mapping:
{
"mappings": {
"properties": {
"plannedCompleteDate": {
"type": "date",
"format": "yyyy-MM-dd"
},
"dueDate": {
"type": "date",
"format": "yyyy-MM-dd"
}
}
}
}
Index Data:
{
"plannedCompleteDate": "2019-05-30"
}
{
"plannedCompleteDate": "2020-06-30"
}
{
"dueDate": "2020-05-30"
}
Search Query:
{
"query": {
"bool": {
"should": [
{
"range": {
"plannedCompleteDate": {
"gte": "2020-01-01",
"lte": "2020-12-31"
}
}
},
{
"range": {
"dueDate": {
"gte": "2020-01-01",
"lte": "2020-12-31"
}
}
}
]
}
}
}
Search Result:
"hits": [
{
"_index": "65808850",
"_type": "_doc",
"_id": "1",
"_score": 1.0,
"_source": {
"plannedCompleteDate": "2020-06-30"
}
},
{
"_index": "65808850",
"_type": "_doc",
"_id": "2",
"_score": 1.0,
"_source": {
"dueDate": "2020-05-30"
}
}
]

Elasticsearch: index boost with completion suggester

Is it possible to use index boost when using completion suggester in Elasticsearch? I have tried many different ways but doesn't seem to work. Haven't found any reference in the documentation claiming that it does not work for completion suggester. Example:
POST index1,index2/_search
{
"suggest" : {
"name_suggest" : {
"text" : "my_query",
"completion" : {
"field" : "name_suggest",
"size" : 7,
"fuzzy" :{}
}
}
},
"indices_boost" : [
{ "index1" : 2 },
{ "index2" : 1.5 }
]
}
The above does not return boosted scores. The scores are the same compared to running it without the indices_boost parameter.
Tried few options but these didn't work directly, instead, you can define the weight of a document at index-time, and these could be used as a workaround to get the boosted document, below is the complete example.
Index mapping same for index1, index2
{
"mappings": {
"properties": {
"suggest": {
"type": "completion"
},
"title": {
"type": "keyword"
}
}
}
}
Index doc 1 with weight in index-1
{
"suggest": {
"input": [
"Nevermind",
"Nirvana"
],
"weight": 30
}
}
Similar doc is inserted in index-2 with diff weight
{
"suggest": {
"input": [
"Nevermind",
"Nirvana"
],
"weight": 10 --> note less weight
}
}
And the simple search will now sort it according to weight
{
"suggest": {
"song-suggest": {
"prefix": "nir",
"completion": {
"field": "suggest"
}
}
}
}
And search result
{
"text": "Nirvana",
"_index": "index-1",
"_type": "_doc",
"_id": "1",
"_score": 34.0,
"_source": {
"suggest": {
"input": [
"Nevermind",
"Nirvana"
],
"weight": 30
}
}
},
{
"text": "Nirvana",
"_index": "index-2",
"_type": "_doc",
"_id": "1",
"_score": 30.0,
"_source": {
"suggest": {
"input": [
"Nevermind",
"Nirvana"
],
"weight": 10
}
}
}
]

search first element of a multivalue text field in elasticsearch

I want to search first element of array in documents of elasticsearch, but I can't.
I don't find it that how can I search.
For test, I created new index with fielddata=true, but I still didn't get the response that I wanted
Document
"name" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
Values
name : ["John", "Doe"]
My request
{
"query": {
"bool" : {
"must" : {
"script" : {
"script" : {
"source": "doc['name'][0]=params.param1",
"params" : {
"param1" : "john"
}
}
}
}
}
}
}
Incoming Response
"reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [name] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
You can use the following script that is used in a search request to return a scripted field:
{
"script_fields": {
"firstElement": {
"script": {
"lang": "painless",
"inline": "params._source.name[0]"
}
}
}
}
Search Result:
"hits": [
{
"_index": "stof_64391432",
"_type": "_doc",
"_id": "1",
"_score": 1.0,
"fields": {
"firstElement": [
"John" <-- note this
]
}
}
]
You can use a Painless script to create a script field to return a customized value for each document in the results of a query.
You need to use equality equals operator '==' to COMPARE two
values where the resultant boolean type value is true if the two
values are equal and false otherwise in the script query.
Adding a working example with index data, mapping, search query, and search result
Index Mapping:
{
"mappings":{
"properties":{
"name":{
"type":"text",
"fielddata":true
}
}
}
}
Index data:
{
"name": [
"John",
"Doe"
]
}
Search Query:
{
"script_fields": {
"my_field": {
"script": {
"lang": "painless",
"source": "params['_source']['name'][0] == params.params1",
"params": {
"params1": "John"
}
}
}
}
}
Search Result:
"hits": [
{
"_index": "test",
"_type": "_doc",
"_id": "1",
"_score": 1.0,
"fields": {
"my_field": [
true <-- note this
]
}
}
]
Arrays of objects do not work as you would expect: you cannot query
each object independently of the other objects in the array. If you
need to be able to do this then you should use the nested data type
instead of the object data type.
You can use the script as shown in my another answer if you want to just compare the value of the first element of the array to some other value. But based on your comments, it looks like your use case is quite different.
If you want to search the first element of the array you need to convert your data, into nested form. Using arrays of object at search time you can’t refer to “the first element” or “the last element”.
Adding a working example with index data, mapping, search query, and search result
Index Mapping:
{
"mappings": {
"properties": {
"name": {
"type": "nested"
}
}
}
}
Index Data:
{
"booking_id": 2,
"name": [
{
"first": "John Doe",
"second": "abc"
}
]
}
{
"booking_id": 1,
"name": [
{
"first": "Adam Simith",
"second": "John Doe"
}
]
}
{
"booking_id": 3,
"name": [
{
"first": "John Doe",
"second": "Adam Simith"
}
]
}
Search Query:
{
"query": {
"nested": {
"path": "name",
"query": {
"bool": {
"must": [
{
"match_phrase": {
"name.first": "John Doe"
}
}
]
}
}
}
}
}
Search Result:
"hits": [
{
"_index": "test",
"_type": "_doc",
"_id": "2",
"_score": 0.9400072,
"_source": {
"booking_id": 2,
"name": [
{
"first": "John Doe",
"second": "abc"
}
]
}
},
{
"_index": "test",
"_type": "_doc",
"_id": "3",
"_score": 0.9400072,
"_source": {
"booking_id": 3,
"name": [
{
"first": "John Doe",
"second": "Adam Simith"
}
]
}
}
]

Elasticsearch Date parsing error in 7.x version

Im using Elasticsearch 7.1 and i have defined the format in my index mappings as below :
"ManufacturerDate": {
"type": "date",
"format": "yyyy-MM-dd'T'HH:mm:ss.SSS'ZZ'|| yyyy-MM-dd'T'HH:mm:ss.SSS'ZZ'||yyyy-MM-dd'T'HH:mm:ss.SSSXXX"
}
But im getting date parsing error when searching against the date - "2020-07-09T00:12:22.011-00:00". The format yyyy-MM-dd'T'HH:mm:ss.SSSXXX is already defined as one of the accepted formats.
The error is
Failed to parse date field [2020-07-09T00:12:22.011-00:00] with format [yyyy-MM-dd'T'HH:mm:ss.SSS'ZZ'||yyyy-MM-dd'T'HH:mm:ss.SSS'ZZ'||yyyy-MM-dd'T'HH:mm:ss.SSSXXX]:
Can anyone please help?
Adding Working example with mapping and search query.
To know more about the Date data type refer to this documentation.
The search query mentioned below is for finding exact date type values.
To Return documents that contain terms within a provided range refer this
Mapping :
{
"mappings": {
"properties": {
"ManufacturerDate": {
"type": "date",
"format": "yyyy-MM-dd'T'HH:mm:ss.SSS'ZZ'||yyyy-MM-dd'T'HH:mm:ss.SSSXXX"
}
}
}
}
Search Query:
{
"query": {
"term": {
"ManufacturerDate": {
"value": "2020-07-09T00:12:22.011-00:00"
}
}
}
}'
Search Result:
"hits": [
{
"_index": "my_index",
"_type": "_doc",
"_id": "1",
"_score": 1.0,
"_source": {
"ManufacturerDate": "2020-07-09T00:12:22.011-00:00"
}
}
]
Update 1:
You can even use Constant score query
Search query:
{
"query": {
"constant_score": {
"filter": {
"term": {
"ManufacturerDate": "2020-07-09T00:12:22.011-00:00"
}
},
"boost": 1.2
}
}
}
Search Result:
"hits": [
{
"_index": "my_index",
"_type": "_doc",
"_id": "1",
"_score": 1.2,
"_source": {
"ManufacturerDate": "2020-07-09T00:12:22.011-00:00"
}
}
]
Update 2: By changing the order of patterns the query works (Using ES version 7.2)
Mapping:
{
"mappings": {
"properties": {
"ManufacturerDate": {
"type": "date",
"format": "yyyy-MM-dd'T'HH:mm:ss.SSSXXX||yyyy-MM-dd'T'HH:mm:ss.SSS'ZZ'||yyyy-MM-dd'T'HH:mm:ss.SSS"
}
}
}
}
Index data:
{
"ManufacturerDate": "2020-07-09T00:12:22.011-00:00"
}
Search Query:
{
"query": {
"constant_score": {
"filter": {
"term": {
"ManufacturerDate": "2020-07-09T00:12:22.011-00:00"
}
},
"boost": 1.2
}
}
}
Search Result :
"hits": [
{
"_index": "my_index5",
"_type": "_doc",
"_id": "1",
"_score": 1.2,
"_source": {
"ManufacturerDate": "2020-07-09T00:12:22.011-00:00"
}
}
]

Is it possible to boost suggestions based on the Elasticsearch index

I have multiple indices that I want suggestions from, but I want to score/order the suggestions based on the index they're from. I've successfully boosted searches based on indices (using indices_boost), but this doesn't seem to work for suggestions. I tried something like:
GET index1,index2/_search
{
"indices_boost" : [
{ "index1" : 9 },
{ "index2" : 1 }
],
"suggest": {
"mySuggest":{
"text":"someText",
"completion": {
"field":"suggestField",
"size":6
}
}
}
}
Is this doable?
At the moment I've resorted to sorting the suggestions in code.
I believe you can try to use category boost in context suggester to achieve the desired behavior. You need to attach a special category field to each suggestion document, which can be exactly the same as the index name.
How to use category context to boost suggestions
The mapping may look like this:
PUT food
{
"mappings": {
"properties" : {
"suggestField" : {
"type" : "completion",
"contexts": [
{
"name": "index_name",
"type": "category"
}
]
}
}
}
}
For demonstration purposes I will create another index, exactly like the one above but with name movie. (Index names can be arbitrary.)
Let's add the suggest documents:
PUT food/_doc/1
{
"suggestField": {
"input": ["timmy's", "starbucks", "dunkin donuts"],
"contexts": {
"index_name": ["food"]
}
}
}
PUT movie/_doc/2
{
"suggestField": {
"input": ["star wars"],
"contexts": {
"index_name": ["movie"]
}
}
}
Now we can run a suggest query with our boosts set:
POST food,movie/_search
{
"suggest": {
"my_suggestion": {
"prefix": "star",
"completion": {
"field": "suggestField",
"size": 10,
"contexts": {
"index_name": [
{
"context": "movie",
"boost": 9
},
{
"context": "food",
"boost": 1
}
]
}
}
}
}
}
Which will return something like this:
{
"suggest": {
"my_suggestion": [
{
"text": "star",
"offset": 0,
"length": 4,
"options": [
{
"text": "star wars",
"_index": "movie",
"_type": "_doc",
"_id": "2",
"_score": 9.0,
"_source": ...,
"contexts": {
"index_name": [
"movie"
]
}
},
{
"text": "starbucks",
"_index": "food",
"_type": "_doc",
"_id": "1",
"_score": 1.0,
"_source": ...,
"contexts": {
"index_name": [
"food"
]
...
Why didn't indices_boost work?
Seems like indices_boost parameter only affects the search, not suggest. _suggest used to be a standalone endpoint, but was deprecated, and probably this is the source of confusion.
Hope that helps!

Resources