Elasticsearch: How to highlight any field in document which contain searched string? - elasticsearch

I am trying to build a full-text search query over thousand of documents with dynamic structure.
But the highlight method works only for specifically named fields.
If I want to use search over _all or _source it doesn't show any hihlighted result.
I already tried many various and tried to "googling" but with no success.
Basic query:
POST tracking*/_search
{
"query": {
"query_string": {
"query": "ci1483967534008.6100622#czcholsint372_te"
}
},
"highlight": {
"require_field_match": false
}
}
will return:
"hits": {
"total": 8,
"max_score": 13.482396,
"hits": [
{
"_index": "tracking-2017.01.09",
"_type": "cyclone",
"_id": "Cyclone1-UAT-ci1483967534008.6100622#czcholsint372_te-Messaging.Message.MessageUnpackaged.Request",
"_score": 13.482396,
"_source": {
... truncated ...
"received": "2017-01-09T13:12:14.008Z",
"tags": [],
"#timestamp": "2017-01-09T13:12:14.008Z",
"size": "3169",
"pairing": " ci1483967534008.6100622#czcholsint372_te <60a93b9-159835b287e-159835b79041a66cd1#ip.net> ErpExJets_RDC1_ProcessPurchaseOrder_9.4.1_20170109131207169 ErpExJets_RDC1_ProcessPurchaseOrder_9.4.1_20170109131207169",
}
},
but no highlight even if the searched string is in the pairing field.
Is it possible at all?
Thanks
Reddy

Elastic doumentation mentions this as Note - "in order to perform highlighting, the actual content of the field is required. If the field in question is stored (has store set to true in the mapping) it will be used, otherwise, the actual _source will be loaded and the relevant field will be extracted from it."
So unless you have _all set to true use the following query.
{
"query": {
"query_string": {
"query": "ci1483967534008.6100622#czcholsint372_te"
}
},
"highlight": {
"require_field_match": false,
"fields": {
"pairing": {}
}
}
}
If you have _all set to true for source docuemnt use the following
{
"query": {
"query_string": {
"query": "ci1483967534008.6100622#czcholsint372_te"
}
},
"highlight": {
"require_field_match": false,
"fields": {
"_all": {}
}
}
}
Hope this helps.

Related

Elasticsearch query match + term boolean

I have documents in elasticsearch index with a "type" field, like this:
[
{
"id": 1,
"serviceDescription": "a bunch of text",
"serviceTitle": "title",
"serviceTags":["tag1","tag2"]
"type":"service"
},
{
"id": 2,
"companyDescription": "a bunch of text more",
"companyTitle": "title",
"companyTags":["tag1","tag2"]
"type":"company"
},...
]
I want to run a match query across all docs in my index, like this:
body = {
"query": {
"match": {
"_all":"sequencing"
}
}
}
but add a filter to only return results where the "type" field equals "service".
As far as I can understand your question, you want to query for sequencing query string, across all the fields, for that
you can use the multi_match query that builds on the match query to allow multi-field queries.
If no fields are provided, the multi_match query defaults to the
index.query.default_field index settings, which in turn defaults to *.
This extracts all fields in the mapping that are eligible to term queries and filters the metadata fields. All extracted fields are then
combined to build a query.
Search Query:
{
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "bunch of text"
}
}
],
"filter": {
"term": {
"type": "service"
}
}
}
}
}
Search Result:
"hits": [
{
"_index": "64867032",
"_type": "_doc",
"_id": "1",
"_score": 0.8630463,
"_source": {
"id": 1,
"serviceDescription": "a bunch of text",
"serviceTitle": "title",
"serviceTags": [
"tag1",
"tag2"
],
"type": "service"
}
}
]

Sort based on the service time of stores

My project contains some stores with their working time and I index them in ElasticSearch. Now there are some scenarios in my product:
Whenever the client requests for the stores which are available now, I use the following range filter:
bool: {
must: [
{ range: {startTime: { lte: now}} },
{ range: {endTime: { gte: now}} }
]
}
Let's call the result Online stores.
When the client requests for all stores, I have to give them all the documents, but I have to sort them, first online stores and then other stores.
I can do that by two queries, one for online and another one for offline store but I want to do that once. Any idea?
You can achieve this by using should as an "optional" clause:
If the bool query is in a query context and has a must or filter
clause then a document will match the bool query even if none of the
should queries match. In this case these clauses are only used to
influence the score.
The bool query takes a more-matches-is-better approach, so the score
from each matching must or should clause will be added together to
provide the final _score for each document.
The query might look like this:
POST my-should/doc/_search
{
"query": {
"bool": {
"must": {
"match_all": {}
},
"should": {
"bool": {
"must": [
{
"range": {
"startTime": {
"lte": "2018-06-24T16:39:59"
}
}
},
{
"range": {
"endTime": {
"gte": "2018-06-22T16:39:59"
}
}
}
],
"_name": "Online"
}
}
}
}
}
The match part of this bool query will define which documents will match, and the should part will boost those that also match additional criteria.
Note that here we used Named Queries to highlight that the "Online" part of the query was matched to a document. The response could look like this:
"hits": [
{
"_index": "my-should",
"_type": "doc",
"_id": "BKgZLWQBERN2JBe1CQ5t",
"_score": 3,
"_source": {
"startTime": "2018-06-23T16:39:59",
"endTime": "2018-06-23T16:39:59"
},
"matched_queries": [
"Online"
]
},
{
"_index": "my-should",
"_type": "doc",
"_id": "BagaLWQBERN2JBe12A7y",
"_score": 1,
"_source": {
"startTime": "2018-06-20T16:39:59",
"endTime": "2018-06-21T16:39:59"
}
}
]
Hope that helps!

Sort keyword field array within ElasticSearch document by relevance

I've got an ElasticSearch index that looks something like this:
{
"mappings": {
"article": {
"properties": {
"title": { "type": "string" },
"tags": {
"type": "keyword"
},
}
}
}
And data that looks something like this:
{ "title": "Something about Dogs", "tags": ["articles", "dogs"] },
{ "title": "Something about Cats", "tags": ["articles", "cats"] },
{ "title": "Something about Dog Food", "tags": ["articles", "dogs", "dogfood"] }
If I search for dog, I get the first and third documents, as I'd expect. And I can weight the search documents the way I like (in reality, I'm using a function_score query to weight on a bunch of fields irrelevant to this question).
What I'd like to do is sort the tags field so that the most relevant tags are returned first, without affecting the sort order of the documents themselves. So I'm hoping for a result like this:
{ "title": "Something about Dog Food", "tags": ["dogs", "dogfood", "articles"] }
Instead of what I get now:
{ "title": "Something about Dog Food", "tags": ["articles", "dogs", "dogfood"] }
The documentation on sort and function score don't cover my case. Any help appreciated. Thanks!
You cannot sort the _source (your array of tags) of the documents given its "matching" capability. One way of doing this is by using nested fields and inner_hits that allows you to sort the matching nested fields.
My suggestion is to transform your tags in a nested field (I chose keyword there just by simplicity, but you can also have text and the analyzer of your choice):
PUT test
{
"mappings": {
"article": {
"properties": {
"title": {
"type": "string"
},
"tags": {
"type": "nested",
"properties": {
"value": {
"type": "keyword"
}
}
}
}
}
}
}
And use this kind of query:
GET test/_search
{
"_source": {
"exclude": "tags"
},
"query": {
"bool": {
"must": [
{
"match": {
"title": "dogs"
}
},
{
"nested": {
"path": "tags",
"query": {
"bool": {
"should": [
{
"match_all": {}
},
{
"match": {
"tags.value": "dogs"
}
}
]
}
},
"inner_hits": {
"sort": {
"_score": "desc"
}
}
}
}
]
}
}
}
Where you try to match on the tags nested field value for the same text you try to match on title. Then, using inner_hits sorting, you can actually sort the nested values based on their inner scoring.
#Val's suggestion is very good, but is good as long as for your "relevant tags" you are ok with just a simple text matching as a substring (i1.indexOf(params.search)). His solution's biggest advantage is that you don't have to change the mapping.
My solution's big advantage is that you are actually using Elasticsearch true search capabilities to determine the "relevant" tags. But the drawback is that you need nested field instead of the regular simple keyword.
What you get from a search call are the source documents. The documents in the response are returned in exactly the same form as when you indexed them, which means that if you indexed ["articles", "dogs", "dogfood"], you'll always get that array in that unaltered form.
One way to get around this is to declare a script_field that applies a small script to sort your array and return the result of that sort.
What the script does is simply move the terms that contain the search term in the front of the list
{
"_source": ["title"],
"query" : {
"match_all": {}
},
"script_fields" : {
"sorted_tags" : {
"script" : {
"lang": "painless",
"source": "return params._source.tags.stream().sorted((i1, i2) -> i1.indexOf(params.search) > -1 ? -1 : 1).collect(Collectors.toList())",
"params" : {
"search": "dog"
}
}
}
}
}
This will return something like this, as you can see the sorted_tags array contains the terms as you expect.
{
"took": 18,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "tests",
"_type": "article",
"_id": "1",
"_score": 1,
"_source": {
"title": "Something about Dog Food"
},
"fields": {
"sorted_tags": [
"dogfood",
"dogs",
"articles"
]
}
}
]
}
}

Elasticsearch query_string search complex keyword by its terms

Now, I know that keyword is not supposed to comprise unstructured text, but let's say that for some reason it just so happened that such text was written into keyword field.
When searching such documents using match or term queries, the document is not found, but when searched using query_string the document is found by a partial match(a "term" inside keyword). I don't understand how this is possible when the documentation for Elasticsearch clearly states that keyword is inverse-indexed as is, without terms tokenization.
Example:
My index mapping:
PUT my_index
{
"mappings": {
"my_type": {
"properties": {
"full_text": {
"type": "text"
},
"exact_value": {
"type": "keyword"
}
}
}
}
}
Then I put a document in:
PUT my_index/my_type/2
{
"full_text": "full text search",
"exact_value": "i want to find this trololo!"
}
And imagine my surprise when I get a document by keyword term, not a full match:
GET my_index/my_type/_search
{
"query": {
"match": {
"exact_value": "trololo"
}
}
}
- no result;
GET my_index/my_type/_search
{
"query": {
"term": {
"exact_value": "trololo"
}
}
}
- no result;
POST my_index/_search
{"query":{"query_string":{"query":"trololo"}}}
- my document is returned(!):
"hits": {
"total": 1,
"max_score": 0.27233246,
"hits": [
{
"_index": "my_index",
"_type": "my_type",
"_id": "2",
"_score": 0.27233246,
"_source": {
"full_text": "full text search",
"exact_value": "i want to find this trololo!"
}
}
]
}
when you do a query_string query on elastic like below
POST index/_search
{
"query": {
"query_string": {
"query": "trololo"
}
}
}
This actually do a search on _all field which if you don't mention get analyzed by standard analyzer in elastic.
If you specify the field in query like the following you won't get records for keyword field.
POST my_index/_search
{
"query": {
"query_string": {
"default_field": "exact_value",
"query": "field"
}
}
}

Elastic Search fulltext search query and filters

I wanna perform a full-text search, but I also wanna use one or many possible filters. The simplified structure of my document, when searching with /things/_search?q=*foo*:
{
"took": 5,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "things",
"_type": "thing",
"_id": "63",
"_score": 1,
"fields": {
"name": [
"foo bar"
],
"description": [
"this is my description"
],
"type": [
"inanimate"
]
}
}
]
}
}
This works well enough, but how do I combine filters with a query? Let's say I wanna search for "foo" in an index with multiple documents, but I only want to get those with type == "inanimate"?
This is my attempt so far:
{
"query": {
"filtered": {
"query": {
"query_string": {
"query": "*foo*"
}
},
"filter": {
"bool": {
"must": {
"term": { "type": "inanimate" }
}
}
}
}
}
}
When I remove the filter part, it returns an accurate set of document hits. But with this filter-definition it does not return anything, even though I can manually verify that there are documents with type == "inanimate".
Since you have not done explicit mapping, term query is looking for an exact match. you need to add "index : not_analyzed" to type field and then your query will work.
This will give you correct documents
{
"query": {
"match": {
"type": "inanimate"
}
}
}
but this is not the solution, You need do explicit mapping as I said.

Resources