Position as result, instead of highlighting - elasticsearch

I try to get positions instead of highlighted text as the result of elasticsearch query.
Create the index:
PUT /test/
{
"mappings": {
"article": {
"properties": {
"text": {
"type": "text",
"analyzer": "english"
},
"author": {
"type": "text"
}
}
}
}
}
Put a document:
PUT /test/article/1
{
"author": "Just Me",
"text": "This is just a simple test to demonstrate the audience the purpose of the question!"
}
Search the document:
GET /test/article/_search
{
"query": {
"bool": {
"must": [
{
"match_phrase": {
"text": {
"query": "simple test",
"_name": "must"
}
}
}
],
"should": [
{
"match_phrase": {
"text": {
"query": "need help",
"_name": "first",
"slop": 2
}
}
},
{
"match_phrase": {
"text": {
"query": "purpose question",
"_name": "second",
"slop": 3
}
}
},
{
"match_phrase": {
"text": {
"query": "don't know anything",
"_name": "third"
}
}
}
],
"minimum_should_match": 1
}
},
"highlight": {
"fields": {
"text": {}
}
}
}
When i run this search, i get the result like so:
This is just a simple test to <em>demonstrate</em> the audience the purpose of the <em>question</em>!
I'm not interested in getting the results surrounded with em tags, but i want to get all the positions of the results like so:
"hits": [
{ "start_offset": 30, "end_offset": 40 },
{ "start_offset": 74, "end_offset": 81 }
]
Hope you get my idea!

To have the offset position of a word in a text you should add to your index mapping a termvector - doc here . As written in the doc, you have to enable this param at index time:
"term_vector": "with_positions_offsets_payloads"
For the specific query, please follow the linked doc page

Related

how to exclude search words in synonyms filter in elasticsearch

While I'm adding table and tables as synonym filter in elastic search, I need to filter out the results for table fan. How to achieve this in elastic search
Could we build a taxonomy of inclusion and exclusion lists filters in settings rather than at run time queries in elastic search
GET <indexName>/_search
{
"query": {
"bool": {
"must_not": [
{
"match": {
"<fieldName>": {
"query": "table fan", // <======= Below operator will applied b/w table(&synonyms) And fan(&synonyms)
"operator": "AND"
}
}
}
]
}
}
}
You can use above query to exclude all the documents having both 'table', 'fan' and their corresponding synonyms.
OR:
If you want to play with multiple logical operators. e.g Given me all the documents which doesn't contain either "table fan" Or "ac" you can use simple_query_string
GET <indexName>/_search
{
"query": {
"bool": {
"must_not": [
{
"simple_query_string": {
"query": "(table + fan) | ac", // <=== '+'='and', '|'='or', '-'='not'
"fields": [
"<fieldName>" // <==== use multiple field names, wildcard also supported
]
}
}
]
}
}
}
Adding a working example with index data, mapping, search query and search result
Index Mapping:
{
"settings": {
"index": {
"analysis": {
"filter": {
"synonym_filter": {
"type": "synonym",
"synonyms": [
"table, tables"
]
}
},
"analyzer": {
"synonym_analyzer": {
"tokenizer": "standard",
"filter": [
"lowercase",
"synonym_filter"
]
}
}
}
}
},
"mappings": {
"properties": {
"title": {
"type": "text",
"analyzer": "synonym_analyzer",
"search_analyzer": "standard"
}
}
}
}
Analyze API
POST/_analyze
{
"analyzer" : "synonym_analyzer",
"text" : "table fan"
}
The following tokens are generated:
{
"tokens": [
{
"token": "table",
"start_offset": 0,
"end_offset": 5,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "tables",
"start_offset": 0,
"end_offset": 5,
"type": "SYNONYM",
"position": 0
},
{
"token": "fan",
"start_offset": 6,
"end_offset": 9,
"type": "<ALPHANUM>",
"position": 1
}
]
}
Index Data:
{ "title": "table and fan" }
{ "title": "tables and fan" }
{ "title": "table fan" }
{ "title": "tables fan" }
{ "title": "table chair" }
Search Query:
{
"query": {
"bool": {
"must": {
"match": {
"title": "table"
}
},
"filter": {
"bool": {
"must_not": [
{
"match_phrase": {
"title": "table fan"
}
},
{
"match_phrase": {
"title": "table and fan"
}
}
]
}
}
}
}
}
You can also use match query in place of match_phrase query
{
"query": {
"bool": {
"must": {
"match": {
"title": "table"
}
},
"filter": {
"bool": {
"must_not": [
{
"match": {
"title": {
"query": "table fan",
"operator": "AND"
}
}
}
]
}
}
}
}
}
Search Result:
"hits": [
{
"_index": "synonym",
"_type": "_doc",
"_id": "2",
"_score": 0.06783115,
"_source": {
"title": "table chair"
}
}
]
Update 1:
Could we build a taxonomy of inclusion and exclusion lists filters in
settings rather than at run time queries in elastic search
Mapping is the process of defining how a document, and the fields it contains, are stored and indexed.Refer this ES documentation on mapping to understand what mapping is used to define.
Please refer to this documentation on Dynamic template that allow you to define custom mappings that can be applied to dynamically added fields

ElasticSearch array data match multiple properties in nested element with AND condition

I'm facing a problem where I have two documents each containing an array of objects. I like to search for one document containing two properties for a nested object (matching both at the same time in the same object) but I always get both documents.
I created the documents with:
POST /respondereval/_doc
{
"resp_id": "1236",
"responses": [
{"key": "meta","text":"abc"},
{"key": "property 1", "text": "yes"},
{"key": "property 2", "text": "yes"},
]
}
POST /respondereval/_doc
{
"resp_id": "1237",
"responses": [
{"key": "meta","text":"abc"},
{"key": "property 1", "text": "no"},
{"key": "property 2", "text": "yes"},
]
}
I defined an index for them to prevent ES to flat out the objects like this:
PUT /respondereval
{
"mappings" : {
"properties": {
"responses" : {
"type": "nested"
}
}
}
}
I now like to search for the first document (resp_id 1236) with the following query:
GET /respondereval/_search
{
"query": {
"nested": {
"path": "responses",
"query": {
"bool": {
"must": [
{ "match": { "responses.key": "property 1" } },
{ "match": { "responses.text": "yes" } }
]
}
}
}
}
}
This should only return one element which matches both conditions at the same time.
Unfortunatly, it always returns both documents. I assume it's because at some point, ES still flattens the values in the nested objects arrays into something like this (simplified):
resp_id 1236: "key":["gender", "property 1", "property 2"], "text:["abc", "yes", "yes"]
resp_id 1237: "key":["gender", "property 1", "property 2"], "text:["abc", "no", "yes"]
which both contain the property1 and yes.
What is the correct way to solve this so that only documents are returned which contains an element in the objects array which matches both conditions ("key": "property 1" AND "text": "yes") at the same time?
The problem is with your mapping. You have text mapping which uses standard analyser by default.
Standard analyzer creates tokens on whitespaces. So
property 1 will be tokenised as
{
"tokens": [
{
"token": "property",
"start_offset": 0,
"end_offset": 8,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "1",
"start_offset": 9,
"end_offset": 10,
"type": "<NUM>",
"position": 1
}
]
}
Similarly property 2 also.
Hence both the documents are returned.
And when you search for yes, it matched from second text in the second document. property 1 matches property analysed token of second key in the document.
To make it work: - use keyword variation
{
"query": {
"nested": {
"path": "responses",
"query": {
"bool": {
"must": [
{ "match": { "responses.key.keyword": "property 1" } },
{ "match": { "responses.text.keyword": "yes" } }
]
}
}
}
}
}
It would be proper:
{
"query": {
"nested": {
"path": "responses",
"query": {
"bool": {
"must": [
{ "match_phrase": { "responses.key": "property 1" } },//phrase queries
{ "match": { "responses.text": "yes" } }
]
}
}
}
}
}
Have you directly tried the must query without nested.path
{
"query": {
"bool": {
"must": [
{
"match": {
"responses.key": "property 1"
}
},
{
"match": {
"responses.text": "yes"
}
}
]
}
}
}

How to boost specific terms in elastic search?

If I have the following mapping:
PUT /book
{
"settings": {},
"mappings": {
"properties": {
"title": {
"type": "text"
},
"author": {
"type": "text"
}
}
}
}
How can i boost specific authors higher than others?
In case of the below example:
PUT /book/_doc/1
{
"title": "car parts",
"author": "john smith"
}
PUT /book/_doc/2
{
"title": "car",
"author": "bob bobby"
}
PUT /book/_doc/3
{
"title": "soap",
"author": "sam sammy"
}
PUT /book/_doc/4
{
"title": "car designs",
"author": "joe walker"
}
GET /book/_search
{
"query": {
"bool": {
"should": [
{ "match": { "title": "car" }},
{ "match": { "title": "parts" }}
]
}
}
}
How do I make it so my search will give me books by "joe walker" are at the top of the search results?
One solution is to make use of function_score.
The function_score allows you to modify the score of documents that are retrieved by a query.
From here
Base on your mappings try to run this query for example:
GET book/_search
{
"query": {
"function_score": {
"query": {
"bool": {
"should": [
{
"match": {
"title": "car"
}
},
{
"match": {
"title": "parts"
}
}
]
}
},
"functions": [
{
"filter": {
"match": {
"author": "joe walker"
}
},
"weight": 30
}
],
"max_boost": 30,
"score_mode": "max",
"boost_mode": "multiply"
}
}
}
The query inside function_score is the same should query that you used.
Now we want to take all the results from the query and give more weight (increase the score) to joe walker's books, meaning prioritize its books over the others.
To achieved that we created a function (inside functions) that compute a new score for each document returned by the query filtered by joe walker books.
You can play with the weight and other params.
Hope it helps

ElastiSearch Query: How to do inline "calculation" between fields, and then use it as boost variable?

I have an Books Index with fields something like this:
{
"title": "To Kill a Mockingbird",
"summary": "To Kill a Mockingbird takes place in Alabama during the Depression..",
"type": "book",
"views": 36
},
{
"title": "The Genius of Birds",
"summary": "The Genius Of Birds shines a new light on a genuinely underrated kind..",
"type": "book",
"views": 10
},
{
"title": "Handbook of Bird Biology",
"summary": "The Handbook of Bird Biology is an essential reference for birdwatchers..",
"type": "book",
"views": 27
}
In ElasticSearch v5.1, below is my current simple Query which is working on it's own:
{
"query": {
"bool": {
"filter": {
"bool": {
"must": [
{
"term": {
"type": "book"
}
}
]
}
},
"must": {
"multi_match": {
"query": "the bird",
"fields": [
"title",
"summary"
]
}
}
}
}
}
(Searching for the words the bird from the fields: title, summary where the type must be book)
This gives me a simple result based on title and summary fields. But i need it to be modified a little bit more.
Is it possible to modify the Query to look something like:
..
"must": {
"multi_match": {
"query": "the bird",
"fields": [
"title^(0.1*views)",
"summary"
]
}
}
..
I don't know how to call it in ES, but basically i want to boost a field (the title) by another field (the view).
Or in the simplest form, something like:
field1^(field2)
Thanks Aarchit Saxena for the hint in the comment section. Now i know it is called field_value_factor, and then by exploring further from there, i've now finally managed to get the query i needed.
The original query (above) has became like this now:
{
"query": {
"function_score": {
"query": {
"bool": {
"filter": {
"bool": {
"must": [
{
"term": {
"type": "book"
}
}
]
}
},
"must": {
"multi_match": {
"query": "the bird",
"fields": [
"title",
"summary"
]
}
}
}
},
"functions": [
{
"field_value_factor": {
"field": "views",
"factor": 1,
"modifier": "none",
"missing": 1
}
}
],
"boost": 1,
"boost_mode": "multiply"
}
}
}
Thank you.

Bool AND search in properties in ElasticSearch

I've got a very small dataset of documents put in ES :
{"id":1, "name": "John", "team":{"code":"red", "position":"P"}}
{"id":2, "name": "Jack", "team":{"code":"red", "position":"S"}}
{"id":3, "name": "Emily", "team":{"code":"green", "position":"P"}}
{"id":4, "name": "Grace", "team":{"code":"green", "position":"P"}}
{"id":5, "name": "Steven", "team":[
{"code":"green", "position":"S"},
{"code":"red", "position":"S"}]}
{"id":6, "name": "Josephine", "team":{"code":"red", "position":"S"}}
{"id":7, "name": "Sydney", "team":[
{"code":"red", "position":"S"},
{"code":"green", "position":"P"}]}
I want to query ES for people who are in the red team, with position P.
With the request
curl -XPOST 'http://localhost:9200/teams/aff/_search' -d '{
"query": {
"bool": {
"must": [
{
"match": {
"team.code": "red"
}
},
{
"match": {
"team.position": "P"
}
}
]
}
}
}'
I've got a wrong result.
ES gives
"name": "John",
"team":
{ "code": "red", "position": "P" }
and
"name": "Sydney",
"team":
[
{ "code": "red", "position": "S"},
{ "code": "green", "position": "P"}
]
For the last entry, ES took the property code=red in the first record and took the property position=P in the second record.
How can I specify that the search must match the 2 two terms in the same record (within or not a list of nested records) ?
In fact, the good answer is only the document 1, with John.
Here is the gist that creates the dataset :
https://gist.github.com/flrt/4633ef59b9b9ec43d68f
Thanks in advance
When you index document like
{
"name": "Sydney",
"team": [
{"code": "red", "position": "S"},
{"code": "green","position": "P"}
]
}
ES implicitly create inner object for your field (team in particular example) and flattens it to structure like
{
'team.code': ['red', 'green'],
'team.position: ['S', 'P']
}
So you lose your order. To avoid this you need explicitly put nested mapping, index your document as always and query them with nested query
So, this
PUT so/nest/_mapping
{
"nest": {
"properties": {
"team": {
"type": "nested"
}
}
}
}
PUT so/nest/
{
"name": "Sydney",
"team": [
{
"code": "red",
"position": "S"
},
{
"code": "green",
"position": "P"
}
]
}
GET so/nest/_search
{
"query": {
"nested": {
"path": "team",
"query": {
"bool": {
"must": [
{
"match": {
"team.code": "red"
}
},
{
"match": {
"team.position": "P"
}
}
]
}
}
}
}
}
will result with empty hits.
Further reading on relation management: https://www.elastic.co/blog/managing-relations-inside-elasticsearch
You can use a Nested Query so that your searches happen individually on the subdocuments in the team array, rather than across the entire document.
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "team",
"query": {
"bool": {
"must": [
{ "match": { "team.code": "red" } },
{ "match": { "team.position": "P" } }
]
}
}
}
}
]
}
}
}

Resources