Elastic search nested match_phrase issue - elasticsearch

We are doing match_phrase query on nested objects, where nested object has a string value only.
We intended to find string phrase occurrences.
Lets suppose,
1) Mapping is as follows.
"attr": {
"type": "nested",
"properties": {
"attr": {
"type": "multi_field",
"fields": {
"attr": { "type": "string", "index": "analyzed", "include_in_all": true, "analyzer": "keyword" },
"untouched": { "type": "string", "index": "analyzed", "include_in_all": false, "analyzer": "not_analyzed" }
}
}
}
}
2) Data is like.
Object A:
"attr": [
{
"attr": "beverage"
},
{
"attr": "apple wine"
}
]
Object B:
"attr": [
{
"attr": "beverage"
},
{
"attr": "apple"
},
{
"attr": "wine"
}
]
3) Therefore, on query like
{
"query": {
"match": {
"_all": {
"query": "apple wine",
"type": "phrase"
}
}
}
}
We are expecting only Object A, but unfortunately Object B is also coming.
Look forward to your suggestions please.

In your case, separate array values should have large gaps in their offsets to avoid phrase matching.
There is a default configurable gap between instances of the same field, but the default value for this gap is 0.
You should change it in the field mapping:
"attr": { "type": "string",
"index": "analyzed",
"include_in_all": true,
"analyzer": "keyword",
"position_offset_gap": 100
}

You will also need to tell the query to search all terms in one nested doc:
"query": {
"nested": {
"path": "attr",
"query": {
"match": {
"attr": {
"query": "apple wine",
"operator": "and"
}
}
}
}
}
A good source of information is http://www.spacevatican.org/2012/6/3/fun-with-elasticsearch-s-children-and-nested-documents/

Related

Elasticsearch: Full text search

I'm trying to build an Elasticsearch full-text search query with the following text "Gold Cartier watches" on multiple fields.
I have to follow this rule: First find all "Gold" documents. From retrieve "Gold" documents, find all "Cartier" documents and from them, find all "watches" documents.
This is my multi_match query:
{
"query": {
"multi_match": {
"query": "Fred or rose",
"fields": [
"name",
"status",
"categories.name",
"brand.name",
"reference"
]
}
}
}
There is my mapping
{
"product": {
"mappings": {
"product": {
"dynamic_date_formats": [],
"properties": {
"available": {
"type": "text"
},
"brand": {
"properties": {
"available": {
"type": "text"
},
"name": {
"type": "keyword"
},
"shopProductBrands": {
"properties": {
"available": {
"type": "text"
},
"priority": {
"type": "integer"
},
"slug": {
"type": "keyword"
}
}
},
"slug": {
"type": "keyword"
}
}
},
"categories": {
"type": "nested",
"properties": {
"available": {
"type": "text"
},
"brand": {
"properties": {
"available": {
"type": "text"
},
"name": {
"type": "keyword"
},
"slug": {
"type": "keyword"
}
}
},
"name": {
"type": "keyword"
},
"parent": {
"type": "keyword"
},
"slug": {
"type": "keyword"
}
}
},
"createdAt": {
"type": "date",
"format": "date_time_no_millis"
},
"longDescription": {
"type": "text",
"analyzer": "french_search"
},
"name": {
"type": "text",
"boost": 15,
"fields": {
"raw": {
"type": "keyword"
}
},
"analyzer": "french_search"
},
"purchasePrice": {
"type": "double"
},
"rawPrice": {
"type": "double"
},
"reference": {
"type": "keyword",
"boost": 10
},
"shortDescription": {
"type": "text",
"boost": 3,
"analyzer": "french_search"
},
"slug": {
"type": "keyword"
},
"status": {
"type": "text"
},
"updatedAt": {
"type": "date",
"format": "date_time_no_millis"
}
}
}
}
}
}
My search will retrieve all "Gold", "Cartier" and "watches" documents combined.
How can I build a query that follow my rule ?
Thanks
I'm not sure that there's an easy solution. I think the closest you can get is to use cross_fields with "operator": "and" and only search fields that have the same analyzer. Can you add "french_search" versions of each of these fields?
cross_fields analyzes the query string into individual terms, then
looks for each term in any of the fields, as though they were one big
field.
However:
The cross_field type can only work in term-centric mode on fields that
have the same analyzer. ... If there are multiple groups, they are
combined with a bool query.
So this query:
{
"query": {
"multi_match": {
"type": "cross_fields",
"query": "gold Cartier watches",
"fields": [
"name",
"status",
"categories.name",
"brand.name",
"reference"
]
}
}
}
Will become something like this:
"query": {
"bool": {
"should": [
{
"multi_match": {
"query": "gold Cartier watches",
"fields": ["name"]
}
},
{
"multi_match": {
"query": "gold Cartier watches",
"fields": ["status"]
}
},
{
"multi_match": {
"query": "gold Cartier watches",
"fields": [
"categories.name",
"brand.name",
"reference"
]
}
}
]
}
}
That query is too loose, but adding "operator": "and" or "minimum_should_match": "100%" would be too strict.
It's not pretty or efficient, but you could do application-side term parsing and build a boolean query. Something like this:
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "gold",
"fields": [
"name",
"status",
...
"reference"
]
}
},
{
"multi_match": {
"query": "Cartier",
"fields": [
"name",
"status",
...
"reference"
]
}
}
...
]
}
}
You can use this approach
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html#_boolean_operators
The preferred operators are + (this term must be present) and - (this term must not be present). All other terms are optional. For example, this query:
quick brown +fox -news
states that:
fox must be present
news must not be present
quick and brown are optional — their presence increases the relevance
The familiar boolean operators AND, OR and NOT (also written &&, || and !) are also supported but beware that they do not honor the usual precedence rules, so parentheses should be used whenever multiple operators are used together. For instance, the previous query could be rewritten as:
((quick AND fox) OR (brown AND fox) OR fox) AND NOT news
U can also use boosting for weighing-up result for a specific term https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html#_boosting

Elasticsearch query and Kibana not working as expected

I am trying to learn Elasticsearch and I am using Kibana to visualise things. I cannot seem to figure out what is wrong with my mapping and queries though.
I am trying to store photo metadata (iptc data). And I have the following mapping for it:
{
"settings": {
"index": {
"analysis": {
"filter": {},
"analyzer": {
"keyword_analyzer": {
"filter": [
"lowercase",
"asciifolding",
"trim"
],
"char_filter": [],
"type": "custom",
"tokenizer": "keyword"
},
"edge_ngram_analyzer": {
"filter": [
"lowercase"
],
"tokenizer": "edge_ngram_tokenizer"
},
"edge_ngram_search_analyzer": {
"tokenizer": "lowercase"
}
},
"tokenizer": {
"edge_ngram_tokenizer": {
"type": "edge_ngram",
"min_gram": 2,
"max_gram": 5,
"token_chars": [
"letter"
]
}
}
}
}
},
"mappings": {
"doc": {
"properties": {
"photo_added": {
"type": "date",
"index": true,
"format": "yyyy-MM-dd' 'H:m:s"
},
"photo_id": {
"type": "long",
"index": true
},
"photo_owner": {
"type": "long",
"index": true
},
"project": {
"type": "long",
"index": true
},
"iptc": {
"type": "nested",
"properties": {
"caption/abstract": {
"type": "text",
"index": true
},
"copyright notice": {
"type": "text",
"index": true
},
"keywords": {
"type": "text",
"index": true,
"fields": {
"keywordstring": {
"type": "text",
"analyzer": "keyword_analyzer"
},
"edgengram": {
"type": "text",
"analyzer": "edge_ngram_analyzer",
"search_analyzer": "edge_ngram_search_analyzer"
},
"completion": {
"type": "completion"
},
"keyword": {
"type": "keyword"
}
}
},
"object name": {
"type": "text",
"index": true
},
"province/state": {
"type": "text",
"index": true
},
"sub-location": {
"type": "text",
"index": true
},
"time created": {
"type": "text",
"index": true
},
"urgency": {
"type": "text",
"index": true
},
"writer/editor": {
"type": "text",
"index": true
}
}
}
}
}
}
}
The thing is: I want a query that searching through the keywords and caption for the existence of the search-text. Whenever the search-text is found within keywords, the score is boosted because that indicated that the photo is of higher relevance. So I formulated the following query (where value is the search-text):
GET /photos/_search
{
"query": {
"dis_max": {
"queries": [
{
"fuzzy": {
"iptc.keywords": {
"value": "value",
"fuzziness": 1,
"boost": 1
}
}
},
{
"fuzzy": {
"iptc.caption/abstract": {
"value": "value",
"fuzziness": 1
}
}
}
]
}
}
}
However it does not seem to find any matches despite the fact that the value is in the documents... And I cannot seem to construct a simple match query that will match against the exact text... for example:
GET /photos/doc/_search?error_trace=true
{
"query": {
"match": {
"iptc.caption/abstract": "exact value from one of the documents"
}
}
}
Will return 0 results... the search-text is however exactly in the document.. I don't know what to do know. To make matters worse (for me, since I am near bald thanks due to the frustration that is causing me) Kibana seems to act up.. I am almost sure it is something really simple (the document date is within 5 years) but when filtering for the exact copy pasted value it returns 0 results... as shown in the screenshot...
I am going crazy here. Does someone know how to fix this or what in earths name I am doing wrong?
I found the solution which is in the documentation of Elastic.
Because nested documents are indexed as separate documents, they can only be accessed within the scope of the nested query, the nested/reverse_nested aggregations, or nested inner hits.
Documentation
So I constructed the following query which works.
{
"query": {
"nested": {
"path": "iptc",
"query": {
"bool": {
"should": [
{
"dis_max": {
"queries": [
{
"fuzzy": {
"iptc.keywords": {
"value": "Feyenoord",
"boost": 1
}
}
},
{
"fuzzy": {
"iptc.caption/abstract": {
"value": "Feyenoord",
"fuzziness": 1
}
}
}
]
}
}
]
}
}
}
}

Match document if it contains multiple nested documents in elasticsearch

I have a document that contains arrays of nested documents. I have a requirement to return matches if the document contains all of the specified nested documents.
here is the relevant part of the mapping:
"element": {
"dynamic": "false",
"properties": {
"tenantId": {
"type": "string",
"index": "not_analyzed"
},
"fqn": {
"type": "string",
"index": "not_analyzed"
},
"id": {
"type": "string",
"index": "not_analyzed"
},
"name": {
"type": "string",
"index": "not_analyzed"
},
"type": {
"type": "string",
"index": "not_analyzed"
},
"location": {
"type": "string",
"index": "not_analyzed"
},
"tags": {
"type": "nested",
"properties": {
"id": {
"type": "string",
"index": "not_analyzed"
},
"dataSourceId": {
"type": "long",
"index": "not_analyzed"
},
"name": {
"type": "string",
"index": "not_analyzed"
},
"value": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
The goal is to be able to return elements that contain all of a list of tags (although the element is permitted to contain additional tags beyond the search requirement).
Here is what I have so far:
{
"query": {
"bool": {
"filter": {
"nested": {
"path": "tags",
"query": {
"bool": {
"must": [
{
"bool": {
"must":{
"term": { "tags.name": "name1" },
"term": { "tags.value": "value1" }
}
}
},
{
"bool": {
"must":{
"term": { "tags.name": "name2" },
"term": { "tags.value": "value2" }
}
}
}
]
}
}
}
}
}
}
}
The problem with this approach is that it returns 0 hits with multiple tag values (it works fine for a single value). I believe that this is because the query is requiring that a tag have multiple names and values in order to match, which obviously can't happen. Does anyone know how to query for elements that contain all of a list of tags?
edit: this is using elasticsearch 5.0
We figured it out. The answer was to create two nested queries, instead of having two clauses to the same nested query.
{
"query":{
"bool":{
"must":[{
"nested":{
"path":"tags",
"query":{
"bool":{
"must":[
{"term":{"tags.name":"name1"}},
{"term":{"tags.value":"value1"}}
]
}
}
}
},
{
"nested":{
"path":"tags",
"query":{
"bool":{
"must":[
{"term":{"tags.name":"name2"}},
{"term":{"tags.value":"value2"}}
]
}
}
}
}]
}
}
}

Elastic search - match phrase query on copy_to field

I'm trying to fix some multi term search, specifically when searching something like:
"This awesome excursion" to resolve when a user types: "This awesom". Managed to get it to work with a match_phrase_query, instead of it previously using a query_string query.
Now though this breaks searching on compound fields. For finding users in the system I created a copy_to field, for things like salutation, first name, middle name and last name to populate. So it looks like this:
"contact": {
"properties": {
"contact_full_name": {
"type": "string"
},
"first_name": {
"copy_to": "contact_full_name",
"fields": {
"raw": {
"analyzer": "case_insensitive_sort",
"ignore_above": 10922,
"type": "string"
}
},
"type": "string"
},
"last_name": {
"copy_to": "contact_full_name",
"fields": {
"raw": {
"analyzer": "case_insensitive_sort",
"ignore_above": 10922,
"type": "string"
}
},
},
"middle_name": {
"copy_to": "contact_full_name",
"fields": {
"raw": {
"analyzer": "case_insensitive_sort",
"ignore_above": 10922,
"type": "string"
}
},
"type": "string"
},
"salutation": {
"copy_to": "contact_full_name",
"fields": {
"raw": {
"analyzer": "case_insensitive_sort",
"ignore_above": 10922,
"type": "string"
}
},
"type": "string"
}
}
}
Now, I then post the following query:
{
"fields": "id",
"from": 0,
"query": {
"filtered": {
"query": {
"bool": {
"minimum_should_match": "1",
"must_not": {
"term": {
"deleted": "true"
}
},
"should": {
"match": {
"contact_full_name": {
"query": "John G",
"type": "phrase_prefix"
}
}
}
}
}
}
},
"size": 50
}
And i get no results. If I use "John" as the term, it returns the result fine. I have a feeling it has to do with what order the fields are copied. Is there a way for me to work around this, or do I need to use an alternate search to make this work?
The values stored:
first name: John
last name: Smith
middle name: Glenn
salutation: Mr

How can I retrieve matching children only?

Consider a very simple model where we have locations and each location can have zero or more events. A location would have properties such as name, description and geo point data (lon/lat). An event should be attached to one location (its parent) and should have a name and description.
{
"location" : {
"properties": {
"name": { "type": "string", "boost": 2.0, "analyzer": "snowball" },
"description": { "type": "string", "analyzer": "snowball" },
"geo": { "type": "geo_point" },
"exhibits": {
"type": "nested",
"properties": {
"name": { "type": "string", "boost": 2.0, "analyzer": "snowball" },
"description": { "type": "string", "analyzer": "snowball" }
}
}
}
}
}
What I want to be able to do, is to query for the child documents (events) performing a full text search on their names and descriptions. I would like to get the matching events back and be able to also get their parent location's name. I would also like to narrow down the result set by location's coordinates. I don't want to get any events that do not match the query. Is that possible in Elastic Search? What types of queries should I use?
I have tried putting events as an array property under location (see above) and using the nested query but it does not return the kind of results I want (I think it returns the whole location, including all events, even the ones that do not match my query). I have tried putting events into a separate index (mapping?) providing the _parent property and then performing the top_children query on locations, but I don't get any results.
{
"exhibit": {
"_parent": { "type": "locations" },
"properties": {
"name": { "type": "string", "boost": 2.0, "analyzer": "snowball" },
"description": { "type": "string", "analyzer": "snowball" }
}
}
}
Could anyone shed some light? I don't know where to begin...
Here's the working solution to my problem, perhaps it will be useful to somebody.
Location mapping:
{
"location" : {
"properties": {
"name": { "type": "string", "boost": 2.0, "analyzer": "snowball" },
"description": { "type": "string", "analyzer": "snowball" },
"geo": { "type": "geo_point" }
}
}
}
Exhibit mapping:
{
"exhibit": {
"_parent": { "type": "locations" },
"properties": {
"name": { "type": "string", "boost": 2.0, "analyzer": "snowball" },
"description": { "type": "string", "analyzer": "snowball" }
}
}
}
Query:
{
"fields": [ "_parent", "name", "_source" ],
"query": {
"bool": {
"should": [
{ "text": { "name": "candy" } },
{ "text": { "description": "candy" } }
]
}
},
"filter": {
"and": [
{
"terms" : {
"_parent": [ "4e7089a9b97d640b30695b7a", "4e7089eeb97d640b30695b7b" ]
}
},
{ "range": { "start": { "lte": "2011-09-22" } } },
{ "range": { "end": { "gte": "2011-09-22" } } }
]
}
}
You should query using the _parent field and passing it an array of IDs of locations to which you want to limit the exhibits.

Resources