Elasticsearch Nested More Like This Query - elasticsearch

Is it possible to perform a More Like This query (https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-mlt-query.html) on text inside a nested datatype (https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html)?
The document that I'd like to query (which I have no control over how it is formatted since the data is owned by another party) looks something like this:
{
"communicationType": "Email",
"timestamp": 1497633308917,
"textFields": [
{
"field": "Subject",
"text": "This is the subject of the email"
},
{
"field": "To",
"text": "to-email#domain.com"
},
{
"field": "Body",
"text": "This is the body of the email"
}
]
}
I would like perform a More Like This query on the body of the email. Before, the documents used to look like this:
{
"communicationType": "Email",
"timestamp": 1497633308917,
"textFields": {
"subject": "This is the subject of the email",
"to: "to-email#domain.com",
"body": "This is the body of the email"
}
}
And I was able to perform a More Like This query on the email body like this:
{
"query": {
"more_like_this": {
"fields": ["textFields.body"],
"like": "This is a similar body of an email",
"min_term_freq": 1
},
"bool": {
"filter": [
{ "term": { "communicationType": "Email" } },
{ "range": { "timestamp": { "gte": 1497633300000 } } }
]
}
}
}
But now that data source has been deprecated, I need to be able to perform an equivalent query on the new data source that has the email body in the nested datatype. I only want to compare the text to the "text" fields that have a "header" of "Body".
Is this possible? And if so, how would the query look like? And would there be a major performance hit to perform the query on the nested datatype compared to before on the non-nested document? Even after applying the timestamp and communicationType filters, there will still be tens of millions of documents that each query would need to compare the like text against, so performance matters.

Actually, it turned out to be straightforward to use a More Like This query inside a nested query:
{
"query": {
"bool": {
"must": {
"nested": {
"path": "textFields",
"query": {
"bool": {
"must": {
"more_like_this": {
"fields": ["textFields.text"],
"like_text": "This is a similar body of an email",
"min_term_freq": 1
}
},
"filter": {
"term": { "textFields.field": "Body" }
}
}
}
}
},
"filter": [
{
"term": {
"communicationType": "Email"
}
},
{
"range": {
"timestamp": {
"gte": 1497633300000
}
}
}
]
}
},
"min_score": 2
}

Related

Search-as-you-type inside arrays

I am trying to implement a search-as-you-type query inside an array.
This is the structure of the documents:
{
"guid": "6f954d53-df57-47e3-ae9e-cb445bd566d3",
"labels":
[
{
"name": "London",
"lang": "en"
},
{
"name": "Llundain",
"lang": "cy"
},
{
"name": "Lunnainn",
"lang": "gd"
}
]
}
and up to now this is what I came with:
{
"query": {
"multi_match": {
"fields": ["labels.name"],
"query": name,
"type": "phrase_prefix"
}
}
which works exactly as requested.
The problem is that I would like to search also by language.
What I tried is:
{
"query": {
"bool": {
"must": [
{
"multi_match": {
"fields": ["labels.name"],
"query": "london",
"type": "phrase_prefix"
}
},
{
"term": {
"labels.lang": "gd"
}
}
]
}
}
}
but these queries act on separate values of the array.
So, for example, I would like to search only Welsh language (cy). That means that my query that contains the city name should match only values that have "cy" on the "lang" tag.
How do I write this kind of query?
Internally, ElasticSearch flattens nested JSON objects, so it can't correlate the lang and name of a specific element in the labels array. If you want this kind of correlation, you'll need to index your documents differently.
The usual way to do this is to use the nested data type with a matching nested query.
The query would end up looking something like this:
{
"query": {
"nested": {
"path": "labels",
"query": {
"bool": {
"must": [
{
"multi_match": {
"fields": ["labels.name"],
"query": "london",
"type": "phrase_prefix"
}
},
{
"term": {
"labels.lang": "gd"
}
}
]
}
}
}
}
}
But note that you'll need to also specify nested mappings for your labels, e.g.:
"properties": {
"labels": {
"type": "nested",
"properties": {
"name": {
"type": "text"
/* you might want to add other mapping-related configuration here */
},
"lang": {
"type": "keyword"
}
}
}
}
Other ways to do this include:
Indexing each label as a separate document, repeating the guid field
Using parent/child documents
You should use Nested datatype in mapping instead of Object datatype. For detail explanation refer this:
https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html
So, you should define mapping of your field something like this:
{
"properties": {
"labels": {
"type": "nested",
"properties": {
"name": {
"type": "text"
},
"lang": {
"type": "keyword"
}
}
}
}
}
After this you could query using Nested Query as:
{
"query": {
"nested": {
"path": "labels",
"query": {
"bool": {
"must": [
{
"multi_match": {
"fields": ["labels.name"],
"query": "london",
"type": "phrase_prefix"
}
},
{
"term": {
"labels.lang": "gd"
}
}
]
}
}
}
}
}

Multi match query with terms lookup searching multiple indices elasticsearch 6.x

All,
I am working on building a NEST 6.x query that takes a serach term and looks in different fields in different indices.
This is the one I got so far but is not returning any results that I am expecting.
Please see the details below
Indices used
dev-sample-search
user-agents-search
The way the search should work is as follows.
The value in the query field(27921093) is searched against the
fields agentNumber, customerName, fileNumber, documentid(These are all
analyzed fileds).
The search should limit the documents to the agentNumbers the user
sampleuser#gmail.com has access to( sample data for
user-agents-search) is added below.
agentNumber, customerName, fileNumber, documentid and status are
part of the index dev-sample-search.
status field is defined as a keyword.
The fields in the user-agents-search index are all keywords
Sample user-agents-search index data:
{
"id": "sampleuser#gmail.com"",
"user": "sampleuser#gmail.com"",
"agentNumber": [
"123.456.789",
"1011.12.13.14"
]
}
Sample dev-sample-search index data:
{
"agentNumber": "123.456.789",
"customerName": "Bank of america",
"fileNumber":"test_file_1123",
"documentid":"1234456789"
}
GET dev-sample-search/_search
{
"from": 0,
"size": 10,
"query": {
"bool": {
"must": [
{
"multi_match": {
"type": "best_fields",
"query": "27921093",
"operator": "and",
"fields": [
"agentNumber",
"customerName",
"fileNumber",
"documentid^10"
]
}
}
],
"filter": [
{
"bool": {
"must": [
{
"terms": {
"agentNumber": {
"index": "user-agents-search",
"type": "_doc",
"user": "sampleuser#gmail.com",
"path": "agentNumber"
}
}
},
{
"bool": {
"must_not": [
{
"terms": {
"status": {
"value": "pending"
}
}
},
{
"term": {
"status": {
"value": "cancelled"
}
}
},
{
"term": {
"status": {
"value": "app cancelled"
}
}
}
],
"should": [
{
"term": {
"status": {
"value": "active"
}
}
},
{
"term": {
"status": {
"value": "terminated"
}
}
}
]
}
}
]
}
}
]
}
}
}
I see a couple of things that you may want to look at:
In the terms lookup query, "user": "sampleuser#gmail.com", should be "id": "sampleuser#gmail.com",.
If at least one should clause in the filter clause should match, set "minimum_should_match" : 1 on the bool query containing the should clause

Elasticsearch filter on nested set

I'm having trouble figuring out how to filter on nested sets. I have this in my index:
PUT /testing
PUT /testing/_mapping/product
{
"product": {
"properties": {
"features": { "type": "nested" }
}
}
}
POST /testing/product
{
"productid": 123,
"features": [
{
"name": "Weight",
"nameslug": "weight",
"value": "10",
"valueslug": "10-kg"
},
{
"name": "Weight",
"nameslug": "weight",
"value": "12",
"valueslug": "12-kg"
}
]
}
I need to filter on value but I get the valueslug from the url. So far I have the following code:
POST _search
{
"query": {
"bool": {
"filter": [
{
"nested": {
"path": "features",
"query": {
"bool": {
"filter": [
{
"range": {
"features.value": { "gte": ??? }
}
}
]
}
}
}
}
]
}
}
}
The difficult part is resolving the valueslug to the actual value. I have looked into Script Query using doc_value, but the problem with that is that it is executed within the current nested document. It would be possible by execution two queries, but I am trying to avoid that (if possible).
I get the feeling that the solution lies in the way the documents should be structured, but I have no clue how I could structure this any different...
I hope anyone can point me in the right direction.
Thanks in advance!

Elasticsearch: Conditionally filter query on fields if they exist in multi-index query

I have a query for a general search which spans multiple indices. Some of the indices have a field called is_published and some have a field called date_review, some have both.
I'm struggling to write a query which will search across fields and filter on the fields mentioned above but only if they exist. I have managed to achieve what I want on the individual fields using missing and/or exists, but it excludes the other variants.
In english, I want to keep documents in the result where:
is_published is true OR the field does not exist
date_review is in the future OR the field does not exist
So, if a document has is_published and it's false, remove it. If a document has date_review in the past, remove it. If it has is_published == false and date_review is in the future, remove it.
I hope this makes sense?
For the purpose of answering, assume the documents might look like this:
// Has `is_published` flag
{
"label": "My document",
"body": "Lorem ipsum doler et sum.",
"is_published": true
}
// Has `date_review` flag
{
"label": "My document",
"body": "Lorem ipsum doler et sum.",
"date_review": "2017-01-01"
}
// Has both `is_published` and `date_review` flags
{
"label": "My document",
"body": "Lorem ipsum doler et sum.",
"is_published": true
"date_review": "2017-01-01"
}
At the moment, my [unfiltered] query looks like this:
{
"index": "index-1,index-2,index-3",
"type": "item",
"body": {
"query": {
"filtered": {
"query": {
"multi_match": {
"query": "my serach phrase",
"type": "phrase_prefix",
"fuzziness": null,
"fields": [
"label^3",
"body",
]
}
},
"filter": []
}
}
}
}
Very grateful for any pointers.
Thanks.
You can try a query like this one:
{
"query": {
"filtered": {
"query": {
"multi_match": {
"query": "my serach phrase",
"type": "phrase_prefix",
"fuzziness": null,
"fields": [
"label^3",
"body"
]
}
},
"filter": {
"bool": {
"must": [
{
"bool": {
"minimum_should_match": 1,
"should": [
{
"missing": {
"field": "is_published"
}
},
{
"term": {
"is_published": true
}
}
]
}
},
{
"bool": {
"minimum_should_match": 1,
"should": [
{
"missing": {
"field": "date_review"
}
},
{
"range": {
"date_review": {
"gt": "now"
}
}
}
]
}
}
]
}
}
}
}
}

ElasticSearch - multi-match with filter - returns no result

I have a problem. My search result returns zero when I add a filter in my JSON request
{
"body":
{
"query":{
"multi_match":
{
"query":"Joe Jerick Aparments",
"fields":["name","Category","address","description"]}
},
"filter":
{
"source":"Category":"Apartments"
}
}
}
First things first,
Yes, there is already data.
Yes there is no error
Yes there is no misspelled words
Thanks!
{
index: "stores",
type: "stores",
id: "1",
body: {
name: "Joe Jerick Apartments",
Category: "Apartments"
address: "Somewhere down the road",
description: "Best apartment yet!"
}
}
So, I didn't see this in my earlier comment but if the fields you're querying on are nested inside of body (in storage -- not in retrieval) you'll need a nested query to get at the fields listed (I'm not sure if you're describing your mapping or what it looks like on query retrieval for a match_all)
If this is the case you'll need body to be mapped as "nested" and then your query would look something like this.
{
"query": {
"filtered": {
"query": {
"multi_match": {
"query": "Joe Jerick Apartments",
"fields": [
"body.name",
"body.Category",
"body.address",
"body.description"
]
}
},
"filter": {
"term": {
"body.Category": "Apartments"
}
}
}
}
}
Altertively you could re-import your records with a flattened structure
{
"id": "1",
"name": "Joe Jerick Apartments",
"Category": "Apartments",
"address": "Somewhere down the road",
"description": "Best apartment yet!"
}
Try this query instead:
{
"query": {
"filtered": {
"query": {
"multi_match": {
"query": "Joe Jerick Apartments",
"fields": [
"name",
"Category",
"address",
"description"
]
}
},
"filter": {
"term": {
"Category": "Apartments"
}
}
}
}
}

Resources