Elastic search - match phrase query on copy_to field - elasticsearch

I'm trying to fix some multi term search, specifically when searching something like:
"This awesome excursion" to resolve when a user types: "This awesom". Managed to get it to work with a match_phrase_query, instead of it previously using a query_string query.
Now though this breaks searching on compound fields. For finding users in the system I created a copy_to field, for things like salutation, first name, middle name and last name to populate. So it looks like this:
"contact": {
"properties": {
"contact_full_name": {
"type": "string"
},
"first_name": {
"copy_to": "contact_full_name",
"fields": {
"raw": {
"analyzer": "case_insensitive_sort",
"ignore_above": 10922,
"type": "string"
}
},
"type": "string"
},
"last_name": {
"copy_to": "contact_full_name",
"fields": {
"raw": {
"analyzer": "case_insensitive_sort",
"ignore_above": 10922,
"type": "string"
}
},
},
"middle_name": {
"copy_to": "contact_full_name",
"fields": {
"raw": {
"analyzer": "case_insensitive_sort",
"ignore_above": 10922,
"type": "string"
}
},
"type": "string"
},
"salutation": {
"copy_to": "contact_full_name",
"fields": {
"raw": {
"analyzer": "case_insensitive_sort",
"ignore_above": 10922,
"type": "string"
}
},
"type": "string"
}
}
}
Now, I then post the following query:
{
"fields": "id",
"from": 0,
"query": {
"filtered": {
"query": {
"bool": {
"minimum_should_match": "1",
"must_not": {
"term": {
"deleted": "true"
}
},
"should": {
"match": {
"contact_full_name": {
"query": "John G",
"type": "phrase_prefix"
}
}
}
}
}
}
},
"size": 50
}
And i get no results. If I use "John" as the term, it returns the result fine. I have a feeling it has to do with what order the fields are copied. Is there a way for me to work around this, or do I need to use an alternate search to make this work?
The values stored:
first name: John
last name: Smith
middle name: Glenn
salutation: Mr

Related

Elasticsearch Not Returning Expected Results For Singular vs Plural

I'm currently having an issue with being unable to return hits for with a particular search term, and it's a bit perplexing to me:
Term: navy flower
The query would up looking like:
(name: "navy flower"~5 OR sku: "navy flower"~10 OR description: "navy flower"~5)
No hits.
If I change the term to: navy flowers
I get 3 hits with it:
The mappings I currently have setup on the index are as follows:
{
"mappings": {
"_doc": {
"properties": {
"active": {
"type": "long"
},
"description": {
"type": "text"
},
"id": {
"type": "integer"
},
"name": {
"type": "text"
},
"sku": {
"type": "text"
},
"upc": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
I'm must be missing something obvious for the match to not be working on the singular vs plural form of the word.
As per your index mapping, you have not specified any analyzer that means elastic search by default use standard analyzers and standard analyzer doesn't do stemming as by default it have only 2 token filter:
Lower Case Token Filter
Stop Token Filter (by default disabled)
For supporting your use case, you required Stemmer token filter with the analyzer. So you can create a custom analyzer and configured to the required field:
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "standard",
"filter": [
"lowercase",
"stemmer"
]
}
}
}
},
"mappings": {
"properties": {
"active": {
"type": "long"
},
"description": {
"type": "text"
},
"id": {
"type": "integer"
},
"name": {
"type": "text",
"analyzer": "my_analyzer"
},
"sku": {
"type": "text"
},
"upc": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
After this you can search with below query:
GET test/_search?q=(name:"navy flower"~5 OR sku: "navy flower"~10 OR description: "navy flower"~5)

Elasticsearch query for multiple fields terms and multile filelds must not match query

My requirement is below:
I need to use multiple fields for must_not match conditions
I need to use multiple fields in terms query
When I run the below query, Getting the following error for Must_not
"reason": "[match] query doesn't support multiple fields, found [app_rating] and [satisfaction_rating]"
And for the Terms multiple fields also getting the error.
"reason": "Expected [START_OBJECT] under [should], but got a [START_ARRAY] in [MyBuckets]",
How can I correct the query?
"size":0,
"_source":["comments.keyword"],
"query":{
"bool": {
"must": [
{"match":{"source.keyword": "ONA"}}
],
"must_not":[
{"match":{"app_rating":"0","satisfaction_rating":"0","usability_rating": "0"}}
]
}
},
"aggs": {
"MyBuckets": {
"should":[{
"terms": {
"fields": ["comments.keyword"]
}
},
{
"terms":{
"fields": ["app_rating"]
}
},
{
"terms":{
"fields": ["satisfaction_rating"]
}
},
{
"terms":{
"fields": ["usability_rating"]
}
}
],
"order":{
"_count": "desc"
},
"size": "10"
}
}
}
** Below is the sample Mapping details**
'''{
"mapping": {
"properties": {
"Id": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"app_rating": {
"type": "long"
},
"comments": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"conversation_rating": {
"type": "long"
},
"id": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"satisfaction_rating": {
"type": "long"
},
"source": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"timestamp": {
"type": "long"
},
"usability_rating": {
"type": "long"
}
}
}
}
You can't use multiple fields in a single must_not clause, so you have to add multiple must_not clause also you are trying to use terms aggregations on various terms and your syntax is not correct which is causing the exception.
can you provide your index mapping and sample docs, so that I can provide the working example.

Elasticsearch query for multiple terms

I am trying to create a search query that allows to search by name and type.
I have indexed the values, and my record in Elasticsearch look like this:
{
_index: "assets",
_type: "asset",
_id: "eAOEN28BcFmQazI-nngR",
_score: 1,
_source: {
name: "test.png",
mediaType: "IMAGE",
meta: {
content-type: "image/png",
width: 3348,
height: 1890,
},
createdAt: "2019-12-24T10:47:15.727Z",
updatedAt: "2019-12-24T10:47:15.727Z",
}
}
so how would I create for example, a query that finds all assets that have the name "test' and are images?
I tried multi_mach query but that did not return the correct results:
{
"query": {
"multi_match" : {
"query": "*test* IMAGE",
"type": "cross_fields",
"fields": [ "name", "mediaType" ],
"operator": "and"
}
}
}
The query above returns 0 results, and if I change the operator to "or" it returns all this assets of type IMAGE.
Any suggestions would be greatly appreciated. TIA!
EDIT: Added Mapping
Below is the mapping:
{
"assets": {
"aliases": {},
"mappings": {
"properties": {
"__v": {
"type": "long"
},
"createdAt": {
"type": "date"
},
"deleted": {
"type": "date"
},
"mediaType": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"meta": {
"properties": {
"content-type": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"width": {
"type": "long"
},
"height": {
"type": "long"
}
}
},
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"originalName": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"updatedAt": {
"type": "date"
}
}
},
"settings": {
"index": {
"creation_date": "1575884312237",
"number_of_shards": "1",
"number_of_replicas": "1",
"uuid": "nSiAoIIwQJqXQRTyqw9CSA",
"version": {
"created": "7030099"
},
"provided_name": "assets"
}
}
}
}
You are unnecessary using the wildcard expression for this simple query.
First, change your analyzer on name field.
You need to create a custom analyzer which replaces . with space as default standard analyzer doesn't do that, so that you when searching for test you get test.png as there will be both test and png in the inverted index. The main benefit of doing this is to avoid the regex queries which are very costly.
Updated mapping with custom analyzer which would do the work for you. Just update your mapping and re-index again all the doc.
{
"aliases": {},
"mappings": {
"properties": {
"__v": {
"type": "long"
},
"createdAt": {
"type": "date"
},
"deleted": {
"type": "date"
},
"mediaType": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"meta": {
"properties": {
"content-type": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"width": {
"type": "long"
},
"height": {
"type": "long"
}
}
},
"name": {
"type": "text",
"analyzer" : "my_analyzer"
},
"originalName": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"updatedAt": {
"type": "date"
}
}
},
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "standard",
"char_filter": [
"replace_dots"
]
}
},
"char_filter": {
"replace_dots": {
"type": "mapping",
"mappings": [
". => \\u0020"
]
}
}
},
"index": {
"number_of_shards": "1",
"number_of_replicas": "1"
}
}
}
Second, you should change your query to bool query as below:
{
"query": {
"bool": {
"must": [
{
"match": {
"name": "test"
}
},
{
"match": {
"mediaType.keyword": "IMAGE"
}
}
]
}
}
}
Which is using must with 2 match queries means, that it would return docs only when there is a match in all the clauses of must query.
I already tested my solution by creating the index, inserting a few sample docs and query them, let me know if you need any help.
Did you tried with best_fields ?
{
"query": {
"multi_match" : {
"query": "Will Smith",
"type": "best_fields",
"fields": [ "name", "mediaType" ],
"operator": "and"
}
}
}

Elasticsearch: Full text search

I'm trying to build an Elasticsearch full-text search query with the following text "Gold Cartier watches" on multiple fields.
I have to follow this rule: First find all "Gold" documents. From retrieve "Gold" documents, find all "Cartier" documents and from them, find all "watches" documents.
This is my multi_match query:
{
"query": {
"multi_match": {
"query": "Fred or rose",
"fields": [
"name",
"status",
"categories.name",
"brand.name",
"reference"
]
}
}
}
There is my mapping
{
"product": {
"mappings": {
"product": {
"dynamic_date_formats": [],
"properties": {
"available": {
"type": "text"
},
"brand": {
"properties": {
"available": {
"type": "text"
},
"name": {
"type": "keyword"
},
"shopProductBrands": {
"properties": {
"available": {
"type": "text"
},
"priority": {
"type": "integer"
},
"slug": {
"type": "keyword"
}
}
},
"slug": {
"type": "keyword"
}
}
},
"categories": {
"type": "nested",
"properties": {
"available": {
"type": "text"
},
"brand": {
"properties": {
"available": {
"type": "text"
},
"name": {
"type": "keyword"
},
"slug": {
"type": "keyword"
}
}
},
"name": {
"type": "keyword"
},
"parent": {
"type": "keyword"
},
"slug": {
"type": "keyword"
}
}
},
"createdAt": {
"type": "date",
"format": "date_time_no_millis"
},
"longDescription": {
"type": "text",
"analyzer": "french_search"
},
"name": {
"type": "text",
"boost": 15,
"fields": {
"raw": {
"type": "keyword"
}
},
"analyzer": "french_search"
},
"purchasePrice": {
"type": "double"
},
"rawPrice": {
"type": "double"
},
"reference": {
"type": "keyword",
"boost": 10
},
"shortDescription": {
"type": "text",
"boost": 3,
"analyzer": "french_search"
},
"slug": {
"type": "keyword"
},
"status": {
"type": "text"
},
"updatedAt": {
"type": "date",
"format": "date_time_no_millis"
}
}
}
}
}
}
My search will retrieve all "Gold", "Cartier" and "watches" documents combined.
How can I build a query that follow my rule ?
Thanks
I'm not sure that there's an easy solution. I think the closest you can get is to use cross_fields with "operator": "and" and only search fields that have the same analyzer. Can you add "french_search" versions of each of these fields?
cross_fields analyzes the query string into individual terms, then
looks for each term in any of the fields, as though they were one big
field.
However:
The cross_field type can only work in term-centric mode on fields that
have the same analyzer. ... If there are multiple groups, they are
combined with a bool query.
So this query:
{
"query": {
"multi_match": {
"type": "cross_fields",
"query": "gold Cartier watches",
"fields": [
"name",
"status",
"categories.name",
"brand.name",
"reference"
]
}
}
}
Will become something like this:
"query": {
"bool": {
"should": [
{
"multi_match": {
"query": "gold Cartier watches",
"fields": ["name"]
}
},
{
"multi_match": {
"query": "gold Cartier watches",
"fields": ["status"]
}
},
{
"multi_match": {
"query": "gold Cartier watches",
"fields": [
"categories.name",
"brand.name",
"reference"
]
}
}
]
}
}
That query is too loose, but adding "operator": "and" or "minimum_should_match": "100%" would be too strict.
It's not pretty or efficient, but you could do application-side term parsing and build a boolean query. Something like this:
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "gold",
"fields": [
"name",
"status",
...
"reference"
]
}
},
{
"multi_match": {
"query": "Cartier",
"fields": [
"name",
"status",
...
"reference"
]
}
}
...
]
}
}
You can use this approach
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html#_boolean_operators
The preferred operators are + (this term must be present) and - (this term must not be present). All other terms are optional. For example, this query:
quick brown +fox -news
states that:
fox must be present
news must not be present
quick and brown are optional — their presence increases the relevance
The familiar boolean operators AND, OR and NOT (also written &&, || and !) are also supported but beware that they do not honor the usual precedence rules, so parentheses should be used whenever multiple operators are used together. For instance, the previous query could be rewritten as:
((quick AND fox) OR (brown AND fox) OR fox) AND NOT news
U can also use boosting for weighing-up result for a specific term https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html#_boosting

Search by size of object type field elastic search

I have a mapping like this:
{
"post": {
"properties": {
"author_gender": {
"type": "string",
"index": "not_analyzed",
"omit_norms": true,
"index_options": "docs"
},
"author_link": {
"type": "string",
"index": "no"
},
"content": {
"type": "string"
},
"mentions": {
"properties": {
"id": {
"type": "integer"
},
"name": {
"type": "string"
},
"profile_image_url": {
"type": "string"
},
"screen_name": {
"type": "string"
}
}
}
}
}
I need to search by the size of the mentions object. I have tried this:
{
"filter": {
"script": {
"script": "doc['mentions'].values.length == 2"
}
}
}
This is not working. Gives an error
nested: ElasticSearchIllegalArgumentException[No field found for
[mentions] in mapping with types [post]];
I have also tried replacing the script part with doc['mentions.id'].value.length == 2. It is also erroring
nested: ArrayIndexOutOfBoundsException[10];
How to query records with mentions object size 2 ?
The elasticsearch guide recommends using size() instead of length for objects. So try this:
{
"filter": {
"script": {
"script": "doc['mentions.id'].values.size() == 2"
}
}
}
All the best.
ElasticSearch will not store array of objects like you think. Instead, multiple arrays are stored for each object properties.
So that's why you must use "sub array" instead.
For me, it works like this:
ES version - 7.11.1
I have mapping like this:
{
"items": {
"properties": {
"description": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"quantity": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"value": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
My Query is like this:
{
"filter": {
"script": {
"script": "doc['items.description.keyword'].size() == 2"
}
}
}
I don't know why but it seems to me that Elasticsearch stores array not as array of particular objects. Such pattern worked for me: I used instead
"script": "doc['some_array'].values.size()"
this
"script": "doc['some_array.any_field'].size()"
Should work:
"script": "doc['mentions.id'].size() == 2"
or
"script": "doc['mentions.id'].length == 2"

Resources