ElasticSearch Bool Query and Keyword Search - performance

We are executing queries similar to the query below against our Elasticsearch instance. The query is querying an index (Mappings are below) that contains approx 3.4 million records. The data we are querying are strings containing encrypted words that are no more than 10,000 characters in length. We encrypt the words we are searching for and then use this as the keyword we are searching for. The search takes an incredibly long time (over a minute) to return results. Any help our suggestions on tuning our index or query is appreciated.
The index mapping:
{
"messagewords": {
"aliases": {},
"mappings": {
"properties": {
"MessageId": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"Words": {
"type": "text"
}
}
},
"settings": {
"index": {
"creation_date": "1649868562656",
"number_of_shards": "5",
"number_of_replicas": "1",
"uuid": "YFVbbow0R66dP3uR4hF9aQ",
"version": {
"created": "7060299"
},
"provided_name": "messagewords"
}
}
}
}
The query:
{
"from":0,
"_source":[
"MessageId"
],
"size":10000,
"track_total_hits":true,
"query":{
"bool":{
"must":[
{
"bool":{
"should":[
{
"query_string":{
"query":" ((Words:\"*nsrFHeMTTBOeIUvkMrYDoA==sr8O8Rpnxn0hOZ88Mbtu4g==pUniFgw3thZ8lXlj68jHqw==XKin211F6GVXm/QzvB+iLQ==HYzhyEJpcldxo3h8Sea+yA==SwmUP1KNAG4YqGdg/KlLdw==nsrFHeMTTBOeIUvkMrYDoA==*\"))"
}
}
]
}
}
]
}
}
}

Try whitespace analyzer:
"Words": {"type": "text", "analyzer": "whitespace"}
together with match_phrase query:
"match_phrase": {"words": "nsrFHeMTTBOeIUvkMrYDoA== ... SwmUP1KNAG4YqGdg/KlLdw== nsrFHeMTTBOeIUvkMrYDoA=="}
Please note that you'll have to split encoded tokens with spaces for it to work.

Related

Elasticsearch replacing cross_fields with combined field and fuzzy

We have an index which was previously searching a few fields such as this:
"query":{
"bool":{
"filter":[
{
"term":{
"eventvisibility":"public"
}
}
],
"should":[
{
"multi_match":{
"query":"keyword",
"fields":[
"eventname",
"venue.name",
"venue.town"
],
"type":"cross_fields",
"minimum_should_match":"3<80%"
}
},
{
"match":{
"eventdescshort":{
"query":"keyword",
"minimum_should_match":"2<80%"
}
}
}
],
"minimum_should_match":1
}
}
This works, but often fails due to spelling mistakes, etc with letters left off the keyword or transposed.
So I was hoping to implement fuzzy searching, As this doesn't work with cross_fields, I created a new field in the index:
"mappings": {
"event": {
"properties": {
"basic_search": {
"type": "text",
"analyzer": "nameanalyzer"
},
"eventname":{
"type": "text",
"copy_to": "basic_search" ,
"fields": {
"raw": {
"type": "keyword"
}
},
"analyzer": "nameanalyzer"
},
"venue": {
"properties": {
"name": {
"type": "text",
"copy_to": "basic_search" ,
"fields": {
"raw": {
"type": "keyword"
}
},
"analyzer": "nameanalyzer"
},
...snip (all fields previosouly in cross_fields now have copy_to: basic_search) ...
}
And our analyzer is as follows:
"nameanalyzer": {
"filter": [
"lowercase",
"stop",
"english_possessive_stemmer",
"english_minimal_stemmer",
"synonym",
"asciifolding",
"word_delimiter"
],
"char_filter": "html_strip",
"type": "custom",
"tokenizer": "standard"
}
I've now run a test search, as follows:
{
"query": {
"fuzzy": {
"basic_search": {
"value": "carers fair"
}
}
}
However, this is not giving me any matches at all.
I just get:
"type": "MatchNoDocsQuery",
"description": "MatchNoDocsQuery(\"empty BooleanQuery\")",
I know I can't see the contents of the basic_search field in _source, so how can I debug and know why this isn't matching?
Fuzzy query don't analyze text before searching. Usage of the same should be avoided.
Excerpt from ES Doc below :
fuzzy query: The elasticsearch fuzzy query type should generally be avoided. Acts much like a term query. Does not analyze the query text first.
Please try below query:
{
"query":{
"match":{
"basic_search":{
"query":"carers fair",
"fuzziness":"AUTO"
}
}
}

how to query for phrases(shingles) in Elasticsearch

I have the following string "Word1 Word2 StopWord1 StopWord2 Word3 Word4".
When I query for this string using ["bool"]["must"]["match"], I would like to return all text that matches "Word1Word2" and/or "Word3Word4".
I have created an analyzer that I would like to use for indexing and searching.
Using analyze API, I have confirmed that indexing is being done correctly. The shingles returned are "Word1Word2" and "Word3Word4"
I want to query so that text matching "Word1Word2" and/or "Word3Word4" are returned. How can I do this dynamically - meaning, I don't know up front how many shingles will be generated, so I don't know how many match_phrase to code up in a query.
"should":[
{ "match_phrase" : {"content": phrases[0]}},
{ "match_phrase" : {"content": phrases[1]}}
]
To query for shingles(and unigrams), you could set up your mappings to handle them cleanly in separate fields. In the example below, the field "shingles" will be used to analyze and retrieve shingles, while the implicit field will be used to handle unigrams.
PUT /my_index
{
"settings": {
"number_of_shards": 1,
"analysis": {
"filter": {
"my_shingle_filter": {
"type": "shingle",
"min_shingle_size": 2,
"max_shingle_size": 2,
"output_unigrams": false
}
},
"analyzer": {
"my_shingle_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"my_shingle_filter"
]
}
}
}
}
}
PUT /my_index/_mapping/my_type
{
"my_type": {
"properties": {
"title": {
"type": "string",
"fields": {
"shingles": {
"type": "string",
"analyzer": "my_shingle_analyzer"
}
}
}
}
}
}
GET /my_index/my_type/_search
{
"query": {
"bool": {
"must": {
"match": {
"title": "<your query string>"
}
},
"should": {
"match": {
"title.shingles": "<your query string"
}
}
}
}
}
Ref. Elasticsearch: The Definitive Guide....

How does type ahead in ElasticSearch work on multiple words and partial text match

I would like to explain with an example.
Documents of my ElasticSearch dataset has a field 'product_name'.
One document has product_name = 'Anmol Twinz Biscuit"
When the user types (a)'Anmol Twin' or (b)'Twin Anmol' or (c)'Twinz Anmol' or (d) Anmol Twinz, I want this specific record returned as search result.
However, this works only if I specify the complete words in the search query. Partial matches are not working. Thus (a) & (b) is not returning the desired result.
Mapping defined (obtained by _mapping query)
{
"sbis_product_idx": {
"mappings": {
"items": {
"properties": {
"category_name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"product_company": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"product_id": {
"type": "long"
},
"product_name": {
"type": "text"
},
"product_price": {
"type": "float"
},
"suggest": {
"type": "completion",
"analyzer": "simple",
"preserve_separators": true,
"preserve_position_increments": true,
"max_input_length": 50
}
}
}
}
}
}
Query being used:
{
"_source": "product_name",
"query": {
"multi_match" : {
"type": "best_fields",
"query": "Twin Anmol",
"fields": [ "product_name", "product_company" ],
"operator": "and"
}
}
}
The document in ES
{
"_index": "sbis_product_idx",
"_type": "misc",
"_id": "107996",
"_version": 1,
"_score": 0,
"_source": {
"suggest": {
"input": [
"Anmol",
"Twinz",
"Biscuit"
]
},
"category_name": "Other Product",
"product_company": "Anmol",
"product_price": 30,
"product_name": "Anmol Twinz Biscuit",
"product_id": 107996
}
}
Result
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
Mistake in query / mapping?
I just created the index with your mapping and indexed ES doc given in your example and just changed the operator in your query from and to or and it's giving me all result for all 4 query combinations.
Find below my query
{
"_source": "product_name",
"query": {
"multi_match" : {
"type": "best_fields",
"query": "Anmol Twinz",
"fields": [ "product_name", "product_company" ],
"operator": "or" --> changed it to `or`
}
}
}
With and operator your query tries to find both terms in your search query, some of which are not complete token like Twin in ES, hence you were not getting results for them, when you change your operator to or then if any of the token present, it will match.
Note:- if you want to match on partial tokens like Twin or Twi then, you need to use the n-gram tokens as explained in official ES doc https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-ngram-tokenizer.html and its a completely different design.

How to create and add values to a standard lowercase analyzer in elastic search

Ive been around the houses with this for the past few days trying things in various orders but cant figure out why its not working.
I am trying to create an index in Elasticsearch with an analyzer which is the same as the "standard" analyzer but retains upper case characters when records are stored.
I create my analyzer and index as follows:
PUT /upper
{
"settings": {
"index" : {
"analysis" : {
"analyzer": {
"rebuilt_standard": {
"tokenizer": "standard",
"filter": [
"standard"
]
}
}
}
}
},
"mappings": {
"doc": {
"properties": {
"title": {
"type": "text",
"analyzer": "rebuilt_standard"
}
}
}
}
}
Then add two records to test like this...
POST /upper/doc
{
"text" : "TEST"
}
Add a second record...
POST /upper/doc
{
"text" : "test"
}
Using /upper/_settings gives the following:
{
"upper": {
"settings": {
"index": {
"number_of_shards": "5",
"provided_name": "upper",
"creation_date": "1537788581060",
"analysis": {
"analyzer": {
"rebuilt_standard": {
"filter": [
"standard"
],
"tokenizer": "standard"
}
}
},
"number_of_replicas": "1",
"uuid": "s4oDgdsFTxOwsdRuPAWEkg",
"version": {
"created": "6030299"
}
}
}
}
}
But when I search with the following query I still get two matches! Both the upper and lower cases which must mean the analyser is not applied when I store the records.
Search like so...
GET /upper/_search
{
"query": {
"term": {
"text": {
"value": "test"
}
}
}
}
Thanks in advance!
first thing first you set your analyzer on the title field instead of upon the text field (since your search is on the text property, and since you are indexing doc with only text property)
"properties": {
"title": {
"type": "text",
"analyzer": "rebuilt_standard"
}
}
try
"properties": {
"text": {
"type": "text",
"analyzer": "rebuilt_standard"
}
}
and keep us posted ;)

Aggregating over _field_names in elasticsearch 5

I'm trying to aggregate over field names in ES 5 as described in Elasticsearch aggregation on distinct keys But the solution described there is not working anymore.
My goal is to get the keys across all the documents. Mapping is the default one.
Data:
PUT products/product/1
{
"param": {
"field1": "data",
"field2": "data2"
}
}
Query:
GET _search
{
"aggs": {
"params": {
"terms": {
"field": "_field_names",
"include" : "param.*",
"size": 0
}
}
}
}
I get following error: Fielddata is not supported on field [_field_names] of type [_field_names]
After looking around it seems the only way in ES > 5.X to get the unique field names is through the mappings endpoint, and since cannot aggregate on the _field_names you may need to slightly change your data format since the mapping endpoint will return every field regardless of nesting.
My personal problem was getting unique keys for various child/parent documents.
I found if you are prefixing your field names in the format prefix.field when hitting the mapping endpoint it will automatically nest the information for you.
PUT products/product/1
{
"param.field1": "data",
"param.field2": "data2",
"other.field3": "data3"
}
GET products/product/_mapping
{
"products": {
"mappings": {
"product": {
"properties": {
"other": {
"properties": {
"field3": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
},
"param": {
"properties": {
"field1": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"field2": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}
}
}
Then you can grab the unique fields based on the prefix.
This is probably because setting size: 0 is not allowed anymore in ES 5. You have to set a specific size now.
POST _search
{
"aggs": {
"params": {
"terms": {
"field": "_field_names",
"include" : "param.*",
"size": 100 <--- change this
}
}
}
}

Resources