Elastic Search Exception for a multi match query of type phrase when using a combination of number and alphabate without space - elasticsearch

I am getting exception for below query:
"multi_match": {
"query": "\"73a\"",
"fields": [],
"type": "phrase",
"operator": "AND",
"analyzer": "custom_analyzer",
"slop": 0,
"prefix_length": 0,
"max_expansions": 50,
"zero_terms_query": "NONE",
"auto_generate_synonyms_phrase_query": true,
"fuzzy_transpositions": true,
"boost": 1.0
}
Exception I am getting:
error" : {
"root_cause" : [
{
"type" : "illegal_state_exception",
"reason" : "field \"log_no.keyword\" was indexed without position data; cannot run SpanTermQuery (term=73)"
},
{
"type" : "illegal_state_exception",
"reason" : "field \"airplanes_data.keyword\" was indexed without position data; cannot run SpanTermQuery (term=73)"
}
],
Note: 1) When I am changing the type from "phrase" to "best_fields", I am not getting any error and getting proper results for "query": ""73a"".
2) Using type as "phrase" and giving space between number and alphabet ex: "query": ""73 a"" also gives results without error.
My question is why with type as "phrase", I am getting error when there is no space between a number and alphabet combo in a query. Ex - "query": ""443abx"", "query": ""73222aaa"".
I am new to elastic search. Any help is appreciated. Thanks :)

Related

elasticsearch fuzzy query seems to ignore brazilian stopwords

I have stopwords for brazilian portuguese configured at my index. but if I made a search for the term "ios" (it's a ios course), a bunch of other documents are returned, because the term "nos" (brazilian stopword) seems to be identified as a valid term for the fuzzy query.
But if I search just by the term "nos", nothing is returned. I would be not expected ios course to be returned by fuzzy query? I'm confused.
Is there any alternative to this. The main purpose here is that when user search for ios, the documents with stopword like "nos" won't be returned, while I can mantain the fuzziness for other more complex search made by users.
An example of query:
GET /index/_search
{
"explain": true,
"query": {
"bool" : {
"must" : [
{
"terms" : {
"document_type" : [
"COURSE"
],
"boost" : 1.0
}
},
{
"multi_match" : {
"query" : "ios",
"type" : "best_fields",
"operator" : "OR",
"slop" : 0,
"fuzziness" : "AUTO",
"prefix_length" : 0,
"max_expansions" : 50,
"zero_terms_query" : "NONE",
"auto_generate_synonyms_phrase_query" : true,
"fuzzy_transpositions" : true,
"boost" : 1.0
}
}
],
"adjust_pure_negative" : true,
"boost" : 1.0
}
}
}
part of explain query:
"description": "weight(corpo:nos in 52) [PerFieldSimilarity], result of:",
image with the config of stopwords
thanks
I tried to add the prefix length, but I want that stopwords to be ignored.
I believe that correctly way to work stopwords by language is below:
PUT idx_teste
{
"settings": {
"analysis": {
"filter": {
"brazilian_stop_filter": {
"type": "stop",
"stopwords": "_brazilian_"
}
},
"analyzer": {
"teste_analyzer": {
"tokenizer": "standard",
"filter": ["brazilian_stop_filter"]
}
}
}
},
"mappings": {
"properties": {
"name": {
"type": "text",
"analyzer": "teste_analyzer"
}
}
}
}
POST idx_teste/_analyze
{
"analyzer": "teste_analyzer",
"text":"course nos advanced"
}
Look term "nos" was removed.
{
"tokens": [
{
"token": "course",
"start_offset": 0,
"end_offset": 6,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "advanced",
"start_offset": 11,
"end_offset": 19,
"type": "<ALPHANUM>",
"position": 2
}
]
}

I am new to ES and have a multi-match query in ES, and want to consider field based on its availability

{
"multi_match": {
"query": "TEST",
"fields": [
"description.regexkeyword^1.0",
"logical_name.regexkeyword^1.0",
"logical_table_name.regexkeyword^1.0",
"physical_name.regexkeyword^1.0",
"presentation_name.regexkeyword^1.0",
"table_name.regexkeyword^1.0"
],
"type": "best_fields",
"operator": "AND",
"slop": 0,
"prefix_length": 0,
"max_expansions": 50,
"lenient": false,
"zero_terms_query": "NONE",
"boost": 1
}
}
There is a field, i.e. edited_description, if in case edited_description exists in document then consider edited_description.regexkeyword^1.0 else consider description, i.e. description.regexkeyword^1.0.
You can't define an if condition in multi_match query. But what you can do is re-look your problem statement in a different way. I can re-look this as, that if in case edited_description and description both exists then the match in edited_description field should be given a higher preference.
This can achieved by setting slightly higher boost value for edited_description field.
{
"multi_match": {
"query": "TEST",
"fields": [
"description.regexkeyword^1.0",
"edited_description.regexkeyword^1.2",
"logical_name.regexkeyword^1.0",
"logical_table_name.regexkeyword^1.0",
"physical_name.regexkeyword^1.0",
"presentation_name.regexkeyword^1.0",
"table_name.regexkeyword^1.0"
],
"type": "best_fields",
"operator": "AND",
"slop": 0,
"prefix_length": 0,
"max_expansions": 50,
"lenient": false,
"zero_terms_query": "NONE",
"boost": 1
}
}
This will result in documents having a match in edited_description to be ranked higher. You can adjust the boosting value to your needs.

Elasticsearch mixed number and string multi_match query failing

I am trying to build a query where I can accept a string containing strings and numbers, and search for those values in fields in my index that contain double values and strings. For example:
Fields: Double doubleVal, String stringVal0, String stringVal1, String doNotSearchVal
Example search string: "person 10"
I am trying to get all documents containing "person" or "10" in any of the fields doubleVal, stringVal0 and stringVal1. This is my example query:
{
"query": {
"multi_match" : {
"query": "person 10",
"fields" : [
"doubleVal^1.0",
"stringVal0^1.0",
"stringVal1^1.0"
],
"type" : "best_fields",
"operator" : "OR",
"slop" : 0,
"prefix_length" : 0,
"max_expansions" : 50,
"zero_terms_query" : "NONE",
"auto_generate_synonyms_phrase_query" : true,
"fuzzy_transpositions" : true,
"boost" : 1.0
}
}
}
(This query was generated by Spring Data Elastic)
When I run this query, I get this error: (I've removed any identifying information)
{
"error": {
"root_cause": [
{
"type": "query_shard_exception",
"reason": "failed to create query: [query removed]",
"index_uuid": "index_uuid",
"index": "index_name"
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
{
"shard": 0,
"index": "index_name",
"node": "node_value",
"reason": {
"type": "query_shard_exception",
"reason": "failed to create query: [query removed]",
"index_uuid": "index_uuid",
"index": "index_name",
"caused_by": {
"type": "number_format_exception",
"reason": "For input string: \"person 10\""
}
}
}
]
},
"status": 400
}
I do not want to split apart the search string. If there is a way to rewrite the query so that it works in the expected way, I would like to do it that way.
You should try to set parameter lenient to true, then format-based errors, such as providing a text query value for a numeric field, will be ignored
You could achieve this in Spring Data with using builder method like this:
.lenient(true)

Elasticsearch: simple_query_string and multi-words synonyms

I have a field with the following search_analyzer:
"name_search_en" : {
"filter" : [
"english_possessive_stemmer",
"lowercase",
"name_synonyms_en",
"english_stop",
"english_stemmer",
"asciifolding"
],
"tokenizer" : "standard"
}
name_synonyms_en is a synonym_graph that looks like this
"name_synonyms_en" : {
"type" : "synonym_graph",
"synonyms" : [
"beach bag => straw bag,beach bag",
"bicycle,bike"
]
}
Running the following multi_match query the synonym are correctly applied
{
"query": {
"multi_match": {
"query": "beach bag",
"auto_generate_synonyms_phrase_query": false,
"type": "cross_fields",
"fields": [
"brand.en-US^1.0",
"name.en-US^1.0"
]
}
}
}
Here is the _validate explanation output. Both beach bag and straw bag are present, as expected, in the raw query:
"explanations" : [
{
"index" : "d7598351-311f-4844-bb91-4f26c9f538f3",
"valid" : true,
"explanation" : "+((((+name.en-US:straw +name.en-US:bag) (+name.en-US:beach +name.en-US:bag))) | (brand.en-US:beach brand.en-US:bag)) #DocValuesFieldExistsQuery [field=_primary_term]"
}
]
I would expect the same in the following simple_query_string
{
"query": {
"simple_query_string": {
"query": "beach bag",
"auto_generate_synonyms_phrase_query": false,
"fields": [
"brand.en-US^1.0",
"name.en-US^1.0"
]
}
}
}
but the straw bag synonym is not present in the raw query
"explanations" : [
{
"index" : "d7598351-311f-4844-bb91-4f26c9f538f3",
"valid" : true,
"explanation" : "+((name.en-US:beach | brand.en-US:beach)~1.0 (name.en-US:bag | brand.en-US:bag)~1.0) #DocValuesFieldExistsQuery [field=_primary_term]"
}
]
The problem seems to be related to multi-terms synonyms only. If I search for bike, the bicycle synonym is correctly present in the query
"explanations" : [
{
"index" : "d7598351-311f-4844-bb91-4f26c9f538f3",
"valid" : true,
"explanation" : "+(Synonym(name.en-US:bicycl name.en-US:bike) | brand.en-US:bike)~1.0 #DocValuesFieldExistsQuery [field=_primary_term]"
}
]
Is this the expected behaviour (meaning multi terms synonyms are not supported for this query)?
By default simple_query_string has the WHITESPACE flag enabled. The input text is tokenized. That's the reason the synonym filter doesn't handle correctly multi-words. This query disable all flags making multi-words synonyms working as expected
{
"query": {
"simple_query_string": {
"query": "beach bag",
"auto_generate_synonyms_phrase_query": false,
"flags": "NONE",
"fields": [
"brand.en-US^1.0",
"name.en-US^1.0"
]
}
}
}
This unfortunately does not play well with minimum_should_match parameter. Full discussion and more details on this can be found here https://discuss.elastic.co/t/simple-query-string-and-multi-terms-synonyms/174780

Can I ignore "failed to find nested object under path" in ElasticSearch if I get results?

We have an index with mapping that includes nested fields. In our Java class these fields are lists of objects, and sometimes the lists can be empty (so in the json structure we get e.g {... "some_nested_field": [], ...}.
When we run a query we do get results as expected, but also an error:
"failures": [
{
"shard": 0,
"index": ".kibana",
"node": "ZoEuUdkORpuBSNs7gqiv1Q",
"reason": {
"type": "query_shard_exception",
"reason": """
failed to create query: {
"nested" : {
"query" : {
"bool" : {
"must" : [
{
"match" : {
"foobar.name" : {
"query" : "brlo",
"operator" : "OR",
"prefix_length" : 0,
"max_expansions" : 50,
"fuzzy_transpositions" : true,
"lenient" : false,
"zero_terms_query" : "NONE",
"auto_generate_synonyms_phrase_query" : true,
"boost" : 1.0
}
}
}
],
"adjust_pure_negative" : true,
"boost" : 1.0
}
},
"path" : "foobar",
"ignore_unmapped" : false,
"score_mode" : "avg",
"boost" : 1.0
}
}
""",
"index_uuid": "xrFCunLNSv6AER_KwNMHSA",
"index": ".kibana",
"caused_by": {
"type": "illegal_state_exception",
"reason": "[nested] failed to find nested object under path [foobar]"
}
}
}
Can I assume that this error is caused by records with empty lists, and ignore it? Or does this indicate an internal error and possibly missing results from my query? Is there a way to avoid this error?
UPDATE:
This is an example of the query we're executing:
GET /_search
{
"query": {
"nested": {
"path": "mynested",
"query": {
"bool": {
"should" : [
{ "match" : { "mynested.name": "foo" } },
{ "match" : { "mynested.description": "bar" } },
{ "match" : { "mynested.category": "baz" } }
],
"minimum_should_match" : 1
}
}
}
}
}
The response from ES reports 10 successful shards and one failure:
{
"took": 889,
"timed_out": false,
"_shards": {
"total": 11,
"successful": 10,
"skipped": 0,
"failed": 1,
"failures": [...]
And we do get hits back:
"hits": {
"total": 234450,
"max_score": 11.092936,
"hits": [ ...
Looks like you have Kibana installed. In the error message it says that it can't find nested under path foobar of index .kibana, which is the one that Kibana uses:
"index_uuid": "xrFCunLNSv6AER_KwNMHSA",
"index": ".kibana",
"caused_by": {
"type": "illegal_state_exception",
"reason": "[nested] failed to find nested object under path [foobar]"
}
When doing simple GET /_search all Elasticsearch indexes are queried, also .kibana, which is probably not what you wanted.
To ignore this particular index you can use Multiple Indices search capability and do a query like:
GET /*,-.kibana/_search
Hope that helps!

Resources