Using Wildcards in Field Names of multi_match Query to match the whole fields - elasticsearch

Using Wildcards in Field Names says field names can be specified with wildcards, for example:
{
"multi_match": {
"query": "Quick brown fox",
"fields": "*_title"
}
}
Now I'd like to query among all fields in the index, So I tried to use * to match all fields. But I got an error, for example:
{
"multi_match": {
"query": "nanjing jianye",
"fields": ["*"]
}
}
error:
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "Invalid format: \"nanjing jianye\""
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
{
"shard": 0,
"index": "mycustomer",
"node": "YHKU1KllRJW-BiH9g5-McQ",
"reason": {
"type": "illegal_argument_exception",
"reason": "Invalid format: \"nanjing jianye\""
}
}
]
},
"status": 400
}
How to match the whole fields using wildcards instead of explicitly specifying the fields?

Related

ElasticSearch painless filter script on text fields not working

I want to use an equality filter (exact match) using a painless script in ElasticSearch. I cannot use directly a term query because the check I want to do is on a text field (and not keyword), so I tried with a match_phrase. This is my mapping: I can't change it.
{
"my_index": {
"aliases": {},
"mappings": {
"properties": {
"my_field": {
"type": "text"
},
}
},
"settings": {
"index": {
"max_ngram_diff": "60",
"number_of_shards": "8",
"blocks": {
"read_only_allow_delete": "false",
"write": "false"
},
"analysis": {...}
}
}
}
}
I tried this query, following this guide:
{
"size": 10,
"index": "my_index",
"body": {
"query": {
"bool": {
"should": [{
"match_phrase": {
"my_field": {
"query": "MY_VALUE",
"boost": 1.5,
"slop": 0
}
}
}],
"must": [],
"filter": [{
"script": {
"script": {
"lang": "painless",
"source": "doc['my_field'] == 'MY_VALUE'"
}
}
}],
"minimum_should_match": 1
}
}
}
}
Anyway, I got this error:
body:
{
"error": {
"root_cause": [
{
"type": "script_exception",
"reason": "runtime error",
"script_stack": [
"org.opensearch.search.lookup.LeafDocLookup.get(LeafDocLookup.java:101)",
"org.opensearch.search.lookup.LeafDocLookup.get(LeafDocLookup.java:53)",
"doc['my_field'] === 'MY_VALUE'",
" ^---- HERE"
],
"script": "doc['my_field'] === 'MY_VALUE'",
"lang": "painless",
"position": {
"offset": 4,
"start": 0,
"end": 30
}
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
{
"shard": 0,
"index": "my_index",
"node": "R99vOHeORlKsk9dnCzcMeA",
"reason": {
"type": "script_exception",
"reason": "runtime error",
"script_stack": [
"org.opensearch.search.lookup.LeafDocLookup.get(LeafDocLookup.java:101)",
"org.opensearch.search.lookup.LeafDocLookup.get(LeafDocLookup.java:53)",
"doc['my_field'] === 'MY_VALUE'",
" ^---- HERE"
],
"script": "doc['my_field'] === 'MY_VALUE'",
"lang": "painless",
"position": {
"offset": 4,
"start": 0,
"end": 30
},
"caused_by": {
"type": "illegal_argument_exception",
"reason": "No field found for [my_field] in mapping with types []"
}
}
}
]
},
"status": 400
}
It seems that doc doesn't contain text fields (I tried with other non-text fields and it works!)
Here they say that:
Doc values are a columnar field value store, enabled by default on all
fields except for analyzed text fields.
And here they say that:
text fields are searchable by default, but by default are not
available for aggregations, sorting, or scripting. Set fielddata=true
on your_field_name in order to load fielddata in memory by uninverting
the inverted index.
But I can't change the mapping.
How I can access text fields in a painless filter script?
(This is similar to ElasticSearch exact match on text field with script but more specific on the filtering script)
ScriptQuery only supports doc_values.
Doc values are the on-disk data structure, built at document index time, which makes this data access pattern possible. They store the same values as the _source but in a column-oriented fashion that is way more efficient for sorting and aggregations. Doc values are supported on almost all field types, with the notable exception of text and annotated_text fields.
As per discussion here
https://github.com/elastic/elasticsearch/issues/30984
Accessing the _source field is slow and something that we don't want to expose in the ScriptQuery because it would be need to be accessed on every document making the search very inefficient.
So you will either need to add keyword sub-field in mapping and reindex data or enable fields data - which will consume large memory

Open Search, exclude field from indexing in mapping

I have the following mapping:
{
"properties": {
"type": {
"type": "keyword"
},
"body": {
"type": "text"
},
"id": {
"type": "keyword"
},
"date": {
"type": "date"
},
},
}
body field is going to be an email message, it's very long and I don't want to index it.
what is the proper way to exclude this field from indexing?
What I tried:
enabled: false - as I understand from the documentation, it's applied only to object type fields but in my case it's not really an object so I'm not sure whether I can use it.
index: false/'no' - this breaks the code at all and does not allow me to make a search. My query contains query itself and aggregations with filter. Filter contains range:
date: { gte: someDay.getTime(), lte: 'now' }
P.S. someDay is a certain day in my case.
The error I get after applying index: false in mapping to the body field is the following:
{
"error":
{
"root_cause":
[
{
"type": "number_format_exception",
"reason": "For input string: \"now\""
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards":
[
{
"shard": 0,
"index": "test",
"node": "eehPq21jQsmkotVOqQEMeA",
"reason":
{
"type": "number_format_exception",
"reason": "For input string: \"now\""
}
}
],
"caused_by":
{
"type": "number_format_exception",
"reason": "For input string: \"now\"",
"caused_by":
{
"type": "number_format_exception",
"reason": "For input string: \"now\""
}
}
},
"status": 400
}
I'm not sure how these cases are associated as the error is about date field while I'm adding index property to body field.
I'm using: "#opensearch-project/opensearch": "^1.0.2"
Please help me to understand:
how to exclude field from indexing.
why applying index: false to body field in mapping breaks the code an I get an error associated with date field.
You should just modify your mapping to this:
"body": {
"type": "text",
"index": false
}
And it should work

Elasticsearch mixed number and string multi_match query failing

I am trying to build a query where I can accept a string containing strings and numbers, and search for those values in fields in my index that contain double values and strings. For example:
Fields: Double doubleVal, String stringVal0, String stringVal1, String doNotSearchVal
Example search string: "person 10"
I am trying to get all documents containing "person" or "10" in any of the fields doubleVal, stringVal0 and stringVal1. This is my example query:
{
"query": {
"multi_match" : {
"query": "person 10",
"fields" : [
"doubleVal^1.0",
"stringVal0^1.0",
"stringVal1^1.0"
],
"type" : "best_fields",
"operator" : "OR",
"slop" : 0,
"prefix_length" : 0,
"max_expansions" : 50,
"zero_terms_query" : "NONE",
"auto_generate_synonyms_phrase_query" : true,
"fuzzy_transpositions" : true,
"boost" : 1.0
}
}
}
(This query was generated by Spring Data Elastic)
When I run this query, I get this error: (I've removed any identifying information)
{
"error": {
"root_cause": [
{
"type": "query_shard_exception",
"reason": "failed to create query: [query removed]",
"index_uuid": "index_uuid",
"index": "index_name"
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
{
"shard": 0,
"index": "index_name",
"node": "node_value",
"reason": {
"type": "query_shard_exception",
"reason": "failed to create query: [query removed]",
"index_uuid": "index_uuid",
"index": "index_name",
"caused_by": {
"type": "number_format_exception",
"reason": "For input string: \"person 10\""
}
}
}
]
},
"status": 400
}
I do not want to split apart the search string. If there is a way to rewrite the query so that it works in the expected way, I would like to do it that way.
You should try to set parameter lenient to true, then format-based errors, such as providing a text query value for a numeric field, will be ignored
You could achieve this in Spring Data with using builder method like this:
.lenient(true)

Elasticsearch 6.1 multi index search with nested fields issue

I run a multi index search (elasticsearch 6.1.1) in 2 indexes with nested field,
"uniqueID" is a nested field that exist only in person index.
"pobox" is a nested field that exist only in adress index
I am getting error:
"index": "adress", "[nested] failed to find nested object under path [uniqueID]"
"index": "person", "[nested] nested object under path [pobox] is not of nested type"
In my query I search in person index for field uniqueID, why I am getting error for pobox field that exist only in adress index. Same for search in adress index it look for uniqueId field that exist only in person index
POST http://locahost:9200/person,adress/_search
{
"query": {
"bool": {
"should": [
{
"bool": {
"must": [
{
"terms": {
"_index": [
"person"
]
}
},
{
"nested": {
"path": "uniqueID",
"query": {
"span_near": {
"clauses": [
{
"span_term": {
"uniqueID.uniqueID.auto": "1"
}
}
],
"slop": 3,
"in_order": true
}
}
}
}
]
}
},
{
"bool": {
"must": [
{
"terms": {
"_index": [
"adress"
]
}
},
{
"nested": {
"path": "pobox",
"query": {
"span_near": {
"clauses": [
{
"span_term": {
"pobox.pobox.auto": "1"
}
}
],
"slop": 3,
"in_order": true
}
}
}
}
]
}
}
]
}
}
}
Error
{
"error": {
"root_cause": [
{
"type": "query_shard_exception",
"reason": "failed to create query: { my_query }",
"index_uuid": "9Z0W-P9ZS02kJ7WmOKHPVQ",
"index": "adress"
},
{
"type": "query_shard_exception",
"reason": "failed to create query: { my_query }",
"index_uuid": "EHoxKGhdSmKoYdNgsylotw",
"index": "person"
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
{
"shard": 0,
"index": "adress",
"node": "AEhiq0wvQTGh468sSmDN5g",
"reason": {
"type": "query_shard_exception",
"reason": "failed to create query: { my_query }",
"index_uuid": "9Z0W-P9ZS02kJ7WmOKHPVQ",
"index": "adress",
"caused_by": {
"type": "illegal_state_exception",
"reason": "[nested] failed to find nested object under path [uniqueID]"
}
}
},
{
"shard": 0,
"index": "person",
"node": "AEhiq0wvQTGh468sSmDN5g",
"reason": {
"type": "query_shard_exception",
"reason": "failed to create query: { my_query }",
"index_uuid": "EHoxKGhdSmKoYdNgsylotw",
"index": "person",
"caused_by": {
"type": "illegal_state_exception",
"reason": "[nested] nested object under path [pobox] is not of nested type"
}
}
}
]
},
"status": 400
}
Well, the nested structure you try to query on one index doesn't exist on the other, so what do you expect?
IMO you should split the queries up and use _msearch if you always need those together.

ES giving error when sorting by distance

I'm trying to sort search results by distance. However, when i try i get the following error:
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "sort option [location] not supported"
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
{
"shard": 0,
"index": "roeselaredev",
"node": "2UYlfd7sTd6qlJWgdK2wzQ",
"reason": {
"type": "illegal_argument_exception",
"reason": "sort option [location] not supported"
}
}
]
},
"status": 400
}
The query i sent looks like this:
GET _search
{
"query": {
"match_all": []
},
"sort": [
{
"geo_distance": {
"location": {
"lat": 50.9436034,
"long": 3.1242917
},
"order":"asc",
"unit":"km",
"distance_type":"plane"
}
},
{
"_score": {
"order":"desc"
}
}
]
}
As near as i can tell i followed the instructions in the documentation to the letter. I'm not getting a malformed query result. I'm just getting a not supported result for the sort by distance option. Any ideas as to what i'm doing wrong?
The query dsl is invalid the OP is almost-correct :) but missing an under-score.
While sorting by distance it is _geo_distance and not geo_distance.
Example:
GET _search
{
"query": {
"match_all": []
},
"sort": [
{
"_geo_distance": {
"location": {
"lat": 50.9436034,
"long": 3.1242917
},
"order":"asc",
"unit":"km",
"distance_type":"plane"
}
},
{
"_score": {
"order":"desc"
}
}
]
}

Resources