search document with null/empty object field in elasticsearch - elasticsearch

I have an elasticsearch index with following mapping, some documents contain objects of status {id:1, status:"failed"} and some are null, cant seem to find a way where i can search for documents having "status.name" as ["failed", "null", "passed"] (docs where either status is failed, passed or not set/null). e.g doing a term query like below gives empty resultset
{
"name":{
"type":"keyword"
}
"status": {
"properties": {
"id": {
"type": "integer"
},
"status": {
"type": "keyword"
}
}
}
}
query tried:
{
"terms": {
"status.name": [ "failed", "null" ]
}
Also tried setting the mapping of status.name as "null_value": "null"

Use a bool query with only should clauses, making it a requirement that at least one of your queries must match. You can query for documents not having a field or having a null-value in that field by putting an exists-query into the must_not-clause of a bool-query (see Elasticsearch Reference: Exists-query).
GET myindex/_search
{
"query": {
"bool": {
"should": [
{"term": {"status.name": {"value": "failed"}}},
{"term": {"status.name": {"value": "passed"}}},
{"bool": {"must_not": {"exists": {"field": "status.name"}}}}
]
}
}
}

Related

Getting illegal_argument_exception", "reason": "Fielddata is disabled on text fields by default elastic search

I am getting this query when i try to run below query from Postman
{ "error": { "root_cause": [ { "type": "illegal_argument_exception",
"reason": "Fielddata is disabled on text fields by default. Set
fielddata=true on [ID] in order to load fielddata in memory by
uninverting the inverted index. Note that this can however use
significant memory. Alternatively use a keyword field instead." }
Here is the request
{
"size": 11,
"query": {
"bool": {
"filter": [
{
"bool": {
"must": [
{
"term": {
"search.doc.TypeId": {
"value": 1,
"boost": 1.0
}
}
}
],
"adjust_negative": true,
"boost": 1.0
}
}
],
"adjust_negative": true,
"boost": 1.0
}
},
"sort": [
{
"ID": {
"order": "desc"
}
}
]
}
Based on the error it seems that the objectID field is of text type. By default, field data is disabled on text fields.
So, according to the error, first, you need to modify your index mapping, so that the text field have field data enabled. Modify your index mapping, as shown below
PUT <index-name>/_mapping
{
"properties": {
"objectID": {
"type": "text",
"fielddata": true
}
}
}
Now use the same search query as given in the question, to get the desired results.

add fuzziness to elasticsearch query

I have a query for an autocomplete/suggestions index that looks like this:
{
"size": 10,
"query": {
"multi_match": {
"query": "'"+search_text+"'",
"type": "bool_prefix",
"fields": [
"company_name",
"company_name._2gram",
"company_name._3gram"
]
}
}
}
This query works exactly as I want it to. However I want to add fuzziness:"AUTO" to this query. I read the documentation and tried adding it like this:
{
"size": 10,
"query": {
"multi_match": {
"query": {
"fuzzy": {
"value": "'"+search_text+"'",
"fuzziness": "AUTO"
}
},
"type": "bool_prefix",
"fields": [
"company_name",
"company_name._2gram",
"company_name._3gram"
]
}
}
}
But I get a this error
```
"type": "parsing_exception",
"reason": "[multi_match] unknown token [START_OBJECT] after [query]",```
This is causing my query not to work.
There is no need to add a fuzzy query. To add fuzziness to a multi-match query you need to add the fuzziness property as described here :
Since you are using bool_prefix as the type of multi-match query, so it creates a match_bool_prefix on each field that analyzes its input and constructs a bool query from the terms. Each term except the last is used in a term query. The last term is used in a prefix query.
Adding a working example with index data, mapping, search query, and search result
Index Mapping:
{
"mappings": {
"properties": {
"company_name": {
"type": "search_as_you_type",
"max_shingle_size": 3
},
"serviceTitle": {
"type": "search_as_you_type",
"max_shingle_size": 3
},
"services": {
"type": "search_as_you_type",
"max_shingle_size": 3
}
}
}
}
Index Data:
{
"company_name":"sequencing how shingles are actually used"
}
Search Query:
{
"size": 10,
"query": {
"multi_match": {
"query": "sequensing how shingles",
"type": "bool_prefix",
"fields": [
"company_name",
"company_name._2gram",
"company_name._3gram"
],
"fuzziness":"auto"
}
}
}
Search Result:
"hits": [
{
"_index": "65153201",
"_type": "_doc",
"_id": "1",
"_score": 1.5465959,
"_source": {
"company_name": "sequencing how shingles are actually used"
}
}
]
If you want to query sequensing, and get the above document, then you need to change the type of multi-match from bool_prefix to another type according to your use case.

Elasticsearch: restrict result to documents with exact match

Currently I trying to restrict results of Elasticsearch (5.4) with the following query:
{
"query": {
"bool": {
"must": {
"multi_match": {
"query": "apache log Linux",
"type": "most_fields",
"fields": [
"message",
"type"
]
}
},
"filter": {
"term": {
"client": "test"
}
}
}
}
}
This returns every document that contains "apache", "log", or "linux". I want to restrict the results to documents that have a field "client" with the exact specified value, this case: "test". However, this query returns all the documents that contain "test" as value. A document with "client": "test client" will also be returned.
I want to restriction to be exact, so only the documents with "client": "test" should be returned and not "client": "test client".
After testing a bunch of different queries and lots of searching, I can not find a solution to my problem. What am I missing?
Just use the keyword part of your client field, since this is 5.x and, by default, the keyword is already there:
"filter": {
"term": {
"client.keyword": "test"
}
}
Set a mapping on your index specifying that your client field is a keyword datatype.
The mapping request could look like
PUT my_index
{
"mappings": {
"my_type": {
"properties": {
"client": {
"type": "keyword"
}
}
}
}
}

Elasticsearch nested significant terms aggregation with background filter

I am having hard times applying a background filter to a nested significant terms aggregation , the bg_count is always 0.
I'm indexing article views that have ids and timestamps, and have multiple applications on a single index. I want the foreground and background set to relate to the same application, so I'm trying to apply a term filter on the app_id field both in the boo query and in the background filter. article_views is a nested object since I want to be also able to query on views with a range filter on timestamp, but I haven't got to that yet.
Mapping:
{
"article_views": {
"type": "nested",
"properties": {
"id": {
"type": "string",
"index": "not_analyzed"
},
"timestamp": {
"type": "date",
"format": "strict_date_optional_time||epoch_millis"
}
}
},
"app_id": {
"type": "string",
"index": "not_analyzed"
}
}
Query:
{
"aggregations": {
"articles": {
"nested": {
"path": "article_views"
},
"aggs": {
"articles": {
"significant_terms": {
"field": "article_views.id",
"size": 5,
"background_filter": {
"term": {
"app_id": "17"
}
}
}
}
}
}
},
"query": {
"bool": {
"must": [
{
"term": {
"app_id": "17"
}
},
{
"nested": {
"path": "article_views",
"query": {
"terms": {
"article_views.id": [
"1",
"2"
]
}
}
}
}
]
}
}
}
As I said, in my result, the bg_count is always 0, which had me worried. If the significant terms is on other fields which are not nested the background_filter works fine.
Elasticsearch version is 2.2.
Thanks
You seem to be hitting the following issue where in your background filter you'd need to "go back" to the parent context in order to define your background filter based on a field of the parent document.
You'd need a reverse_nested query at that point, but that doesn't exist.
One way to circumvent this is to add the app_id field to your nested documents so that you can simply use it in the background filter context.

I don't get any documents back from my elasticsearch query. Can someone point out my mistake?

I thought I had figured out Elasticsearch but I suspect I have failed to grok something, and hence this problem:
I am indexing products, which have a huge number of fields, but the ones in question are:
{
"show_in_catalogue": {
"type": "boolean",
"index": "no"
},
"prices": {
"type": "object",
"dynamic": false,
"properties": {
"site_id": {
"type": "integer",
"index": "no"
},
"currency": {
"type": "string",
"index": "not_analyzed"
},
"value": {
"type": "float"
},
"gross_tax": {
"type": "integer",
"index": "no"
}
}
}
}
I am trying to return all documents where "show_in_catalogue" is true, and there is a price with site_id 1:
{
"filter": {
"term": {
"prices.site_id": "1",
"show_in_catalogue": true
}
},
"query": {
"match_all": {}
}
}
This returns zero results. I also tried an "and" filter with two separate terms - no luck.
A subset of one of the documents returned if I have no filters looks like:
{
"prices": [
{
"site_id": 1,
"currency": "GBP",
"value": 595,
"gross_tax": 1
},
{
"site_id": 2,
"currency": "USD",
"value": 745,
"gross_tax": 0
}
]
}
I hope I am OK to omit so much of the document here; I don't believe it to be contingent but I cannot be certain, of course.
Have I missed a vital piece of knowledge, or have I done something terminally thick? Either way, I would be grateful for an expert's knowledge at this point. Thanks!
Edit:
At the suggestion of J.T. I also tried reindexing the documents so that prices.site_id was indexed - no change. Also tried the bool/must filter below to no avail.
To clarify, the reason I'm using an empty query is that the web interface may supply a query string, but the same code is used to simply filter all products. Hence I left in the query, but empty, since that's what Elastica seems to produce with no query string.
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": [
{
"term": {
"show_in_catalogue": true
}
},
{
"term": {
"prices.site_id": 1
}
}
]
}
}
}
}
}
You have site_id set as {"index": "no"}. This tells ElasticSearch to exclude the field from the index which makes it impossible to query or filter on that field. The data will still be stored. Likewise, you can set a field to only be in the index and searchable, but not stored.
I'm new to ElasticSearch as well and can't always grok the questions! I'm actually confused by you query. If you are going to "just filter" then you don't need a query. What I don't understand is your use of two fields inside the term filter. I've never done this. I guess it acts as an OR? Also, if nothing matches, it seems to return everything. If you wanted a query with the results of that query filtered, then you would want to use a
-d '{
"query": {
"filtered": {
"query": {},
"filter": {}
}
}
}'
If you just want to apply filters is the filter that should work without any "query" necessary
-d '{
"filter": {
"bool": {
"must": [
{
"term": {
"show_in_catalogue": true
}
},
{
"term": {
"prices.site_id": 1
}
}
]
}
}
}'

Resources