I have documents in Elasticsearch with the following structure:
"mappings": {
"document": {
"properties": {
"#timestamp": {
"type": "date",
"format": "strict_date_optional_time||epoch_millis"
},
"#version": {
"type": "string"
},
"id_secuencia": {
"type": "long"
},
"event": {
"properties": {
"elapsedTime": {
"type": "double"
},
"requestTime": {
"type": "date",
"format": "strict_date_optional_time||epoch_millis"
},
"error": {
"properties": {
"errorCode": {
"type": "string",
"index": "not_analyzed"
},
"failureDetail": {
"type": "string"
},
"fault": {
"type": "string"
}
}
},
"file": {
"type": "string",
"index": "not_analyzed"
},
"messageId": {
"type": "string"
},
"request": {
"properties": {
"body": {
"type": "string"
},
"header": {
"type": "string"
}
}
},
"responseTime": {
"type": "date",
"format": "strict_date_optional_time||epoch_millis"
},
"service": {
"properties": {
"operation": {
"type": "string",
"index": "not_analyzed"
},
"project": {
"type": "string",
"index": "not_analyzed"
},
"proxy": {
"type": "string",
"index": "not_analyzed"
},
"version": {
"type": "string",
"index": "not_analyzed"
}
}
},
"timestamp": {
"type": "date",
"format": "strict_date_optional_time||epoch_millis"
},
"user": {
"type": "string",
"index": "not_analyzed"
}
}
},
"type": {
"type": "string"
}
}
}
}
And I need to retrieve a list of unique values for the field "event.file" (to show in a Kibana Data Table) according to the following criteria:
There is more than one document with the same value for the field "event.file"
All the occurences for that value of "event.file" have resulted in error (field "event.error.errorCode" exists in all documents)
For that purpose the approach I've been testing is the use of terms aggregation, so I can get a list of buckets with all documents for a single file name. What I haven't been able to achieve is to drop some of the resulting buckets in the aggregation according to the previous criteria (if at least one of them does not have an error the bucket should be discarded).
Is this the correct approach or is there a better/easier way to get this type of result?
Thanks a lot.
After trying out several queries I found the following approach (see query below) to be valid for my purpose. The problem I see now is that apparently it is not possible to do this in Kibana, as it has no support for pipeline aggregations (see https://github.com/elastic/kibana/issues/4584).
{
"query": {
"bool": {
"must": [
{
"filtered": {
"filter": {
"exists": {
"field": "event.file"
}
}
}
}
]
}
},
"size": 0,
"aggs": {
"file-events": {
"terms": {
"field": "event.file",
"size": 0,
"min_doc_count": 2
},
"aggs": {
"files": {
"filter": {
"exists": {
"field": "event.file"
}
},
"aggs": {
"totalFiles": {
"value_count": {
"field": "event.file"
}
}
}
},
"errors": {
"filter": {
"exists": {
"field": "event.error.errorCode"
}
},
"aggs": {
"totalErrors": {
"value_count": {
"field": "event.error.errorCode"
}
}
}
},
"exhausted": {
"bucket_selector": {
"buckets_path": {
"total_files":"files>totalFiles",
"total_errors":"errors>totalErrors"
},
"script": "total_errors == total_files"
}
}
}
}
}
}
Again, if I'm missing something feedback will be appreciated :)
Related
I have been trying to fetch a document using multiple filters.
Im currently using ES 1.7 Is it possible to use match_phrase twice on a filter?
example: people document
q=aaron&address=scarborough - searching a person by name and address, works fine.
{
"query": {
"match_phrase": {
"name": "aaron"
}
},
"filter": {
"bool": {
"must": {
"nested": {
"path": "addresses",
"query": {
"match_phrase": {
"address": "scarborough"
}
}
}
}
}
},
q=aaron&phone=813-689-6889 - searching a person by name and phone number works fine as well.
{
"query": {
"match_phrase": {
"name": "aaron"
}
},
"filter": {
"bool": {
"must": {
"query": {
"match_phrase": {
"phone": "813-689-6889"
}
}
}
}
}
However, When I try to use both filters, address and phone I get a No filter registered for [match_phrase] error
for example: q=aaron&address=scarborough&phone=813-689-6889
{
"query": {
"match_phrase": {
"name": "aaron"
}
},
"filter": {
"bool": {
"must": {
"nested": {
"path": "addresses",
"query": {
"match_phrase": {
"address": "scarborough"
}
}
},
"query": {
"match_phrase": {
"phone": "813-689-6889"
}
}
}
}
}
the error, when using address and phone filters together:
nested: QueryParsingException[[pl_people] No filter registered for [match_phrase]]; }]","status":400}):
index mapping (person) as requested:
{
"pl_people": {
"mappings": {
"person": {
"properties": {
"ac_name": {
"type": "string",
"analyzer": "autocomplete"
},
"date_of_birth": {
"type": "date",
"format": "dateOptionalTime"
},
"email": {
"type": "string"
},
"first_name": {
"type": "string",
"fields": {
"na_first_name": {
"type": "string",
"index": "not_analyzed"
}
}
},
"last_name": {
"type": "string",
"fields": {
"na_last_name": {
"type": "string",
"index": "not_analyzed"
}
}
},
"middle_name": {
"type": "string",
"fields": {
"na_middle_name": {
"type": "string",
"index": "not_analyzed"
}
}
},
"name": {
"type": "string",
"fields": {
"na_name": {
"type": "string",
"index": "not_analyzed"
},
"ngram_name": {
"type": "string",
"analyzer": "my_start"
},
"ns_name": {
"type": "string",
"analyzer": "no_stopwords"
}
}
},
"phone": {
"type": "string"
},
"time": {
"type": "date",
"format": "dateOptionalTime"
},
"updated_at": {
"type": "date",
"format": "dateOptionalTime"
}
}
}
}
}
}
Maybe you can use term-filter, instead of match_phrase as a filter.
See here.
Hello I have a problem with the combination of multiple queries within Elasticsearch.
The problem only occurs whenever I try to combine a multi_match query with the geo_distance query. The multi_match query works when the geo_distance query is not present and the geo_distance query works when the multi_match query is not present.
Whenever I execute the multi_match query without the geo_distance query I get the results that I expect. I also get the expected results when I try the geo_distance query without the multi_match query.
Boths results contain the dataset that I would expect to receive when both queries are executed together. But whenever I execute them together I receive 0 results.
When I combine the geo_distance query with a simple term query the search works. So I presume it is problem with the combination of queries.
I would appreciate any ideas.
My query is the following:
{
"query": {
"bool": {
"must": {
"bool": {
"should": {
"multi_match": {
"query": "CompanyName GmbH",
"fields": [
"originalName",
"legalName"
],
"type": "cross_fields",
"operator": "AND"
}
}
}
},
"filter": {
"bool": {
"should": {
"geo_distance": {
"location": [
9.87107,
51.69915
],
"distance": "30.0km",
"distance_type": "arc"
}
}
}
}
}
}
}
The mapping behind all of that is:
{
"customer": {
"aliases": {
},
"mappings": {
"customer-entity": {
"properties": {
"communication": {
"properties": {
"domain": {
"type": "string"
},
"email": {
"type": "string"
},
"landline": {
"type": "string"
},
"mobile": {
"type": "string"
}
}
},
"id": {
"type": "long"
},
"legalName": {
"type": "string",
"store": true
},
"location": {
"type": "geo_point"
},
"operatingModes": {
"type": "string"
},
"originalName": {
"type": "string",
"store": true
}
}
},
"homepage-entity": {
"_parent": {
"type": "customer-entity"
},
"_routing": {
"required": true
},
"properties": {
"customerId": {
"type": "string",
"store": true
},
"id": {
"type": "long"
},
"metas": {
"type": "string",
"store": true
}
}
},
"person-entity": {
"_parent": {
"type": "customer-entity"
},
"_routing": {
"required": true
},
"properties": {
"customerId": {
"type": "string",
"store": true
},
"firstName": {
"type": "string",
"store": true
},
"id": {
"type": "long"
},
"lastName": {
"type": "string",
"store": true
},
"personId": {
"type": "string",
"store": true
}
}
}
},
"settings": {
"index": {
"refresh_interval": "-1",
"number_of_shards": "1",
"creation_date": "1488920698118",
"store": {
"type": "fs"
},
"number_of_replicas": "0",
"uuid": "ZcLN5sxASXGUnKZMg8mBpw",
"version": {
"created": "2040499"
}
}
},
"warmers": {
}
}
}
I'm trying to delete documents with a date that is lower than december 1st but it doesn't look like it actually deletes anything.
I tried using the delete by query API:
curl -XPOST "http://localhost:9200/mediadata/events/_delete_by_query" -d'
{
"query": {
"range": {
"created_at": {
"lt": "2016-12-01 00:00:00"
}
}
}
}'
Or this syntax:
curl -XDELETE 'http://localhost:9200/mediadata/events/_query' -d ...
I obtain this kind of result:
{"_index":"mediadata","_type":"events","_id":"_delete_by_query","_version":10,"_shards":{"total":3,"successful":2,"failed":0},"created":false}
Thanks in advance.
EDIT: Here is the mapping:
{
"mediadata": {
"mappings": {
"events": {
"properties": {
"channels": {
"properties": {
"kdata": {
"type": "string",
"index": "not_analyzed"
},
"mail": {
"type": "string",
"index": "not_analyzed"
},
"md5": {
"type": "string",
"index": "not_analyzed"
},
"mobile": {
"type": "string",
"index": "not_analyzed"
},
"ssp": {
"type": "string",
"index": "not_analyzed"
}
}
},
"contents": {
"type": "string",
"index": "not_analyzed"
},
"created_at": {
"type": "date",
"format": "yyyy-MM-dd' 'HH:mm:ss"
},
"editor": {
"type": "string",
"index": "not_analyzed"
},
"end": {
"type": "date",
"format": "yyyy-MM-dd' 'HH:mm:ss"
},
"location": {
"type": "geo_point"
},
"message": {
"type": "string",
"index": "not_analyzed"
},
"price": {
"type": "double"
},
"quantity": {
"type": "long"
},
"query": {
"properties": {
"bool": {
"properties": {
"filter": {
"properties": {
"range": {
"properties": {
"created_at": {
"properties": {
"lt": {
"type": "string"
}
}
}
}
}
}
},
"must": {
"properties": {
"match_all": {
"type": "object"
}
}
}
}
},
"filtered": {
"properties": {
"filter": {
"properties": {
"range": {
"properties": {
"created_at": {
"properties": {
"lt": {
"type": "string"
}
}
}
}
}
}
},
"query": {
"properties": {
"match_all": {
"type": "object"
}
}
}
}
},
"range": {
"properties": {
"created_at": {
"properties": {
"lt": {
"type": "string"
},
"lte": {
"type": "string"
}
}
}
}
}
}
},
"reference": {
"type": "string",
"index": "not_analyzed"
},
"source": {
"type": "string",
"index": "not_analyzed"
},
"start": {
"type": "date",
"format": "yyyy-MM-dd' 'HH:mm:ss"
},
"type": {
"type": "string",
"index": "not_analyzed"
},
"updated_at": {
"type": "date",
"format": "yyyy-MM-dd' 'HH:mm:ss"
}
}
}
}
}
}
Your syntax is indeed correct. In version 5.x the deletion by query is as follow .
POST mediadata/events/_delete_by_query?conflicts=proceed
{
"query": {
"range": {
"created_at": {
"gt": "2016-11-02 00:00:00"
}
}
}
}
Now , based on the response that you're getting from ES
{"_index":"mediadata","_type":"events","_id":"_delete_by_query","_version":10,"_shards":{"total":3,"successful":2,"failed":0},"created":false}
I will assume that you're running version 2.x , where the syntax is different.
First of all , in version 2.x the deletion by query is a plugin that you need to install using :
plugin install delete-by-query
Then you run it :
curl -XDELETE "http://localhost:9200/mediadata/events/_query" -d'
{
"query": {
"range": {
"created_at": {
"gt": "2016-11-02 00:00:00"
}
}
}
}'
The response looks like :
{
"took": 0,
"timed_out": false,
"_indices": {
"_all": {
"found": 1,
"deleted": 1,
"missing": 0,
"failed": 0
},
"mediadata": {
"found": 1,
"deleted": 1,
"missing": 0,
"failed": 0
}
},
"failures": []
}
Full example :
PUT mediadata
{
"mappings": {
"events": {
"properties": {
"channels": {
"properties": {
"kdata": {
"type": "string",
"index": "not_analyzed"
},
"mail": {
"type": "string",
"index": "not_analyzed"
},
"md5": {
"type": "string",
"index": "not_analyzed"
},
"mobile": {
"type": "string",
"index": "not_analyzed"
},
"ssp": {
"type": "string",
"index": "not_analyzed"
}
}
},
"contents": {
"type": "string",
"index": "not_analyzed"
},
"created_at": {
"type": "date",
"format": "yyyy-MM-dd' 'HH:mm:ss"
},
"editor": {
"type": "string",
"index": "not_analyzed"
},
"end": {
"type": "date",
"format": "yyyy-MM-dd' 'HH:mm:ss"
},
"location": {
"type": "geo_point"
},
"message": {
"type": "string",
"index": "not_analyzed"
},
"price": {
"type": "double"
},
"quantity": {
"type": "long"
},
"query": {
"properties": {
"bool": {
"properties": {
"filter": {
"properties": {
"range": {
"properties": {
"created_at": {
"properties": {
"lt": {
"type": "string"
}
}
}
}
}
}
},
"must": {
"properties": {
"match_all": {
"type": "object"
}
}
}
}
},
"filtered": {
"properties": {
"filter": {
"properties": {
"range": {
"properties": {
"created_at": {
"properties": {
"lt": {
"type": "string"
}
}
}
}
}
}
},
"query": {
"properties": {
"match_all": {
"type": "object"
}
}
}
}
},
"range": {
"properties": {
"created_at": {
"properties": {
"lt": {
"type": "string"
},
"lte": {
"type": "string"
}
}
}
}
}
}
},
"reference": {
"type": "string",
"index": "not_analyzed"
},
"source": {
"type": "string",
"index": "not_analyzed"
},
"start": {
"type": "date",
"format": "yyyy-MM-dd' 'HH:mm:ss"
},
"type": {
"type": "string",
"index": "not_analyzed"
},
"updated_at": {
"type": "date",
"format": "yyyy-MM-dd' 'HH:mm:ss"
}
}
}
}
}
PUT mediadata/events/1
{
"created_at" : "2016-11-02 00:00:00"
}
PUT mediadata/events/3
{
"created_at" : "2016-11-03 00:00:00"
}
#The one to delete
PUT mediadata/events/4
{
"created_at" : "2016-10-03 00:00:00"
}
#to verify that the documents are in the index
GET mediadata/events/_search
{
"query": {
"range": {
"created_at": {
"lt": "2016-11-02 00:00:00"
}
}
}
}
DELETE /mediadata/events/_query
{
"query": {
"range": {
"created_at": {
"gt": "2016-11-02 00:00:00"
}
}
}
}
I have this mapping on ES 1.7.3:
{
"customer": {
"aliases": {},
"mappings": {
"customer": {
"properties": {
"addresses": {
"type": "nested",
"include_in_parent": true,
"properties": {
"address1": {
"type": "string"
},
"address2": {
"type": "string"
},
"address3": {
"type": "string"
},
"country": {
"type": "string"
},
"latitude": {
"type": "double",
"index": "not_analyzed"
},
"longitude": {
"type": "double",
"index": "not_analyzed"
},
"postcode": {
"type": "string"
},
"state": {
"type": "string"
},
"town": {
"type": "string"
},
"unit": {
"type": "string"
}
}
},
"companyNumber": {
"type": "string"
},
"id": {
"type": "string",
"index": "not_analyzed"
},
"name": {
"type": "string"
},
"status": {
"type": "string"
},
"timeCreated": {
"type": "date",
"format": "dateOptionalTime"
},
"timeUpdated": {
"type": "date",
"format": "dateOptionalTime"
}
}
}
},
"settings": {
"index": {
"refresh_interval": "1s",
"number_of_shards": "5",
"creation_date": "1472372294516",
"store": {
"type": "fs"
},
"uuid": "RxJdXvPWSXGpKz8pdcF91Q",
"version": {
"created": "1050299"
},
"number_of_replicas": "1"
}
},
"warmers": {}
}
}
The spring application generates this query:
{
"query": {
"bool": {
"should": {
"query_string": {
"query": "(addresses.\\*:sample* AND NOT status:ARCHIVED)",
"fields": [
"type",
"name",
"companyNumber",
"status",
"addresses.unit",
"addresses.address1",
"addresses.address2",
"addresses.address3",
"addresses.town",
"addresses.state",
"addresses.postcode",
"addresses.country"
],
"default_operator": "or",
"analyze_wildcard": true
}
}
}
}
}
on which "addresses.*:sample*" is the only input.
"query": "(sample* AND NOT status:ARCHIVED)"
Code above works but searches all fields of the customer object.
Since I want to search only on address fields I used the "addresses.*"
Query works only if the fields of the address object are of String type and before I added longitude and latitude fields of double type on address object. Now the error occurs because of these two new fields.
Error:
Parse Failure [Failed to parse source [{
"query": {
"bool": {
"should": {
"query_string": {
"query": "(addresses.\\*:sample* AND NOT status:ARCHIVED)",
"fields": [
"type",
"name",
"companyNumber","country",
"state",
"status",
"addresses.unit",
"addresses.address1",
"addresses.address2",
"addresses.address3",
"addresses.town",
"addresses.state",
"addresses.postcode",
"addresses.country",
],
"default_operator": "or",
"analyze_wildcard": true
}
}
}
}
}
]]
NumberFormatException[For input string: "sample"
Is there a way to search "String" fields within a nested object using addresses.* only?
The solution was to add "lenient": true. As per the documentation: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html
lenient - If set to true will cause format based failures (like providing text to a numeric field) to be ignored.
Some documents has category fields.. Some of these docs has category fields its value equals to "-1". I need a query return documents which have category fields and "not equal to -1".
I tried this:
GET webproxylog/_search
{
"query": {
"filtered": {
"filter": {
"not":{
"filter": {"and": {
"filters": [
{"term": {
"category": "-1"
}
},
{
"missing": {
"field": "category"
}
}
]
}}
}
}
}
}
}
But not work.. returns docs not have "category field"
EDIT
Mapping:
{
"webproxylog": {
"mappings": {
"accesslog": {
"properties": {
"category": {
"type": "string",
"index": "not_analyzed"
},
"clientip": {
"type": "string",
"index": "not_analyzed"
},
"clientmac": {
"type": "string",
"index": "not_analyzed"
},
"clientname": {
"type": "string",
"index": "not_analyzed"
},
"duration": {
"type": "long"
},
"filetype": {
"type": "string",
"index": "not_analyzed"
},
"hierarchycode": {
"type": "string",
"index": "not_analyzed"
},
"loggingdate": {
"type": "date",
"format": "dateOptionalTime"
},
"reqmethod": {
"type": "string",
"index": "not_analyzed"
},
"respsize": {
"type": "long"
},
"resultcode": {
"type": "string",
"index": "not_analyzed"
},
"url": {
"type": "string",
"analyzer": "slash_analyzer"
},
"user": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
If your category field is string and is analyzed by default, then your -1 will be indexed as 1 (stripping the minus sign).
You will need that field to be not_analyzed or to add a sub-field which is not analyzed (as my solution below).
Something like this:
DELETE test
PUT /test
{
"mappings": {
"test": {
"properties": {
"category": {
"type": "string",
"fields": {
"notAnalyzed": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}
POST /test/test/1
{"category": "-1"}
POST /test/test/2
{"category": "2"}
POST /test/test/3
{"category": "3"}
POST /test/test/4
{"category": "4"}
POST /test/test/5
{"category2": "-1"}
GET /test/test/_search
{
"query": {
"bool": {
"must_not": [
{
"term": {
"category.notAnalyzed": {
"value": "-1"
}
}
},
{
"filtered": {
"filter": {
"missing": {
"field": "category"
}
}
}
}
]
}
}
}