I have a problem. In my application I'm using ElasticSearch. I'm posting JSON object to my ElasticSearch server. That JSON object contain DSL query. So, what I need to do, is to query specific index for some data.
This is the query:
{
"query":{
"indices":{
"indices":[
"index-1"
],
"query":{
"bool":{
"must":[
{
"match":{
"_all":"this is test"
}
}
],
"should":[
{
"match":{
"country":"PL"
}
}
],
"minimum_should_match":1
}
},
"no_match_query":"none"
}
},
"from":0,
"size":10,
"sort":{
"name":{
"order":"ASC"
}
}
}
Query works just fine, it returns data which I want to. However, in ElasticSearch logs I can see:
[2015-05-28 22:08:20,942][DEBUG][action.search.type] [index] [twitter][0], node[X_ASxSKmn-Bzmn-nHLQ8g], [P], s[STARTED]: Failed to execute [org.elasticsearch.action.search.SearchRequest#7b98e9ad] lastShard [true]
org.elasticsearch.search.SearchParseException: [twitter][0]: query[MatchNoDocsQuery],from[0],size[10]: Parse Failure [Failed to parse source [HERE_COMES_MY_JSON]]
at org.elasticsearch.search.SearchService.parseSource(SearchService.java:681)
at org.elasticsearch.search.SearchService.createContext(SearchService.java:537)
at org.elasticsearch.search.SearchService.createAndPutContext(SearchService.java:509)
at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:264)
at org.elasticsearch.search.action.SearchServiceTransportAction$5.call(SearchServiceTransportAction.java:231)
at org.elasticsearch.search.action.SearchServiceTransportAction$5.call(SearchServiceTransportAction.java:228)
at org.elasticsearch.search.action.SearchServiceTransportAction$23.run(SearchServiceTransportAction.java:559)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.search.SearchParseException: [twitter][0]: query[MatchNoDocsQuery],from[0],size[10]: Parse Failure [No mapping found for [name] in order to sort on]
It tries to fetch something from twitter index, which is some standard out-of-the-box index for testing. Why? I specified that I want to search in index-1, not all of them.
I found workaround, just to add:
"ignore_unmapped" : true
in sort, but it's not really a solution.
I don't know if it matters, but I set up a REST which I'm calling, and inside my Java app I'm passing JSON to ElasticSearch like that:
Client client = new TransportClient(settings);
SearchRequestBuilder builder = client .prepareSearch().setSource(json);
SearchResponse response = builder.execute().actionGet();
Anyone have any clue what is wrong? I would really appreciate any
I think you misunderstood the functionality of indices here: it can be used when a certain query needs to be executed against a list of indices and another query that needs to be executed on indices that do not match the list of indices.
So, all depends on the indices you run this against.
For example:
GET /abc,test1,test2,test3/_search
{
"query": {
"indices": {
"indices": ["abc"],
"query": {
"bool": {
"must": [
...
will be run against abc, test1, test2, test3 indices. Indices that match "indices": ["abc"] will have the query run against. The other indices that don't match (in my example - test1, test2, test3) will have the query from no_match_query run against them.
So, it is important against which indices you run your indices query. And ignoring unmapped fields is the way to go here.
Related
I'm connecting my recommendation service with product service. The recommendation service, no matter what the parameters are, always returns a list of product ID sorted by relevancy. Example:
["ID1", "ID2", "ID3"]
The product service owns Elasticsearch indices that store the details of the products. The client expects the data of the recommended products along with the product details ordered by the relevancy. Hence I'm using this search query:
{
"query":{
"bool":{
"filter":[
{
"terms": {
"product_id": ["ID1", "ID2", "ID3"]
}
}
]
}
}
}
The problem is the result from that query is not sorted by the terms values' order. What changes can I make to achieve the goals?
P.S.: Any advice or reference in Elasticsearch index design, services' response format, or the system design for recommendation system would be much welcomed.
The terms query functions as an OR filter that scores the matches in a bool manner (true -> 1, false -> 0).
Having said that, you could generate a similar OR query via a query_string query that'd boost the individual IDs, thus increase their score, and consequently sort them higher:
{
"query":{
"bool":{
"should": [
{
"query_string": {
"default_field": "product_id",
"query": "ID1^3 OR ID2^2 OR ID3^1"
}
}
],
"filter":[
{
"terms": {
"product_id": ["ID1", "ID2", "ID3"]
}
}
]
}
}
}
The boost values above can of course be dynamically changed to account for the varying length of the list of IDs.
The problem:
Our log data has 27-34 million entries for a /event-heartbeat.
I need to filter those entries out to see just viable log messages in Kibana.
Using Kibana filters with wildcards does not work. Thus, I think I will have to write QueryDSL to do it in version 6.4.2 Elasticsearch to get it to filter out the event heart beats.
I have been looking and I can't find any good explanations on how to do an anti-pattern match so to search for all entries that don't have /event-heartbeat in the message.
Here is the log message:
#timestamp:
June 14th 2019, 12:39:09.225
host.name:
iislogs-production
source:
C:\inetpub\logs\LogFiles\W3SVC5\u_ex19061412.log
offset:
83,944,181
message:
2019-06-14 19:39:06 0.0.0.0 GET /event-heartbeat id=Budrug2UDw 443 - 0.0.0.0 - - 200 0 0 31
prospector.type:
log
input.type:
log
beat.name:
iislogs-production
beat.hostname:
MYHOSTNAME
beat.version:
6.4.2
_id:
yg6AV2sB0_n
_type:
doc
_index:
iislogs-production-6.4.2-2019.06.14
_score:
-
Message is a keyword field so I can do painless scripting on it.
I've used Lucene syntax
NOT message: "*/event-heartbeat*"
This is the anti pattern the kibana filter generates.
{
"query": {
"bool": {
"should": [
{
"match_phrase": {
"message": "*event-heartbeat*"
}
}
],
"minimum_should_match": 1
}
}
}
I've tried the proposed solution below by huglap. I also adjusted my query based on his comment and tried two ways. I adjust it with the term word instead of match and tried both ways because the field technically is a keyword so I could do painless scripting on it. The query still returns event heartbeat log entries.
Here are the two queries I tried from the below proposed solution:
GET /iislogs-production-*/_search
{
"query":{
"bool":{
"must":{
"match_all":{
}
},
"filter":{
"bool":{
"must_not":[
{
"term":{
"message.whitespace":"event-heartbeat"
}
}
]
}
}
}
}
}
GET /iislogs-production-*/_search
{
"query":{
"bool":{
"must":{
"match_all":{
}
},
"filter":{
"bool":{
"must_not":[
{
"match":{
"message.whitespace":"event-heartbeat"
}
}
]
}
}
}
}
}
Index Mapping:
https://gist.github.com/zukeru/907a9b2fa2f0d6f91a532b0865131988
Have you thought about a 'must_not' bool query?
Since your going for the whole set and not really caring about shaping the relevancy function, I suggest the use of a filter instead of a query. You'll get better performance.
{
"query":{
"bool":{
"must":{
"match_all":{
}
},
"filter":{
"bool":{
"must_not":[
{
"match":{
"message.whitespace":"event-heartbeat"
}
}
]
}
}
}
}
}
This example assumes you are querying against a text field, thus the use of a 'match' query instead of a 'term' one.
You also need to make sure that the field is analyzed (really tokenized) according to your goals. The fact that you have a dash in your query term will create problems if you're using a simple or even a standard analyser. Elasticsearch would break the term in two words. You could try the whitespace analyser on that one or just remove the dash from the query.
I know that ElasticSearch has an internal limit on how many clauses you can use in a bool query. This is controlled by the max_clause_count in the ElasticSearch.yml file.
But I thought that this limit did not apply to the values that were passed in the searches
So a query like the following would work, with more than 1024 values in the
terms query
{
"query":{
"bool":{
"should":[
{ "terms": {"id": ["cafe-babe-0000","cafe-babe-0001",... ]}}
]
}
}
}
But this query will launch a TooManyClauses Exception. So, in this case, the
number of values in the query also counts for this limit. Is it correct?
Also, I now that it's not the best way to perform this kind of queries, but
Is it possible to rewrite the previous query so that the limit is not exceeded?
You can use the ids query.
"query": {
"ids": {
"values": [ "cafe-babe-0000","cafe-babe-0001",... ]
}
}
For the best of i know there is no limitation on this query.
Let's say I have a structure like:
{"account_number":171,"balance":7091,
"firstname":"Nelda","lastname":"Hopper",
"age":39,"gender":"M",
"address":"742 Prospect Place","employer":"Equicom",
"email":"neldahopper#equicom.com",
"city":"Finderne","state":"SC"}
(the data comes from here).
If I write the following query in ElasticSearch:
POST /bank/_search?pretty
{
"query":
{ "bool":
{ "must":
[ { "range":
{ "balance": { "gte": 30000 } } } ] }
},
"fields":["gender", "balance", "age"],
"aggs":{
"age_filter":{
"filter":{
"match":{
"age":"30"
}
},
"aggs":{
"gender_stats":{
"terms":{"field":"gender"}
}
}
}
}
}
I'll get (1) 402 query results for the main query and (2) aggregation on the 18 results that passed the filter "age:30".
I've tried to do the similar trick in Solr 5.1, but the closes I could get was this:
q=balance:[30000%20TO%20*]&facet=true&facet.field=gender&fq=age:30
with the big difference that the filter is now applied to the main query results, so I get only 18 results at all, and then apply a corresponding faceting.
Is there a way to write a Solr query that is entirely equivalent to the ElasticSearch one? I.e. getting full results and then applying filtering only to the aggregation/faceting?
NB: I've tried exclusion by tag:
q={!ex=tagForAge}balance:[30000%20TO%20*]&facet=true&facet.field=gender&fq={!tag="tagForAge"}age:30
but it does not seem to apply to the main query.
Try appending &facet.query=age:30 to your query.
This will basically generate your facets from a particular search query which in your case is age:30.
For more information check here.
I'm querying an ElasticSearch database (the Danish CVR registry) using NEST in C#. I'm trying to formulate a query that will query this scheme:
relations: [
{
participant: {
key: 123123
},
organisations: [
{
organisationName: {
name: "some string",
period: {
from: "SOME DATE"
to: "SOMEDATE OR NULL"
}
},
... more of similar objects ..
}
]
},
.. more of similar objects ..
]
My problem here is that I need to find documents that have a certain participant.key value, while at the same time has a specific organisations.organisationName.name and a missing or null value in organisations.organisationName.period.to
I know I need to use a nested query to get documents that have both a null value in the to field and a certain name in the name field, but on top of that I need to also have the specific key in the particiant.key field, and this is where I'm having trouble. Note that all 3 fields that I'm checking must be within the same relations object, and the to and name fields must be within the same organisationName object.
The query without the key part as a JSON query is this:
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "relations.organisations.organisationName",
"score_mode": "max",
"query": {
"bool": {
"must": [
{ "match": { "relations.organisations.organisationName.name": "EJERREGISTER" }},
{"filtered": { "filter" : {
"missing" : { "field" : "relations.organisations.organisationName.period.to" }
} } }
]
}}}}
]
}}}
Hoping someone out there is apt at making these queries in the NEST Query DSL. I could also work from a pure ElasticSearch JSON query, but the .NET equivalent would be my preferred option :)
Thanks in advance!
After some experimentation I came to the conclusion that the right answer to my problem would be a query with a nested query that 1. Checks the key, and 2. has a nested query that does the other things I needed in organisation.organisationName object.
I couldn't quite verify this, however, because the database I'm querying does not have the relations-object marked as nested (and I can't change that since it's a government database)
My workaround was to retrieve all relations related to my keys, and then filtering out the remaining objects in memory, as this wasn't too much overhead in my scenario.
Edit: as a follow up, the external database I was using added the nested clause, and it worked as explained above.