Elastic search string filter - does such option exists? - elasticsearch

I am wondering, is there such an option like string filter?
I've recently bumped into the following error:
RequestError(400, 'search_phase_execution_exception',
'too_many_clauses: maxClauseCount is set to 1024')
According to Lucene's documentation, it says:
Use a filter to replace the part of the query that causes the exception.
Do you have any ideas?

The Lucene FAQ mentions a few approaches to overcoming the TooManyClauses exception which doesn't apply to Elasticsearch as before they used to have terms filter separately but now its part of terms query itself.
so below is the example how you could use terms in the filter context:
{
"query": {
"bool": {
"filter": [
{ "term": { "user" : ["kimchy", "elasticsearch"]},
]
}
}
}
If you really need to use a query instead of a filter then you can update
indices.query.bool.max_clause_count: n in the elasticsearch.yml (replace n with the number of clause count that you need) file of each node of the cluster and restart the cluster.
Note that this will increase the
memory requirements for searches that expand to many terms.

Related

Elasticsearch: What is the difference between a match and a term in a filter?

I was following an ES tutorial, and at some point I wrote a query using term in the filter instead the recommended solution using match. My understanding is that match was used in the query part to get scoring, while term was used in the filter part to just remove hits before enter the query part. To my surprise match also works in the filter part.
What is the difference between:
GET blogs/_search
{
"query": {
"bool": {
"filter": {
"match": {
"category.keyword": "News"
}
}
}
}
}
and:
GET blogs/_search
{
"query": {
"bool": {
"filter": {
"term": {
"category.keyword": "News"
}
}
}
}
}
Both returns the same hits, and the score is 0 for all hits.
What is the behaviour or match in a filter clause? I would expect it to yield some score, but it does not.
What I thought:
term : does not analyze either the parameter or the field, and it is a yes/no scenario.
match : analyzes parameter and field and calculates a score of how good they match.
But when using match against a keyword in the filter part of the query, how does it behave?
The match query is a high-level query that resorts to using a term query if it needs to.
Scoring has nothing to do with using match instead of term. Scoring kicks in when you use bool/must/should instead of bool/filter.
Here is how the match query works:
First, it checks the type of the field.
If it's a text field then the value will be analyzed, either with the analyzer specified in the query (if any), or with the search- or index-time analyzer specified in the mapping.
If it's a keyword field (like in your case), then the input is not analyzed and taken "as is"
Since you're using the match query on a keyword field and your input is a single term, nothing is analyzed and the match query resorts to using a term query underneath. This is why you're seeing the same results.
In general, it's always best to use a match query as it is smart enough to know what to do given the field you're querying and the input data you're searching for.
You can read more about the difference between the two here.

Is query context evaluated before filter context in elasticsearch? How to determine the order of evaluation?

I am using the below query :
GET customer/doc/_search?routing=123
{
"query": {
"bool": {
"filter": [
{
"term": {
"location": "Delhi"
}
}
],
"should": [
{
"match_phrase_prefix": {
"phone": {
"query": "650",
"max_expansions": 100
}
}
}
]
}
}
}
The problem is my search on phone isn't working anymore. It used to work fine when I had less data, now every shard has data for multiple locations. Search on phone now requires me to type in 6 or 7 characters at times. (There may be matching phone numbers that have different location but are on this shard)
This is due to max_expansions I am guessing. When I increase it to 500 it does return me search results (not all), but the query becomes slow.
Isn't there a way to force es to apply filter first (and restrict the dataset) and then apply the should clause, so that I get the matching results even with small value of max_expansions?
Any help is appreciated.
It is due to max_expansions. Restricting dataset is not exactly what you may want to do ( Thats also not very straight forward - you may have to use some script which will in turn slowdown query).
When you query for a wildcard expression, Lucene expands the wildcard expression into set of actual terms in your inverted index term dictionary. Now , when you restrict the term expansion to 500 - it might miss a few.
I would consider using prefixes during indexing phase. Prefixes helps to avoid the costly expansion in runtime phase.

Less restrictive search doesn't return any hits in ElasticSearch

The query below returns hits, for example where name is "Balances by bank":
GET /_search
{ "query": {
"multi_match": { "query": "Balances",
"fields": ["name","descrip","notes"]
}
}
}
So why this doesn't return anything? Note that the query is less restrictive, the word is "Balance" and not "Balances" with an s.
GET /_search
{ "query": {
"multi_match": { "query": "Balance",
"fields": ["name","descrip","notes"]
}
}
}
What search would return both?
You need to change your mapping to be able to do that.
If you didn't specified a mapping with specific analyzers when creating your index, elasticsearch will use the default mapping and analyzer.
The default mapping will map each text field as both text and keyword, so you will be able to performe full text search (match part of the string) and keyword search (match the whole string), but it will use the standard analyzer.
With the standard analyzer your example Balances by bank becomes the following list of tokens: [Balances, by, bank], those items are added to the inverted index and elasticsearch can find the documents when you search for any of them.
When you search for just Balance, this term does not exist in the inverted index and elasticsearch returns nothing.
To be able to return both Balance and Balances you need to change your mapping and use the analyzer for the english language, this analyzer will reduce your terms to their stem and match Balance, Balances as also Balancing, Balanced, Balancer etc.
Look at this part of the documentation to see how the analysis process work.
And of course, you can also search for Balance* and it will return both Balance and Balances, but it is a different query.

Application-side Joins Elasticsearch

I have two indexes in Elasticsearch, a system index, and a telemetry index. I'd like to perform queries and aggregations on the telemetry index using filters from the systems index. The systems index is relatively small and only receives new documents occasionally, but the telemetry index is much larger and is constantly receiving new documents. This seems like an ideal situation for using an application-side join.
I tried emulating the example query at the pervious link, but it turns out the filtered query is deprecated as of ES 5.0. (Why is this example in the current documentation?!)
Here are my queries:
GET /system/_search
{
"query": {
"match": {
"name": "George's system"
}
}
}
GET /telemetry/_search
{
"query": {
"bool":{
"must": {
"multi_match": {
"operator": "and",
"fields": ["systemId"]
, [1] }
}
}
}
}
}
The second one fails with a json_parse_exception because for some reason it doesn't like the [ ] characters after "fields".
Can anyone provide a simple example of using application-side joins?
Once such a query is defined (perhaps in Kibana's Dev Tools console) is there a way to visualize it in Kibana?
With elastic there is no way to execute two nested queries like in a relational database where the first query uses the response of the second. The example in the application-side join, means that you are actually making two queries (two different requests to elastic) on the application side.
First query you get the list of ids you need to filter on.
Second query you pass the list of ids that you got to the terms filter.
This works when you have no more than 1024 values for systemId. Because terms query has a limit on the number of terms.
Because this query is not feasible, then you can't visualize it in kibana.
In such case you have to sacrifice a little of space and add the systemId to your mapping.
Good Luck!

Querying large amounts of terms without expanding maxClauseCount

In a data flow of mine, I am trying to retrieve a subset of documents from a previous terms aggregation, but hitting the maxClauseCount limit within my ES cluster. The follow up query is along these lines:
GET dataset/_search
{
"size": 2000,
"query": {
"bool": {
"must": [
(a filter or two)...,
{
"terms":{
"otherid":[
"789e18f2-bacb-4e38-9800-bf8e4c65c206",
"8e6967aa-5b98-483e-b50f-c681c7396a6a",
...
]
}
}
]}
}
}
In my research I've come across a lookup - which sadly we can't use - as well as the ids query.
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-terms-query.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-ids-query.html
From experimentation, it appears that the ids query doesn't share the limit the terms query has (potentially it's not converted into terms clauses). Do any of you know if there's a good way to achieve similar functionality to the ids query without using the ids fields.
My version of ES is 5.0.
Thanks!
instead of using terms use the Terms filter it will solve the issue
OR
index.query.bool.max_clause_count: increase to higher value(*Not Recommended)
http://george-stathis.com/2013/10/18/setting-the-booleanquery-maxclausecount-in-elasticsearch/

Resources