How do boolean predicates work in Elasticsearch query string syntax - elasticsearch

I have a question regarding the ES query string syntax. I am searching logstash log-entries containg xml documents and I'd like to search for documents containg certain XML attributes with certain values. When searching for:
id: foobar AND attrName=SomeValue
In my data set this query finds lets say 100 documents
When searching for:
id: foobar AND attrName SomeValue
I get less documents. Why is that, when according to the query_string docs the default operator is OR.
When I escape the " character and query like this I get the correct results:
id: foobar AND attrName=\"SomeValue\"
I'm running the query using the following json:
{
"sort": [
"#timestamp"
],
"query": {
"query_string": {
"query": "mySearchText"
}
},
"fields": [
"_id"
],
"size": 100
}
Any tips on how to search in XML documents containing only elements and attributes but no text nodes.
Edit #1: I just stumpbled upon another thing I don't understand. Why are these queries different:
a AND b OR c
is different than:
a AND (b OR c)
Any tips on how these queries are evaluated?
Edit #2: Okay I think I nailed down what behaviour is confusing me.
When my query string looks like this:
id: foo AND attrName=\"SomeValue\" AND field2:bar
I get all documents where:
- id=foo
- field2=bar
- contain the text attrName AND the text SomeValue
When I change my query to (added parentheses):
id: foo AND (attrName=\"SomeValue\") AND field2:bar
I get all documents where:
- id=foo
- field2=bar
- contain the text attrName OR the text SomeValue
Why is (attrName=\"SomeValue\") evaluated as attrName OR SomeValue, whereas without parentheses it is attrName AND SomeValue?

Related

Elasticsearch: Constant score applied within match query, but after search terms have been analysed?

Imagine I have some documents, with the following values contained within a text field called name
Document1: abc xyz group
Document2: group x/group y
Document3: group 1, group 2, group 3, group 4
Now imagine I'm sending a simple match query to ES for the term 'group':
{
"query": {
"match": {
"name": "group"
}
}
}
My desired outcome would be that all 3 documents would return with the same score, no matter how often the term appears, where it appears, etc.
Now, I already know that I can do this by wrapping my match with a constant_score, like so:
{
"query": {
"constant_score": {
"filter": {
"match": {
"name": "group"
}
},
"boost": 1
}
}
}
BUT, say I now want to query using the search term abc group. In this case, what I want to happen is that Document2 and Document3 will return the same score (matches group), but Document1 to have a better score as it matches both abc and group.
With a constant_score wrapping my match query, documents that contain any of the terms return the same score (i.e Document1, 2 and 3 return the same score for abc group). If I remove the constant_score, then Document 3 has the best score presumably because it contains more matches with the search text (group appearing 4 times).
It seems as though I need a way of moving the constant_score query to after the match query has analyzed my search text. Effectively causing a query of abc group to be two constant_score queries - one for abc and one for group.
Does anyone know of a way to achieve this?
I've managed to solve this by utilising Elasticsearch's unique token filter: https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-unique-tokenfilter.html
I've added that to my name field in the index mappings, and it looks to be retrieving the desired results without having to worry about constant_score.
Note however all this does is eliminate term frequencies from having any effect on the _score - other metrics (such as fieldLength) still have an effect on the results. This isn't, therefore, the equivalent of using a post-analyzed version of constant_score as I hypothesized in the question, however this will suffice for my current requirements.

Elasticsearch simple query string: removing documents containing words

I created a foo example to express what I mean. Suppose we have an index which documents contain the words Text and Texture.
Then I'd like to select all documents containing the word Text (I'm using the simple query string).
When I use the query "query": "Text", I get areas 1, 2 and 3 from the picture bellow.
When I use the query "query": "Text -Texture", I get only the area 3 from the picture bellow.
How could I get both areas 2 and 3?
Thanks.
To understand your problem you need to post your query.
Try to use term:
{
"query": {
"term": {
"myField": "Text"
}
}
}

elasticsearch: or operator, number of matches

Is it possible to score my searches according to the number of matches when using operator "or"?
Currently query looks like this:
"query": {
"function_score": {
"query": {
"match": {
"tags.eng": {
"query": "apples banana juice",
"operator": "or",
"fuzziness": "AUTO"
}
}
},
"script_score": {
"script": # TODO
},
"boost_mode": "replace"
}
}
I don't want to use "and" operator, since I want documents containing "apple juice" to be found, as well as documents containing only "juice", etc. However a document containing the three words should score more than documents containing two words or a single word, and so on.
I found a possible solution here https://github.com/elastic/elasticsearch/issues/13806
which uses bool queries. However I don't know how to access the tokens (in this example: apples, banana, juice) generated by the analyzer.
Any help?
Based on the discussions above I came up with the following solution, which is a bit different that I imagined when I asked the question, but works for my case.
First of all I defined a new similarity:
"settings": {
"similarity": {
"boost_similarity": {
"type": "scripted",
"script": {
"source": "return 1;"
}
}
}
...
}
Then I had the following problem:
a query for "apple banana juice" had the same score for a doc with tags ["apple juice", "apple"] and another doc with tag ["banana", "apple juice"]. Although I would like to score the second one higher.
From the this other discussion I found out that this issue was caused because I had a nested field. And I created a usual text field to address it.
But I also was wanted to distinguish between a doc with tags ["apple", "banana", "juice"] and another doc with tag ["apple banana juice"] (all three words in the same tag). The final solution was therefore to keep both fields (a nested and a text field) for my tags.
Finally the query consists of bool query with two should clauses: the first should clause is performed on the text field and uses an "or" operator. The second should clause is performed on the nested field and uses and "and operator"
Despite I found a solution for this specific issue, I still face a few other problems when using ES to search for tagged documents. The examples in the documentation seem to work very well when searching for full texts. But does someone know where I can find something more specific to tagged documents?

search elasticsearch fields with dashes in the field name

EDIT: seems it is an important detail that the field names with dashes have further subproperties which are the ones I am trying to search.
I have some elasticsearch documents with dashes in some field names like this:
{
"item": {
"item-value": {
"subvalue": "subvalue"
},
"item-name": "name"
},
"other_field": "other_value"
}
When I try match queries on "other_field" and "item.item-name", hits are returned. Queries on item.item-value.subvalue return 0 hits every time even when there should be matches.
{"match": {"item.item-subvalue.subvalue": "subvalue"}}
Is there anything else I can manipulate in the query or settings to make this field match without restructuring the documents?
Looks like a typo. {"match": {"item.item-value.subvalue": "subvalue"}}

Is it possible to chain fquery filters in elastic search with exact matches?

I have been having trouble writing a method that will take in various search parameters in elasticsearch. I was working with queries that looked like this:
body:
{query:
{filtered:
{filter:
{and:
[
{term: {some_term: "foo"}},
{term: {is_visible: true}},
{term: {"term_two": "something"}}]
}
}
}
}
Using this syntax I thought I could chain these terms together and programatically generate these queries. I was using simple strings and if there was a term like "person_name" I could split the query into two and say "where person_name match 'JOHN'" and where person_name match 'SMITH'" getting accurate results.
However, I just came across the "fquery" upon asking this question:
Escaping slash in elasticsearch
I was not able to use this "and"/"term" filter searching a value with slashes in it, so I learned that I can use fquery to search for the full value, like this
"fquery": {
"query": {
"match": {
"by_line": "John Smith"
But how can I search like this for multiple items? IT seems that when i combine fquery and my filtered/filter/and/term queries, my "and" term queries are ignored. What is the best practice for making nested / chained queries using elastic search ?
As in the comment below, yes I can just add fquery to the "and" block like so
{:filtered=>
{:filter=>
{:and=>[
{:term=>{:is_visible=>true}},
{:term=>{:is_private=>false}},
{:fquery=>
{:query=>{:match=>{:sub_location=>"New JErsey"}}}}]}}}
Why would elasticsearch also return results with "sub_location" = "new York"? I would like to only return "new jersey" here.
A match query analyzes the input and by default it is a boolean OR query if there are multiple terms after the analysis. In your case, "New JErsey" gets analyzed into the terms "new" and "jersey". The match query that you are using will search for documents in which the indexed value of field "sub_location" is either "new" or "jersey". That is why your query also matches documents where the value of field "sub_location" is "new York" because of the common term "new".
To only match for "new jersey", you can use the following version of the match query:
{
"query": {
"match": {
"sub_location": {
"query": "New JErsey",
"operator": "and"
}
}
}
}
This will not match documents where the value of field "sub_location" is "New York". But, it will match documents where the value of field "sub_location" is say "York New" because the query finally translates into a boolean query like "York" AND "New". If you are fine with this behaviour, well and good, else read further.
All these issues arise because you are using the default analyzer for the field "sub_location" which breaks tokens at word boundaries and indexes them. If you really do not care about partial matches and want to always match the entire string, you can make use of custom analyzers to use Keyword Tokenizer and Lowercase Token Filter. Mind you, going ahead with this approach will need you to re-index all your documents again.

Resources