Elasticsearch term query does not give any results - elasticsearch

I am very new to Elasticsearch and I have to perform the following query:
GET book-lists/book-list/_search
{
"query":{
"filtered":{
"filter":{
"bool":{
"must":[
{
"term":{
"title":"Sociology"
}
},
{
"term":{
"idOwner":"17xxxxxxxxxxxx45"
}
}
]
}
}
}
}
}
According to the Elasticsearch API, it is equivalent to pseudo-SQL:
SELECT document
FROM book-lists
WHERE title = "Sociology"
AND idOwner = 17xxxxxxxxxxxx45
The problem is that my document looks like this:
{
"_index":"book-lists",
"_type":"book-list",
"_id":"AVBRSvHIXb7carZwcePS",
"_version":1,
"_score":1,
"_source":{
"title":"Sociology",
"books":[
{
"title":"The Tipping Point: How Little Things Can Make a Big Difference",
"isRead":true,
"summary":"lorem ipsum",
"rating":3.5
}
],
"numberViews":0,
"idOwner":"17xxxxxxxxxxxx45"
}
}
And the Elasticsearch query above doesn't return anything.
Whereas, this query returns the document above:
GET book-lists/book-list/_search
{
"query":{
"filtered":{
"filter":{
"bool":{
"must":[
{
"term":{
"numberViews":"0"
}
},
{
"term":{
"idOwner":"17xxxxxxxxxxxx45"
}
}
]
}
}
}
}
}
This makes me suspect that the fact that "title" is the same name for the two fields is for something.
Is there a way to fix this without having to rename any of the fields. Or am I missing it somewhere else?
Thanks for anyone trying to help.

Your problem is described in the documentation.
I suspect that you don't have any explicit mapping on your index, which means elasticsearch will use dynamic mapping.
For string fields, it will pass the string through the standard analyzer which lowercases it (among other things). This is why your query doesn't work.
Your options are:
Specify an explicit mapping on the field so that it isn't analyzed before storing in the index (index: not_analyzed).
Clean your term query before sending it to elasticsearch (in this specific query lowercasing will work, but note that the standard analyzer also does other things like remove stop words, so depending on the title you may still have issues).
Use a different query type (e.g., query_string instead of term which will analyze the query before running it).
Looking at the sort of data you are storing you probably need to specify an explicit not_analyzed mapping.
For option three your query would look something like this:
{
"query":{
"filtered":{
"filter":{
"bool":{
"must":[
{
"query_string":{
"fields": ["title"],
"analyzer": "standard",
"query": "Sociology"
}
},
{
"term":{
"idOwner":"17xxxxxxxxxxxx45"
}
}
]
}
}
}
}
}
Note that the query_string query has special syntax (e.g., OR and AND are not treated as literals) which means you have to be careful what you give it. For this reason explicit mapping with a term filter is probably more appropriate for your use case.

I have described this issue in this blog.
The issue is coming due to default tokenization in Elasticsearch.
In the same , I have outlined 2 solutions.
One is enabling not_analyzed flag on the required field and other is to use keyword tokenizer.

To expand on solarissmoke's solution, while the contents of that field will be passed through the standard analyzer, your query will not. If you refer to the Elasticsearch documentation on the term query, you will see that term queries are not analyzed.
The match query is probably more appropriate for your case. What you query will be analyzed in the same way as the contents of the title field by default. The query_string query brings a lot more to the table and you should review the documentation if you plan on using that.
So again pretty much what you had with the small tweak:
GET book-lists/book-list/_search
{
"query":{
"filtered":{
"filter":{
"bool":{
"must":[
{
"match":{
"title":"Sociology"
}
},
{
"term":{
"idOwner":"17xxxxxxxxxxxx45"
}
}
]
}
}
}
}
}
It is important to note passing lowercase version of the terms to the term query (hack - does not seem like a good idea given what solarissmoke describe about the other features of the Standard analyzer like the stop filter), using the query_string query, or using the match query is still very different from the SQL query you described:
SELECT document
FROM book-lists
WHERE title = "Sociology"
AND idOwner = 17xxxxxxxxxxxx45
With those Elasticsearch queries, you can match records where idOwner might be the same but title might be something like "Another Sociology Title" which is different from what you would expect with that SQL. Here is some great stuff from the documentation and another stackoverflow post that will elaborate on what was going on, where term queries and filters are appropriate, and getting exact matches:
Elasticsearch : Finding Exact Values
Stackoverflow : Exact (not substring) matching in Elasticsearch

Related

How can we make few tokens to be phrase in elastic search query

I want to search part of query to be considered as phrase .For e.g. I want to search "Can you show me documents for Hospitality and Airline Industry"
Here I want Airline Industry to be considered as phrase.I dont find any such settings in multi_match .
Even when we try to use multi_match query using "Can you show me documents for Hospitality and \"Airline Industry\"" .Default analyser breaks it into separate tokens.I dont want to change settings of my analyser.Also I have found that we can do this in simple_query_string but that has consequences that we can not apply filter option as we have in multi_match boolean query because I want to apply filter on certain feilds as well.
search_text="Can you show me documents for Hospitality and Airline Industry" Now I Want to pass Airline Industry as a phrase to search my indexed document against 2 fields.
okay so say I have existing code like this.
If filter:
qry={
“query":{
“bool”:{
“must”:{
"multi_match":{
"query":search_text,
"type":"best_fields",
"fields":["TITLE1","TEXT"],
"tie_breaker":0.3,
}
},
“filter”:{“terms”:{“GRP_CD”:[“1234”,”5678”] }
}
}
else:
qry={
"query":{
"multi_match":{
"query":search_text',
"type":"best_fields",
"fields":["TITLE1",TEXT"],
"tie_breaker":0.3
}
}
}
'But then I have realised this code is not handling Airline Industry as a phrase even though I am passing search string like this
"Can you show me documents for Hospitality and \"Airline Industry\""
As per elastic search document I came to know there is this query which might handle this
qry={"query":{
"simple_query_string":{
"query":"Can you show me documents for Hospitality and \"Airline Industry\"",
"fields":["TITLE1","TEXT"] }
} }
But now my issue is what if user want to apply filter..with filter query as above I can not pass phrase and boolean query is not possible with simple_query_string'
You can always combine queries using boolean query. Lets understand this case by case. Before going to the cases I would like to clarify one thing which is about filter. The filter clause of boolean query behave just like a must clause but the difference is that any query (even another boolean query with a must/should clause(s)) inside filter clause have filter context. Filter context means, that part of query will not be considered for score calculation.
Now lets move on to cases:
Case 1: Only query and no filters.
{
"query": {
"bool": {
"must": [
{
"simple_query_string": {
"query": "Can you show me documents for Hospitality and \"Airline Industry\"",
"fields": [
"TITLE1",
"TEXT"
]
}
}
]
}
}
}
Notice that the query is same as specified by you in the question. All I have done here is that I wrapped it in a bool query. This doesn't make any logical change to the query but doing so will make it easier to add queries to filter clause programmatically.
Case 2: Phrase query with filter.
{
"query": {
"bool": {
"must": [
{
"simple_query_string": {
"query": "Can you show me documents for Hospitality and \"Airline Industry\"",
"fields": [
"TITLE1",
"TEXT"
]
}
}
],
"filter": [
{
"terms": {
"GRP_CD": [
"1234",
"5678"
]
}
}
]
}
}
}
This way you can combine query(query context) with the filters.

How do I do an Anti Match Pattern on Keyword Field Elasticsearch Query 6.4.2

The problem:
Our log data has 27-34 million entries for a /event-heartbeat.
I need to filter those entries out to see just viable log messages in Kibana.
Using Kibana filters with wildcards does not work. Thus, I think I will have to write QueryDSL to do it in version 6.4.2 Elasticsearch to get it to filter out the event heart beats.
I have been looking and I can't find any good explanations on how to do an anti-pattern match so to search for all entries that don't have /event-heartbeat in the message.
Here is the log message:
#timestamp:
June 14th 2019, 12:39:09.225
host.name:
iislogs-production
source:
C:\inetpub\logs\LogFiles\W3SVC5\u_ex19061412.log
offset:
83,944,181
message:
2019-06-14 19:39:06 0.0.0.0 GET /event-heartbeat id=Budrug2UDw 443 - 0.0.0.0 - - 200 0 0 31
prospector.type:
log
input.type:
log
beat.name:
iislogs-production
beat.hostname:
MYHOSTNAME
beat.version:
6.4.2
_id:
yg6AV2sB0_n
_type:
doc
_index:
iislogs-production-6.4.2-2019.06.14
_score:
-
Message is a keyword field so I can do painless scripting on it.
I've used Lucene syntax
NOT message: "*/event-heartbeat*"
This is the anti pattern the kibana filter generates.
{
  "query": {
    "bool": {
      "should": [
        {
          "match_phrase": {
            "message": "*event-heartbeat*"
          }
        }
      ],
      "minimum_should_match": 1
    }
  }
}
I've tried the proposed solution below by huglap. I also adjusted my query based on his comment and tried two ways. I adjust it with the term word instead of match and tried both ways because the field technically is a keyword so I could do painless scripting on it. The query still returns event heartbeat log entries.
Here are the two queries I tried from the below proposed solution:
GET /iislogs-production-*/_search
{
"query":{
"bool":{
"must":{
"match_all":{
}
},
"filter":{
"bool":{
"must_not":[
{
"term":{
"message.whitespace":"event-heartbeat"
}
}
]
}
}
}
}
}
GET /iislogs-production-*/_search
{
"query":{
"bool":{
"must":{
"match_all":{
}
},
"filter":{
"bool":{
"must_not":[
{
"match":{
"message.whitespace":"event-heartbeat"
}
}
]
}
}
}
}
}
Index Mapping:
https://gist.github.com/zukeru/907a9b2fa2f0d6f91a532b0865131988
Have you thought about a 'must_not' bool query?
Since your going for the whole set and not really caring about shaping the relevancy function, I suggest the use of a filter instead of a query. You'll get better performance.
{
"query":{
"bool":{
"must":{
"match_all":{
}
},
"filter":{
"bool":{
"must_not":[
{
"match":{
"message.whitespace":"event-heartbeat"
}
}
]
}
}
}
}
}
This example assumes you are querying against a text field, thus the use of a 'match' query instead of a 'term' one.
You also need to make sure that the field is analyzed (really tokenized) according to your goals. The fact that you have a dash in your query term will create problems if you're using a simple or even a standard analyser. Elasticsearch would break the term in two words. You could try the whitespace analyser on that one or just remove the dash from the query.

Understanding boosting in ElasticSearch

I've been using ElasticSearch for a little bit with the goal of building a search engine and I'm interested in manually changing the IDFs (Inverse Document Frequencies) of each term to match the ones one can measure from the Google Books unigrams.
In order to do that I plan on doing the following:
1) Use only 1 shard (so IDFs are not computed for every shard and they are "global")
2) Get the ttf (total term frequency, which is used to compute the IDFs) for every term by running this query for every document in my index
curl -XGET 'http://localhost:9200/index/document/id_doc/_termvectors?pretty=true' -d '{
"fields" : ["content"],
"offsets" : true,
"term_statistics" : true
}'
3) Use the Google Books unigram model to "rescale" the ttf for every term.
The problem is that, once I've found the "boost" factors I have to use for every term, how can I use this in a query?
For instance, let's consider this example
"query":
{
"bool":{
"should":[
{
"match":{
"title":{
"query":"cat",
"boost":2
}
}
},
{
"match":{
"content":{
"query":"cat",
"boost":2
}
}
}
]
}
}
Does that mean that the IDFs of the term "cat" is going to be boosted / multiplied by a factor of 2?
Also, what happens if instead of search for one word I have a sentence? Would that mean that the IDFs of each word is going to be boosted by 2?
I tried to understand the role of the boost parameter (https://www.elastic.co/guide/en/elasticsearch/guide/current/query-time-boosting.html) and t.getBoost(), but that seems a little confusing.
The boost is used when query with multi query clauses, example:
{
"bool":{
"should":[
{
"match":{
"clause1":{
"query":"query1",
"boost":3
}
}
},
{
"match":{
"clause2":{
"query":"query2",
"boost":2
}
}
},
{
"match":{
"clause3":{
"query":"query1",
"boost":1
}
}
}
]
}
}
In the above query, it means clause1 is three times important than clause3, clause2 is the twice important than clause2, It's not simply multiply 3, 2, because when calculate score, because there is normalized for scores.
also if you just query with one query clause with boost, it's not useful.
An usage scenario for using boost:
A set of page document set with title and content field.
You want to search title and content with some terms, and you think title is more important than content when search these documents. so you can set title query clause boost more than content. Such as if your query hit one document by title field, and one hit document by content field, and you want to hit title field's document prior to the content field document. so boost can help you do it.

elasticsearch filter query not work

I try to make a query with filtering but it fails, Bad Request comes as reponse
{
"query":{
"filtered":{
"query":{
"logdate":{
"gte":"01-01-2014"
}
}
}
}
}
I search documentation online and see it works as same part of my code but something in there is not true that I cant figure out
you seem "query" tag in filter comment in online documentation of elasticsearch or elsewhere ? lol never go there. Use "filter" tag in filtered query and also you must add "range" field. here This is the true form of your query
{
"query":{
"filtered":{
"filter":{
"range":{
"logdate":{
"gte":"01-01-2014"
}
}
}
}
}
}

ElasticSearch - Searching with hyphens

Elastic Search 1.6
I want to index text that contains hyphens, for example U-12, U-17, WU-12, t-shirt... and to be able to use a "Simple Query String" query to search on them.
Data sample (simplified):
{"title":"U-12 Soccer",
"comment": "the t-shirts are dirty"}
As there are quite a lot of questions already about hyphens, I tried the following solution already:
Use a Char filter: ElasticSearch - Searching with hyphens in name.
So I went for this mapping:
{
"settings":{
"analysis":{
"char_filter":{
"myHyphenRemoval":{
"type":"mapping",
"mappings":[
"-=>"
]
}
},
"analyzer":{
"default":{
"type":"custom",
"char_filter": [ "myHyphenRemoval" ],
"tokenizer":"standard",
"filter":[
"standard",
"lowercase"
]
}
}
}
},
"mappings":{
"test":{
"properties":{
"title":{
"type":"string"
},
"comment":{
"type":"string"
}
}
}
}
}
Searching is done with the following query:
{"_source":true,
"query":{
"simple_query_string":{
"query":"<Text>",
"default_operator":"AND"
}
}
}
What works:
"U-12", "U*", "t*", "ts*"
What didn't work:
"U-*", "u-1*", "t-*", "t-sh*", ...
So it seems the char filter is not executed on search strings?
What could I do to make this work?
The answer is really simple:
Quote from Igor Motov: Configuring the standard tokenizer
By default the simple_query_string query doesn't analyze the words
with wildcards. As a result it searches for all tokens that start with
i-ma. The word i-mac doesn't match this request because during
analysis it's split into two tokens i and mac and neither of these
tokens starts with i-ma. In order to make this query find i-mac you
need to make it analyze wildcards:
{
"_source":true,
"query":{
"simple_query_string":{
"query":"u-1*",
"analyze_wildcard":true,
"default_operator":"AND"
}
}
}
the Quote from Igor Motov is true, you have to add "analyze_wildcard":true, in order to make it worked with regex. But it is important to notice that the hyphen actually tokenizes "u-12" in "u" "12", two separated words.
if preserve the original is important do not use Mapping char filter. Otherwise is kind of useful.
Imagine that you have "m0-77", "m1-77" and "m2-77", if you search m*-77 you are going to have zero hits. However you can remplace "-" (hyphen) with AND in order to connect the two separed words and then search m* AND 77 that is going to give you a correct hit.
you can do it in the client front.
In your problem u-*
{
"query":{
"simple_query_string":{
"query":"u AND 1*",
"analyze_wildcard":true
}
}
}
t-sh*
{
"query":{
"simple_query_string":{
"query":"t AND sh*",
"analyze_wildcard":true
}
}
}
If anyone is still looking for a simple workaround to this issue, replace hyphen with underscore _ when indexing data.
For eg, O-000022334 should indexed as O_000022334.
When searching, replace underscore back to hyphen again when displaying results. This way you can search for "O-000022334" and it will find a correct match.

Resources