Multiple Field search in Elasticsearch - elasticsearch

How can we do multiple field search in Elastic search.
for example I want to search subcategory and region, for one field it is working for multiple field search how we have to do.
Below link is working fine, since I am using one field only for search
http://34c512ba34534fffdfd12abfd69f2458.us-east-1.aws.found.io:9200/episodes/episode/_search?q=sub_cat_seo_url:english-news&sort=pubdate_timestamp:desc
but when I try to search multiple field for example sub_cat_seo_url and region it is not working
see this link (not working)
http://34c512ba34534fffdfd12abfd69f2458.us-east-1.aws.found.io:9200/episodes/episode/_search?q=sub_cat_seo_url:english-news,region:1&sort=pubdate_timestamp:desc
http://34c512ba34534fffdfd12abfd69f2458.us-east-1.aws.found.io:9200/episodes/episode/_search?q=sub_cat_seo_url:english-news&region:1&sort=pubdate_timestamp:desc

According to documentation, it should work
See http://www.elasticsearch.org/guide/reference/query-dsl/query-string-query.html
That being said, you can also use the following:
http://34c512ba34534fffdfd12abfd69f2458.us-east-1.aws.found.io:9200/episodes/episode/_search?q=%2Bsub_cat_seo_url%3Aenglish-news+%2Bregion%3A1&sort=pubdate_timestamp:desc
NOTE :
The existing mapping makes your field "sub_cat_seo_url" analyzed which is analyzed using standard analyzer. Hence, when you are searching for "english-news" it gets tokenized into "english", "news" which results in any document matching either english or news to be valid matches. For eg. "telugu-news" is a valid match for your query. Not sure if it is intentional.
In your mapping you need to mark it as "not_analyzed" for exact match.
Note : %2b is decoded as '+' whereas '+' is decoded as ' '

Related

How to search exact word in a test in Elastic Search

Let's say I have two texts:
Text 1 - "The fox has been living in the wood cabin for days."
Text 2 - "The wooden hammer is a dangerous weapon."
And I would like to search for the word "wood", without it matching me "wooden hammer". How would I do that in Elastic Search or nest?
Term query is used for exact matches search. However it's not recommended to use it against text fields, the following quote from term query documentation:
To better search text fields, the match query also analyzes your
provided search term before performing a search. This means the match
query can search text fields for analyzed tokens rather than an exact
term.
The term query does not analyze the search term. The term query only
searches for the exact term you provide. This means the term query may
return poor or no results when searching text fields.
The problem with text exact matches, as described in the Term query documentation:
By default, Elasticsearch changes the values of text fields as part of
analysis. This can make finding exact matches for text field values
difficult.
So, the documents data is modified (i.e., analyzed) before indexing. This depends on the index mapping definition for each field, defaults to the default index analyzer, or the standard analyzer.
But the default standard analyzer will not change the token "Wooden" to "Wood", this might happen if you used stemming for this field.
This means, if you don't use a different analyzer or stemming, querying with "Wood" shouldn't match "Wooden" token.
To summarize: Indexed data is modified/analyzed before indexing (based on the field mapping definition). Match query analyze the search query, while Term query doesn't analyze the search query. So you have to properly chose the field mapping and the search query to better suit your use case
For some use cases, like storing email addressed, phone numbers or keyword fields that always have the same value, consider using the Keyword type, which is suitable for exact matches in these use cases. However, ES recommends:
Avoid using keyword fields for full-text search. Use the text field
type instead.
So for better visibility and practical solution for your use case, it's better to elaborate more the field mapping you use and what you want to achieve.

Elastic search giving strange results

I am following this tutorial on elastic search.
Two employees have 'about' value as:
"about": "I love to go rock climbing"
"about": "I like to collect rock albums"
I run following query:
GET /megacorp/employee/_search {"query":{"match":{"about":"rock coll"}}}
Both above entries are returned, but surprisingly wit same score:
"_score": 0.2876821
Shouldn't the second one must have higher score as it has 'about' value containing both 'rock' and 'coll' while first one only contains 'rock'?
That totally depends on what analyzer you are using. if you are using standard or english analyzer this result is correct. I recommend you to spend some time working with elasticsearch's Analyze API to get familiar how each analyzer affect your text.
By the way, if you want second document to have higher score, take a look at Partial matching.
When we search on a full-text field, we need to pass the query string through the same analysis process as we have when we index a document, to ensure that we are searching for terms in the same form as those that exist in the index.
Analysis process usually consists of normalization and tokenization (the string is tokenized into individual terms by a tokenizer).
As for match Query:
If you run a match query against a full-text field, it will analyze the query string by using the correct analyzer for that field before executing the search. It just looks for the words that are specified.
So, in your match query Elasticsearch will look for occurrences of the whole separate words: rock or/and coll.
Your 2nd document doesn't contain a separate word coll but was matched by the word rock.
Conclusion: the 2 documents are equivalent in their _score value (they were matched by the same word rock)
Elasticsearch analyzes each text field before storing it. The default analyzer (standard analyzer) splits the text based on whitespaces and lowercases it. The output of analysis process is a list of tokens which are used to match your query tokens. If any of the tokens match exactly the relevant document is returned. That's being said, your second document doesn't contain the token col and that's why you are having the same score for both documents.
Even if you build your custom analyzer and use stemming, the word collect won't be stemmed as coll.
You can build custom analyzers in which you can specify that tokens should be of length 1 character, then Elasticsearch will consider each single character as a token and you can search for the existence of any character in your documents.

difference between a field and the field.keyword

If I add a document with several fields to an Elasticsearch index, when I view it in Kibana, I get each time the same field twice. One of them will be called
some_field
and the other one will be called
some_field.keyword
Where does this behaviour come from and what is the difference between both of them?
PS: one of them is aggregatable (not sure what that means) and the other (without keyword) is not.
Update : A short answer would be that type: text is analyzed, meaning it is broken up into distinct words when stored, and allows for free-text searches on one or more words in the field. The .keyword field takes the same input and keeps as one large string, meaning it can be aggregated on, and you can use wildcard searches on it. Aggregatable means you can use it in aggregations in elasticsearch, which resembles a sql group by if you are familiar with that. In Kibana you would probably use the .keyword field with aggregations to count distinct values etc.
Please take a look on this article about text vs. keyword.
Briefly: since Elasticsearch 5.0 string type was replaced by text and keyword types. Since then when you do not specify explicit mapping, for simple document with string:
{
"some_field": "string value"
}
below dynamic mapping will be created:
{
"some_field": {
"type" "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
As a consequence, it will both be possible to perform full-text search on some_field, and keyword search and aggregations using the some_field.keyword field.
I hope this answers your question.
Look at this issue. There is some explanation of your question in it. Roughly speaking some_field is analyzed and can be used for fulltext search. On the other hand some_field.keyword is not analyzed and can be used in term queries or in aggregation.
I will try to answer your questions one by one.
Where does this behavior come from?
It is introduced in Elastic 5.0.
What is the difference between the two?
some_field is used for full text search and some_field.keyword is used for keyword searching.
Full text searching is used when we want to include individual tokens of a field's value to be included in search. For instance, if you are searching for all the hotel names that has "farm" in it, such as hay farm house, Windy harbour farm house etc.
Keyword searching is used when we want to include the whole value of the field in search and not individual tokens from the value. For eg, suppose you are indexing documents based on city field. Aggregating based on this field will have separate count for "new" and "york" instead of "new york" which is usually the expected behavior.
From Elastic 5.0 onwards, strings now will be mapped both as keyword and text by default.

Search by ignore value case checking

In my index I have inserted fields without changing the case of values(Upper case or Lower case), like in my elasticsearch document a field name contains value Hello World. And i have made name field as not_analyzed for exact match. But in that case, when i search by hello world this document don’t returned by elasticsearch, might be due to case sensitivity. I have tried by using term query and match query but haven't found a luck.
Please suggest, if there is a way.
Thanks
The only way you can do this in Elasticsearch is by analyzing the field and using token filters. There is a lowercase token filter available that you should use but this can't really be done on-the-fly like SQL where you wrap the field to be queried against in something like LOWER().
To get the effect you desire I would use something like the Keyword tokenizer with the Lowercase token filter. If you set this analyzer to be the default analyzer for indexing and searching then your searches will also be case insensitive too.

How to query all fields individually with ElasticSearch

As I understand it, ElasticSearch searches on the magic _all field by default. The problem with this seems to be that if a field uses a different index analyzer, the analyzed data from this field is not searched.
I've had success with searching on the fields ['domain', '_all'] but I really need to avoid having to manually specify each field which was analyzed differently. I see fields supports wildcards but seemingly not '' on its own. I could do a, b*, c*, d* etc. but this seems a tad inefficient.
the special field "_all" is discontinued and copy_to function can be used instead as per the official documentation. This approach allows one to create a computed field (managed by elastic search) that one can specify to copy data from other fields to mimic _all search.
However there is an alternative approach through the use of multi_match providing wildcard field names as part of the query. This works just like the earlier mechanism searching "_all" field.
{"multi_match":{"query":"java","fields":["*"]}}]}}

Resources