Is match query case sensitive in elasticsearch? - elasticsearch

I have followed an example from here
The mapping for the index is
{
"mappings": {
"my_type": {
"properties": {
"full_text": {
"type": "string"
},
"exact_value": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
And the document indexed is
{
"full_text": "Quick Foxes!",
"exact_value": "Quick Foxes!"
}
I have noticed while using a simple match query on the "full_text" field like below
{
"query": {
"match": {
"full_text": "quick"
}
}
}
I get to see the document is matching. Also if I use uppercase, that is "QUICK" , as the search term, it shows the document is matching.
Why is it so?. By default the tokenizer would have splitted the text in "full_text" field in to "quick","foxes". So how is match query matching the document for upper cased values?

Because you haven't specified which analyzer to use for "full_text" field into your index mapping then the default analyzer is used. The default will be "Standard Analyzer".
Quote from ElasticSearch docs:
An analyzer of type standard is built using the Standard Tokenizer with the Standard Token Filter, Lower Case Token Filter, and Stop Token Filter.
Before executing the query in your index, ElasticSearch will apply the same analyzer configured for your field to your query values. Because the default analyzer uses Lower Case Token Filter in its processing then using "Quick" or "QUICK" or "quick" will give you to the same query because the analyzer will lower case them by using the Lower Case Token Filter and result to just "quick".

Related

How to conditionally apply an analyzer at index time to a field that could be one of many languages?

I have documents with a field (e.g. input_text) that contains a string that could be one of 20 odd languages. I have another field that has the short form of the language (e.g. lang)
I want to conditionally apply an analyzer at index time to the text field dependent on what the language is as detected from the language field.
I eventually want a Kibana dashboard with a single word cloud of the most common words in the text field (ie in multiple languages) but only words that have been stemmed and tokenized with stop words removed.
Is there a way to do this?
The elasticsearch documents suggest using multiple fields for each language and then specifying an analyzer for the appropriate field, but I can't do this as there are 20 some languages and this would overload my nodes.
There is no way to achieve what you want in Elasticsearch (applying analyzer to field A based on the value of field B).
I would recommend to create one index per language, and then create an index alias that groups all those indices and query against it.
PUT lang_de
{
"mappings": {
"properties": {
"input_text": {
"type": "text",
"analyzer": "german"
}
}
}
}
PUT lang__en
{
"mappings": {
"properties": {
"input_text": {
"type": "text",
"analyzer": "english"
}
}
}
}
POST _aliases
{
"actions": [
{
"add": {
"index": "lang_*",
"alias": "lang"
}
}
]
}

is there match phrase any query in elasticsearch?

In elasticsearch match_phrase query will match full phrase.
match_phrase_prefix query will match phrase as prefix.
for example:
"my_field": "confidence ab"
will match: "confidence above" and "confidence about".
is there query for "match phrase any" like below example:
"my_field": "dence ab"
should fetch match: "confidence above" and "confidence about"
Thanks
There are 2 ways that you can do this
Store the field values as-is in ES by applying keyword analyzer type in mapping => Do a wildcard search
(OR)
Store the field using ngram tokenizer => Do search your data based on your requirement with or without using standard or keyword search analyzers
usually wildcard search are performance inefficient .
Please do let me know on your progress based on my above suggestions so that I can help you further if needed
You need to define the mapping of your field to keyword like below:
PUT test
{
"mappings": {
"properties": {
"name":{
"type": "keyword"
}
}
}
}
Then search over this field using wildcard like below:
GET test/_search
{
"query": {
"wildcard": {
"name": {
"value": "*dence ab*"
}
}
}
}
Please let me know if your have any problem with this.
In your case, the simplest solution is using Query string query or Simple query string query. The latter one is less strict with the query syntax error.
First, make sure that your field is mapped with type text. The example below create a mapping for field named my_field under the test-index.
{
"test-index" : {
"mappings" : {
"properties" : {
"my_field" : {
"type" : "text"
}
}
}
}
}
Then, for searching, use query string query with wild-cards.
{
"query": {
"query_string": {
"fields": ["my_field"],
"query": "*dence ab*"
}
}
}

Elasticsearch copy_to not working on keyword field

I am trying to copy two fields onto a third field, which should have the type 'keyword' (because I want to be able to aggregate by it, and do not need to perform a full-text search)
PUT /test/_mapping/_doc
{
"properties": {
"first": {
"copy_to": "full_name",
"type": "keyword"
},
"last": {
"copy_to": "full_name",
"type": "keyword"
},
"full_name": {
"type": "keyword"
}
}
}
I then post a new document:
POST /test/_doc
{
"first": "Bar",
"last": "Foo"
}
And query it using the composite field full_name:
GET /test2/_search
{
"query": {
"match": {
"full_name": "Bar Foo"
}
}
}
And no hits are returned.
If the type of the composite field full_name were text then it works as expected and described in the docs:
https://www.elastic.co/guide/en/elasticsearch/reference/current/copy-to.html
Is it not possible to copy onto a keyword-type field?
The problem is that you use match query - When you index your docs you use keyword type which according to the ES documentation are "...only searchable by their exact value."
However when you query that field you use match query which is using the standard analyzer which, among other stuff, also does lower-casing which causes your terms to not match nothing.
You have few options I can think of in this case:
Change the field type to text which will perform the same analysis as the match query.
Create a custom field type with custom analyzer which will perform lower casing
Don't query more than a single term at a time and use term query instead of match
It seems that the type of destination field of copy_to must be text type.

Elasticsearch - match not_analyzed field with partial search term

I have a "name" field - not_analyzed in my elasticsearch index.
Lets say value of "name" field is "some name". My question is, if I want a match for the search term - some name some_more_name someother name because it contains some name in it, then will not_analyzed allow that match to happen, if not, then how can I get a match for the proposed search term?
During the indexing the text of name field is stored in inverted index. If this field was analyzed, 2 terms would go to the inverted index: some and name. But as it is not analyzed, only 1 term is stored: some name
During the search (using match query), by default your search query is analyzed and tokenized. So there will be several terms: some, name, some_more_name and someother. Then Elasticsearch will look at inverted index to see if there is at least one term from the search query. But there is only some name term, so you won't see this document in the result set.
You can play with analyzers using _analyze endpoint
Returning to your question, if you want to get a match for the proposed search query, your field must be analyzed.
If you need to keep non-analyzed version as well you should use multi fields:
PUT my_index
{
"mappings": {
"my_type": {
"properties": {
"name": {
"type": "keyword",
"fields": {
"analyzed": {
"type": "text"
}
}
}
}
}
}
}
Taras has explained clearly,and i issue might have resolved,but still if you cant change mapping of your index ,you can use query(I have tested in 5.4 ES)
GET test/_search
{
"query": {
"query_string": {
"default_field": "namekey",
"query": "*some* *name*",
"default_operator": "OR"
}
}

how to keep *only* longest term produced by PathHierarchy tokenizer in ElasticSearch?

I need to use PathHierarchy tokenizer during indexing stage. (so I could generate terms like "a", "a/b", "a/b/c".
But during search stage I would like to only keep the longest term ("a/b/c"). I need this because Kibana uses query_string type of queries so the query_string itself is analyzesed.
(question regarding Kibana queries is here:
do the queries for values analyzed with hierarchical path work correctly in Kibana and ElasticSearch?)
is it possible to create a custom analyzer which will use path_hierarchy tokenizer and then will apply a filter which will only keep the longest term?
You can use a different analyser for indexing and searching. Maybe this mapping can help you:
PUT /myindex
{
"mappings": {
"mytype":{
"properties": {
"path": {
"type": "string",
"index_analyzer": " path_hierarchy",
"search_analyzer": "keyword"
}
}
}
}
}

Resources