Elasticsearch - match not_analyzed field with partial search term - performance

I have a "name" field - not_analyzed in my elasticsearch index.
Lets say value of "name" field is "some name". My question is, if I want a match for the search term - some name some_more_name someother name because it contains some name in it, then will not_analyzed allow that match to happen, if not, then how can I get a match for the proposed search term?

During the indexing the text of name field is stored in inverted index. If this field was analyzed, 2 terms would go to the inverted index: some and name. But as it is not analyzed, only 1 term is stored: some name
During the search (using match query), by default your search query is analyzed and tokenized. So there will be several terms: some, name, some_more_name and someother. Then Elasticsearch will look at inverted index to see if there is at least one term from the search query. But there is only some name term, so you won't see this document in the result set.
You can play with analyzers using _analyze endpoint
Returning to your question, if you want to get a match for the proposed search query, your field must be analyzed.
If you need to keep non-analyzed version as well you should use multi fields:
PUT my_index
{
"mappings": {
"my_type": {
"properties": {
"name": {
"type": "keyword",
"fields": {
"analyzed": {
"type": "text"
}
}
}
}
}
}
}

Taras has explained clearly,and i issue might have resolved,but still if you cant change mapping of your index ,you can use query(I have tested in 5.4 ES)
GET test/_search
{
"query": {
"query_string": {
"default_field": "namekey",
"query": "*some* *name*",
"default_operator": "OR"
}
}

Related

is there match phrase any query in elasticsearch?

In elasticsearch match_phrase query will match full phrase.
match_phrase_prefix query will match phrase as prefix.
for example:
"my_field": "confidence ab"
will match: "confidence above" and "confidence about".
is there query for "match phrase any" like below example:
"my_field": "dence ab"
should fetch match: "confidence above" and "confidence about"
Thanks
There are 2 ways that you can do this
Store the field values as-is in ES by applying keyword analyzer type in mapping => Do a wildcard search
(OR)
Store the field using ngram tokenizer => Do search your data based on your requirement with or without using standard or keyword search analyzers
usually wildcard search are performance inefficient .
Please do let me know on your progress based on my above suggestions so that I can help you further if needed
You need to define the mapping of your field to keyword like below:
PUT test
{
"mappings": {
"properties": {
"name":{
"type": "keyword"
}
}
}
}
Then search over this field using wildcard like below:
GET test/_search
{
"query": {
"wildcard": {
"name": {
"value": "*dence ab*"
}
}
}
}
Please let me know if your have any problem with this.
In your case, the simplest solution is using Query string query or Simple query string query. The latter one is less strict with the query syntax error.
First, make sure that your field is mapped with type text. The example below create a mapping for field named my_field under the test-index.
{
"test-index" : {
"mappings" : {
"properties" : {
"my_field" : {
"type" : "text"
}
}
}
}
}
Then, for searching, use query string query with wild-cards.
{
"query": {
"query_string": {
"fields": ["my_field"],
"query": "*dence ab*"
}
}
}

Elasticsearch Per-Field Boosts, Wildcard and Explicit Field Matches Conflict

Some queries in Elasticsearch offer wildcard matching for fields with boosts, for example, the simple_query_string query.
In our use-case we would like to boost specific fields while giving all other fields a score of 0. We thought this could be achieved with the following example query pattern:
GET /kibana_sample_data_ecommerce/_search
{
"profile": "true",
"query": {
"simple_query_string": {
"query": "Eddie",
"fields": [
"*^0",
"customer_full_name^100"
]
}
}
}
It appears though that if several field definitions match the same field name via wildcards their boosts are multiplied. The above example will yield a score of 0 for all documents even if they contain the token "Eddie" in the customer_full_name field.
This following example demonstrates that the boosts are multiplied:
GET /kibana_sample_data_ecommerce/_search
{
"profile": "true",
"query": {
"simple_query_string": {
"query": "Eddie",
"fields": [
"*^0.1",
"customer_full_name^100"
]
}
}
}
It leads to the expression (customer_full_name:eddie)^10.0 in the profile explanation of the query.
Does that mean that it is not possible to achieve our desired outcome with field boosts? The desired outcome is: All matches in a specific field have theirs score multiplied by 100 while all documents with matches in other fields are still returned but have 0 score.

terms for each field vs values for each field in _all elasticsearch

I just started learning elasticsearch and would like to know what is the difference between terms and value in the following sentence that I copied from elasticsearch website:
"It is important to note that the _all field combines the original values from each field as a string. It does not combine the terms from each field.
While I understand what a value is, I have been scratching my head over terms for each field!
Can someone help me what it means, please?
The paragraph preceding the one you have pasted gives some explanation:
The date_of_birth field in the above example is recognised as a date field and so will index a single term representing 1970-10-24 00:00:00 UTC. The _all field, however, treats all values as strings, so the date value is indexed as the three string terms: "1970", "24", "10".
In other words, the _all field takes the original values from the indexed document and runs them through its own analyzer, producing its own terms which are then stored in the index. It does not use the terms produced by analyzers of other fields.
One example is given in the paragraph I've pasted above. It explains that the date_of_birth field will be recognized as a date type and therefore will analyze and store the field value as a single term 1970-10-24 00:00:00 UTC. So if you will try to match the date_of_birth field with a match query like this:
{ "query": { "match: { "date_of_birth": "24 10" } } }
You won't find that document because the parser won't be able to parse the provided value as a date.
On the other hand, if you will run the same query on the _all field, you will definitely find that document:
{ "query": { "match: { "_all": "24 10" } } }
Because, as the documentation suggests, the _all field will include following text type terms: ["1970", "10", "24"].
Let's look at another example. Assume you have the following mapping of user type:
"user": {
"properties": {
"nickname": { "type": "keyword" },
"name": { "type": "text" },
"age": { "type": "integer" }
}
}
And you index the following document:
{
"nickname": "Super-Man",
"name": "John",
"age": 25
}
Elasticsearch will analyze the fields of this document according to their types, eventually storing following terms for each of these fields:
_all: ["super", "man", "john", "25"] - all strings
nickname: ["Super-Man"]
name: ["john"]
age: [25] - integer
Therefore, if you will try to find this document using a match (or a term) query where nickname equals to super you won't find it. Because nickname field was analyzed as a keyword, you must use the exact string to find it - "Super-Man".
But if you try to find this document using a match query where _all equals to super, you will find it.
On the other hand, if you try to find this document using a term query over the _all field an integer value 25, you won't find it. Again, because _all field is just a text field:
{ "query": { term": { "_all": 25} } }
But running the same query on the age field will return the document:
{ "query": { term": { "age": 25} } }

Is match query case sensitive in elasticsearch?

I have followed an example from here
The mapping for the index is
{
"mappings": {
"my_type": {
"properties": {
"full_text": {
"type": "string"
},
"exact_value": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
And the document indexed is
{
"full_text": "Quick Foxes!",
"exact_value": "Quick Foxes!"
}
I have noticed while using a simple match query on the "full_text" field like below
{
"query": {
"match": {
"full_text": "quick"
}
}
}
I get to see the document is matching. Also if I use uppercase, that is "QUICK" , as the search term, it shows the document is matching.
Why is it so?. By default the tokenizer would have splitted the text in "full_text" field in to "quick","foxes". So how is match query matching the document for upper cased values?
Because you haven't specified which analyzer to use for "full_text" field into your index mapping then the default analyzer is used. The default will be "Standard Analyzer".
Quote from ElasticSearch docs:
An analyzer of type standard is built using the Standard Tokenizer with the Standard Token Filter, Lower Case Token Filter, and Stop Token Filter.
Before executing the query in your index, ElasticSearch will apply the same analyzer configured for your field to your query values. Because the default analyzer uses Lower Case Token Filter in its processing then using "Quick" or "QUICK" or "quick" will give you to the same query because the analyzer will lower case them by using the Lower Case Token Filter and result to just "quick".

Elasticsearch doesn't return results for a specific term search

I am attempting to do a query where I filter on term for a specific term. This is the query I am attempting to run:
{
"query": {
"filtered": {
"filter": {
"term": {
"tags": "sports"
}
}
}
},
"sort": {
"timestamp": "desc"
}
}
When I run the same query with a different field (ex: "type": "blog_post") it works, so I am confident in the syntax.
I checked to make sure that tags was properly mapped (I checked at "http://server_name/index/_mapping") and it was.
I also checked that there are documents with "tags" : "sports" in Elasticsearch.
Any ideas what the issue could be? It is only that field, all others work, and "tags" is indexed.
What is the mapping/analyzer you have defined for the field "tags"? If you have not defined any analyzer then it will be analysed using the standard analyzer which in turn will give stemmed token "sport" instead of "sports"
If you do a term search or term filter the input is not analyzed, and will try to search for an exact match. So search for term "sports" won't match.
You should either change the mapping for tags to "not_analyzed" or change the search query to something other than term, like query string query.
Based on a use case you've described I assume tags is mapped as an array of values. That said, term filter can only be used for exact matches.
What I would try is to use terms filter or exist filter instead and change the query to this:
"terms" : { "tags" : "sports" }
or this
"exists" : { "tags" : "sports" }

Resources