Case insensitive query - elasticsearch

Im using ElasticSearch, and I have this field:
"name": {
"type": "string",
"index": "not_analyzed"
},
I run this query to get, for example, all employees with the name "Charles":
GET company_employee/employee/_search
{
"query": {
"bool": {
"must": [
{
"match_phrase": {
"name": "Charles"
}
}
]
}
}
}
The issue with this is that I need to get an insensitive search. I need to retrieve all "Charles", even if the value I provide to the query es ChaRleS or charles, or CHarles, etc. What I need to do?

If reindexing is not an option, this leaves altering your query
Although the regexp approach doesn't allow for searching case insensitive, you could do so "manually".
If the character is always the first letter, you can get by with this:
GET company_employee/employee/_search
{
"query": {
"regexp": { "name": "[Cc]harles" }
}
}
Otherwise for "true" case-insensitive:
GET company_employee/employee/_search
{
"query": {
"regexp": { "name": "[Cc][Hh][Aa][Rr][Ll][Ee][Ss]" }
}
}
This is in no way efficient, but matches your constraint of not altering the index.

Basically you'd rather need to have your name field to be analyzed, i.e. like this:
"name": {
"type": "text",
"analyzer": "standard"
}
With such mapping all your values will be lower-cased, thus search will be case-insensitive.

Related

Elasticsearch 6.3 query with space in a keyword field and not returning all documents

I have the fallowing part of a mapping:
"name": {
"store": "true",
"type": "keyword"
}
and this query:
{
"query":{
"query_string":{
"query":"+(name:John Doe)",
"fields":[
]
}
},
"aggregations":{
"name":{
"terms":{
"field":"name",
"size":10
}
}
}
}
The query should return over 100 results however it only returns a few. If I add quotes to John Doe like this: \"John Doe\" then it returns all the desired results.
I'm wondering why this happens. Isn't enough that the field is mapped as keyword so that John Doe is analyzed as a whole, and no quotes should be added? Also, why would it return less items without quotes?
Note: In ES 1.4 the same query seems to work fine (although is not the same data to be honest, and it uses facets instead of aggregations).
The documentation for query string query clearly states:
If the field is a keyword field the analyzer will create a single term ...
So you don't need to add quotes to your search string. Instead, you need to write your query correctly. Currently your query try to find the term John in field name, and term Doe in all other fields! So you must rewrite your query in one of the following ways:
Add parentheses to your search term so the query parser can "understand" that all words must be found in name field:
{
"query": {
"query_string": {
"query": "+(name:(John Doe))",
"fields": [
]
}
},
"aggregations": {
"name": {
"terms": {
"field": "name",
"size": 10
}
}
}
}
Specify field name in fields array rather than in query string:
{
"query": {
"query_string": {
"query": "+(John Doe)",
"fields": [
"name"
]
}
},
"aggregations": {
"name": {
"terms": {
"field": "name",
"size": 10
}
}
}
}

Elasticsearch: restrict result to documents with exact match

Currently I trying to restrict results of Elasticsearch (5.4) with the following query:
{
"query": {
"bool": {
"must": {
"multi_match": {
"query": "apache log Linux",
"type": "most_fields",
"fields": [
"message",
"type"
]
}
},
"filter": {
"term": {
"client": "test"
}
}
}
}
}
This returns every document that contains "apache", "log", or "linux". I want to restrict the results to documents that have a field "client" with the exact specified value, this case: "test". However, this query returns all the documents that contain "test" as value. A document with "client": "test client" will also be returned.
I want to restriction to be exact, so only the documents with "client": "test" should be returned and not "client": "test client".
After testing a bunch of different queries and lots of searching, I can not find a solution to my problem. What am I missing?
Just use the keyword part of your client field, since this is 5.x and, by default, the keyword is already there:
"filter": {
"term": {
"client.keyword": "test"
}
}
Set a mapping on your index specifying that your client field is a keyword datatype.
The mapping request could look like
PUT my_index
{
"mappings": {
"my_type": {
"properties": {
"client": {
"type": "keyword"
}
}
}
}
}

Exact match in elastic search query

I want to exactly match the string ":Feed:" in a message field and go back a day pull all such records. The json I have seems to also match the plain word " feed ". I am not sure where I am going wrong. Do I need to add "constant_score" to this query JSON? The JSON I have currently is as shown below:
{
"query": {
"bool": {
"must": {
"query_string": {
"fields": ["message"],
"query": "\\:Feed\\:"
}
},
"must": {
"range": {
"timestamp": {
"gte": "now-1d",
"lte": "now"
}
}
}
}
}
}
As stated here: Finding Exact Values, since the field has been analyzed when indexed - you have no way of exact-matching its tokens (":"). Whenever the tokens should be searchable the mapping should be "not_analyzed" and the data needs to be re-indexed.
If you want to be able to easily match only ":feed:" inside the message field you might want to costumize an analyzer which doesn't tokenize ":" so you will be able to query the field with a simple "match" query instead of wild characters.
Not able to do this with query_string but managed to do so by creating a custom normalizer and then using a "match" or "term" query.
The following steps worked for me.
create a custom normalizer (available >V5.2)
"settings": {
"analysis": {
"normalizer": {
"my_normalizer": {
"type": "custom",
"filter": ["lowercase"]
}
}
}
}
Create a mapping with type "keyword"
{
"mappings": {
"default": {
"properties": {
"title": {
"type": "text",
"fields": {
"normalize": {
"type": "keyword",
"normalizer": "my_normalizer"
},
"keyword" : {
"type": "keyword"
}
}
}
}
}
}
use match or term query
{
"query": {
"bool": {
"must": [
{
"match": {
"title.normalize": "string to match"
}
}
]
}
}
}
Use match phrase
GET /_search
{
"query": {
"match_phrase": {
"message": "7000-8900"
}
}
}
In java use matchPhraseQuery of QueryBuilder
QueryBuilders.matchPhraseQuery(fieldName, searchText);
Simple & Sweet Soln:
use term query..
GET /_search
{
"query": {
"term": {
"message.keyword": "7000-8900"
}
}
}
use term query instead of match_phrase,
match_phrase this find/match with ES-document stored sentence, It will not exactly match. It matches with those sentence words!

Why prefix returns documents without the specific prefix?

I want to return only documents which their name start with "pizza". this is what I've done:
{
"query": {
"filtered": {
"filter": {
"prefix": {
"name": "pizza"
}
}
}
}
}
But I've got these 3 documents:
{
"name": "Viana Pizza",
"city": "Mashhad",
"address": "Vakil abad",
"foods": ["Pizza"],
"salad": true,
"rate": 5.0
}
{
"name": "Pizza Pizza",
"city": "Mashhad",
"address": "Bahar st",
"foods": ["Pizza"],
"salad": true,
"rate": 8.5
}
{
"name": "Reza Pizza",
"city": "Tehran",
"address": "Vali Asr",
"foods": ["Pizza"],
"salad": true,
"rate": 7.5
}
As you can see, Only one of them has "pizza" in the beginning of the name field.
What's wrong?
Probably, the simplest explanation given that you didn't provide the actual mapping, is that you have th e "name" field as "string" and "analyzed" (the default). Which means that "Reza Pizza" will be transformed to "reza" and "pizza" terms.
And your filter will match against terms, not against entire fields. Because ES analyzes the fields and forms terms when the standard mapping is used.
You need to either change your "name" field to "not_analyzed" or add another field to mirror the "name" but this mirror field to be "not_analyzed". Also, for text "pizza" (lowercase) to work in this case you need to create a custom analyzer.
Below you have the solution with the mirror field:
PUT /pizza
{
"settings": {
"analysis": {
"analyzer": {
"my_keyword_lowercase_analyzer": {
"type": "custom",
"tokenizer": "keyword",
"filter": ["lowercase"]
}
}
}
},
"mappings": {
"restaurant": {
"properties": {
"name": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"analyzer": "my_keyword_lowercase_analyzer"
}
}
}
}
}
}
}
And in searching you need to use the mirror field:
GET /pizza/restaurant/_search
{
"query": {
"filtered": {
"filter": {
"prefix": {
"name.raw": "pizza"
}
}
}
}
}
That's all about Elasticsearch analyzers. Let's read the documentation on prefix filter:
Filters documents that have fields containing terms with a specified prefix (not analyzed).
Here we can see that this filter matches terms, not the whole field value. When you index the document, ES splits your field values to terms using analyzers. Default analyzer splits value by whitespace and convert parts to lowercse. So all three results have term pizza in the name field and pizza term perfectly matches pizza prefix. If you want to match field value as is - I'd suggest you to map name field as not_analyzed

I don't get any documents back from my elasticsearch query. Can someone point out my mistake?

I thought I had figured out Elasticsearch but I suspect I have failed to grok something, and hence this problem:
I am indexing products, which have a huge number of fields, but the ones in question are:
{
"show_in_catalogue": {
"type": "boolean",
"index": "no"
},
"prices": {
"type": "object",
"dynamic": false,
"properties": {
"site_id": {
"type": "integer",
"index": "no"
},
"currency": {
"type": "string",
"index": "not_analyzed"
},
"value": {
"type": "float"
},
"gross_tax": {
"type": "integer",
"index": "no"
}
}
}
}
I am trying to return all documents where "show_in_catalogue" is true, and there is a price with site_id 1:
{
"filter": {
"term": {
"prices.site_id": "1",
"show_in_catalogue": true
}
},
"query": {
"match_all": {}
}
}
This returns zero results. I also tried an "and" filter with two separate terms - no luck.
A subset of one of the documents returned if I have no filters looks like:
{
"prices": [
{
"site_id": 1,
"currency": "GBP",
"value": 595,
"gross_tax": 1
},
{
"site_id": 2,
"currency": "USD",
"value": 745,
"gross_tax": 0
}
]
}
I hope I am OK to omit so much of the document here; I don't believe it to be contingent but I cannot be certain, of course.
Have I missed a vital piece of knowledge, or have I done something terminally thick? Either way, I would be grateful for an expert's knowledge at this point. Thanks!
Edit:
At the suggestion of J.T. I also tried reindexing the documents so that prices.site_id was indexed - no change. Also tried the bool/must filter below to no avail.
To clarify, the reason I'm using an empty query is that the web interface may supply a query string, but the same code is used to simply filter all products. Hence I left in the query, but empty, since that's what Elastica seems to produce with no query string.
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": [
{
"term": {
"show_in_catalogue": true
}
},
{
"term": {
"prices.site_id": 1
}
}
]
}
}
}
}
}
You have site_id set as {"index": "no"}. This tells ElasticSearch to exclude the field from the index which makes it impossible to query or filter on that field. The data will still be stored. Likewise, you can set a field to only be in the index and searchable, but not stored.
I'm new to ElasticSearch as well and can't always grok the questions! I'm actually confused by you query. If you are going to "just filter" then you don't need a query. What I don't understand is your use of two fields inside the term filter. I've never done this. I guess it acts as an OR? Also, if nothing matches, it seems to return everything. If you wanted a query with the results of that query filtered, then you would want to use a
-d '{
"query": {
"filtered": {
"query": {},
"filter": {}
}
}
}'
If you just want to apply filters is the filter that should work without any "query" necessary
-d '{
"filter": {
"bool": {
"must": [
{
"term": {
"show_in_catalogue": true
}
},
{
"term": {
"prices.site_id": 1
}
}
]
}
}
}'

Resources