ElasticSearch : How to use IN equivalent operator in ElasticSearch - elasticsearch

SELECT * FROM Customers
WHERE City IN ('Paris','London')
How to convert above query in elasticsearch..

You may use terms query
GET _search
{
"query" : {
"terms" : {
"city" : ["Paris", "London"]
}
}
}
However, please make sure that your mapping has city marked as not_analyzed
In order to make your searches case insensitive, there are two ways I can think of :
lower case your terms while indexing as well as querying, this is an easy way.
Create a custom analyzer for lowercase the input without tokenizing it. Use match query instead of terms query. Terms query doesn't work on analyzed fields.
A sample lowercase analyzer would look like this :
"analyzer": {
"lowercase_analyzer": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"lowercase",
"asciifolding"
]
}
}

Your query should look like:
The POST request:
http://localhost:9200/Customers/_search? <--assuming customers as your index
and the request BODY:
"query":{
"query_string":{
"query":"City:(Paris London)"
}
}
IN acts like : in ES. Hope this helps!

Related

Elasticsearch: search with wildcard and custom analyzer

Requirement: Search with special characters in a text field.
my Solution so far: Use wildcard query with custom analyzer. I want to use wildcards because it seems the easiest way to do partial searches in a long string with multiple search keys. See ES query below.
I have an index called "invoices" and it has document with one of the fields as
"searchString" : "I000010-1 000010 3901 North Saginaw Road add 2 Midland MI 48640 US MS Dhoni MSD-Company MSD (777) 777-7777 (333) 333-3333 sandeep#xyz.io msd-company msdhoni Dhoni, MS (3241480)"
Note: This field acts as the deprecated _all field in ES.
Index Mapping for this field:
"searchString": {"type": "text","analyzer": "multi_level_analyzer"},
Analyzer settings:
PUT invoices
{
"settings": {
"analysis": {
"analyzer": {
"multi_level_analyzer": {
"type": "custom",
"tokenizer": "whitespace",
"char_filter": [
"html_strip"
],
"filter": [
"lowercase",
"asciifolding"
]
}
}
}
}
}
My query looks something like this:
GET invoices/_search
{
"query": {
"bool": {
"must": [{
"wildcard": {
"searchString": {
"value": "msd-company*",
"boost": 1.0
}
}
},
{
"wildcard": {
"searchString": {
"value": "Saginaw*",
"boost": 1.0
}
}
}
]
}
}
}
My question:
Earlier when I was not using a custom analyzer the above query worked BUT I was not able to search for words with special characters like "msd-company".
After attaching the custom analyzer(multi_level_analyzer) the above query fails to return any result. I changed the wildcard query and appended an asterisk before the search key and for some reason it works now. (referred this answer)
I want to know the impact of using "* msd-company*" instead of "msd-company*" in the wildcard query for the text field.
How can I still use the wildcard query "msd-company*" with custom analyzer?
Open to suggestions for any other approach to my problem statement.
I have solved my problem by changing the mapping of the said field to this:
"searchString": {"type": "text","analyzer": "multi_level_analyzer", "search_analyzer": "standard"},
But since wildcard queries are expensive, I would still like to know if there exists a better solution to satisfy my search use case.

How can I index a field using two different analyzers in Elastic search

Say that I have a field "productTitle" which I want to use for my users to search for products.
I also want to apply autocomplete functionality. So I m using an autocomplete_analyzer with the following filter:
"autocomplete_filter": {
"type": "edge_ngram",
"min_gram": 2,
"max_gram": 10
}
However, at the same time when users make a search I don't want the "edge_ngram" to be applied, since it produces lot of irrelevant results.
For example when users want to search for "mi" and start typing "m", "mi".. they should get the results starting with m,mi as auto-complete options. However, when they actually make the query, they should only get results with the word "mi". Currently they also see results with "mini" etc..
Therefore, is it possible to have "productTitle" indexed using two different analyzers? Is multi-field type an option for me?
EDIT: Mapping for productTitle
"productTitle" : {
"type" : "string",
"index_analyzer" : "second",
"search_analyzer" : "standard",
"fields" : {
"raw" : {
"type" : "string",
"index" : "not_analyzed"
}
}
}
,
"second" analyzer
"analyzer": {
"second": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"trim",
"autocomplete_filter"
]
}
So when I'm querying for :
"filtered" : {
"query" : {
"match" : {
"productTitle" : {
"query" : "mi",
"type" : "boolean",
"minimum_should_match" : "2<75%"
}
}
}
}
I also get results like "mini". But I need to only get results including just "mi"
Thank you
hmm ... as far as I know, there is no way to apply multiple analyzers for same field ... what You can make is to use "Multi Fields".
here is an example how to apply different analyzers for "subfields":
https://www.elastic.co/guide/en/elasticsearch/reference/current/multi-fields.html#_multi_fields_with_multiple_analyzers
The correct way of preventing what you describe in your answer is to specify both analyzer and search_analyzer in your field mapping, like this:
"productTitle": {
"type": "string",
"analyzer": "autocomplete_analyzer",
"search_analyzer": "standard"
}
The autocomplete analyzer will kick in at indexing time and tokenize your title according to your edge_ngram configuration and the standard analyzer will kick in at search time without applying the edge_ngram stuff.
In this context, there is no need for multi-fields unless you need to tokenize the productTitle field in different ways.

Elasticsearch. How to find phrases if query has no spaces

For example, I have a document with a phrase "Star wars" in the name field.
I would like to make a search with DSL and query "starwars" and get this document.
I am trying to get something like this
GET _search
{
"query" : {
"match_phrase" : {
"name": {
"query" : "starwars"
}
}
}
}
How can I do it with elasticsearch?
I think you would need to update the analyzer on that name field with a custom analyzer that includes the synonym token filter with a synonym for starwars.
The docs on creating a custom analyzer should help you out. Additionally, the standard analyzer is applied by default if you did not specify any analyzer for that name field in your mapping. You can base your custom analyzer on that and add that synonym token filter in that array of filters. Perhaps, give some more thought to how you want the content to be analyzed for the other requirements you have as well as this.
With this analyzer update you should be able to use that query and get the result you expect.
Example:
{
"filter" : {
"my_synonym" : {
"type" : "synonym",
"synonyms" : [
"star wars => starwars"
]
}
},
"analyzer" : {
"standard_with_synonym" : {
"tokenizer" : "standard",
"filter" : ["standard", "lowercase", "my_synonym", "stop"]
}
}
}

ElasticSearch - Search for complete phrase only

I am trying to create a search that will return me exactly what i requested.
For instance let's say i have 2 documents with a field named 'Val'
First doc have a value of 'a - Copy', second document is 'a - Copy (2)'
My goal is to search exactly the value 'a - Copy' and find only the first document in my returned results and not both of them with different similarity rankings
When i try most of the usual queries like:
GET test/_search
{
"query": {
"match": {
"Val": {
"query": "a - copy",
"type": "phrase"
}
}
}
}
or:
GET /test/doc/_search
{
"query": {
"query_string": {
"default_field": "Val",
"query": "a - copy"
}
}
}
I get both documents all the time
There is a very good documentation for finding exact values in ES:
https://www.elastic.co/guide/en/elasticsearch/guide/current/_finding_exact_values.html
It shows you how to use the term filter and it mentions problems with analyzed fields, too.
To put it in a nutshell you need to run a term filter like this (I've put your values in):
GET /test/doc/_search
{
"query" : {
"filtered" : {
"query" : {
"match_all" : {}
},
"filter" : {
"term" : {
"Val" : "a - copy"
}
}
}
}
}
However, this doesn't work with analyzed fields. You won't get any results.
To prevent this from happening, we need to tell Elasticsearch that
this field contains an exact value by setting it to be not_analyzed.
There are multiple ways to achieve that. e.g. custom field mappings.
Yes, you are getting that because your field is, most likely, analyzed and split into tokens.
You need an analyzer similar to this one
"custom_keyword_analyzer": {
"type": "custom",
"tokenizer": "keyword",
"filter": "lowercase"
}
which uses the keyword tokenizer and the lowercase filter (I noticed you indexed upper case letters, but expect to search with lowercase letters).
And then use a term filter to search your documents.

How to make sure elasticsearch is using the analyzers defined on the mappings?

I have an index in elasticsearch with several custom analyzers for specific fields. Example:
"titulo" : {
"type" : "string",
"index_analyzer" : "analyzer_titulo",
"search_analyzer" : "analyzer_titulo"
}
analyzer_titulo is this:
"analyzer_titulo":{
"filter":[
"standard",
"lowercase",
"asciifolding",
"stop_filter",
"filter_shingle",
"stemmer_filter"
],
"char_filter":[
"html_strip"
],
"tokenizer":"standard"
}
However when i try to use the _analyze api to test the analyzer for this field elasticsearch seems to ignore the custom analyzer:
As you can see both results are different but, if my understanding is correct, they should be the same.
What i am missing here? Is there a way to use the _explain api to see what analyzer is used?
PS: unfortunately i can't post my full mappings (company policy) but i only have one index and one type.
Thanks
I'm not familiar with the tool you're using to test your analyser (don't know why it's not working), but what you can do is run a query that returns the values sitting in the index
curl 'http://localhost:9200/myindex/livros/_search?pretty=true' -d '{
"query" : {
"match_all" : { }
},
"script_fields": {
"terms" : {
"script": "doc[field].values",
"params": {
"field": "titulo"
}
}
}
}'
If your type has many documents in it, then you'll want to change the match_all :{} to something more specific.

Resources