I have an uissue in my fuzzy_like_this query
If my string contains any Apostrophe then its not searching those values contains in db.
sample
citrus's => search string
but results not selecting the apostrophe values instead getting like
citrus, so and so..
pls do help me
Thanks in advance
elastic search accepts the Apostrophe.so please double check your query
6 unicode characters can represent an 'apostrophe' in documents. It can be either u0027, u2018, u2019, u201B, u0091 or u0092
Out of the six, Elasticsearch recognises three unicode characters as 'apostrophe' : u0027, u2018 and u2019.
So, I think your apostrophe must be last 3 unicode character, which Elasticsearch assumes as word boundaries. So, citrus's will be tokenize as citrus only.
Adding a char_filter in your analyzer might help you. All the six characters will be replaced by proper 'apostrophe' u0027
curl -XPUT http://localhost:9200/index_name(your_index) -d '
{
"settings": {
"analysis": {
"char_filter": {
"mycharfilter": {
"type": "mapping",
"mappings": [
"\\u0091=>\\u0027",
"\\u0092=>\\u0027",
"\\u2018=>\\u0027",
"\\u2019=>\\u0027",
"\\u201B=>\\u0027"
]
}
},
"analyzer": {
"quotes_analyzer": {
"tokenizer": "standard",
"char_filter": [ "mycharfilter" ]
}
}
}
}
}'
Related
In Elasticsearch, how do I search for an arbitrary substring, perhaps including spaces? (Searching for part of a word isn't quite enough; I want to search any substring of an entire field.)
I imagine it has to be in a keyword field, rather than a text field.
Suppose I have only a few thousand documents in my Elasticsearch index, and I try:
"query": {
"wildcard" : { "description" : "*plan*" }
}
That works as expected--I get every item where "plan" is in the description, even ones like "supplantation".
Now, I'd like to do
"query": {
"wildcard" : { "description" : "*plan is*" }
}
...so that I might match documents with "Kaplan isn't" among many other possibilities.
It seems this isn't possible with wildcard, match prefix, or any other query type I might see. How do I simply search on any substring? (In SQL, I would just do description LIKE '%plan is%')
(I am aware any such query would be slow or perhaps even impossible for large data sets.)
Have you tried the regxp query in elasticsearch? It sure does sound like something you might be interested in.
I was hoping there might be something built-in for this Elasticsearch, given that this simple substring search seems like a very basic capability (Thinking about it, it is implemented as strstr() in C, LIKE '%%' in SQL, Ctrl+F in most text editors, String.IndexOf in C#, etc.), but this seems not to be the case. Note that the regexp query doesn't support case insensitivity, so I also needed to pair it with this custom analyzer, so that the index matches all-lowercase. Then I can convert my search string to lowercase as well.
{
"settings": {
"analysis": {
"analyzer": {
"lowercase_keyword": {
"type": "custom",
"tokenizer": "keyword",
"filter": [ "lowercase" ]
}
}
}
},
"mappings": {
...
"description": {"type": "text", "analyzer": "lowercase_keyword"},
}
}
Example query:
"query": {
"regexp" : { "description" : ".*plan is.*" }
}
Thanks to Jai Sharma for leading me; I just wanted to provide more detail.
My Elasticsearch queries are not working properly because sometimes (not always) my stored data have spaces () substituted with underscores (_). When users search with spaces, the don't get the entries with underscores in the results.
For example, if users search for the string annoying problem they get nothing because annoying_problem is the string stored in the index.
I have many similar problems for other characters as well, such as Ø being replaced with o in the data used to populate my index.
How should I solve this?
try using stopwords
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"type": "standard",
"stopwords": [ "_"]
}
}
}
}
}
refrence https://www.elastic.co/guide/en/elasticsearch/guide/current/using-stopwords.html
How can I match parts of a word to the parent word ?. For example: I need to match "eese" or "heese" to the word "cheese".
The best way to achieve this is using an edgeNGram token filter combined with two reverse token filters. So, first you need to define a custom analyzer called reverse_analyzer in your index settings like below. Then you can see that I've declared a string field called your_field with a sub-field called suffix which has our custom analyzer defined.
PUT your_index
{
"settings": {
"analysis": {
"analyzer": {
"reverse_analyzer": {
"tokenizer": "keyword",
"filter" : ["lowercase", "reverse", "substring", "reverse"]
}
},
"filter": {
"substring": {
"type": "edgeNGram",
"min_gram": 1,
"max_gram": 10
}
}
}
},
"mappings": {
"your_type": {
"properties": {
"your_field": {
"type": "string",
"fields": {
"suffix": {
"type": "string",
"analyzer": "reverse_analyzer"
}
}
}
}
}
}
}
Then you can index a test document with "cheese" inside, like this:
PUT your_index/your_type/1
{"your_field": "cheese"}
When this document is indexed, the your_field.suffix field will contain the following tokens:
e
se
ese
eese
heese
cheese
Under the hood what is happening when indexing cheese is the following:
The keyword tokenizer will tokenize a single token, => cheese
The lowercase token filter will put the token in lowercase => cheese
The reverse token filter will reverse the token => eseehc
The substring token filter will produce different tokens of length 1 to 10 => e, es, ese, esee, eseeh, eseehc
Finally, the second reverse token filter will reverse again all tokens => e, se, ese, eese, heese, cheese
Those are all the tokens that will be indexed
So we can finally search for eese (or any suffix of cheese) in that sub-field and find our match
POST your_index/_search
{
"query": {
"match": {
"your_field.suffix": "eese"
}
}
}
=> Yields the document we've just indexed above.
You can do it two ways:
If you need it happen only for some search then search box only you can pass
*eese* or *heese*
Just give * in beginning and end of your search word. If you need it for every search
string "*#{params[:query]}*"
this will match with your parent word and give the result
There are multiple ways to do this
The analyzer approach - Here you Ngram tokenizer to break sub tokens of all the words. Hence for the word "cheese" -> [ "chee" , "hees" , "eese" , "cheese" ] and all ind of substrings would be generated. With this index size will go high , but the search speed would be optimized
The wildcard query approach - In this approach , a scan happens on the reverse index. This does not occupy additional index size , but it will take more time on the search.
Elastic Search 1.6
I want to index text that contains hyphens, for example U-12, U-17, WU-12, t-shirt... and to be able to use a "Simple Query String" query to search on them.
Data sample (simplified):
{"title":"U-12 Soccer",
"comment": "the t-shirts are dirty"}
As there are quite a lot of questions already about hyphens, I tried the following solution already:
Use a Char filter: ElasticSearch - Searching with hyphens in name.
So I went for this mapping:
{
"settings":{
"analysis":{
"char_filter":{
"myHyphenRemoval":{
"type":"mapping",
"mappings":[
"-=>"
]
}
},
"analyzer":{
"default":{
"type":"custom",
"char_filter": [ "myHyphenRemoval" ],
"tokenizer":"standard",
"filter":[
"standard",
"lowercase"
]
}
}
}
},
"mappings":{
"test":{
"properties":{
"title":{
"type":"string"
},
"comment":{
"type":"string"
}
}
}
}
}
Searching is done with the following query:
{"_source":true,
"query":{
"simple_query_string":{
"query":"<Text>",
"default_operator":"AND"
}
}
}
What works:
"U-12", "U*", "t*", "ts*"
What didn't work:
"U-*", "u-1*", "t-*", "t-sh*", ...
So it seems the char filter is not executed on search strings?
What could I do to make this work?
The answer is really simple:
Quote from Igor Motov: Configuring the standard tokenizer
By default the simple_query_string query doesn't analyze the words
with wildcards. As a result it searches for all tokens that start with
i-ma. The word i-mac doesn't match this request because during
analysis it's split into two tokens i and mac and neither of these
tokens starts with i-ma. In order to make this query find i-mac you
need to make it analyze wildcards:
{
"_source":true,
"query":{
"simple_query_string":{
"query":"u-1*",
"analyze_wildcard":true,
"default_operator":"AND"
}
}
}
the Quote from Igor Motov is true, you have to add "analyze_wildcard":true, in order to make it worked with regex. But it is important to notice that the hyphen actually tokenizes "u-12" in "u" "12", two separated words.
if preserve the original is important do not use Mapping char filter. Otherwise is kind of useful.
Imagine that you have "m0-77", "m1-77" and "m2-77", if you search m*-77 you are going to have zero hits. However you can remplace "-" (hyphen) with AND in order to connect the two separed words and then search m* AND 77 that is going to give you a correct hit.
you can do it in the client front.
In your problem u-*
{
"query":{
"simple_query_string":{
"query":"u AND 1*",
"analyze_wildcard":true
}
}
}
t-sh*
{
"query":{
"simple_query_string":{
"query":"t AND sh*",
"analyze_wildcard":true
}
}
}
If anyone is still looking for a simple workaround to this issue, replace hyphen with underscore _ when indexing data.
For eg, O-000022334 should indexed as O_000022334.
When searching, replace underscore back to hyphen again when displaying results. This way you can search for "O-000022334" and it will find a correct match.
I have the following Elastic Search query with only a term filter. My query is much more complex but I am just trying to show the issue here.
{
"filter": {
"term": {
"field": "update-time"
}
}
}
When I pass in a hyphenated value to the filter, I get zero results back. But if I try without an unhyphenated value I get results back. I am not sure if the hyphen is an issue here but my scenario makes me believe so.
Is there a way to escape the hyphen so the filter would return results? I have tried escaping the hyphen with a back slash which I read from the Lucene forums but that didn't help.
Also, if I pass in a GUID value into this field which is hyphenated and surrounded by curly braces, something like - {ASD23-34SD-DFE1-42FWW}, would I need to lower case the alphabet characters and would I need to escape the curly braces too?
Thanks
I would guess that your field is analyzed, which is default setting for string fields in elasticsearch. As a result, when it indexed it's not indexed as one term "update-time" but instead as 2 terms: "update" and "time". That's why your term search cannot find this term. If your field will always contain values that will have to be matched completely as is, it would be the best to define such field in mapping as not analyzed. You can do it by recreating the index with new mapping:
curl -XPUT http://localhost:9200/your-index -d '{
"mappings" : {
"your-type" : {
"properties" : {
"field" : { "type": "string", "index" : "not_analyzed" }
}
}
}
}'
curl -XPUT http://localhost:9200/your-index/your-type/1 -d '{
"field" : "update-time"
}'
curl -XPOST http://localhost:9200/your-index/your-type/_search -d'{
"filter": {
"term": {
"field": "update-time"
}
}
}'
Alternatively, if you want some flexibility in finding records based on this field, you can keep this field analyzed and use text queries instead:
curl -XPOST http://localhost:9200/your-index/your-type/_search -d'{
"query": {
"text": {
"field": "update-time"
}
}
}'
Please, keep in mind that if your field is analyzed then this record will be found by searching for just word "update" or word "time" as well.
The accepted answer didn't work for me with elastic 6.1. I solved it using the "keyword" field that elastic provides by default on string fields.
{
"filter": {
"term": {
"field.keyword": "update-time"
}
}
}
Based on the answer by #imotov If you're using spring-data-elasticsearch then all you need to do is mark your field as:
#Field(type = FieldType.String, index = FieldIndex.not_analyzed)
instead of
#Field(type = FieldType.String)
The problem is you need to drop the index though and re-instantiate it with new mappings.