Special Character "- " in the Elastic Search acting - elasticsearch

"-" is acting like a or operator for e.g. I am searching "t-link", then it showing the result containing "t-link" as well as "t", why it is giving two terms, but i interested in the "t-link", why it is happening so? How can i recover from it?

Elasticsearch is using by default the standard analyzer for strings.
Basically, your string is tokenized in two tokens, lowercased:
t
link
If you need to know what does elasticsearch with your fields, use the _analyze API.
$ curl -XGET 'localhost:9200/_analyze?analyzer=standard' -d 't-link'
$ curl -XGET 'localhost:9200/_analyze?analyzer=simple' -d 't-link'
If you don't want that, make sure you put the right mapping for that field and use either a simple analyzer or a keyword analyzer or no analyzer at all depending on your requirements. See also String core type.
$ curl -XPUT 'http://localhost:9200/twitter/tweet/_mapping' -d '
{
"tweet" : {
"properties" : {
"message" : {"type" : "string", "analyzer" : "simple"},
"other" : {"type" : "string", "index" : "not_analyzed"}
}
}
}
'
Using this form message field will be analyzed with simple analyzer and other field won't be analyzed at all.

Related

Find uppercase strings with wildcard

I have a field my_field that is defined like this:
"properties" : {
...
"my_field" : { "type" : "string", "store" : "no", "index" : "not_analyzed" },
...
}
All lowercase strings that are stored in that field can be found with wildcard:
i.e. kindergarten can be found with my_field:kinder*
but all uppercase strings cannot be found with wildcard:
i.e. KINDERGARTEN can neither be found with myfield:KINDER* nor with my_field:kinder*
Is that the expected behaviour or am I doing something wrong?
You must set lowercase_expanded_terms to false in order to do case-sensitive search with wildcards. Like this: http://localhost:9200/test/_search?lowercase_expanded_terms=false&q=my_field:KINDER*
I did quick test and everything looks correct to me.
I would try to test analysis on that field using /_analyze API to see that values really aren't lowercased.
curl -XPOST 'http://localhost:9200/test/_analyze?field=my_field' -d {
"test": "This Should Be Single Token"
}
Or try Index Termlist Plugin to see what tokens are actually stored in that field.

What's the reason for specifying only the 'field' option for the Term & Phrase suggesters in elasticsearch

When using the suggester API, we are forced to specify the field option :
"suggest" : {
"text" : "val",
"sug_name" : {
"term" : {
"field" : "field_name"
}
}
}
Is this field supposed to be a valid field name of some type ?
If so, fields can exist only in the context of types AFAIK.
Why isn't possible to also specify (at least optionally) the type the field belongs to ?
Is your question if "field" has to be a valid field?
YES it does if you want it to find anything, you are welcome to search for fields that dont exist, although that seems an odd thing to do.
Your second question, the answer, I believe, is NO, you can not specify a _type using the _suggest api, you can use a suggest block with the _search api as shown here
curl -s -XPOST 'localhost:9200/_search' -d '{
"query" : {
...
},
"suggest" : {
...
}
}'

ES Search partial word - ngram?

I am using Elastic Search to index entities that contain two fields: agencyName and agencyAddress.
Let's say I have indexed one entity:
{
"agencyName": "Turismo Viajes",
"agencyAddress": "Av. MaipĂș 500"
}
I would like to be able to search for this entity and get the entity above searching through the agencyName. Different searches could be:
1) urismo
2) Viaje
3) Viajes
4) Turismo
5) uris
The idea is that if I query with those strings I should always get that entity (probably with different score depending on how accurate it is).
For this I thought that nGram would work out, so I defined a global analyzer in my elastic search.yml file called phrase.
index:
analysis:
analyzer:
phrase:
type: custom
tokenizer: nGram
filter: [nGram, lowercase, asciifolding]
And I created the agency index like this:
{
"possible_clients" : {
"possible_client" : {
"properties" : {
"agencyName" : {
"type" : "string",
"analyzer" : "phrase"
},
"agencyAddress" : {
"type": "string"
}
}
The problem is that when making a call like this:
curl -XPOST 'http://localhost:9200/possible_clients/possible_client/_search' -d '{
"query": { "term": { "agencyName": "uris" }}
}'
I don't get any hits. Any ideas what I am doing wrong?
Thanks in advance.
You are using a term query for searching. A term query is always unanalysed. So changing the analyser will not have any effect. You should use for example a match query.
According to the docs, the default value of the max_gram of your tokenizer is 2. So, you index tu, ur, ri, is, sm, mo , etc etc.
The term filter does not analyze your input, so, you are searching for uris, and uris was never indexed.
Try to set a max_gram. :
ngram tokenizer
ngram tokenfilter
And maybe you should not use both the ngram tokenizer and the ngram filter. I always used just the filter. (because the tokenizer was the whitespace)
here is a edgengram filter we had to define here. Ngrams should work just the same.
"filter" : {
"my_filter" : {
"type" : "edgeNGram",
"min_gram" : "1",
"max_gram" : "20"
}
}
Hope it helps.

Exact (not substring) matching in Elasticsearch

{"query":{
"match" : {
"content" : "2"
}
}} matches all the documents whole content contains the number 2, however I would like the content to be exactly 2, no more no less - think of my requirement in a spirit of Java's String.equals.
Similarly for the second query I would like to match when the document's content is exactly '3 3' and nothing more or less. {"query":{
"match" : {
"content" : "3 3"
}
}}
How could I do exact (String.equals) matching in Elasticsearch?
Without seeing your index type mapping and sample data, it's hard to answer this directly - but I'll try.
Offhand, I'd say this is similar to this answer here (https://stackoverflow.com/a/12867852/382774), where you simply set the content field's index option to not_analyzed in your mapping:
"url" : {
"type" : "string",
"index" : "not_analyzed"
}
Edit: I wasn't clear enough with my original answer, shown above. I did not mean to imply that you should add the example code to your query, I meant that you need to specify in your index type mapping that the url field is of type string and it is indexed but not analyzed (not_analyzed).
This tells Elasticsearch to not bother analyzing (tokenizing or token filtering) the field when you're indexing your documents - just store it in the index as it exists in the document. For more information on mappings, see http://www.elasticsearch.org/guide/reference/mapping/ for an intro and http://www.elasticsearch.org/guide/reference/mapping/core-types/ for specifics on not_analyzed (tip: search for it on that page).
Update:
Official doc tells us that in a new version of Elastic search you can't define variable as "not_analyzed", instead of this you should use "keyword".
For the old version elastic:
{
"foo": {
"type" "string",
"index": "not_analyzed"
}
}
For new version:
{
"foo": {
"type" "keyword",
"index": true
}
}
Note that this functionality (keyword type) are from elastic 5.0 and backward compatibility layer is removed from Elasticsearch 6.0 release.
Official Doc
You should use filter instead of match.
{
"query" : {
"constant_score" : {
"filter" : {
"term" : {
"content" : 2
}
}
}
}
And you got docs whose content is exact 2, instead of 20 or 2.1

Elastic Search Hyphen issue with term filter

I have the following Elastic Search query with only a term filter. My query is much more complex but I am just trying to show the issue here.
{
"filter": {
"term": {
"field": "update-time"
}
}
}
When I pass in a hyphenated value to the filter, I get zero results back. But if I try without an unhyphenated value I get results back. I am not sure if the hyphen is an issue here but my scenario makes me believe so.
Is there a way to escape the hyphen so the filter would return results? I have tried escaping the hyphen with a back slash which I read from the Lucene forums but that didn't help.
Also, if I pass in a GUID value into this field which is hyphenated and surrounded by curly braces, something like - {ASD23-34SD-DFE1-42FWW}, would I need to lower case the alphabet characters and would I need to escape the curly braces too?
Thanks
I would guess that your field is analyzed, which is default setting for string fields in elasticsearch. As a result, when it indexed it's not indexed as one term "update-time" but instead as 2 terms: "update" and "time". That's why your term search cannot find this term. If your field will always contain values that will have to be matched completely as is, it would be the best to define such field in mapping as not analyzed. You can do it by recreating the index with new mapping:
curl -XPUT http://localhost:9200/your-index -d '{
"mappings" : {
"your-type" : {
"properties" : {
"field" : { "type": "string", "index" : "not_analyzed" }
}
}
}
}'
curl -XPUT http://localhost:9200/your-index/your-type/1 -d '{
"field" : "update-time"
}'
curl -XPOST http://localhost:9200/your-index/your-type/_search -d'{
"filter": {
"term": {
"field": "update-time"
}
}
}'
Alternatively, if you want some flexibility in finding records based on this field, you can keep this field analyzed and use text queries instead:
curl -XPOST http://localhost:9200/your-index/your-type/_search -d'{
"query": {
"text": {
"field": "update-time"
}
}
}'
Please, keep in mind that if your field is analyzed then this record will be found by searching for just word "update" or word "time" as well.
The accepted answer didn't work for me with elastic 6.1. I solved it using the "keyword" field that elastic provides by default on string fields.
{
"filter": {
"term": {
"field.keyword": "update-time"
}
}
}
Based on the answer by #imotov If you're using spring-data-elasticsearch then all you need to do is mark your field as:
#Field(type = FieldType.String, index = FieldIndex.not_analyzed)
instead of
#Field(type = FieldType.String)
The problem is you need to drop the index though and re-instantiate it with new mappings.

Resources