elasticsearch mapping analyzer - GET not getting result - elasticsearch

I am trying to create an analyzer, which replaces special character with a whitespace and convert it into uppercase. then after, if I want to search with lowercase also it should work.
Mapping Analyzer:
soundarya#soundarya-VirtualBox:~/Downloads/elasticsearch-2.4.0/bin$ curl -XPUT 'http://localhost:9200/aida' -d '{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "standard",
"char_filter": [
"my_char_filter"
],
"filter": [
"uppercase"
]
}
},
"char_filter": {
"my_char_filter": {
"type": "pattern_replace",
"pattern": "(\\d+)-(?=\\d)",
"replacement": "$1 "
}
}
}
}
}
'
{"acknowledged":true}
soundarya#soundarya-VirtualBox:~/Downloads/elasticsearch-2.4.0/bin$ curl -XPOST 'http://localhost:9200/aida/_analyze?pretty' -d '{
"analyzer":"my_analyzer",
"text":"My name is Soun*arya?jwnne&yuuk"
}'
It is tokenizing the words properly by replacing the special character with the whitespace. Now if I search a word from the text, it is not retrieving me any result.
soundarya#soundarya-VirtualBox:~/Downloads/elasticsearch-2.4.0/bin$ curl -XGET 'http://localhost:9200/aida/_search' -d '{
"query":{
"match":{
"text":"My"
}
}
}'
I am not getting any result out of the above GET query. Getting result like :
soundarya#soundarya-VirtualBox:~/Downloads/elasticsearch-2.4.0/bin$ curl -XGET 'http://localhost:9200/aida/_search' -d '{
"query":{
"match":{
"text":"my"
}
}
}'
{"took":5,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":0,"max_score":null,"hits":[]}}
Can anyone help me with this! Thank you!

You don't seem to have indexed any data after creating your index. The call to _analyze will not index anything but simply show you how the content you send to ES would be analyzed.
First, you need to create your index by specifying a mapping in which you use the analyzer you've defined:
curl -XPUT 'http://localhost:9200/aida' -d '{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "standard",
"char_filter": [
"my_char_filter"
],
"filter": [
"uppercase"
]
}
},
"char_filter": {
"my_char_filter": {
"type": "pattern_replace",
"pattern": "(\\d+)-(?=\\d)",
"replacement": "$1 "
}
}
}
},
"mappings": { <--- add a mapping type...
"doc": {
"properties": {
"text": { <--- ...with a field...
"type": "string",
"analyzer": "my_analyzer" <--- ...using your analyzer
}
}
}
}
}'
Then you can index a new real document:
curl -XPOST 'http://localhost:9200/aida/doc' -d '{
"text": "My name is Soun*arya?jwnne&yuuk"
}'
Finally, you can search:
curl -XGET 'http://localhost:9200/aida/_search' -d '{
"query":{
"match":{
"text":"My"
}
}
}'

Related

Elasticseach not using synonyms from synonym file

I am new to elasticsearch so before downvoting or marking as duplicate, please read the question first.
I am testing synonyms in elasticsearch (v 2.4.6) which I have installed on Ubuntu 16.04. I am giving synonyms through a file named synonym.txt which I have placed in config directory. I have created an index synonym_test as follows-
curl -XPOST localhost:9200/synonym_test/ -d '{
"settings": {
"analysis": {
"analyzer": {
"my_synonyms": {
"tokenizer": "whitespace",
"filter": ["lowercase","my_synonym_filter"]
}
},
"filter": {
"my_synonym_filter": {
"type": "synonym",
"ignore_case": true,
"synonyms_path" : "synonym.txt"
}
}
}
}
}'
The index contains two fields- id and some_text. I configure the field some_text with the custom analyzer as follows-
curl -XPUT localhost:9200/synonym_test/rulers/_mapping -d '{
"properties": {
"id": {
"type": "double"
},
"some_text": {
"type": "string",
"search_analyzer": "my_synonyms"
}
}
}'
Then I have inserted some data as -
curl -XPUT localhost:9200/synonym_test/external/5 -d '{
"id" : "5",
"some_text":"apple is a fruit"
}'
curl -XPUT localhost:9200/synonym_test/external/7 -d '{
"id" : "7",
"some_text":"english is spoken in england"
}'
curl -XPUT localhost:9200/synonym_test/external/8 -d '{
"id" : "8",
"some_text":"Scotland Yard is a popular game."
}'
curl -XPUT localhost:9200/synonym_test/external/9 -d '{
"id" : "9",
"some_text":"bananas contain potassium"
}'
The synonym.txt file contains following-
"britain,england,scotland"
"fruit,bananas"
After doing all this, when I run the query for term fruit (which should also return the text containing bananas as they are synonyms in file), I get the text containing fruit only.
{
"took":117,
"timed_out":false,
"_shards":{
"total":5,
"successful":5,
"failed":0
},
"hits":{
"total":1,
"max_score":0.8465736,
"hits":[
{
"_index":"synonym_test",
"_type":"external",
"_id":"5",
"_score":0.8465736,
"_source":{
"id":"5",
"some_text":"apple is a fruit"
}
}
]
}
}
I have also tried the following links, but none seem to have helped me -
Synonym analyzer not working ,
Elasticsearch synonym analyzer not working , How to apply synonyms at query time instead of index time in Elasticsearch , how to configure the synonyms_path in elasticsearch and many other links.
So, can anyone please tell me if I am doing anything wrong? Is there anything wrong with the settings or synonym file? I want the synonyms to work (query time) so that when I search for a term, I get all documents related to that term.
Please refer to following url: Custom Analyzer on how you should configure custom analyzers.
If we follow the guides from above documentation our schema will be as follows:
curl -XPOST localhost:9200/synonym_test/ -d '{
"settings": {
"analysis": {
"analyzer": {
"type": "custom"
"my_synonyms": {
"tokenizer": "whitespace",
"filter": ["lowercase","my_synonym_filter"]
}
},
"filter": {
"my_synonym_filter": {
"type": "synonym",
"ignore_case": true,
"synonyms_path" : "synonym.txt"
}
}
}
}
}
Which currently works on my elasticsearch instance.

Enable stop token filter for standard analyzer

I'm using ElasticSearch 5.1 and I want to enable the Stop Token Filter for the standard analyzer which is disabled by default
The document describes how to use it in a custom analyzer, but I would like to know how to enable it, since it's already included.
you have to configure the standard analyzer, see the example below how to do it with curl command(taken from docs here):
curl -XPUT 'localhost:9200/my_index?pretty' -d'
{
"settings": {
"analysis": {
"analyzer": {
"std_english": {
"type": "standard",
"stopwords": "_english_"
}
}
}
},
"mappings": {
"my_type": {
"properties": {
"my_text": {
"type": "text",
"analyzer": "standard",
"fields": {
"english": {
"type": "text",
"analyzer": "std_english"
}
}
}
}
}
}
}'
curl -XPOST 'localhost:9200/my_index/_analyze?pretty' -d'
{
"field": "my_text",
"text": "The old brown cow"
}'
curl -XPOST 'localhost:9200/my_index/_analyze?pretty' -d'
{
"field": "my_text.english",
"text": "The old brown cow"
}'

How to use stopword elasticsearch

I have an Elasticsearch 1.5 running on my server,
specifically, I want/create three fields with is
1.name
2.description
3.nickname
I want setup stopword for description and nickname field when I insert the data on the Elasticsearch then stop word automatically remove unwanted stopword. I'm trying so many time but not working.
curl -X POST http://127.0.0.1:9200/tryoindex/ -d'
{
"settings": {
"analysis": {
"filter": {
"custom_english_stemmer": {
"type": "stemmer",
"name": "english"
},
"snowball": {
"type" : "snowball",
"language" : "English"
}
},
"analyzer": {
"custom_lowercase_stemmed": {
"tokenizer": "standard",
"filter": [
"lowercase",
"custom_english_stemmer",
"snowball"
]
}
}
}
},
"mappings": {
"test": {
"_all" : {"enabled" : true},
"properties": {
"text": {
"type": "string",
"analyzer": "custom_lowercase_stemmed"
}
}
}
}
}'
curl -X POST "http://localhost:9200/tryoindex/nama/1" -d '{
"text" : "Tryolabs running monkeys KANGAROOS and jumping elephants jum is your"
}'
curl "http://localhost:9200/tryoindex/nama/_search?pretty=1" -d '{
"query": {
"query_string": {
"query": "Tryolabs running monkeys KANGAROOS and jumping elephants jum is your",
"fields": ["text"]
}
}
}'
Change your analyzer part to
"analyzer": {
"custom_lowercase_stemmed": {
"tokenizer": "standard",
"filter": [
"stop",
"lowercase",
"custom_english_stemmer",
"snowball"
]
}
}
To verify the changes use
curl -XGET 'localhost:9200/tryoindex/_analyze?analyzer=custom_lowercase_stemmed' -d 'testing this is stopword testing'
and observe the tokens
{"tokens":[{"token":"test","start_offset":0,"end_offset":7,"type":"<ALPHANUM>","position":1},{"token":"stopword","start_offset":16,"end_offset":24,"type":"<ALPHANUM>","position":4},{"token":"test","start_offset":25,"end_offset":32,"type":"<ALPHANUM>","position":5}]}%
PS: If you don't want to get the stemmed version of testing, then remove the stemming filters.
You need to use the stop token filter in your analyzer filter chain.

Best way to search/index the data - with and without whitespace

I am having a problem indexing and searching for words that may or may not contain whitespace...Below is an example
Here is how the mappings are set up:
curl -s -XPUT 'localhost:9200/test' -d '{
"mappings": {
"properties": {
"name": {
"street": {
"type": "string",
"index_analyzer": "index_ngram",
"search_analyzer": "search_ngram"
}
}
}
},
"settings": {
"analysis": {
"filter": {
"desc_ngram": {
"type": "edgeNGram",
"min_gram": 3,
"max_gram": 20
}
},
"analyzer": {
"index_ngram": {
"type": "custom",
"tokenizer": "keyword",
"filter": [ "desc_ngram", "lowercase" ]
},
"search_ngram": {
"type": "custom",
"tokenizer": "keyword",
"filter": "lowercase"
}
}
}
}
}'
This is how I built the index:
curl -s -XPUT 'localhost:9200/test/name/1' -d '{ "street": "Lakeshore Dr" }'
curl -s -XPUT 'localhost:9200/test/name/2' -d '{ "street": "Sunnyshore Dr" }'
curl -s -XPUT 'localhost:9200/test/name/3' -d '{ "street": "Lake View Dr" }'
curl -s -XPUT 'localhost:9200/test/name/4' -d '{ "street": "Shore Dr" }'
Here is an example of the query that is not working correctly:
curl -s -XGET 'localhost:9200/test/_search?pretty=true' -d '{
"query":{
"bool":{
"must":[
{
"match":{
"street":{
"query":"lake shore dr",
"type":"boolean"
}
}
}
]
}
}
}';
If a user attempts to search for "Lake Shore Dr", I want to only match to document 1/"Lakeshore Dr"
If a user attempts to search for "Lakeview Dr", I want to only match to document 3/"Lake View Dr"
So is the issue with how I am setting up the mappings (tokenizer?, edgegram vs ngrams?, size of ngrams?) or the query (I have tried things like setting the minimum_should_match, and the analyzer to use), but I have not been able to get the desired results.
Thanks all.

How to match on prefix in Elasticsearch

let's say that in my elasticsearch index I have a field called "dots" which will contain a string of punctuation separated words (e.g. "first.second.third").
I need to search for e.g. "first.second" and then get all entries whose "dots" field contains a string being exactly "first.second" or starting with "first.second.".
I have a problem understanding how the text querying works, at least I have not been able to create a query which does the job.
Elasticsearch has Path Hierarchy Tokenizer that was created exactly for such use case. Here is an example of how to set it for your index:
# Create a new index with custom path_hierarchy analyzer
# See http://www.elasticsearch.org/guide/reference/index-modules/analysis/pathhierarchy-tokenizer.html
curl -XPUT "localhost:9200/prefix-test" -d '{
"settings": {
"analysis": {
"analyzer": {
"prefix-test-analyzer": {
"type": "custom",
"tokenizer": "prefix-test-tokenizer"
}
},
"tokenizer": {
"prefix-test-tokenizer": {
"type": "path_hierarchy",
"delimiter": "."
}
}
}
},
"mappings": {
"doc": {
"properties": {
"dots": {
"type": "string",
"analyzer": "prefix-test-analyzer",
//"index_analyzer": "prefix-test-analyzer", //deprecated
"search_analyzer": "keyword"
}
}
}
}
}'
echo
# Put some test data
curl -XPUT "localhost:9200/prefix-test/doc/1" -d '{"dots": "first.second.third"}'
curl -XPUT "localhost:9200/prefix-test/doc/2" -d '{"dots": "first.second.foo-bar"}'
curl -XPUT "localhost:9200/prefix-test/doc/3" -d '{"dots": "first.baz.something"}'
curl -XPOST "localhost:9200/prefix-test/_refresh"
echo
# Test searches.
curl -XPOST "localhost:9200/prefix-test/doc/_search?pretty=true" -d '{
"query": {
"term": {
"dots": "first"
}
}
}'
echo
curl -XPOST "localhost:9200/prefix-test/doc/_search?pretty=true" -d '{
"query": {
"term": {
"dots": "first.second"
}
}
}'
echo
curl -XPOST "localhost:9200/prefix-test/doc/_search?pretty=true" -d '{
"query": {
"term": {
"dots": "first.second.foo-bar"
}
}
}'
echo
curl -XPOST "localhost:9200/prefix-test/doc/_search?pretty=true&q=dots:first.second"
echo
There is also a much easier way, as pointed out in elasticsearch documentation:
just use:
{
"text_phrase_prefix" : {
"fieldname" : "yourprefix"
}
}
or since 0.19.9:
{
"match_phrase_prefix" : {
"fieldname" : "yourprefix"
}
}
instead of:
{
"prefix" : {
"fieldname" : "yourprefix"
}
Have a look at prefix queries.
$ curl -XGET 'http://localhost:9200/index/type/_search' -d '{
"query" : {
"prefix" : { "dots" : "first.second" }
}
}'
You should use a commodin chars to make your query, something like this:
$ curl -XGET http://localhost:9200/myapp/index -d '{
"dots": "first.second*"
}'
more examples about the syntax at: http://lucene.apache.org/core/old_versioned_docs/versions/2_9_1/queryparsersyntax.html
I was looking for a similar solution - but matching only a prefix. I found #imtov's answer to get me almost there, but for one change - switching the analyzers around:
"mappings": {
"doc": {
"properties": {
"dots": {
"type": "string",
"analyzer": "keyword",
"search_analyzer": "prefix-test-analyzer"
}
}
}
}
instead of
"mappings": {
"doc": {
"properties": {
"dots": {
"type": "string",
"index_analyzer": "prefix-test-analyzer",
"search_analyzer": "keyword"
}
}
}
}
This way adding:
'{"dots": "first.second"}'
'{"dots": "first.third"}'
Will add only these full tokens, without storing first, second, third tokens.
Yet searching for either
first.second.anyotherstring
first.second
will correctly return only the first entry:
'{"dots": "first.second"}'
Not exactly what you asked for but somehow related, so I thought could help someone.

Resources