I just installed and testing elastic search it looks great and i need to know some thing i have an configuration file
elasticsearch.json in config directory
{
"network" : {
"host" : "127.0.0.1"
},
"index" : {
"number_of_shards": 3,
"number_of_replicas": 1,
"refresh_interval" : "2s",
"analysis" : {
"analyzer" : {
"index_analyzer" : {
"tokenizer" : "nGram",
"filter" : ["lowercase"]
},
"search_analyzer" : {
"tokenizer" : "nGram",
"filter" : ["lowercase"]
}
},
"// you'll need lucene dep for this: filter" : {
"snowball": {
"type" : "snowball",
"language" : "English"
}
}
}
}
}
and i have inserted an doc that contains a word searching if i search for keyword
search it says nothing found...
wont it stem before indexing or i missed some thing in config ....
How looks your query?
your config does not look good. try:
...
"index_analyzer" : {
"tokenizer" : "nGram",
"filter" : ["lowercase", "snowball"]
},
"search_analyzer" : {
"tokenizer" : "nGram",
"filter" : ["lowercase", "snowball"]
}
},
"filter" : {
"snowball": {
"type" : "snowball",
"language" : "English"
}
}
I've had trouble overriding the "default_search" and "default_index" analyzer as well.
This works though.
You can add "index_analyzer" to default all string fields with unspecified analyzers within a type, if need be.
curl -XDELETE localhost:9200/twitter
curl -XPOST localhost:9200/twitter -d '
{"index":
{ "number_of_shards": 1,
"analysis": {
"filter": {
"snowball": {
"type" : "snowball",
"language" : "English"
}
},
"analyzer": { "a2" : {
"type":"custom",
"tokenizer": "standard",
"filter": ["lowercase", "snowball"]
}
}
}
}
}
}'
curl -XPUT localhost:9200/twitter/tweet/_mapping -d '{
"tweet" : {
"date_formats" : ["yyyy-MM-dd", "dd-MM-yyyy"],
"properties" : {
"user": {"type":"string"},
"message" : {"type" : "string", "analyzer":"a2"}
}
}}'
curl -XPUT http://localhost:9200/twitter/tweet/1 -d '{ "user": "kimchy", "post_date": "2009-11-15T13:12:00", "message": "Trying out searching teaching, so far so good?" }'
curl -XGET localhost:9200/twitter/tweet/_search?q=message:search
curl -XGET localhost:9200/twitter/tweet/_search?q=message:try
Related
I'm new in ELK. I have a problem with the followed search query:
curl --insecure -H "Authorization: ApiKey $ESAPIKEY" -X GET "https://localhost:9200/commsrch/_search?pretty" -H 'Content-Type: application/json' -d'
{
"query": {
"bool": {
"should" : [
{"match" : {"cn" : "franc"}},
{"prefix" : {"srt" : "99889300200"}}
]
}
}
}
'
I need to find all documents that satisfies the condition: OR field "cn" contains "franc" OR field "srt" starts with "99889300200".
Index mapping:
{
"commsrch" : {
"mappings" : {
"properties" : {
"addr" : {
"type" : "text",
"index" : false
},
"cn" : {
"type" : "text",
"analyzer" : "compname"
},
"srn" : {
"type" : "text",
"analyzer" : "srnsrt"
},
"srt" : {
"type" : "text",
"analyzer" : "srnsrt"
}
}
}
}
}
Index settings:
{
"commsrch" : {
"settings" : {
"index" : {
"routing" : {
"allocation" : {
"include" : {
"_tier_preference" : "data_content"
}
}
},
"number_of_shards" : "1",
"provided_name" : "commsrch",
"creation_date" : "1675079141160",
"analysis" : {
"filter" : {
"ngram_filter" : {
"type" : "ngram",
"min_gram" : "3",
"max_gram" : "4"
}
},
"analyzer" : {
"compname" : {
"filter" : [
"lowercase",
"stop",
"ngram_filter"
],
"type" : "custom",
"tokenizer" : "whitespace"
},
"srnsrt" : {
"type" : "custom",
"tokenizer" : "standard"
}
}
},
"number_of_replicas" : "1",
"uuid" : "C15EXHnaTIq88JSYNt7GvA",
"version" : {
"created" : "8060099"
}
}
}
}
}
Query works properly with just only one condition. If query has only "match" condition, results has properly documents count. If query has only "prefix" condition, results has properly documents count.
In case of two conditions "match" and "prefix", i see in result documents that corresponds only "prefix" condition.
In ELK docs can't find any limitation about mixing "prefix" and "match", but as i see some problem exists. Please help to find where is the problem.
In continue of experince I have one more problem.
Example:
Source data:
1st document cn field: "put stone is done"
2nd document cn field:: "job one or two"
Mapping and index settings the same as described in my first post
Request:
{
"query": {
"bool": {
"should" : [
{"match" : {"cn" : "one"}},
{"prefix" : {"cn" : "one"}}
]
}
}
}
'
As I understand, the high scores got first document, because it has more repeats of "one". But I need high scores for documents, that has at least one word in field "cn" started from string "one". I have experiments with query:
{
"query": {
"bool": {
"should": [
{"match": {"cn": "one"}},
{
"constant_score": {
"filter": {
"prefix": {
"cn": "one"
}
},
"boost": 100
}
}
]
}
}
}
But it doesn't work properly. What's wrong with my query?
I would like to highlight just the ngrams which match, not the whole word.
Example:
term: "Wo"
highlight should be: "<em>Wo</em>nderfull world!"
currently it is: "<em>Wonderfull</em> world!"
Mapping is:
{
"global_search_1495732922733" : {
"mappings" : {
"meeting" : {
"properties" : {
...
"name" : {
"type" : "text",
"analyzer" : "meeteor_index_analyzer",
"search_analyzer" : "meeteor_search_term_analyzer"
},
...
}
}
}
}
}
Analyzers are:
"analysis" : {
"filter" : {
"meeteor_stemmer" : {
"name" : "english",
"type" : "stemmer"
},
"meeteor_ngram" : {
"type" : "nGram",
"min_gram" : "2",
"max_gram" : "15"
}
},
"analyzer" : {
"meeteor_search_term_analyzer" : {
"filter" : [
"lowercase",
"asciifolding"
],
"tokenizer" : "standard"
},
"meeteor_index_analyzer" : {
"filter" : [
"lowercase",
"asciifolding",
"meeteor_ngram"
],
"tokenizer" : "standard"
},
"meeteor_project_id_analyzer" : {
"tokenizer" : "standard"
}
}
},
Concrete example:
curl -XGET 'localhost:9200/global_search/meeting/_search?pretty' -H 'Content-Type: application/json' -d'
{
"query": {
"match": {
"name": "Me"
}
},
"highlight":{
"fields": {
"name": {}
}
}
}
'
The result is:
"...highlight" : {
"name" : [
"Sad <em>Meeting</em>"
]
}
The correct way to achieve what you want is using ngram as tokenizer and not filter. You can do something like this:
"analysis" : {
"filter" : {
"meeteor_stemmer" : {
"name" : "english",
"type" : "stemmer"
}
},
"tokenizer" : {
"meeteor_ngram_tokenizer" : {
"type" : "nGram",
"min_gram" : "2",
"max_gram" : "15"
}
},
"analyzer" : {
"meeteor_search_term_analyzer" : {
"filter" : [
"lowercase",
"asciifolding"
],
"tokenizer" : "standard"
},
"meeteor_index_analyzer" : {
"filter" : [
"lowercase",
"asciifolding"
],
"tokenizer" : "meeteor_ngram_tokenizer"
},
"meeteor_project_id_analyzer" : {
"tokenizer" : "standard"
}
}
},
It will generate the highlighting by ngram for you like this:
"...highlight" : {
"name" : [
"Sad <em>Me</em>eting"
]
}
i have a batch of "smartphones" products in my ES and I need to query them by using "smart phone" text. So I m looking into the compound word token filter. Specifically , I m planning to use a custom filter like this:
curl -XPUT 'localhost:9200/_all/_settings -d '
{
"analysis" : {
"analyzer":{
"second":{
"type":"custom",
"tokenizer":"standard",
"filter":["myFilter"]
}
"filter": {
"myFilter" :{
"type" : "dictionary_decompounder"
"word_list": ["smart", "phone"]
}
}
}
}
}
'
Is this the correct approach ? Also I d like to ask you how can i create and add the custom analyser to ES? I looked into several links but couldn't figure out how to do it. I guess I m looking for the correct syntax.
Thank you
EDIT
I m running 1.4.5 version.
and I verified that the custom analyser was added successfully:
{
"test_index" : {
"settings" : {
"index" : {
"creation_date" : "1453761455612",
"analysis" : {
"filter" : {
"myFilter" : {
"type" : "dictionary_decompounder",
"word_list" : [ "smart", "phone" ]
}
},
"analyzer" : {
"second" : {
"type" : "custom",
"filter" : [ "lowercase", "myFilter" ],
"tokenizer" : "standard"
}
}
},
"number_of_shards" : "5",
"number_of_replicas" : "1",
"version" : {
"created" : "1040599"
},
"uuid" : "xooKEdMBR260dnWYGN_ZQA"
}
}
}
}
Your approach looks good, I would also consider adding lowercase token filter, so that even Smartphone (notice Uppercase 'S') will be split into smart and phone.
Then You could create index with analyzer like this,
curl -XPUT 'localhost:9200/your_index -d '
{
"settings": {
"analysis": {
"analyzer": {
"second": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"myFilter"
]
}
},
"filter": {
"myFilter": {
"type": "dictionary_decompounder",
"word_list": [
"smart",
"phone"
]
}
}
}
},
"mappings": {
"my_type": {
"properties": {
"name": {
"type": "string",
"analyzer": "second"
}
}
}
}
}
'
Here you are creating index named your_index, custom analyzer named second and applied that to name field.
You can check if the analyzer is working as expected with analyze api like this
curl -XGET 'localhost:9200/your_index/_analyze' -d '
{
"analyzer" : "second",
"text" : "LG Android smartphone"
}'
Hope this helps!!
I have created a synonym analyser on an index:
curl http://localhost:9200/test_index/_settings?pretty
{
"test_index" : {
"settings" : {
"index" : {
"creation_date" : "1429175067557",
"analyzer" : {
"search_synonyms" : {
"filter" : [ "lowercase", "search_synonym_filter" ],
"tokenizer" : "standard"
}
},
"uuid" : "Zq6Id8xsRWGofJrNCb7M8w",
"number_of_replicas" : "1",
"analysis" : {
"filter" : {
"search_synonym_filter" : {
"type" : "synonym",
"synonyms" : [ "sneakers,pumps" ]
}
}
},
"number_of_shards" : "5",
"version" : {
"created" : "1050099"
}
}
}
}
}
But when I try to use it with the mapping:
curl -XPUT 'http://localhost:9200/test_index/_mapping/product_catalog?pretty' -H "Content-Type: application/json" \
-d '{"product_catalog": {"properties" : {"name": {"type": "string", "include_in_all": true, "analyzer":"search_synonyms"} }}}'
I get the error:
{
"error" : "MapperParsingException[Analyzer [search_synonyms] not found for field [name]]",
"status" : 400
}
I have also tried to just check the analyser with:
curl 'http://localhost:9200/test_index/_analyze?analyzer=search_synonyms&pretty=1&text=pumps'
but still get an error:
ElasticsearchIllegalArgumentException[failed to find analyzer [search_synonyms]]
Any ideas, I may be missing something but I can't think what.
The analyzer element has to be inside your analysis component. Change your index creator as follows:
{
"settings": {
"index": {
"creation_date": "1429175067557",
"uuid": "Zq6Id8xsRWGofJrNCb7M8w",
"number_of_replicas": "0",
"analysis": {
"filter": {
"search_synonym_filter": {
"type": "synonym",
"synonyms": [
"sneakers,pumps"
]
}
},
"analyzer": {
"search_synonyms": {
"filter": [
"lowercase",
"search_synonym_filter"
],
"tokenizer": "standard"
}
}
},
"number_of_shards": "5",
"version": {
"created": "1050099"
}
}
}
}
The mapping of the elasticsearch index has a custom analyzer attached to it. How to read the definition of the custom analyzer.
http://localhost:9200/test_namespace/test_namespace/_mapping
"matchingCriteria": {
"type": "string",
"analyzer": "custom_analyzer",
"include_in_all": false
}
my search is not working with the analyzer thats why i need to know what exactly this analyzer is doing.
the doc explains how to modify an analyzer or attach a new analyzer to an existing index but i didnt find a way to see what an analyzer does.
use the _settings API:
curl -XGET 'http://localhost:9200/test_namespace/_settings?pretty=true'
it should generate a response similar to:
{
"test_namespace" : {
"settings" : {
"index" : {
"creation_date" : "1418990814430",
"routing" : {
"allocation" : {
"disable_allocation" : "false"
}
},
"uuid" : "FmX9NrSNSTO2bQM5pd-iQQ",
"number_of_replicas" : "2",
"analysis" : {
"analyzer" : {
"edi_analyzer" : {
"type" : "custom",
"char_filter" : [ "my_pattern" ],
"filter" : [ "lowercase", "length" ],
"tokenizer" : "whitespace"
},
"xml_analyzer" : {
"type" : "custom",
"char_filter" : [ "html_strip" ],
"filter" : [ "lowercase", "length" ],
"tokenizer" : "whitespace"
},
...