elasticsearch percolator stemmer - elasticsearch

I'm attempting to use the percolation function in elasticsearch. It works great but out of the box there is no stemming to handle singular/plurals etc. The documentation is rather thin on this topic so I was wondering if anyone has gotten this working and what settings are required. At the moment I'm not indexing my documents since I'm not searching them, just passing them through the percolator to trigger notifications.

You can use the percolate API to test documents against percolators without indexing them. However, the percolate API requires and index and a type for your doc. This is so that it knows how each field in your document is defined (or mapped).
Analyzers belong to an index, and the fields in a mapping/type definition can use either globally defined analyzers, or custom analyzers defined for your index.
For instance, we could define a mapping for index test, type test using a globally defined analyzer as follows:
curl -XPUT 'http://127.0.0.1:9200/test/?pretty=1' -d '
{
"mappings" : {
"test" : {
"properties" : {
"title" : {
"type" : "string",
"analyzer" : "english"
}
}
}
}
}
'
Or alternatively, you could setup a custom analyzer that belongs just to the test index:
curl -XPUT 'http://127.0.0.1:9200/test/?pretty=1' -d '
{
"mappings" : {
"test" : {
"properties" : {
"title" : {
"type" : "string",
"analyzer" : "my_english"
}
}
}
},
"settings" : {
"analysis" : {
"analyzer" : {
"my_english" : {
"stopwords" : [],
"type" : "english"
}
}
}
}
}
'
Now we can create our percolator, specifying which index it belongs to:
curl -XPUT 'http://127.0.0.1:9200/_percolator/test/english?pretty=1' -d '
{
"query" : {
"match" : {
"title" : "singular"
}
}
}
'
And test it out with the percolate API, again specifying the index and the type:
curl -XGET 'http://127.0.0.1:9200/test/test/_percolate?pretty=1' -d '
{
"doc" : {
"title" : "singulars"
}
}
'
# {
# "ok" : true,
# "matches" : [
# "english"
# ]
# }

Related

Setting enabled to true Elasticsearch

I'm new to elasticsearch. I have an index type as follows
{
"myindex" : {
"mappings" : {
"systemChanges" : {
"_all" : {
"enabled" : false
},
"properties" : {
"autoChange" : {
"type" : "boolean"
},
"changed" : {
"type" : "object",
"enabled" : false
},
"created" : {
"type" : "date",
"format" : "strict_date_optional_time||epoch_millis"
}
}
}
}
}
}
I'm unable to fetch the details having changed.new = completed. After some research i have found that it's because the changed field is set to enabled : false. and I need to change the same. I tried as follows
curl -X PUT "localhost:9200/myindex/" -H 'Content-Type: application/json' -d' {
"mappings": {
"systemChanges" : {
"properties" : {
"changed" : {
"enabled" : true
}
}
}
}
}'
But I'm getting error as following.
{"error":{"root_cause":[{"type":"index_already_exists_exception","reason":"already exists","index":"myindex"}],"type":"index_already_exists_exception","reason":"already exists","index":"myindex"},"status":400}
How can I change the enabled to true in order to fetch the details of the changed.new field?
you are trying to add an index again with the same name and hence the error.
See the below link for updating a mapping
https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-put-mapping.html
The enabled setting can be updated on existing fields using the PUT mapping API.

How do I configure elastic search to use the icu_tokenizer?

I'm trying to search a text indexed by elasticsearch and the icu_tokenizer but can't get it working.
My testcase is to tokenize the sentence “Hello. I am from Bangkok”, in thai สวัสดี ผมมาจากกรุงเทพฯ, which should be tokenized to the five words สวัสดี, ผม, มา, จาก, กรุงเทพฯ. (Sample from Elasticsearch - The Definitive Guide)
Searching using any of the last four words fails for me. Searching using any of the space separated words สวัสดี or ผมมาจากกรุงเทพฯ works fine.
If I specify the icu_tokenizer on the command line, like
curl -XGET 'http://localhost:9200/icu/_analyze?tokenizer=icu_tokenizer' -d "สวัสดี ผมมาจากกรุงเทพฯ"
it tokenizes to five words.
My settings are:
curl http://localhost:9200/icu/_settings?pretty
{
"icu" : {
"settings" : {
"index" : {
"creation_date" : "1474010824865",
"analysis" : {
"analyzer" : {
"nfkc_cf_normalized" : [ "icu_normalizer" ],
"tokenizer" : "icu_tokenizer"
}
}
},
"number_of_shards" : "5",
"number_of_replicas" : "1",
"uuid" : "tALRehqIRA6FGPu8iptzww",
"version" : {
"created" : "2040099"
}
}
}
}
The index is populated with
curl -XPOST 'http://localhost:9200/icu/employee/' -d '
{
"first_name" : "John",
"last_name" : "Doe",
"about" : "สวัสดี ผมมาจากกรุงเทพฯ"
}'
Searching with
curl -XGET 'http://localhost:9200/_search' -d'
{
"query" : {
"match" : {
"about" : "กรุงเทพฯ"
}
}
}'
Returns nothing ("hits" : [ ]).
Performing the same search with one of สวัสดี or ผมมาจากกรุงเทพฯ works fine.
I guess I've misconfigured the index, how should it be done?
The missing part is:
"mappings": {
"employee" : {
"properties": {
"about":{
"type": "text",
"analyzer": "icu_analyzer"
}
}
}
}
In the mapping, the document field have to be specified the analyzer to be using.
[Index] : icu
[type] : employee
[field] : about
PUT /icu
{
"settings": {
"analysis": {
"analyzer": {
"icu_analyzer" : {
"char_filter": [
"icu_normalizer"
],
"tokenizer" : "icu_tokenizer"
}
}
}
},
"mappings": {
"employee" : {
"properties": {
"about":{
"type": "text",
"analyzer": "icu_analyzer"
}
}
}
}
}
test the custom analyzer using followings DSLJson
POST /icu/_analyze
{
"text": "สวัสดี ผมมาจากกรุงเทพฯ",
"analyzer": "icu_analyzer"
}
The result should be [สวัสดี, ผม, มา, จาก, กรุงเทพฯ]
My suggestion would be :
Kibana : Dev Tool could help you for effective query crafting

Elasticsearch does not found an existing document using a the DSL

I dont know why, using the URI Search way to search a document is returning the right document, but the document is not found if I use the API DSL.
To reproduce the issue:
Without any index created, I insert this document:
curl http://localhost:9299/integrationtest-index/searchable/ID_XXXX2 -d '{ "ref" : "XXXX2", "field1" : "value1" }'
So the index is created automatically with the default mapping (type searchable):
curl http://localhost:9299/integrationtest-index?pretty
{
"integrationtest-index" : {
"aliases" : { },
"mappings" : {
"searchable" : {
"properties" : {
"field1" : {
"type" : "string"
},
"ref" : {
"type" : "string"
}
}
}
},
"settings" : {
"index" : {
"field1" : "value1",
"ref" : "XXXX2",
"number_of_shards" : "5",
"creation_date" : "1466780216631",
"number_of_replicas" : "1",
"uuid" : "GBj2VF-wQy6JP74AqoIn5g",
"version" : {
"created" : "2020099"
}
}
},
"warmers" : { }
}
}
This query return one document:
curl http://localhost:9299/integrationtest-index/searchable/_search?q=ref:XXXX2
But this other query response that does not exist:
curl -XPOST http://localhost:9299/integrationtest-index/searchable/_search/exists -d '
{
"query": {
"term" : {
"ref" : "XXXX2"
}
}
}'
Why the last query said that the document does not exist?
Environment:
ElasticSearch 2.2.0
Ubuntu 16.04 LTS
OpenJDK Runtime Environment (build 1.8.0_91-8u91-b14-0ubuntu4~16.04.1-b14)
I have the same problem every few months, so I decided response myself and share my stupids errors.
By default, elasticsearch use index:analyzed, so the query with term does not found any document.
If you use the URI Search way, elasticsearch is executing a query_string and not a term query.
This query is working:
curl -XPOST http://localhost:9299/integrationtest-index/searchable/_search/exists -d '
{
"query": {
"match" : {
"ref" : "XXXX2"
}
}
}'
More information in the documentation, in the section Why doesn’t the term query match my document?

How to tell ElasticSearch to create nested fields

I'm putting data in ES and check the mapping which is created,
I'm executing this:
curl -XPOST 'http://localhost:9200/testnested2/type1/0' -d '{
"p1": ["1","2","3","4"],
"users" : {
"first" : "John",
"last" : "Sm11ith"
}
}'
and this is the schema it created:
{
"testnested2":{
"mappings":{
"type1":{
"properties":{
"p1":{"type":"string"},
"users":{
"properties":{
"first":{"type":"string"},
"last":{"type":"string"}
}
}
}
}
}
}
}
I was wondering if it's possible to tell it that "users" is nested or I have to create the mapping for myself.
I would like that ES could create an shema like this:
curl -XPOST http://180.5.5.93:9200/testnested3 -d '{
"settings" : {
"number_of_shards" : 1
},
"mappings" : {
"type1" : {
"properties" : {
"propiedad1" : { "type" : "string"},
"users" : {
"type" : "nested",
"include_in_parent": true,
"properties": {
"first" : {"type": "string" },
"last" : {"type": "string" }
}
}
}
}
}
}'
By default, the dynamic mapping feature of ElasticSearch will map users as an object instead of nested.
If you want to tune this behavior, you have to define explicitely a users attribute as nested either in :
the type1 mapping
the default mapping of your index. This way, for any type created, the users attribute will be set automatically to nested(see here for default mapping information)

How to create alias for dynamic fields in elasticsearch dynamic templates?

I am using elasticsearch 1.0.2 and using a sample dynamic template in my index. Is there anyway we can derive the field index name from a part of dynamic field Name
This is my template
{"dynamic_templates":[
"dyn_string_fields": {
"match": "dyn_string_*",
"match_mapping_type": "string",
"mapping": {
"type": "string",
"index" : "analyzed",
"index_name": "{name}"
}
}
}]}
The dynamic templates work and I am able to add fields. Our goal is to add fields with the "dyn_string_" prefix but while searching it should be just the fieldname without the "dyn_string_" prefix. I tested using match_mapping_type to add fields but this will allow any field to be added. Does someone have any suggestions?
I looked at Elasticsearch API and they have a transform feature in 1.3 which allows to modify the document before insertion.(unfortunately I will not be able to upgrade to that version.)
In single template several aliases can be set. For quick example please have a look at this dummy example:
curl -XPUT localhost:9200/_template/test_template -d '
{
"template" : "test_*",
"settings" : {
"number_of_shards" : 4
},
"aliases" : {
"name_for_alias" : {}
},
"mappings" : {
"type" : {
"properties" : {
"id" : {
"type" : "integer",
"include_in_all" : false
},
"test_user_id" : {
"type" : "integer",
"include_in_all" : false
}
}
}
}
}
'
There "name_for_alias" is you simple alias. As parameter there can be defined preset filters if you want use alias for filtering data.
More information can be found here: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-templates.html

Resources