Elasticsearch does not found an existing document using a the DSL - elasticsearch

I dont know why, using the URI Search way to search a document is returning the right document, but the document is not found if I use the API DSL.
To reproduce the issue:
Without any index created, I insert this document:
curl http://localhost:9299/integrationtest-index/searchable/ID_XXXX2 -d '{ "ref" : "XXXX2", "field1" : "value1" }'
So the index is created automatically with the default mapping (type searchable):
curl http://localhost:9299/integrationtest-index?pretty
{
"integrationtest-index" : {
"aliases" : { },
"mappings" : {
"searchable" : {
"properties" : {
"field1" : {
"type" : "string"
},
"ref" : {
"type" : "string"
}
}
}
},
"settings" : {
"index" : {
"field1" : "value1",
"ref" : "XXXX2",
"number_of_shards" : "5",
"creation_date" : "1466780216631",
"number_of_replicas" : "1",
"uuid" : "GBj2VF-wQy6JP74AqoIn5g",
"version" : {
"created" : "2020099"
}
}
},
"warmers" : { }
}
}
This query return one document:
curl http://localhost:9299/integrationtest-index/searchable/_search?q=ref:XXXX2
But this other query response that does not exist:
curl -XPOST http://localhost:9299/integrationtest-index/searchable/_search/exists -d '
{
"query": {
"term" : {
"ref" : "XXXX2"
}
}
}'
Why the last query said that the document does not exist?
Environment:
ElasticSearch 2.2.0
Ubuntu 16.04 LTS
OpenJDK Runtime Environment (build 1.8.0_91-8u91-b14-0ubuntu4~16.04.1-b14)

I have the same problem every few months, so I decided response myself and share my stupids errors.
By default, elasticsearch use index:analyzed, so the query with term does not found any document.
If you use the URI Search way, elasticsearch is executing a query_string and not a term query.
This query is working:
curl -XPOST http://localhost:9299/integrationtest-index/searchable/_search/exists -d '
{
"query": {
"match" : {
"ref" : "XXXX2"
}
}
}'
More information in the documentation, in the section Why doesn’t the term query match my document?

Related

Kibana Create Index Pattern : strange behaviour of wildcard

I have just one index in elasticsearch, with name aa-bb-YYYY-MM.
Documents in this index contain a field i want to use as date field.
Those documents have been inserted from a custom script (not using logstash).
When creating the index pattern in kibana:
If i enter aa-bb-*, the date field is not found.
If i enter aa-*, the date field is not found.
If i enter aa*, the date field is found, and i can create the index pattern.
But i really need to group indexes by the first two "dimensions".I tried using "_" instead "-", with the same result.
Any idea of what is going on?
Its working for me. I'm on the latest build on the 5.0 release branch (just past the beta1 release). I don't know what version you're on.
I created this index and added 2 docs;
curl --basic -XPUT 'http://elastic:changeme#localhost:9200/aa-bb-2016-09' -d '{
"settings" : {
"number_of_shards" : 1
},
"mappings" : {
"test" : {
"properties" : {
"date" : { "type" : "date"},
"action" : {
"type" : "text",
"analyzer" : "standard",
"fields": {
"raw" : { "type" : "text", "index" : "not_analyzed" }
}
},
"myid" : { "type" : "integer"}
}
}
}
}'
curl -XPUT 'http://elastic:changeme#localhost:9200/aa-bb-2016-09/test/1' -d '{
"date" : "2015-08-23T00:01:00",
"action" : "start",
"myid" : 1
}'
curl -XPUT 'http://elastic:changeme#localhost:9200/aa-bb-2016-09/test/2' -d '{
"date" : "2015-08-23T14:02:30",
"action" : "stop",
"myid" : 1
}'
and I was able to create the index pattern with aa-bb-*

How do I configure elastic search to use the icu_tokenizer?

I'm trying to search a text indexed by elasticsearch and the icu_tokenizer but can't get it working.
My testcase is to tokenize the sentence “Hello. I am from Bangkok”, in thai สวัสดี ผมมาจากกรุงเทพฯ, which should be tokenized to the five words สวัสดี, ผม, มา, จาก, กรุงเทพฯ. (Sample from Elasticsearch - The Definitive Guide)
Searching using any of the last four words fails for me. Searching using any of the space separated words สวัสดี or ผมมาจากกรุงเทพฯ works fine.
If I specify the icu_tokenizer on the command line, like
curl -XGET 'http://localhost:9200/icu/_analyze?tokenizer=icu_tokenizer' -d "สวัสดี ผมมาจากกรุงเทพฯ"
it tokenizes to five words.
My settings are:
curl http://localhost:9200/icu/_settings?pretty
{
"icu" : {
"settings" : {
"index" : {
"creation_date" : "1474010824865",
"analysis" : {
"analyzer" : {
"nfkc_cf_normalized" : [ "icu_normalizer" ],
"tokenizer" : "icu_tokenizer"
}
}
},
"number_of_shards" : "5",
"number_of_replicas" : "1",
"uuid" : "tALRehqIRA6FGPu8iptzww",
"version" : {
"created" : "2040099"
}
}
}
}
The index is populated with
curl -XPOST 'http://localhost:9200/icu/employee/' -d '
{
"first_name" : "John",
"last_name" : "Doe",
"about" : "สวัสดี ผมมาจากกรุงเทพฯ"
}'
Searching with
curl -XGET 'http://localhost:9200/_search' -d'
{
"query" : {
"match" : {
"about" : "กรุงเทพฯ"
}
}
}'
Returns nothing ("hits" : [ ]).
Performing the same search with one of สวัสดี or ผมมาจากกรุงเทพฯ works fine.
I guess I've misconfigured the index, how should it be done?
The missing part is:
"mappings": {
"employee" : {
"properties": {
"about":{
"type": "text",
"analyzer": "icu_analyzer"
}
}
}
}
In the mapping, the document field have to be specified the analyzer to be using.
[Index] : icu
[type] : employee
[field] : about
PUT /icu
{
"settings": {
"analysis": {
"analyzer": {
"icu_analyzer" : {
"char_filter": [
"icu_normalizer"
],
"tokenizer" : "icu_tokenizer"
}
}
}
},
"mappings": {
"employee" : {
"properties": {
"about":{
"type": "text",
"analyzer": "icu_analyzer"
}
}
}
}
}
test the custom analyzer using followings DSLJson
POST /icu/_analyze
{
"text": "สวัสดี ผมมาจากกรุงเทพฯ",
"analyzer": "icu_analyzer"
}
The result should be [สวัสดี, ผม, มา, จาก, กรุงเทพฯ]
My suggestion would be :
Kibana : Dev Tool could help you for effective query crafting

Elasticsearch How do I get a metadata using Image Plugin

I defined matadata by the mapping of the Elasticsearch image Plugin.
Mapping:
"photo" : {
"mappings" : {
"scenery" : {
"properties" : {
"my_img" : {
"type" : "image",
"feature" : {"FCTH" : { }, ... },
"metadata" : {
"jpeg.image_height" : {"type" : "string","store" : true},
"jpeg.image_width" : {"type" : "string","store" : true}
}
}
}
}
}
}
After an index, although searched, metadata does not return.
How do I get a metadata?
I tried:
curl -XPOST 'localhost:9200/photo/scenery/_search' -d '{
"query":{
"image":{
"my_img":{
"feature":"CEDD",
"index":"photo",
"type":"scenery",
"id":"0",
"path":"my_img",
"hash":"BIT_SAMPLING"
}
}
}
}'
Result:
{"took":14,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":5,"max_score":1.0,"hits":[{"_index":"photo","_type":"scenery","_id":"0","_score":1.0, "_source" : {"file_name": "376423.jpg", "my_img": "/9j/4AAQSkZJRgABAQ...
Perhaps, the original data (base64 encoded image) will be returned _source field. You can use that instead, the fields option.
Try this query.
curl -XPOST 'localhost:9200/photo/scenery/_search' -d '{
"query":{
...
},
"fields": ["my_img.metadata.jpeg.image_height","my_img.metadata.jpeg.image_width" ]
}'

Specify Routing on Index Alias's Term Lookup Filter

I am using Logstash, ElasticSearch and Kibana to allow multiple users to log in and view the log data they have forwarded. I have created index aliases for each user. These restrict their results to contain only their own data.
I'd like to assign users to groups, and allow users to view data for the computers in their group. I created a parent-child relationship between the groups and the users, and I created a term lookup filter on the alias.
My problem is, I receive a RoutingMissingException when I try to apply the alias.
Is there a way to specify the routing for the term lookup filter? How can I lookup terms on a parent document?
I posted the mapping and alias below, but a full gist recreation is available at this link.
curl -XPUT 'http://localhost:9200/accesscontrol/' -d '{
"mappings" : {
"group" : {
"properties" : {
"name" : { "type" : "string" },
"hosts" : { "type" : "string" }
}
},
"user" : {
"_parent" : { "type" : "group" },
"_routing" : { "required" : true, "path" : "group_id" },
"properties" : {
"name" : { "type" : "string" },
"group_id" : { "type" : "string" }
}
}
}
}'
# Create the logstash alias for cvializ
curl -XPOST 'http://localhost:9200/_aliases' -d '
{
"actions" : [
{ "remove" : { "index" : "logstash-2014.04.25", "alias" : "cvializ-logstash-2014.04.25" } },
{
"add" : {
"index" : "logstash-2014.04.25",
"alias" : "cvializ-logstash-2014.04.25",
"routing" : "intern",
"filter": {
"terms" : {
"host" : {
"index" : "accesscontrol",
"type" : "user",
"id" : "cvializ",
"path" : "group.hosts"
},
"_cache_key" : "cvializ_hosts"
}
}
}
}
]
}'
In attempting to find a workaround for this error, I submitted a bug to the ElasticSearch team, and received an answer from them. It was a bug in ElasticSearch where the filter is applied before the dynamic mapping, causing some erroneous output. I've included their workaround below:
PUT /accesscontrol/group/admin
{
"name" : "admin",
"hosts" : ["computer1","computer2","computer3"]
}
PUT /_template/admin_group
{
"template" : "logstash-*",
"aliases" : {
"template-admin-{index}" : {
"filter" : {
"terms" : {
"host" : {
"index" : "accesscontrol",
"type" : "group",
"id" : "admin",
"path" : "hosts"
}
}
}
}
},
"mappings": {
"example" : {
"properties": {
"host" : {
"type" : "string"
}
}
}
}
}
POST /logstash-2014.05.09/example/1
{
"message":"my sample data",
"#version":"1",
"#timestamp":"2014-05-09T16:25:45.613Z",
"type":"example",
"host":"computer1"
}
GET /template-admin-logstash-2014.05.09/_search

elasticsearch percolator stemmer

I'm attempting to use the percolation function in elasticsearch. It works great but out of the box there is no stemming to handle singular/plurals etc. The documentation is rather thin on this topic so I was wondering if anyone has gotten this working and what settings are required. At the moment I'm not indexing my documents since I'm not searching them, just passing them through the percolator to trigger notifications.
You can use the percolate API to test documents against percolators without indexing them. However, the percolate API requires and index and a type for your doc. This is so that it knows how each field in your document is defined (or mapped).
Analyzers belong to an index, and the fields in a mapping/type definition can use either globally defined analyzers, or custom analyzers defined for your index.
For instance, we could define a mapping for index test, type test using a globally defined analyzer as follows:
curl -XPUT 'http://127.0.0.1:9200/test/?pretty=1' -d '
{
"mappings" : {
"test" : {
"properties" : {
"title" : {
"type" : "string",
"analyzer" : "english"
}
}
}
}
}
'
Or alternatively, you could setup a custom analyzer that belongs just to the test index:
curl -XPUT 'http://127.0.0.1:9200/test/?pretty=1' -d '
{
"mappings" : {
"test" : {
"properties" : {
"title" : {
"type" : "string",
"analyzer" : "my_english"
}
}
}
},
"settings" : {
"analysis" : {
"analyzer" : {
"my_english" : {
"stopwords" : [],
"type" : "english"
}
}
}
}
}
'
Now we can create our percolator, specifying which index it belongs to:
curl -XPUT 'http://127.0.0.1:9200/_percolator/test/english?pretty=1' -d '
{
"query" : {
"match" : {
"title" : "singular"
}
}
}
'
And test it out with the percolate API, again specifying the index and the type:
curl -XGET 'http://127.0.0.1:9200/test/test/_percolate?pretty=1' -d '
{
"doc" : {
"title" : "singulars"
}
}
'
# {
# "ok" : true,
# "matches" : [
# "english"
# ]
# }

Resources