Elasticsearch completion : strange behavior when multiple matches per document - elasticsearch

When I use the completion type inside a suggest as described in the ElasticSearch documentation (https://www.elastic.co/guide/en/elasticsearch/reference/6.7/search-suggesters-completion.html), I do not manage to get all the matching words (I only get one matching word per document)
I test the following commands on my ElasticSearch 6.7.2 (which is the latest available on AWS at this moment) :
Deleting the index in case it exists
curl http://localhost:9200/test -H 'Content-Type: application/json' -X DELETE
Creating the index
curl http://localhost:9200/test -H 'Content-Type: application/json' -X PUT -d '
{
"mappings": {
"page": {
"properties": {
"completion_terms": {
"type": "completion"
}
}
}
}
}
'
Indexing a document
curl http://localhost:9200/test/_doc/1 -H 'Content-Type: application/json' -X PUT -d '
{
"completion_terms": ["restaurant", "restauration", "réseau"]
}'
Check the document exists
curl http://localhost:9200/test/_doc/1 -H 'Content-Type: application/json'
Use the completion
curl -X GET "localhost:9200/test/_search?pretty=true" -H 'Content-Type: application/json' -d'
{
"_source": ["suggestExact"],
"suggest": {
"suggestExact" : {
"prefix" : "res",
"completion" : {
"field" : "completion_terms"
}
}
}
}
'
The result is :
{
"took" : 6,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : 0.0,
"hits" : [ ]
},
"suggest" : {
"suggestExact" : [
{
"text" : "res",
"offset" : 0,
"length" : 3,
"options" : [
{
"text" : "restaurant",
"_index" : "test",
"_type" : "page",
"_id" : "1",
"_score" : 1.0,
"_source" : { }
}
]
}
]
}
}
I'd like to get ALL the matching words (here, I get at most one result per document)
In the example, "restauration" and "réseau" are missing
Am I doing something wrong ?

After many searches, I found that this is the intended behavior (that is to "suggest documents", instead of "suggest terms")
Especially, see https://github.com/elastic/elasticsearch/issues/31738
However, I still do not manage to achieve "suggest terms" even with the term suggester which seems to be the correct way (https://www.elastic.co/guide/en/elasticsearch/reference/6.7/search-suggesters-term.html)

Related

Why do I have to PUT new documents to a nested URI, if mapping types have been removed?

I'm on Elasticsearch 7.14.0 where mapping types have been removed.
If I run the following:
curl -X PUT "localhost:9200/products/1?pretty" -H 'Content-Type: application/json' -d'
{
"name": "Toast"
}
'
I get
{
"error" : "Incorrect HTTP method for uri [/products/1?pretty] and method [PUT], allowed: [POST]",
"status" : 405
}
It seems that elastic wants me PUT it in an /index/type/ URI:
curl -X PUT "localhost:9200/pop/products/1?pretty" -H 'Content-Type: application/json' -d'
{
"name": "Toast"
}
'
{
"_index" : "pop",
"_type" : "products",
"_id" : "1",
"_version" : 1,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 0,
"_primary_term" : 1
}
I am wondering why I must have a nested URI indicating a type, if mapping types have been removed?
You have to add _doc to your put request call as shown below
curl -X PUT "localhost:9200/products/_doc/1?pretty" -H 'Content-Type: application/json' -d'
{
"name": "Toast"
}
'
As mentioned in elasticsearch official documentation after mapping types were removed in 7.x, you need to add , _doc (which does not represent a document type rather it represents the endpoint name) for the document index, get, and delete APIs

Elasticsearch-6.x norms false not working

That is what I have Done:
First:
curl -X PUT "localhost:9200/log_20180419"
Second
curl -X PUT "localhost:9200/log_20180419/_mapping/_doc" -H 'Content-Type: application/json' -d'
{
"properties": {
"title": {
"type": "text",
"norms": false
}
}
}
'
Third
#I insert data with python client : elastisearch-py
from elastisearch import Elastisearch
es_conn = Elastisearch()
content_tmp = "acxzcasiuchxzuicbhasuicgzyugas%s"
for i in range(10000):
result = content_tmp % i
es_conn.index(index="log_20180419", body = {"title":result}, doc_type="_doc")
Forth
I Query It
curl -X GET "localhost:9200/cdn_log_20180419/_search?pretty" -H 'Content-Type: application/json' -d'
{
"query": {
"match":{
"title":"dasuioczxuivcaduciqanbcaiushcauinhauincsaincdjkxzcbyquiwbjkfcznkajsbcjkzxhcuiasbcjkzxchjdsfasckjbjak9999"
}
}
}
'
Result is
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 7.2293553,
"hits" : [
{
"_index" : "cdn_log_20180419",
"_type" : "_doc",
"_id" : "oDR99mIBBZEcRu0i7LlO",
"_score" : 7.2293553,
"_source" : {
"title" : "dasuioczxuivcaduciqanbcaiushcauinhauincsaincdjkxzcbyquiwbjkfcznkajsbcjkzxhcuiasbcjkzxchjdsfasckjbjak9999"
}
}
]
}
}
You can see, it still has _score file in result, I get Confuse with it ?
The Doc is here https://www.elastic.co/guide/en/elasticsearch/reference/current/norms.html
The norm is only one part of scoring. The norm covers the field length norm and index-time boosting (if you are using that), but term frequency and inverse document frequency (TF/IDF) are independent of it.
If you don't need / want scoring for your query, look into boolean filters or constant score.

Unable to search attachment type field in an ElasticSearch indexed document

Search does not return any results although I do have a document that should match the query.
I do have the ElasticSearch mapper-attachments plugin installed per https://github.com/elasticsearch/elasticsearch-mapper-attachments. I have also googled the topic as well as browsed similar questions in stack overflow, but have not found an answer.
Here's what I typed into a windows 7 command prompt:
c:\Java\elasticsearch-1.3.4>curl -XDELETE localhost:9200/tce
{"acknowledged":true}
c:\Java\elasticsearch-1.3.4>curl -XPUT localhost:9200/tce
{"acknowledged":true}
c:\Java\elasticsearch-1.3.4>curl -XPUT localhost:9200/tce/contact/_mapping -d{\"
contact\":{\"properties\":{\"my_attachment\":{\"type\":\"attachment\"}}}}
{"acknowledged":true}
c:\Java\elasticsearch-1.3.4>curl -XPUT localhost:9200/tce/contact/1 -d{\"my_atta
chment\":\"SGVsbG8=\"}
{"_index":"tce","_type":"contact","_id":"1","_version":1,"created":true}
c:\Java\elasticsearch-1.3.4>curl localhost:9200/tce/contact/_search?pretty
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "tce",
"_type" : "contact",
"_id" : "1",
"_score" : 1.0,
"_source":{"my_attachment":"SGVsbG8="}
} ]
}
}
c:\Java\elasticsearch-1.3.4>curl localhost:9200/tce/contact/_search?pretty -d{\"
query\":{\"term\":{\"my_attachment\":\"Hello\"}}}
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : null,
"hits" : [ ]
}
}
Note that the base64 encoded value of "Hello" is "SGVsbG8=", which is the value I have inserted into the "my_attachment" field of the document.
I am assuming that the mapper-attachments plugin has been deployed correctly because I don't get an error executing the mapping command above.
Any help would be greatly appreciated.
What analyzer is running against the my_attachment field?
if it's the standard analyser (can't see any listed) then the Hello in the text will be made lowercase in the index.
i.e. when doing a term search (which doesn't have an analyzer on it) - try searching for hello
curl localhost:9200/tce/contact/_search?pretty -d'
{"query":
{"term":
{"my_attachment":"hello"
}}}'
you can also see which terms have been added to the index:
curl 'http://localhost:9200/tce/contact/_search?pretty=true' -d '{
"query" : {
"match_all" : { }
},
"script_fields": {
"terms" : {
"script": "doc[field].values",
"params": {
"field": "my_attachment"
}
}
}
}'

Elasticsearch index last update time

Is there a way to retrieve from ElasticSearch information on when a specific index was last updated?
My goal is to be able to tell when it was the last time that any documents were inserted/updated/deleted in the index. If this is not possible, is there something I can add in my index modification requests that will provide this information later on?
You can get the modification time from the _timestamp
To make it easier to return the timestamp you can set up Elasticsearch to store it:
curl -XPUT "http://localhost:9200/myindex/mytype/_mapping" -d'
{
"mytype": {
"_timestamp": {
"enabled": "true",
"store": "yes"
}
}
}'
If I insert a document and then query on it I get the timestamp:
curl -XGET 'http://localhost:9200/myindex/mytype/_search?pretty' -d '{
> fields : ["_timestamp"],
> "query": {
> "query_string": { "query":"*"}
> }
> }'
{
"took" : 7,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "myindex",
"_type" : "mytype",
"_id" : "1",
"_score" : 1.0,
"fields" : {
"_timestamp" : 1417599223918
}
} ]
}
}
updating the existing document:
curl -XPOST "http://localhost:9200/myindex/mytype/1/_update" -d'
{
"doc" : {
"field1": "data",
"field2": "more data"
},
"doc_as_upsert" : true
}'
Re-running the previous query shows me an updated timestamp:
"fields" : {
"_timestamp" : 1417599620167
}
I don't know if there are people who are looking for an equivalent, but here is a workaround using shards stats for > Elasticsearch 5 users:
curl XGET http://localhost:9200/_stats?level=shards
As you'll see, you have some informations per indices, commits and/or flushs that you might use to see if the indice changed (or not).
I hope it will help someone.
Just looked into a solution for this problem. Recent Elasticsearch versions have a <index>/_recovery API.
This returns a list of shards and a field called stop_time_in_millis which looks like it is a timestamp for the last write to that shard.

ElasticSearch CouchDB Geo location

I am trying to get elasticsearch to index a couchdb river without luck.
I have a database 'pl1' with only one document '1' in it.
This is a printout of the entire document pretty-printed:
curl -XGET localhost:5984/pl1/1 | python -mjson.tool
{
"_id": "1",
"_rev": "1-0442f3962cffedc2238fcdb28dd77557",
"location": {
"geo_json": {
"coordinates": [
59.70141999133738,
14.162789164118708
],
"type": "point"
},
"lat": 14.162789164118708,
"lon": 59.70141999133738
}
}
I create a couchdb river and index with a catch-all type called all_entries the following way:
curl -XPUT 'localhost:9200/_river/pl1/_meta' -d '
{
"type" : "couchdb",
"couchdb" : {
"host" : "localhost",
"port" : 5984,
"filter" : null,
"db" : "pl1"
},
"index" : {
"index" : "pl1",
"type" : "all_entries",
"bulk_size" : "100",
"bulk_timeout" : "10ms"
}
}'
{"ok":true,"_index":"_river","_type":"pl1","_id":"_meta","_version":1}
To test whether the document was indexed I perform the following query:
curl -XGET localhost:9200/pl1/all_entries/_count?pretty=true
{
"count" : 1,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
}
}
But then nothing. I can't figure out how to index the location using a geo_shape type (I have also tried with the different geo_point format for the data, and indexing that, but also no results)
How do I specify a mapper and query for this?

Resources