Why do I have to PUT new documents to a nested URI, if mapping types have been removed? - elasticsearch

I'm on Elasticsearch 7.14.0 where mapping types have been removed.
If I run the following:
curl -X PUT "localhost:9200/products/1?pretty" -H 'Content-Type: application/json' -d'
{
"name": "Toast"
}
'
I get
{
"error" : "Incorrect HTTP method for uri [/products/1?pretty] and method [PUT], allowed: [POST]",
"status" : 405
}
It seems that elastic wants me PUT it in an /index/type/ URI:
curl -X PUT "localhost:9200/pop/products/1?pretty" -H 'Content-Type: application/json' -d'
{
"name": "Toast"
}
'
{
"_index" : "pop",
"_type" : "products",
"_id" : "1",
"_version" : 1,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 0,
"_primary_term" : 1
}
I am wondering why I must have a nested URI indicating a type, if mapping types have been removed?

You have to add _doc to your put request call as shown below
curl -X PUT "localhost:9200/products/_doc/1?pretty" -H 'Content-Type: application/json' -d'
{
"name": "Toast"
}
'
As mentioned in elasticsearch official documentation after mapping types were removed in 7.x, you need to add , _doc (which does not represent a document type rather it represents the endpoint name) for the document index, get, and delete APIs

Related

A mapper_parsing_exception occurred when using the bulk API of Elasticsearch

Elasticsearch version: 8.3.3
Indexing was performed using the following Elasticsearch API.
curl -X POST "localhost:9200/bulk_meta/_doc/_bulk?pretty" -H 'Content-Type: application/json' -d'
{"index": { "_id": "1"}}
{"mydoc": "index action, id 1 "}
{"index": {}}
{"mydoc": "index action, id 2"}
'
In this case, the following error occurred.
{
"error" : {
"root_cause" : [
{
"type" : "mapper_parsing_exception",
"reason" : "failed to parse"
}
],
"type" : "mapper_parsing_exception",
"reason" : "failed to parse",
"caused_by" : {
"type" : "illegal_argument_exception",
"reason" : "Malformed content, found extra data after parsing: START_OBJECT"
}
},
"status" : 400
}
I've seen posts asking to add \n, but that didn't help.
You need to remove _doc from the requst.
curl -X POST "localhost:9200/bulk_meta/_bulk?pretty" -H 'Content-Type: application/json' -d'
{"index":{"_id":"1"}}
{"mydoc":"index action, id 1 "}
{"index":{}}
{"mydoc":"index action, id 2"}
'

Elasticseach error with null value for dense vector datatype

I created an index with a dense_vector:
curl -X PUT "localhost:9200/my_index?pretty" -H 'Content-Type: application/json' -d'
{
"mappings": {
"properties": {
"my_vector": {
"type": "dense_vector",
"dims": 3
}
}
}
}
'
When I index a document with a vector it works well:
curl -X PUT "localhost:9200/my_index/_doc/1?pretty" -H 'Content-Type: application/json' -d'
{
"my_vector" : [0.5, 10, 6]
}
'
BUT when I index a document with a null value for the vector it returns an error:
curl -X PUT "localhost:9200/my_index/_doc/2?pretty" -H 'Content-Type: application/json' -d'
{
"my_vector" : null
}
'
The error is:
{
"error" : {
"root_cause" : [
{
"type" : "parsing_exception",
"reason" : "Failed to parse object: expecting token of type [VALUE_NUMBER] but found [END_OBJECT]",
"line" : 5,
"col" : 1
}
],
"type" : "mapper_parsing_exception",
"reason" : "failed to parse",
"caused_by" : {
"type" : "parsing_exception",
"reason" : "Failed to parse object: expecting token of type [VALUE_NUMBER] but found [END_OBJECT]",
"line" : 5,
"col" : 1
}
},
"status" : 400
}
How can I handle null value for vector type in ES?
instead of setting it to null you can remove that field from that particular document which is equivalent to setting it as null using the followingrequest
curl --location --request POST 'http://{ip}:9200/my_index/_doc/{docId}/_update' \
--header 'Content-Type: application/json' \
--header 'Content-Type: application/json' \
--data-raw '{
"script" : "ctx._source.remove(\"my_vector\")"
}'

How can curl perform a get request with a data payload?

The introductory materials on ElasticSearch include the following example curl request:
curl -X GET "localhost:9200/_search?pretty" -H 'Content-Type: application/json' -d'
{
"query": {
"query_string" : {
"query" : "(new york city) OR (big apple)",
"default_field" : "content"
}
}
}
'
This request has two parameter which I thought were incompatible:
-X GET, which specifies that the request is a GET.
-d [...], which specifies that the request has a data payload.
I thought that specifying a data payload was only possible in a PUT or POST requests, because GET requests do not have any concept of a data payload. Is this a valid curl command? What does it do, exactly?
Above curl request is a valid request, in fact, if you have index and data, then you can check the output of your command.
I tried it in my system and ES index and it gave me proper response.
curl -v -X GET "localhost:9500/querytime/_search?pretty" -H 'Content-Type: application/json' -d'
{
"query": {
"query_string" : {
"query" : "(avengers) OR (big apple)",
"default_field" : "movie_name"
}
}
}'
* Trying ::1...
* TCP_NODELAY set
* Connected to localhost (::1) port 9500 (#0)
> GET /querytime/_search?pretty HTTP/1.1
> Host: localhost:9500
> User-Agent: curl/7.64.1
> Accept: */*
> Content-Type: application/json
> Content-Length: 156
>
* upload completely sent off: 156 out of 156 bytes
< HTTP/1.1 200 OK
< content-type: application/json; charset=UTF-8
< content-length: 905
<
{
"took" : 4,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 0.14874382,
"hits" : [
{
"_index" : "querytime",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.14874382,
"_source" : {
"movie_name" : "Avengers: Infinity War"
}
}
]
}
}
As mentioned in the official manual of curl command, if you are using *nix based system, then you can search below in the manual of curl.
-G, --get
When used, this option will make all data specified with -d, --data, --data-binary or --data-urlencode to be used in an
HTTP GET request instead of the POST request that otherwise would be
used. The
data will be appended to the URL with a '?' separator
As explained in this SO answer, it also depends on the web-server to parse the body in GET request.

Bulk index text field containing new lines with cURL

I am trying to bulk index a file with the following format to my elasticsearch index:
{"index":{"_index":"articles","_type":"_doc"}}
{"title":"My Article Title","text":"My article text. \nNext paragraph here."}
Using this command:
curl -s -XPOST -H 'Content-Type: application/x-ndjson' http://localhost:9200/_bulk --data-binary #/data.json
The problem is that the article text in my documents may contain new line characters \n, which breaks the formatting for a cURL bulk index, so I get this error:
{"error":{"root_cause":[{"type":"parse_exception","reason":"request body is required"}],"type":"parse_exception","reason":"request body is required"},"status":400}
I have been able to bulk index these documents using the javascript API, so I'm hoping it will be possible to do using cURL, as I want to index these documents into my docker image as a part of the build.
I've managed to do it on Elasticsearch 7.3 and Red Hat Enterprise Linux 7 (7.7).
1) Changed .json to .txt and just hit enter after the last line, saved and uploaded on server
[root#host tmp]$ mv data.json data.txt
2) Forced curl to append new line to output
[root#host tmp]$ echo '-w "\n"' >> ~/.curlrc
3) Curled to ES:
[root#host tmp]$ curl -s -XPOST -H 'Content-Type: application/x-ndjson' https://localhost:9200/_bulk -k -u user:pass --data-binary #data.json
{"took":4,"errors":false,"items":[{"index":{"_index":"articles","_type":"_doc","_id":"QdsosG0B3nqkAGly3E6t","_version":1,"result":"created","_shards":{"total":2,"successful":2,"failed":0},"_seq_no":1,"_primary_term":1,"status":201}}]}
4) Result:
[root#host tmp]$ curl -XGET -H 'Content-Type: application/x-ndjson' https://localhost:9200/articles/_search?pretty -k -u user:pass
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "articles",
"_type" : "_doc",
"_id" : "QdsosG0B3nqkAGly3E6t",
"_score" : 1.0,
"_source" : {
"title" : "My Article Title",
"text" : "My article text. \nNext paragraph here."
}
}
]
}
}

Elasticsearch completion : strange behavior when multiple matches per document

When I use the completion type inside a suggest as described in the ElasticSearch documentation (https://www.elastic.co/guide/en/elasticsearch/reference/6.7/search-suggesters-completion.html), I do not manage to get all the matching words (I only get one matching word per document)
I test the following commands on my ElasticSearch 6.7.2 (which is the latest available on AWS at this moment) :
Deleting the index in case it exists
curl http://localhost:9200/test -H 'Content-Type: application/json' -X DELETE
Creating the index
curl http://localhost:9200/test -H 'Content-Type: application/json' -X PUT -d '
{
"mappings": {
"page": {
"properties": {
"completion_terms": {
"type": "completion"
}
}
}
}
}
'
Indexing a document
curl http://localhost:9200/test/_doc/1 -H 'Content-Type: application/json' -X PUT -d '
{
"completion_terms": ["restaurant", "restauration", "réseau"]
}'
Check the document exists
curl http://localhost:9200/test/_doc/1 -H 'Content-Type: application/json'
Use the completion
curl -X GET "localhost:9200/test/_search?pretty=true" -H 'Content-Type: application/json' -d'
{
"_source": ["suggestExact"],
"suggest": {
"suggestExact" : {
"prefix" : "res",
"completion" : {
"field" : "completion_terms"
}
}
}
}
'
The result is :
{
"took" : 6,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : 0.0,
"hits" : [ ]
},
"suggest" : {
"suggestExact" : [
{
"text" : "res",
"offset" : 0,
"length" : 3,
"options" : [
{
"text" : "restaurant",
"_index" : "test",
"_type" : "page",
"_id" : "1",
"_score" : 1.0,
"_source" : { }
}
]
}
]
}
}
I'd like to get ALL the matching words (here, I get at most one result per document)
In the example, "restauration" and "réseau" are missing
Am I doing something wrong ?
After many searches, I found that this is the intended behavior (that is to "suggest documents", instead of "suggest terms")
Especially, see https://github.com/elastic/elasticsearch/issues/31738
However, I still do not manage to achieve "suggest terms" even with the term suggester which seems to be the correct way (https://www.elastic.co/guide/en/elasticsearch/reference/6.7/search-suggesters-term.html)

Resources