Error putting base64 converted string into Elasticsearch - elasticsearch

I create a simple mapping:
curl -XPUT 'localhost:9200/ficherosindex?pretty=true' -d '{
"mappings": {
"items": {
"dynamic": "strict",
"properties" : {
"title" : { "type": "string" },
"body" : { "type": "string" },
"attachments" : { "type": "attachment" }
}}}}'
I make PUT the title and the body, leaving attachments empty.
curl -XPUT 'localhost:9200/ficherosindex/items/1' -d '{
"title": "This is a test title",
"body" : "This is the body of the java",
"attachments" : ""
}'
And then I make the following script to update the attachments fields with the content of the MY_PDF.pdf file, converting it to base64.
#!/bin/sh
coded=`cat MY_PDF.pdf | perl -MMIME::Base64 -ne 'print encode_base64($_)'`
curl -X POST 'localhost:9200/ficherosindex/items/1/_update?pretty=true' -d '{
"doc" : {
"attachments" : \"${coded}\"
}}'
When I run the script I'm getting the following error:
{
"error" : {
"root_cause" : [ {
"type" : "json_parse_exception",
"reason" : "Unexpected character ('\\' (code 92)): expected a valid value (number, String, array, object, 'true', 'false' or 'null')\n at [Source: [B#6c8caddf; line: 3, column: 30]"
} ],
"type" : "json_parse_exception",
"reason" : "Unexpected character ('\\' (code 92)): expected a valid value (number, String, array, object, 'true', 'false' or 'null')\n at [Source: [B#6c8caddf; line: 3, column: 30]"
},
"status" : 500
}
What I'm doing wrong? Maybe I've to change the following line?
{
"doc" : {
"attachments" : \"${coded}\"
}}'
I also tried this solution with no luck. I have to mantain the order I'm showing. First create the item without the attachments and then use the _update to append the content of the .PDF to it.
Thanks in advance

Something like this should do:
#!/bin/sh
coded=`cat MY_PDF.pdf | perl -MMIME::Base64 -ne 'print encode_base64($_)'`
curl -XPOST 'localhost:9200/ficherosindex/items/1/_update?pretty=true' -H "Content-Type: application/json" -d #- <<CURL_DATA
{ "doc": { "attachments": "$coded" }}
CURL_DATA

Related

A mapper_parsing_exception occurred when using the bulk API of Elasticsearch

Elasticsearch version: 8.3.3
Indexing was performed using the following Elasticsearch API.
curl -X POST "localhost:9200/bulk_meta/_doc/_bulk?pretty" -H 'Content-Type: application/json' -d'
{"index": { "_id": "1"}}
{"mydoc": "index action, id 1 "}
{"index": {}}
{"mydoc": "index action, id 2"}
'
In this case, the following error occurred.
{
"error" : {
"root_cause" : [
{
"type" : "mapper_parsing_exception",
"reason" : "failed to parse"
}
],
"type" : "mapper_parsing_exception",
"reason" : "failed to parse",
"caused_by" : {
"type" : "illegal_argument_exception",
"reason" : "Malformed content, found extra data after parsing: START_OBJECT"
}
},
"status" : 400
}
I've seen posts asking to add \n, but that didn't help.
You need to remove _doc from the requst.
curl -X POST "localhost:9200/bulk_meta/_bulk?pretty" -H 'Content-Type: application/json' -d'
{"index":{"_id":"1"}}
{"mydoc":"index action, id 1 "}
{"index":{}}
{"mydoc":"index action, id 2"}
'

Elastic Search perform calculation on one document

I need to perform a calculation on a field of a specific document. As an example, I need to sum 50 to a price. I have tried the following options:
curl -X POST "localhost:9200/ex1/ex2/WPatZHgBEd7rI-6ZwNFC/_update?pretty" -H 'Content-Type: application/json' -d'{"doc": {"price": +50}}'
In this case it sets the price as 50. and if I try this:
curl -X POST "localhost:9200/ex1/ex2/WPatZHgBEd7rI-6ZwNFC/_update?pretty" -H 'Content-Type: application/json' -d'{"doc": {"price": "price"+50}}'
it gives the following error:
{
"error" : {
"root_cause" : [
{
"type" : "json_parse_exception",
"reason" : "Unexpected character ('-' (code 45)): was expecting comma to separate Object entries\n at [Source: org.elasticsearch.transport.netty4.ByteBufStreamInput#4c7cecda; line: 1, column: 29]"
}
],
"type" : "json_parse_exception",
"reason" : "Unexpected character ('-' (code 45)): was expecting comma to separate Object entries\n at [Source: org.elasticsearch.transport.netty4.ByteBufStreamInput#4c7cecda; line: 1, column: 29]"
},
"status" : 500
}
Use a script to increment a doc's attribute:
POST localhost:9200/ex1/ex2/WPatZHgBEd7rI-6ZwNFC/_update?pretty
{
"script": {
"source": "ctx._source.price += params.increment_by",
"params": {
"increment_by": 50
}
}
}
With cURL:
curl -XPOST "http://localhost:9200/localhost:9200/ex1/ex2/WPatZHgBEd7rI-6ZwNFC/_update?pretty" -H 'Content-Type: application/json' -d'{ "script": { "source": "ctx._source.price += params.increment_by", "params": { "increment_by": 50 } }}'

Using Curl to put data into ES and got Unexpected character ('n' (code 110))

I'm using Curl to put data into ES. I have already created a customer index.
The following command is from ES document.
curl -X PUT "localhost:9200/customer/_doc/1?pretty" -H 'Content-Type: application/json' -d'
{
"name": "John Doe"
}
'
When I do this, I get an error.
{
"error" : {
"root_cause" : [
{
"type" : "mapper_parsing_exception",
"reason" : "failed to parse"
}
],
"type" : "mapper_parsing_exception",
"reason" : "failed to parse",
"caused_by" : {
"type" : "json_parse_exception",
"reason" : "Unexpected character ('n' (code 110)): was expecting double-quote to start field name\n at [Source: org.elasticsearch.common.bytes.BytesReference$MarkSupportingStreamInputWrapper#1ec5236e; line: 3, column: 4]"
}
},
"status" : 400
}
I think, the below is the main reason of my error.
reason" : "Unexpected character ('n' (code 110)): was expecting double-quote to start field name
I have a feeling that I need to use (backslash) to escape. However, my attempt \' is not working great. Any advice?
I made it work like the below.
curl -X PUT "localhost:9200/customer/_doc/1?pretty" -H 'Content-Type: application/json' -d '
{
\"name\": \"John Doe\" <==== I used "backslash" in front of all the "
}
'
Answer without my comment:
curl -X PUT "localhost:9200/customer/_doc/1?pretty" -H 'Content-Type: application/json' -d '
{
\"name\": \"John Doe\"
}
'

null_value mapping in Elasticsearch

I have created a mapping for a tweetb type in a twitter index:
curl -XPUT http://www.mydomain:9200/twitter/tweetb/_mapping -d '{
"twitter": {
"mappings": {
"tweetb": {
"properties": {
"message": {
"type": "string",
"null_value": "NA"
}
}
}
}
}
}'
Then, I put one document:
curl -XPUT http://www.mydomain.com:9200/twitter/tweetb/1 -d '{"message": null}'
Then, I tried to get the inserted doc back:
curl -XGET http://www.mydomain:9200/twitter/tweetb/1
And that returned:
{
"_index": "twitter",
"_type": "tweetb",
"_id": "1",
"_version": 2,
"found" : true,
"_source" : { "message": null }
}
I was expecting "message" : "NA" in the _source field. However, it looks like "null_value" isn't working. Am I missing something?
The "null_value" field mapping does not change the value stored, rather it changes the value that is used in searches.
If you try searching for your "message" using "NA", then it should appear in the results:
curl -XPOST http://www.mydomain.com:9200/twitter/tweetb/_search -d '{
"query" : {
"match" : { "message" : "NA" }
}
}'
Of interest, it should respond with the actual value being null. Now, if you add a new document whose raw value is literally "NA" and perform the search, then you should see both results returned for the above query--one with a value and the other with null defined.
Perhaps of similar interest, this works for other queries as well based on how it is indexed, which is why a lowercase n.* matches, but N.* semi-surprisingly will not match:
curl -XPOST http://www.mydomain.com:9200/twitter/tweetb/_search -d '{
"query" : {
"regexp" : { "message" : "n.*" }
}
}'

How can I boost certain fields over others in elasticsearch?

My goal is to apply the boost to field "name" (see example below), but I have two problems when I search for "john":
search is also matching {name: "dany", message: "hi bob"} when name is "dany" and
search is not boosting name over message (rows with name="john" should be on the top)
The gist is on https://gist.github.com/tomaspet262/5535774
(since stackoverflow's form submit returned 'Your post appears to contain code that is not properly formatted as code', which was formatted properly).
I would suggest using query time boosting instead of index time boosting.
#DELETE
curl -XDELETE 'http://localhost:9200/test'
echo
# CREATE
curl -XPUT 'http://localhost:9200/test?pretty=1' -d '{
"settings": {
"analysis" : {
"analyzer" : {
"my_analyz_1" : {
"filter" : [
"standard",
"lowercase",
"asciifolding"
],
"type" : "custom",
"tokenizer" : "standard"
}
}
}
}
}'
echo
# DEFINE
curl -XPUT 'http://localhost:9200/test/posts/_mapping?pretty=1' -d '{
"posts" : {
"properties" : {
"name" : {
"type" : "string",
"analyzer" : "my_analyz_1"
},
"message" : {
"type" : "string",
"analyzer" : "my_analyz_1"
}
}
}
}'
echo
# INSERT
curl localhost:9200/test/posts/1 -d '{name: "john", message: "hi john"}'
curl localhost:9200/test/posts/2 -d '{name: "bob", message: "hi john, how are you?"}'
curl localhost:9200/test/posts/3 -d '{name: "john", message: "bob?"}'
curl localhost:9200/test/posts/4 -d '{name: "dany", message: "hi bob"}'
curl localhost:9200/test/posts/5 -d '{name: "dany", message: "hi john"}'
echo
# REFRESH
curl -XPOST localhost:9200/test/_refresh
echo
# SEARCH
curl "localhost:9200/test/posts/_search?pretty=1" -d '{
"query": {
"multi_match": {
"query": "john",
"fields": ["name^2", "message"]
}
}
}'
Im not sure if this is relevant in this case, but when testing with such small amounts of data, I always use 1 shard instead of default settings to ensure no issues because of distributed calculation.

Resources