Bulk index text field containing new lines with cURL - elasticsearch

I am trying to bulk index a file with the following format to my elasticsearch index:
{"index":{"_index":"articles","_type":"_doc"}}
{"title":"My Article Title","text":"My article text. \nNext paragraph here."}
Using this command:
curl -s -XPOST -H 'Content-Type: application/x-ndjson' http://localhost:9200/_bulk --data-binary #/data.json
The problem is that the article text in my documents may contain new line characters \n, which breaks the formatting for a cURL bulk index, so I get this error:
{"error":{"root_cause":[{"type":"parse_exception","reason":"request body is required"}],"type":"parse_exception","reason":"request body is required"},"status":400}
I have been able to bulk index these documents using the javascript API, so I'm hoping it will be possible to do using cURL, as I want to index these documents into my docker image as a part of the build.

I've managed to do it on Elasticsearch 7.3 and Red Hat Enterprise Linux 7 (7.7).
1) Changed .json to .txt and just hit enter after the last line, saved and uploaded on server
[root#host tmp]$ mv data.json data.txt
2) Forced curl to append new line to output
[root#host tmp]$ echo '-w "\n"' >> ~/.curlrc
3) Curled to ES:
[root#host tmp]$ curl -s -XPOST -H 'Content-Type: application/x-ndjson' https://localhost:9200/_bulk -k -u user:pass --data-binary #data.json
{"took":4,"errors":false,"items":[{"index":{"_index":"articles","_type":"_doc","_id":"QdsosG0B3nqkAGly3E6t","_version":1,"result":"created","_shards":{"total":2,"successful":2,"failed":0},"_seq_no":1,"_primary_term":1,"status":201}}]}
4) Result:
[root#host tmp]$ curl -XGET -H 'Content-Type: application/x-ndjson' https://localhost:9200/articles/_search?pretty -k -u user:pass
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "articles",
"_type" : "_doc",
"_id" : "QdsosG0B3nqkAGly3E6t",
"_score" : 1.0,
"_source" : {
"title" : "My Article Title",
"text" : "My article text. \nNext paragraph here."
}
}
]
}
}

Related

Why do I have to PUT new documents to a nested URI, if mapping types have been removed?

I'm on Elasticsearch 7.14.0 where mapping types have been removed.
If I run the following:
curl -X PUT "localhost:9200/products/1?pretty" -H 'Content-Type: application/json' -d'
{
"name": "Toast"
}
'
I get
{
"error" : "Incorrect HTTP method for uri [/products/1?pretty] and method [PUT], allowed: [POST]",
"status" : 405
}
It seems that elastic wants me PUT it in an /index/type/ URI:
curl -X PUT "localhost:9200/pop/products/1?pretty" -H 'Content-Type: application/json' -d'
{
"name": "Toast"
}
'
{
"_index" : "pop",
"_type" : "products",
"_id" : "1",
"_version" : 1,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 0,
"_primary_term" : 1
}
I am wondering why I must have a nested URI indicating a type, if mapping types have been removed?
You have to add _doc to your put request call as shown below
curl -X PUT "localhost:9200/products/_doc/1?pretty" -H 'Content-Type: application/json' -d'
{
"name": "Toast"
}
'
As mentioned in elasticsearch official documentation after mapping types were removed in 7.x, you need to add , _doc (which does not represent a document type rather it represents the endpoint name) for the document index, get, and delete APIs

Elasticsearch Bulk API using curl and text file

I'm a beginner with Elasticsearch and am following an "Essential Training" in LinkedIn Learning. I'm trying to follow with bulk loading API and the instructor is using Linux, I'm on Windows. He created a text file to read in with data using "VI". I just created a text file and pasted the data and removed the ".txt". The contents of the file, called reqs, is this:
{
"index":{
"_index":"my-test",
"_type":"my-type",
"_id":"1"
}
}{
"col1":"val1"
}{
"index":{
"_index":"my-test",
"_type":"my-type",
"_id":"2"
}
}{
"col1":"val2"
}{
"index":{
"_index":"my-test",
"_type":"my-type",
"_id":"3"
}
}{
"col1":"val3"
}
I've tried saving it with a carriage return (new line) after the last line and without. I saved this into my elasticsearch folder (C:\elasticsearch-7.12.0) which is the same directory I'm running the following command from:
c:\elasticsearch-7.12.0>curl -s -H "Content-Type: application/x-ndjson" -XPOST localhost:9200/_bulk --data-binary "#reqs"; echo
When I do this, I'm getting the following error:
{"error":{"root_cause":[{"type":"parse_exception","reason":"request body is required"}],"type":"parse_exception","reason":"request body is required"},"status":400}
Use this below curl command
curl -H 'Content-Type: application/x-ndjson' -XPOST 'localhost:9200/index-name/_bulk?pretty' --data-binary #reqs.json
reqs.json should look like this
{"index" : {"_index" : "my-test", "_type" : "my-type", "_id" : "1"}}
{"col1" : "val1"}
{"index" : {"_index" : "my-test", "_type" : "my-type", "_id" : "2"}}
{"col1" : "val2"}
{"index" : {"_index" : "my-test", "_type" : "my-type", "_id" : "3"}}
{"col1" : "val3"}

How can curl perform a get request with a data payload?

The introductory materials on ElasticSearch include the following example curl request:
curl -X GET "localhost:9200/_search?pretty" -H 'Content-Type: application/json' -d'
{
"query": {
"query_string" : {
"query" : "(new york city) OR (big apple)",
"default_field" : "content"
}
}
}
'
This request has two parameter which I thought were incompatible:
-X GET, which specifies that the request is a GET.
-d [...], which specifies that the request has a data payload.
I thought that specifying a data payload was only possible in a PUT or POST requests, because GET requests do not have any concept of a data payload. Is this a valid curl command? What does it do, exactly?
Above curl request is a valid request, in fact, if you have index and data, then you can check the output of your command.
I tried it in my system and ES index and it gave me proper response.
curl -v -X GET "localhost:9500/querytime/_search?pretty" -H 'Content-Type: application/json' -d'
{
"query": {
"query_string" : {
"query" : "(avengers) OR (big apple)",
"default_field" : "movie_name"
}
}
}'
* Trying ::1...
* TCP_NODELAY set
* Connected to localhost (::1) port 9500 (#0)
> GET /querytime/_search?pretty HTTP/1.1
> Host: localhost:9500
> User-Agent: curl/7.64.1
> Accept: */*
> Content-Type: application/json
> Content-Length: 156
>
* upload completely sent off: 156 out of 156 bytes
< HTTP/1.1 200 OK
< content-type: application/json; charset=UTF-8
< content-length: 905
<
{
"took" : 4,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 0.14874382,
"hits" : [
{
"_index" : "querytime",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.14874382,
"_source" : {
"movie_name" : "Avengers: Infinity War"
}
}
]
}
}
As mentioned in the official manual of curl command, if you are using *nix based system, then you can search below in the manual of curl.
-G, --get
When used, this option will make all data specified with -d, --data, --data-binary or --data-urlencode to be used in an
HTTP GET request instead of the POST request that otherwise would be
used. The
data will be appended to the URL with a '?' separator
As explained in this SO answer, it also depends on the web-server to parse the body in GET request.

Elasticsearch completion : strange behavior when multiple matches per document

When I use the completion type inside a suggest as described in the ElasticSearch documentation (https://www.elastic.co/guide/en/elasticsearch/reference/6.7/search-suggesters-completion.html), I do not manage to get all the matching words (I only get one matching word per document)
I test the following commands on my ElasticSearch 6.7.2 (which is the latest available on AWS at this moment) :
Deleting the index in case it exists
curl http://localhost:9200/test -H 'Content-Type: application/json' -X DELETE
Creating the index
curl http://localhost:9200/test -H 'Content-Type: application/json' -X PUT -d '
{
"mappings": {
"page": {
"properties": {
"completion_terms": {
"type": "completion"
}
}
}
}
}
'
Indexing a document
curl http://localhost:9200/test/_doc/1 -H 'Content-Type: application/json' -X PUT -d '
{
"completion_terms": ["restaurant", "restauration", "réseau"]
}'
Check the document exists
curl http://localhost:9200/test/_doc/1 -H 'Content-Type: application/json'
Use the completion
curl -X GET "localhost:9200/test/_search?pretty=true" -H 'Content-Type: application/json' -d'
{
"_source": ["suggestExact"],
"suggest": {
"suggestExact" : {
"prefix" : "res",
"completion" : {
"field" : "completion_terms"
}
}
}
}
'
The result is :
{
"took" : 6,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : 0.0,
"hits" : [ ]
},
"suggest" : {
"suggestExact" : [
{
"text" : "res",
"offset" : 0,
"length" : 3,
"options" : [
{
"text" : "restaurant",
"_index" : "test",
"_type" : "page",
"_id" : "1",
"_score" : 1.0,
"_source" : { }
}
]
}
]
}
}
I'd like to get ALL the matching words (here, I get at most one result per document)
In the example, "restauration" and "réseau" are missing
Am I doing something wrong ?
After many searches, I found that this is the intended behavior (that is to "suggest documents", instead of "suggest terms")
Especially, see https://github.com/elastic/elasticsearch/issues/31738
However, I still do not manage to achieve "suggest terms" even with the term suggester which seems to be the correct way (https://www.elastic.co/guide/en/elasticsearch/reference/6.7/search-suggesters-term.html)

Unable to search attachment type field in an ElasticSearch indexed document

Search does not return any results although I do have a document that should match the query.
I do have the ElasticSearch mapper-attachments plugin installed per https://github.com/elasticsearch/elasticsearch-mapper-attachments. I have also googled the topic as well as browsed similar questions in stack overflow, but have not found an answer.
Here's what I typed into a windows 7 command prompt:
c:\Java\elasticsearch-1.3.4>curl -XDELETE localhost:9200/tce
{"acknowledged":true}
c:\Java\elasticsearch-1.3.4>curl -XPUT localhost:9200/tce
{"acknowledged":true}
c:\Java\elasticsearch-1.3.4>curl -XPUT localhost:9200/tce/contact/_mapping -d{\"
contact\":{\"properties\":{\"my_attachment\":{\"type\":\"attachment\"}}}}
{"acknowledged":true}
c:\Java\elasticsearch-1.3.4>curl -XPUT localhost:9200/tce/contact/1 -d{\"my_atta
chment\":\"SGVsbG8=\"}
{"_index":"tce","_type":"contact","_id":"1","_version":1,"created":true}
c:\Java\elasticsearch-1.3.4>curl localhost:9200/tce/contact/_search?pretty
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "tce",
"_type" : "contact",
"_id" : "1",
"_score" : 1.0,
"_source":{"my_attachment":"SGVsbG8="}
} ]
}
}
c:\Java\elasticsearch-1.3.4>curl localhost:9200/tce/contact/_search?pretty -d{\"
query\":{\"term\":{\"my_attachment\":\"Hello\"}}}
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : null,
"hits" : [ ]
}
}
Note that the base64 encoded value of "Hello" is "SGVsbG8=", which is the value I have inserted into the "my_attachment" field of the document.
I am assuming that the mapper-attachments plugin has been deployed correctly because I don't get an error executing the mapping command above.
Any help would be greatly appreciated.
What analyzer is running against the my_attachment field?
if it's the standard analyser (can't see any listed) then the Hello in the text will be made lowercase in the index.
i.e. when doing a term search (which doesn't have an analyzer on it) - try searching for hello
curl localhost:9200/tce/contact/_search?pretty -d'
{"query":
{"term":
{"my_attachment":"hello"
}}}'
you can also see which terms have been added to the index:
curl 'http://localhost:9200/tce/contact/_search?pretty=true' -d '{
"query" : {
"match_all" : { }
},
"script_fields": {
"terms" : {
"script": "doc[field].values",
"params": {
"field": "my_attachment"
}
}
}
}'

Resources