What is causing the Elasticsearch bulk load to fail? - bash

I can't figure out why I can't bulk load Elasticsearch with JSON. I've done this before but this time I am totally stumped.
I have processed a set of JSON documents into Elastic Bulk Load Format and am trying to bulk load the index I just created (verified created, can be queried, and is empty).
{"create": {"_id": "ef68e997-c616-4b0b-b08e-dfc09f8cb08f"}}
{"id": "ef68e997-c616-4b0b-b08e-dfc09f8cb08f", "title": "My document"}
... repeats for all records
The command I run uses a list of paths to the JSON bulk files and a loop to curl/POST them to Elastic using credentials:
while IFS= read -r "path" < "${DOC_LIST_PATH}"
do
echo "Submitting Elastic formatted docs at ${path} to Elastic index 'docs' ..."
curl \
-X POST \
-H "Content-Type: application/x-ndjson" \
"https://${ES_USER}:${ES_PASSWD}#${ES_HOSTNAME}:${ES_PORT}/docs/_bulk" \
--data-binary "#${path}"
done
I've done all this before and it should work but... it doesn't. I get this error instead:
Submitting Elastic formatted docs at data/docs.json/part-00000.json to Elastic index 'docs' ...
Warning: Couldn't read data from file
Warning: "data/docs.json/part-00000.json",
Warning: this makes an empty POST.
{"error":{"root_cause":[{"type":"parse_exception","reason":"request body is required"}],"type":"parse_exception","reason":"request body is required"},"status":400}
... repeats for all files
I have found that the problem is with this bash code, not the data or the bulk load request:
--data-binary "#${path}"
If I replace that with this, it works:
--data-binary "#data/docs.json/part-00000.json"
Making the full working command for a single file:
curl -X POST -H "Content-Type: application/x-ndjson" "https://${ES_USER}:${ES_PASSWD}#${ES_HOSTNAME}:${ES_PORT}/docs/_bulk" --data-binary "#data/docs.json/part-00000.json"
But I need to script this, so this is still maddening. Please help!
This example is also in a gist here

Related

Not able to extract Json Data in Apache-NiFi

I have written a below POST command and using "HandleHttpRequest" processor to receive the POST request in Apache NiFi
curl -v -H "Accept: application/json" -H "Content-type: application/json" -X POST -d '{"employeeDetails":{"empid":"124","empname": "praveen"}}' http://localhost:7002
I am able to receive the json data in "handleHttpRequest" processor as shown below
when I check the list queue I am able to see the json data
HandleHttpProcessor details
But I want to extract empid and check whether empid of my json data is null or not,I tried
"ExtractText","ReplaceText","UpdateAttribute","EvaluateJsonPath" etc Processors to fetch empolyee details but I am unable to do it.
EvaluateJson path details
I am getting "flowfile did not have a valid JSON content" error in EvaluateJsonPath processor
How do I extract empdata and check whether its null or not?
The problem is not related to NiFi. You should post data with CURL like this (change double quote " to single quote ' after -d):
curl -v -H "Accept: application/json" -H "Content-type: application/json" -X POST -d '{"employeeDetails":{"empid":"124","empname": "praveen"}}' http://localhost:7002
I have the exact same two processors you have, additionally, I also have a HandleHTTPResponse added so that my curl command exits neatly without sending additional buffer messages that make the EvaluateJSONPath component fail with invalid JSON error (My guess is this could have been your case as well).
HTTP Request Flow
Also as #Behrouz Seyedi mentioned you would need to use a single quote in your command. This is my curl command
curl -v -H "Content-type: application/json" -X POST -d '{"employeeDetails":{"empid":"124","empname": "praveen"}}' http://localhost:7003
This is the screenshot to the EvaluateJSONPath processor.
This is the response of the EvaluateJSONPath
empid

Deleting a type in Elastic Search using curl

I am trying to delete a type in elastic search using curl script in bat file
ECHO Running Curl Script
curl -XDELETE "http://localhost/testing/" -d''
pause
The response that i got was No handler found for uri . I looked into documentation of Elastic Search and it says to use delete by query https://www.elastic.co/guide/en/elasticsearch/reference/5.0/docs-delete-by-query.html
How can i modify the my curl script to use this new api for ES 2.3
Thanks
If you want to use the delete-by-query API to delete all documents of a given type, you can do it like this:
curl -XDELETE "http://localhost/testing/_query?q=_type:typename"
However, you're better off deleting the index and recreating it so you can modify the mapping type as you see fit.
curl -XDELETE "http://localhost/testing/"
curl -XPUT "http://localhost/testing/" -d '{"settings": {...}, "mappings": {...}}'

Load GeoJSON file into Apache CouchDB

I am working on Windows10 and tried to load a geojson file into my couchdb via the "curl" command and a POST request in the cmd which looks like that:
C:\Program Files\cURL\bin>curl -d #path-to-my-data\data.geojson -H "Content-type: application/json" -X POST http://127.0.0.1:5984/_utils/database.html?-dbName-
and then I get the following error:
{"error":"method_not_allowed","reason":"Only GET,HEAD allowed"}
On http://couchdb-13.readthedocs.org/en/latest/api-basics/ it is said, that "If you use the an unsupported HTTP request type with a URL that does not support the specified type, a 405 error will be returned, listing the supported HTTP methods."
When I try that with a PUT request, I get the same error.
I validated the json with jsonlint so this should not be the problem.
I tried several tutorials like "Three Steps to CouchDB Heaven …" or "Export & Import a Database with CouchDB" but none of them seems to work.
So I am not sure, where the problem is. Do I need to make changes in my geojson file, or something else?
thanks for your help
The needed curl command just looks like that:
curl -H "Content-Type: application/json" -X POST http://localhost:5984/db -d #C:\Users\Name\Desktop\data.geojson

Update to-many association

Having a many-to-many relationship between users and groups. I would like to know how to update this relationship with SDR. This is what I've tried so far after reading the docs.
curl -X POST -H 'Content-Type: text/uri-list' -d 'http://localhost:8080/rest/users/5' http://localhost:8080/rest/groups/1/users
Expected result: Add user 5 to group 1.
Actual result: 405 Method Not Allowed.
curl -X PUT -H 'Content-Type: text/uri-list' -d 'http://localhost:8080/rest/users/5' http://localhost:8080/rest/groups/1/users
Expected result: Replace all members of group 1 with user 5.
Actual result: Works as expected.
curl -X PUT -H 'Content-Type: text/uri-list' -d #members.txt http://localhost:8080/rest/groups/1/users
Where the file members.txt has:
http://localhost:8080/rest/users/5
http://localhost:8080/rest/users/6
http://localhost:8080/rest/users/7
Expected result: Replace all members of group 1 with the users 5, 6 and 7.
Actual result: Only last user (in this case 7) gets added.
Could someone provide an example on how to ADD a single URI to an association?. Also if possible, how to add or replace an association with multiple URIs?
After re-reading the documentation, it does indeed say POST should add to the collection.
My experience has been to use PATCH to add to the collection.
To further the answer: You should be able to use PUT CONTENT-TYPE: text/uri-list with a content body having multiple URIs. Each URI is separated by a line break "\n"
Try this:
curl -v -X POST -H "Content-Type: text/uri-list" -d "http://localhost:8080/rest/users/5" http://localhost:8080/rest/groups/1/users

Bash curl POST a binary variable

How do you POST a binary variable in curl bash?
#!/usr/bin/env bash
IMAGE=$(curl "http://www.google.com/images/srpr/logo3w.png")
curl --data-binary "$IMAGE" --request "POST" "http://www.somesite.com"
Curl seems to do corrupt the image when uploading.
Curl has the option to write response to disk and then read from it, but it'd be more efficient to do it solely in memory.
Try to eliminate the variable ... as follows:
curl "http://www.google.com/images/srpr/logo3w.png" | curl --data-binary - --request "POST" "http://www.somesite.com"
From the curl man page:
If you start the data with the letter #, the rest should be a file name to read the data from, or - if you want curl to read the data from stdin.
EDIT: From the man page, too:
--raw When used, it disables all internal HTTP decoding of content or transfer encodings and instead makes them passed on unaltered, raw. (Added in 7.16.2)
What happens, if applied on either or both sides?
I had a related problem, where I wanted to dynamically curl a file from a given folder.
curl --data-binary directory/$file --request "POST" "http://www.somesite.com"
did not work - uploaded the string "directory/myFile.jar" instead of the actual file.
Adding the # symbol
curl --data-binary #directory/$file --request "POST" "http://www.somesite.com" fixed it.

Resources