Error handling with curl and elasticsearch - bash

I'm currently developing bash scripts that use elasticsearch and I need a good error-handling.
In this situation I try to add a document to elasticsearch and check if the operation succeeded.
At first I naively tried this :
response=$(curl -XPOST 'http://localhost:9200/indexation/document' -d '
{
"content":"'"$txt"'",,
"date_treatment":"'"$(date +%Y-%m-%d)"'"
}') && echo ok || echo fail
But curl doesn't work that way and still returns success (0 - which is actually logical) even though the json request is obviously incorrect (note the double comma on line 3) and elasticsearch displays errors.
So the answer isn't there. Now I think I should analyze the variable $response to catch errors (grep ?). I post this question to get hints or solutions on the way to do this in a reliable way and to make sure I'm not missing an obvious solution (maybe a curl option I don't know ?).
Additional useful things
Parsing JSON with Unix tools
Examples of the content of $response :
success :
{
"_id": "AVQz7Fg0nF90YvJIX_2C",
"_index": "indexation",
"_shards": {
"failed": 0,
"successful": 1,
"total": 1
},
"_type": "document",
"_version": 1,
"created": true
}
error :
{
"error": {
"caused_by": {
"reason": "json_parse_exception: Unexpected character (',' (code 44)): was expecting either valid name character (for unquoted name) or double-quote (for quoted) to start field name\n at [Source: org.elasticsearch.common.io.stream.InputStreamStreamInput#139163f; line: 3, column: 17]",
"type": "json_parse_exception"
},
"reason": "failed to parse",
"root_cause": [
{
"reason": "json_parse_exception: Unexpected character (',' (code 44)): was expecting either valid name character (for unquoted name) or double-quote (for quoted) to start field name\n at [Source: org.elasticsearch.common.io.stream.InputStreamStreamInput#139163f; line: 3, column: 17]",
"type": "json_parse_exception"
}
],
"type": "mapper_parsing_exception"
},
"status": 400
}

A simple workaround is to use the -f/--fail option.
As per documentation :
(HTTP) Fail silently (no output at all) on server errors. This is
mostly done to better enable scripts etc to better deal with failed
attempts. In normal cases when an HTTP server fails to deliver a
document, it returns an HTML document stating so (which often also
describes why and more). This flag will prevent curl from outputting
that and return error 22.
This method is not fail-safe and there are occasions where
non-successful response codes will slip through, especially when
authentication is involved (response codes 401 and 407).
example:
response=$(curl -XPOST 'http://localhost:9200/indexation/document' -d '
{
"content":"'"$txt"'",,
"date_treatment":"'"$(date +%Y-%m-%d)"'"
}' -f ) && echo ok || echo fail

Related

Curl/GraphQL command failing with 200

I am trying to write a shell script that executes a curl against a GraphQL API and I've never interacted with GQL before. I am getting some strange errors and although I understand this community doesn't have access to the GQL server I was hoping someone could take a look at the script and make sure I'm not doing anything flagrantly wrong syntax-wise (both in the shell script layer as well as the GQL query itself).
My script:
#!/bin/bash
BSEE_WEB_SERVER_DNS=https://mybsee.example.com
BSEE_API_KEY=abc123
siteId=1
scanConfigId=456
runScanQuery='mutation CreateScheduleItem { create_schedule_item(input: {site_id: "$siteId" scan_configuration_ids: "$scanConfigId"}) { schedule_item { id } } }'
runScanVariables='{ "input": "site_id": $scanId }}'
runScanOperationName='CreateScheduleItem'
curl -i --request POST \
--url $BSEE_WEB_SERVER_DNS/graphql/v1 \
--header "Authorization: $BSEE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{"query":"$runScanQuery","variables":{$runScanVariables},"operationName":"${runScanOperationName}"}'
And the output when I run the script off the terminal:
HTTP/2 200
<OMITTED RESPONSE HEADERS>
{"errors":[{"message":"Invalid JSON : Unexpected character (\u0027$\u0027 (code 36)): was expecting double-quote to start field name, Line 1 Col 38","extensions":{"code":3}}]}%
I am omitting the HTTP response headers for security and brevity reasons.
I am wondering if my use of quotes/double-quotes is somehow wrong, or if there is anything about the nature of the GQL query itself (via curl) that looks off to anyone.
I verified with the team that manages the server that the HTTP 200 OK response code is correct. 200 shows that the request succeeded to the GQL API, but that GQL is responding with this error to indicate the query itself is incorrect.
We need to modify the GraphQL bits and fix the bash string quoting.
runScanQuery GraphQL operation
Fix the GraphQL syntax. Use a GraphQL operation name CreateScheduleItem with variables $site_id in the arguments input: { site_id: $siteId, scan_configuration_ids: $scanConfigId:
mutation CreateScheduleItem($site_id: String!, $scanConfigId: String!) {
create_schedule_item(
input: { site_id: $siteId, scan_configuration_ids: $scanConfigId }
) {
schedule_item {
id
}
}
}
runScanVariables: JSON
Our mutation expects two variables, which GraphQL will substitute into CreateScheduleItem($site_id: String!, $scanConfigId: String!). Provide the GraphQL variables as JSON. Here is the expected output after bash variable substitution:
{ "$site_id": "1", "$scanConfigId": "456" }
Get the bash quoting right
Finally, translate the inputs into bash-friendly syntax:
runScanQuery='mutation CreateScheduleItem($site_id: String!, $scanConfigId: String!) { create_schedule_item(input: {site_id: $siteId scan_configuration_ids: $scanConfigId}) { schedule_item { id } } }'
runScanVariables='{"$site_id":"'"$siteId"'","$scanConfigId":"'"$scanConfigId"'"}' # no spaces!
runScanOperationName='CreateScheduleItem'
data='{"query":"'"$runScanQuery"'","variables":'$runScanVariables',"operationName":"'"$runScanOperationName"'"}'
Check our bash formats. Paste the terminal output into a code-aware editor like VSCode. Expect the editor to parse the output correctly.
echo $runScanQuery # want string in graphql format
echo $runScanVariables # want JSON
echo $data # want JSON
Edit: add a public API example
Here's a complete working example using the public Star Wars API:
#!/bin/bash
filmId=1
data='{"query":"query Query($filmId: ID) { film(filmID: $filmId) { title }}","variables":{"filmId":"'"$filmId"'"}}'
curl --location --request POST 'https://swapi-graphql.netlify.app/.netlify/functions/index' \
--header 'Content-Type: application/json' \
--data "$data"
Responds with {"data":{"film":{"title":"A New Hope"}}}.
In GraphQL it's normal to always have 200 status code; client must check response body searching for failures.
The reason is simple: In REST, http is part of the protocol and status code has semantics but in GraphQL http is not part of the protocol, you can have GraphQL over serveral transport protocols:
http: typical scenario docs
WebSocket: does not provide any "status code like" payload. sample
MQTT: does not provide any "status code like" payload
...
The only way that server tells you something (even failures) is the body.
In your case I suggest you jq to parse json via bash script searching error property.
Your error is completely unrelated to GraphQL. You really have wrong JSON.
Error message says Unexpected character (\u0027$\u0027 (code 36)): was expecting double-quote to start field name, Line 1 Col 38",
You can replace escaped \u0027 with apostrophe and you will get
Unexpected character ('$' (code 36)): was expecting double-quote to start field name, Line 1 Col 38",
So it hates dollar sign at position 38 in what you send as data to curl
data='{"query":"'"$runScanQuery"'","variables":'$runScanVariables'
^
this
First - all field names and values in JSON should be wrapped with double quotes, not single.
Second - if you want curl to expand env variable, put it to double quotes, not single.

Index with ! in their name cant be filtered for recovering

I have an ES cluster whith indices name like web.analytics.data.api!monthly!2018-07_v0 and doing regular snapshots/backups
Now, when I want to restore all of them, all works pretty well. If I want to restore just a specific index however, es wont do it. The command I use:
curl -X POST "localhost:9200/_snapshot/s3_backups/20191218_060001/_restore?pretty&wait_for_completion=true" -H 'Content-Type: application/json' -d'
{
"indices": "web.analytics.data.api!monthly!2018-07_v0",
"index_settings": {
"index.number_of_replicas": 0
}
}
'
The result I get is:
{
"snapshot" : {
"snapshot" : "20191218_060001",
"indices" : [ ],
"shards" : {
"total" : 0,
"failed" : 0,
"successful" : 0
}
}
}
Please note, that If I use index without ! in its name (e.g. .kibana), it works well. Any ideas of how I can solve that? Preferably without telling developers to rename the indices. The ES in question has version 1.7.3 I am aware it is EOL, but it is what I have to work with right now.
So it was my bad in the end. The index I got did not exist (typo in it) but I was told ! is problematic so i did not double check and the test indices were picked by me, so of course they were correct...

Backup and restore some records of an elasticsearch index

I wish to take a backup of some records(eg latest 1 million records only) of an Elasticsearch index and restore this backup on a different machine. It would be better if this could be done using available/built-in Elasticsearch features.
I've tried Elasticsearch snapshot and restore (following code), but looks like it takes a backup of the whole index, and not selective records.
curl -H 'Content-Type: application/json' -X PUT "localhost:9200/_snapshot/es_data_dump?pretty=true" -d '
{
"type": "fs",
"settings": {
"compress" : true,
"location": "es_data_dump"
}
}'
curl -H 'Content-Type: application/json' -X PUT "localhost:9200/_snapshot/es_data_dump/snapshot1?wait_for_completion=true&pretty=true" -d '
{
"indices" : "index_name",
"type": "fs",
"settings": {
"compress" : true,
"location": "es_data_dump"
}
}'
The format of backup could be anything, as long as it can be successfully restored on a different machine.
you can use _reinex API. it can take any query. after reindex, you have a new index as backup, which contains requested records. easily copy it where ever you want.
complete information is here: https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html
In the end, I fetched the required data using python driver because that is what I found the easiest for the given use case.
For that, I ran an Elasticsearch query and stored its response in a file in newline-separated format and then I later restored data from it using another python script. A maximum of 10000 entries are returned this way along with the scroll ID to be used to fetch next 10000 entries and so on.
es = Elasticsearch(timeout=30, max_retries=10, retry_on_timeout=True)
page = es.search(index=['ct_analytics'], body={'size': 10000, 'query': _query, 'stored_fields': '*'}, scroll='5m')
while len(page['hits']['hits']) > 0:
es_data = page['hits']['hits'] #Store this as you like
scrollId = page['_scroll_id']
page = es.scroll(scroll_id=scrollId, scroll='5m')

Autocompletion elasticsearch

I'm following along with the tutorial for elasticsearch's completion suggester here. It's pretty easy to get going. But I'm unable to get completions for more than one word. In the example single incomplete words give great results, e.g
"Nir" -> "options":[{"text":"Nevermind Nirvana..."
"Nev" -> "options":[{"text":"Nevermind Nirvana..."
But the following fail:
"Nirvana Nev" -> Nothing!
"Nevermind Nir" -> Nothing!
I can get it to work by populating combinatorial options e.g
curl -X PUT "localhost:9200/music/_doc/1?refresh" -H 'Content-Type: application/json' -d'
{
"suggest" : {
"input": [ "Nevermind", "Nirvana", "Nirvana Nevermind", "Nevermind Nirvana" ],
"weight" : 34
},
"title" : "Nevermind by Nirvana"
}
'
But this approach will soon lead to massive variants of text added to the input.
There must be a better way?

Couchdb view Queries

Could you please help me in creating a view. Below is the requirement
select * from personaccount where name="srini" and user="pup" order by lastloggedin
I have to send name and user as input to the view and the data should be sorted by lastloggedin.
Below is the view I have created but it is not working
{
"language": "javascript",
"views": {
"sortdatetimefunc": {
"map": "function(doc) {
emit({
lastloggedin: doc.lastloggedin,
name: doc.name,
user: doc.user
},doc);
}"
}
}
}
And this the curl command iam using:
http://uta:password#localhost:5984/personaccount/_design/checkdatesorting/_view/sortdatetimefunc?key={\"name:srini\",\"user:pup\"}
My Questions are
As sorting will be done on key and I want it on lastloggedin so I have given that also in emit function.
But iam passing name and user only as parameters. Do we need to pass all the parameters which we give it in key?
First of all I want to convey to you for the reply, I have done the same and iam getting errors. Please help
Could you please try this on your PC, iam posting all the commands :
curl -X PUT http://uta:password#localhost:5984/person-data
curl -X PUT http://uta:password#localhost:5984/person-data/srini -d '{"Name":"SRINI", "Idnum":"383896", "Format":"NTSC", "Studio":"Disney", "Year":"2009", "Rating":"PG", "lastTimeOfCall": "2012-02-08T19:44:37+0100"}'
curl -X PUT http://uta:password#localhost:5984/person-data/raju -d '{"Name":"RAJU", "Idnum":"456787", "Format":"FAT", "Studio":"VFX", "Year":"2010", "Rating":"PG", "lastTimeOfCall": "2012-02-08T19:50:37+0100"}'
curl -X PUT http://uta:password#localhost:5984/person-data/vihar -d '{"Name":"BALA", "Idnum":"567876", "Format":"FAT32", "Studio":"YELL", "Year":"2011", "Rating":"PG", "lastTimeOfCall": "2012-02-08T19:55:37+0100"}'
Here's the view as you said I created :
{
"_id": "_design/persondestwo",
"_rev": "1-0d3b4857b8e6c9e47cc9af771c433571",
"language": "javascript",
"views": {
"personviewtwo": {
"map": "function (doc) {\u000a emit([ doc.Name, doc.Idnum, doc.lastTimeOfCall ], null);\u000a}"
}
}
}
I have fired this command from curl command :
curl -X GET http://uta:password#localhost:5984/person-data/_design/persondestwo/_view/personviewtwo?startkey=["SRINI","383896"]&endkey=["SRINI","383896",{}]descending=true&include_docs=true
I got this error :
[4] 3000
curl: (3) [globbing] error: bad range specification after pos 99
[5] 1776
[6] 2736
[3] Done descending=true
[4] Done(3) curl -X GET http://uta:password#localhost:5984/person-data/_design/persondestwo/_view/personviewtwo?startkey=["SRINI","383896"]
[5] Done endkey=["SRINI","383896"]
I am not knowing what this error is.
I have also tried passing the parameters the below way and it is not helping
curl -X GET http://uta:password#localhost:5984/person-data/_design/persondestwo/_view/personviewtwo?key={\"Name\":\"SRINI\",\"Idnum\": \"383896\"}&descending=true
But I get different errors on escape sequences
Overall I just want this query to be satisfied through the view :
select * from person-data where Name="SRINI" and Idnum="383896" orderby lastTimeOfCall
My concern is how to pass the multiple parameters from curl command as I get lot of errors if I do the above way.
First off, you need to use an array as your key. I would use:
function (doc) {
emit([ doc.name, doc.user, doc.lastLoggedIn ], null);
}
This basically outputs all the documents in order by name, then user, then lastLoggedIn. You can use the following URL to query.
/_design/checkdatesorting/_view/sortdatetimefunc?startkey=["srini","pup"]&endkey=["srini","pup",{}]&include_docs=true
Second, notice I did not output doc as the value of your query. It takes up much more disk space, especially if your documents are fairly large. Just use include_docs=true.
Lastly, refer to the CouchDB Wiki, it's pretty helpful.
I just stumbled upon this question. The errors you are getting are caused by not escaping this command:
curl -X GET http://uta:password#localhost:5984/person-data/_design/persondestwo/_view/personviewtwo?startkey=["SRINI","383896"]&endkey=["SRINI","383896",{}]descending=true&include_docs=true
The & character has a special meaning on the command-line and should be escaped when part of an actual parameter.
So you should put quotes around the big URL, and escape the quotes inside it:
curl -X GET "http://uta:password#localhost:5984/person-data/_design/persondestwo/_view/personviewtwo?startkey=[\"SRINI\",\"383896\"]&endkey=[\"SRINI\",\"383896\",{}]descending=true&include_docs=true"

Resources