base64 encode and decode of aws command to retrieve some fields - bash

The below code is producing the expected results with username.
es_eh="$(aws cloudtrail --region us-east-1 lookup-events --lookup-attributes AttributeKey=EventSource,AttributeValue=route53.amazonaws.com --max-items 50 --start-time "${start_date}" --end-time "${end_date}" --output json)"
for row in $(echo "${es_eh}" | jq -r '.Events[] | #base64'); do
echo "${row}" | base64 --decode | jq -r '.Username'
done
I didn't understand the purpose of doing base64 encode and then doing decode of the same string inside loop to retrieve username.
This is not working when I remove base64 encode and decode.
for row in $(echo "${es_eh}" | jq -r '.Events[]'); do
echo "${row}"| jq -r '.Username'
done

Without the encoding, the output of the first jq is more than one row. The loop iterates over the lines and fails, as none of them contains a valid JSON. With the | #base64, each subobject is encoded to a single row, inflated back to a full JSON object by base64 --decode.
To see the rows, try outputting $row before processing it.

When you use $( ) without quotes around it, the result gets split into "words", but the shell's definition of a "word" is almost never what you want (and certainly has nothing to do with the json entries you want it split into). This sort of thing is why you should almost never use unquoted expansions.
Converting the output entries to base64 makes them wordlike enough that shell word splitting actually does the right thing. But note: some base64 encoders split their output into lines, which would make each line be treated as a separate "word". If jq's base64 encoding did this, this code would fail catastrophically on large events.

Transforming the for loop into a while loop should fix the problem :
while read -r row; do
echo "${row}" | jq -r '.Username'
done < <(echo "${es_eh}" | jq -c -r '.Events[]')
Note that in the outer jq, I used option -c to put output in a single ine.

Related

Difference between slurp, null input, and inputs filter

Given the input document:
{"a":1}
{"b":2}
{"c":3,"d":4}
What is the difference between the following jq programs (if any)? They all seem to produce the same output.
jq '[., inputs] | map(to_entries[].value)'
jq -n '[inputs] | map(to_entries[].value)'
jq -s 'map(to_entries[].value)'
In other words, the following (simplified/reduced) invocations seem identical:
jq '[.,inputs]'
jq -n '[inputs]'
jq -s '.'.
How are they different? Are there scenarios where one works, but the others don't? Did older versions of jq not support all of them? Is it performance related? Or simply a matter of readability and personal preference?
Bonus points (added later to the question): does the same hold true for the following programs?
jq '., inputs | to_entries[].value'
jq -n 'inputs | to_entries[].value'
jq -s '.[] | to_entries[].value'
jq 'to_entries[].value'
With jq '-n [inputs] ....' and jq '[.,inputs] ....', you are loading the whole file into memory.
A more memory-efficient way to achieve the result as an array is:
jq -n '[inputs | to_entries[].value]'
Those first three programs are equivalent, both functionally and in terms of resource utilization, but they obscure the difference between array-oriented and stream-oriented programming.
In a nutshell, think sed and awk. For more details, see e.g. my A Stream-oriented Introduction to jq, and i.p. the section On the importance of inputs.
Bonus points: does the same hold true for the following programs:
Referring to the last four numbered examples in the Q: (4), (5) and (7) are essentially equivalent; (6) is just silly.
If you're looking for a reason why all these variations exist, please bear in mind that input and inputs were late additions in the development of jq. Perhaps they were late additions because jq was originally envisioned as a very simple and for the most part "purely functional" language.
Adding even more cases for the sake of completeness:
From the manual:
--raw-input/-R:
Don't parse the input as JSON. Instead, each line of text is passed to the filter as a string. If combined with --slurp, then the entire input is passed to the filter as a single long string.
This means that on one hand
jq -R -n '[inputs]' and
jq -R '[., inputs]'
both produce an array of strings, as each item provided by inputs (and . if it wasn't silenced by -n) corresponds to a line of text from the input document(s), whereas on the other hand
jq -R -s '.'
slurps all characters from the input document(s) into exactly one long string, newlines included.

Jq: How to ignore whitespaces in keys & values

When making query using Jq Play for the provided json the output looks as expected Demo. But when I try the same query in shell Script & iterate the object I see new row got added because of whitespace between sentence.
Query:
query=$(cat $basename/test.json | jq -r '.DesignCode | to_entries[] | "\(.key):\(.value)"')
for i in $query
do
printf "$i"
done
used in the shell script
Output ScreenShot
What is the correct way to write the query?
I'm not sure about the output of your command but, in my experience, shell is a bit confusing when it comes to creating arrays from strings.
A useful workaround I use a lot is forcing shell to recognize the output as an array by compound assignment:
query=( $(cat $basename/test.json | jq -r '.DesignCode | to_entries[] | "\(.key):\(.value)"') )
for i in $query
do
printf "$i \n"
done

Generated SHA256 value is different from the expected SHA256 value

The following steps are to be performed:
Do base64 decode on ASCII value.
To 'chirp' append the decoded value.
Generate sha256 on the "chirp<decoded_value>"
#!/bin/sh
a=$(echo MDkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDIgADAoR0WUZBTAkZv0Syvt+g5wGpb/HYHh22zAxCNP+ryTQ=|base64 -d)
b="chirp$a"
echo $b
echo -n $b | sha256sum
I am getting a value of:
f62e19108cfb5a91434f1bba9f5384f9039857743aa2c0707efaa092791e4420
But the expected value is:
6a29cb4....
Am I missing anything?
For binary data, as the base64-decoded data that your dealing with, I would not rely too much on echo, but just pipe the stuff, like this:
<<<'MDkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDIgADAoR0WUZBTAkZv0Syvt+g5wGpb/HYHh22zAxCNP+ryTQ=' base64 -d | cat <(echo -n chirp) - | sha256sum
That gives me the result that you expect, 6a29cb438954e8c78241d786af874b1c7218490d3024345f6e11932377a932b6 .
Here, cat gets two file descriptors as arguments, the first one streaming the word "chirp" and the second one forwarding the stdout of the previous command (base64 -d)

Parsing kinesis data using bash, jq, sed

Im hoping to walk through some kinesis data using bash. Using a cmd like:
aws kinesis get-records --shard-iterator <long shard info> | jq '[.|.Records[].Data]' | grep \"ey | sed -e 's/^[ \t]*\"//;s/[ \t]*\",$//'
I can get the base64 data from the stream. What Im having issues with is piping this through base64 so I can see the actual data.
If I send it through using a combination of head -n and tail I can see individual values but any attempt to pass through more than 2-3 lines fails. Errors are typically one set of JSON values followed by garbage data. The whole command is typically preceded by
Invalid character in input stream.
To see the json values I use <long bash command from above> | xargs base64 -D
-- Caveat: Using bash on OSX
This works (assuming you've copied the base64 data to a file):
while IFS= read -r line; do echo $line | base64 -D && printf "\n"; done < <infile>
I have developed Kines - friendly CLI for Amazon Kinesis Data Stream. This can be useful for your debugging purpose.
You can install it using pip.
pip install kines
Then you can run kines walk command on stream and shard to view decoded data.
kines walk <stream-name> <shard-id>
Demo:

Parse JQ output through external bash function?

I want to parse out data out of a log file which consist of JSON sting and I wonder if there's a way for me to use a bash function to perform any custom parsing instead of overloading jq command.
Command:
tail errors.log --follow | jq --raw-output '. | [.server_name, .server_port, .request_file] | #tsv'
Outputs:
8.8.8.8 80 /var/www/domain.com/www/public
I want to parse 3rd column to cut the string to exclude /var/www/domain.com part where /var/www/domain.com is the document root, and /var/www/domain.com/subdomain/public is the public html section of the site. Therefore I would like to leave my output as /subdomain/public (or from the example /www/public).
I wonder if I can somehow inject a bash function to parse .request_file column? Or how would I do that using jq?
I'm having issues piping out the output of any part of this command that would allow me to do any sort of string manipulation.
Use a BashFAQ #1 while read loop to iterate over the lines, and a BashFAQ #100 parameter expansion to perform the desired modifications:
tail -f -- errors.log \
| jq --raw-output --unbuffered \
'[.server_name, .server_port, .request_file] | #tsv' \
| while IFS=$'\t' read -r server_name server_port request_file; do
printf '%s\t%s\t%s\n' "$server_name" "$server_port" "/${request_file#/var/www/*/}"
done
Note the use of --unbuffered, to force jq to flush its output lines immediately rather than buffering them. This has a performance penalty (so it's not default), but it ensures that you get output immediately when reading from a potentially-slow input source.
That said, it's also easy to remove a prefix in jq, so there's no particular reason to do the above:
tail -f -- errors.log | jq -r '
def withoutPrefix: sub("^([/][^/]+){3}"; "");
[.server_name, .server_port, (.request_file | withoutPrefix)] | #tsv'

Resources