Bash: Issue when iterating string with lines [duplicate] - bash

I have a JSON data as follows in data.json file
[
{"original_name":"pdf_convert","changed_name":"pdf_convert_1"},
{"original_name":"video_encode","changed_name":"video_encode_1"},
{"original_name":"video_transcode","changed_name":"video_transcode_1"}
]
I want to iterate through the array and extract the value for each element in a loop. I saw jq. I find it difficult to use it to iterate. How can I do that?

Just use a filter that would return each item in the array. Then loop over the results, just make sure you use the compact output option (-c) so each result is put on a single line and is treated as one item in the loop.
jq -c '.[]' input.json | while read i; do
# do stuff with $i
done

By leveraging the power of Bash arrays, you can do something like:
# read each item in the JSON array to an item in the Bash array
readarray -t my_array < <(jq --compact-output '.[]' input.json)
# iterate through the Bash array
for item in "${my_array[#]}"; do
original_name=$(jq --raw-output '.original_name' <<< "$item")
changed_name=$(jq --raw-output '.changed_name' <<< "$item")
# do your stuff
done

jq has a shell formatting option: #sh.
You can use the following to format your json data as shell parameters:
cat data.json | jq '. | map([.original_name, .changed_name])' | jq #sh
The output will look like:
"'pdf_convert' 'pdf_convert_1'"
"'video_encode' 'video_encode_1'",
"'video_transcode' 'video_transcode_1'"
To process each row, we need to do a couple of things:
Set the bash for-loop to read the entire row, rather than stopping at the first space (default behavior).
Strip the enclosing double-quotes off of each row, so each value can be passed as a parameter to the function which processes each row.
To read the entire row on each iteration of the bash for-loop, set the IFS variable, as described in this answer.
To strip off the double-quotes, we'll run it through the bash shell interpreter using xargs:
stripped=$(echo $original | xargs echo)
Putting it all together, we have:
#!/bin/bash
function processRow() {
original_name=$1
changed_name=$2
# TODO
}
IFS=$'\n' # Each iteration of the for loop should read until we find an end-of-line
for row in $(cat data.json | jq '. | map([.original_name, .changed_name])' | jq #sh)
do
# Run the row through the shell interpreter to remove enclosing double-quotes
stripped=$(echo $row | xargs echo)
# Call our function to process the row
# eval must be used to interpret the spaces in $stripped as separating arguments
eval processRow $stripped
done
unset IFS # Return IFS to its original value

From Iterate over json array of dates in bash (has whitespace)
items=$(echo "$JSON_Content" | jq -c -r '.[]')
for item in ${items[#]}; do
echo $item
# whatever you are trying to do ...
done

Try Build it around this example. (Source: Original Site)
Example:
jq '[foreach .[] as $item ([[],[]]; if $item == null then [[],.[0]] else [(.[0] + [$item]),[]] end; if $item == null then .[1] else empty end)]'
Input [1,2,3,4,null,"a","b",null]
Output [[1,2,3,4],["a","b"]]

None of the answers here worked for me, out-of-the-box.
What did work was a combination of a few:
projectList=$(echo "$projRes" | jq -c '.projects[]')
IFS=$'\n' # Read till newline
for project in ${projectList[#]}; do
projectId=$(jq '.id' <<< "$project")
projectName=$(jq -r '.name' <<< "$project")
...
done
unset IFS
NOTE: I'm not using the same data as the question does, in this example assume projRes is the output from an API that gives us a JSON list of projects, eg:
{
"projects": [
{"id":1,"name":"Project"},
... // array of projects
]
}

An earlier answer in this thread suggested using jq's foreach, but that may be much more complicated than needed, especially given the stated task. Specifically, foreach (and reduce) are intended for certain cases where you need to accumulate results.
In many cases (including some cases where eventually a reduction step is necessary), it's better to use .[] or map(_). The latter is just another way of writing [.[] | _] so if you are going to use jq, it's really useful to understand that .[] simply creates a stream of values.
For example, [1,2,3] | .[] produces a stream of the three values.
To take a simple map-reduce example, suppose you want to find the maximum length of an array of strings. One solution would be [ .[] | length] | max.

Here is a simple example that works in zch shell:
DOMAINS='["google","amazon"]'
arr=$(echo $DOMAINS | jq -c '.[]')
for d in $arr; do
printf "Here is your domain: ${d}\n"
done

I stopped using jq and started using jp, since JMESpath is the same language as used by the --query argument of my cloud service and I find it difficult to juggle both languages at once. You can quickly learn the basics of JMESpath expressions here: https://jmespath.org/tutorial.html
Since you didn't specifically ask for a jq answer but instead, an approach to iterating JSON in bash, I think it's an appropriate answer.
Style points:
I use backticks and those have fallen out of fashion. You can substitute with another command substitution operator.
I use cat to pipe the input contents into the command. Yes, you can also specify the filename as a parameter, but I find this distracting because it breaks my left-to-right reading of the sequence of operations. Of course you can update this from my style to yours.
set -u has no function in this solution, but is important if you are fiddling with bash to get something to work. The command forces you to declare variables and therefore doesn't allow you to misspell a variable name.
Here's how I do it:
#!/bin/bash
set -u
# exploit the JMESpath length() function to get a count of list elements to iterate
export COUNT=`cat data.json | jp "length( [*] )"`
# The `seq` command produces the sequence `0 1 2` for our indexes
# The $(( )) operator in bash produces an arithmetic result ($COUNT minus one)
for i in `seq 0 $((COUNT - 1))` ; do
# The list elements in JMESpath are zero-indexed
echo "Here is element $i:"
cat data.json | jp "[$i]"
# Add or replace whatever operation you like here.
done
Now, it would also be a common use case to pull the original JSON data from an online API and not from a local file. In that case, I use a slightly modified technique of caching the full result in a variable:
#!/bin/bash
set -u
# cache the JSON content in a stack variable, downloading it only once
export DATA=`api --profile foo compute instance list --query "bar"`
export COUNT=`echo "$DATA" | jp "length( [*] )"`
for i in `seq 0 $((COUNT - 1))` ; do
echo "Here is element $i:"
echo "$DATA" | jp "[$i]"
done
This second example has the added benefit that if the data is changing rapidly, you are guaranteed to have a consistent count between the elements you are iterating through, and the elements in the iterated data.

This is what I have done so far
arr=$(echo "$array" | jq -c -r '.[]')
for item in ${arr[#]}; do
original_name=$(echo $item | jq -r '.original_name')
changed_name=$(echo $item | jq -r '.changed_name')
echo $original_name $changed_name
done

Related

What is the proper way of growing a json array using jq inside a bash script?

I am trying to construct a json array using jq element by element. The elements are being generated by a certain process. In this example I am keeping all the elements as the same, let's say {"key_1":1} for simplicity.
declare JSON_ARRAY=[]
total_count=10000
OBJECT="{\"key_1\":1}"
for i in $(seq 0 $total_count); do
JSON_ARRAY=$(echo "$JSON_ARRAY" | jq .[$i]+="$OBJECT")
done
echo "$JSON_ARRAY" | jq -c
I want the output from the code above to be a json array, such as for 3 elements:
[{"key_1":1}, {"key_1":1}, {"key_1":1}]
For smaller values of counter this would work but for large values like 10000 this leads to parse error or Aborted (core dumped). It seems that the script runs out of memory trying to build large arrays. What could be the reasonable approach to doing this using jq? In this case all the elements are identical, but in my situation the elements (each a json object) are being generated in runtime within the loop such as the following:
for i in $(seq 0 $total_count); do
OBJECT=$(build_object)
JSON_ARRAY=$(echo "$JSON_ARRAY" | jq .[$i]+="$OBJECT")
done
If you have a loop (or any sequence of commands) which generates a stream of JSON objects, you can slurp them with jq and get your array for free.
Loop:
for i in $(seq 100); do
echo '{"key": "value"}';
done | jq -s '.' > output.json
Sequence:
{
echo '{"key": "value"}';
echo '{"other_key": "second value"}';
echo '{"last": 42}';
} | jq -s '.' > output.json

jq: insert new objects while reading inputs from json file and bash stdout

I want to insert new json objects in between json objects using bash generated uuid.
input json file test.json
{"name":"a","type":1}
{"name":"b","type":2}
{"name":"c","type":3}
input bash command uuidgen -r
target output json
{"id": "7e3ca7b0-48f1-41fe-9a19-092a62cba0dc"}
{"name":"a","type":1}
{"id": "3f793fdd-ec3b-4306-8153-12f3f9faf2c1"}
{"name":"b","type":2}
{"id": "cbcd759a-37e7-4da7-b7fe-7572f474ec31"}
{"name":"c","type":3}
basic jq program to insert new objects
jq -c '{"id"}, .' test.json
output json
{"id":null}
{"name":"a","type":1}
{"id":null}
{"name":"b","type":2}
{"id":null}
{"name":"c","type":3}
jq program to insert uuid generated from bash:
jq -c '{"id" | input}, .' test.json < <(uuidgen)
Unsure about how to handle two inputs, bash command used to create a value in the new object, and the input file to be transformed (new object inserted in between each object).
I want to process small and large json files up to a few gigabytes each.
Greatly appeaciate some help with a well designed solution(s) that would scale for large files and perform the operations quickly and efficiently.
Thanks in advance.
If the input file is already well-formed JSONL, then a simple bash solution would be:
while IFS= read -r line; do
printf "{\"id\": \"%s\"}\n" $(uuidgen)
printf '%s\n' "$line"
done < test.json
This might well be the best trivial solution if test.json is very large and known to be valid JSONL.
If the input file is not already JSONL, then you could still use the above approach by piping in jq -c . test.json. And if ‘read’ is too slow, you could still use the above text-processing approach with awk.
For the record, a single-call-to-jq solution along the lines you have in mind could be constructed as follows:
jq -n -c -R --slurpfile objects test.json '
$objects[] | {"id": input}, .' <(while true ; do uuidgen ; done)
Obviously you cannot "slurp" the unbounded stream of uuidgen values; less obviously perhaps, if you were simply to pipe in the stream, the process will hang.
Since #peak has already covered the jq side of the problem, I'm going to take a shot at doing this more efficiently using Python, still wrapped so it can be called in a shell script.
This assumes that your input is JSONL, with one document per line. If it isn't, consider piping through jq -c . before piping into the below.
#!/usr/bin/env bash
py_prog=$(cat <<'EOF'
import json, sys, uuid
for line in sys.stdin:
print(json.dumps({"id": str(uuid.uuid4())}))
sys.stdout.write(line)
EOF
)
python -c "$py_prog" <in.json >out.json
Here's another approach where jq is handling input as raw string, already muxed by a separate copy of bash.
while IFS= read -r line; do
uuidgen
printf '%s\n' "$line"
done | jq -Rrc '({ "id": . }, input)'
It still has all the performance overhead of calling uuidgen once per input line (plus some extra overhead because bash's read operates one byte at a time) -- but it operates in a fixed amount of memory without needing Python.
If the input was not known in advance to be valid JSONL,
one of the following bash+jq solutions might make sense
since the overhead of counting the number of objects would be relatively small.
If the input is small enough to fit in memory, you could go with a simple solution:
n=$(jq -n 'reduce inputs as $in (0; .+1)' test.json)
for ((i=0; i < $n; i++)); do uuidgen ; done |
jq -n -c -R --slurpfile objects test.json '
$objects[] | {"id": input}, .'
Otherwise, that is, if the input is very large, then one could avoid slurping it as follows:
n=$(jq -n 'reduce inputs as $in (0; .+1)' test.json)
jq -nc --rawfile ids <(for ((i=0; i < $n; i++)); do uuidgen ; done) '
$ids | split("\n") as $ids
| foreach inputs as $in (-1; .+1; {id: $ids[.]}, $in)
' test.json

How to create a dictionary from a text file in bash?

I want to create a dictionary in bash from a text file which looks like this:
H96400275|A
H96400276|B
H96400265|C
H96400286|D
Basically I want a dictionary like this from this file file.txt:
KEYS VALUES
H96400275 = A
H96400276 = B
H96400265 = C
H96400286 = D
I created following script:
#!/bin/bash
declare -a dictionary
while read line; do
key=$(echo $line | cut -d "|" -f1)
data=$(echo $line | cut -d "|" -f2)
dictionary[$key]="$data"
done < file.txt
echo ${dictionary[H96400275]}
However, this does not print A, rather it prints D. Can you please help ?
Associative arrays (dictionaries in your terms) are declared using -A, not -a. For references to indexed (ones declared with -a) arrays' elements, bash performs arithmetic expansion on the subscript ($key and H96400275 in this case); so you're basically overwriting dictionary[0] over and over, and then asking for its value; thus D is printed.
And to make this script more effective, you can use read in conjunction with a custom IFS to avoid cuts. E.g:
declare -A dict
while IFS='|' read -r key value; do
dict[$key]=$value
done < file
echo "${dict[H96400275]}"
See Bash Reference Manual § 6.7 Arrays.
the only problem is that you have to use -A instead of -a
-a Each name is an indexed array variable (see Arrays above).
-A Each name is an **associative** array variable (see Arrays above).
What you want to do is so named associative array. And to declare it you need to use command:
declare -A dictionary

How can I convert a "key: value" sequence into JSON?

hokay, I am trying to write a script that takes information from the yum - repolist all and puts it into pretty JSON for me to use in some data collecting.. Right now I have my output from the yum command looking like this.
All I have for code right now is just the yum repolist command.
#!/bin/bash -x
yum -v repolist all | grep -B2 -A6 "enabled" | sed 's/[[:space:]]//g' , 's/--//g' , 's/name=name=/name=/g'
the output from that command looks like:
Repo-id: wazuh_repo
Repo-name: Wazuhrepository
Repo-status: enabled
Repo-revision: 1536348945
Repo-updated: FriSep712:35:512018
Repo-pkgs: 73
Repo-size: 920M
Repo-baseurl: https://packages.wazuh.com/3.x/yum/
Repo-expire: 21,600second(s)(last:WedOct3108:59:002018)
There are about 8 entries and the titles are always the same... Can someone explain like I am five how to convert this into json, I've read the jq man page, I've read about hash's. nothing seems to make sense. I know I need to have a "key"/"value" how to I designate these?
I just want to take the output and make it look like pretty JSON, this is part of a larger script I am writing to help keep ontop of the repos we use at work. I am just totally not getting JSON though.
edit: I would prefer not to use a wrapper function and do/learn the proper way
So, first, so people who don't have yum can test this, let's make a wrapper function:
write_output() { cat <<EOF
Repo-id: wazuh_repo
Repo-name: Wazuhrepository
Repo-status: enabled
Repo-revision: 1536348945
Repo-updated: FriSep712:35:512018
Repo-pkgs: 73
Repo-size: 920M
Repo-baseurl: https://packages.wazuh.com/3.x/yum/
Repo-expire: 21,600second(s)(last:WedOct3108:59:002018)
EOF
}
Notably, all your keys come before the string :, and the values come after them -- so we want to read line-by-line, split based on colon-space sequences, treat what was in front as a key, and treat what's in back as a value.
Given that:
jq -Rn '[inputs | split(": ")] | reduce .[] as $kv ({}; .[$kv[0]] = $kv[1])' < <(write_output)
...properly emits:
{
"Repo-id": "wazuh_repo",
"Repo-name": "Wazuhrepository",
"Repo-status": "enabled",
"Repo-revision": "1536348945",
"Repo-updated": "FriSep712:35:512018",
"Repo-pkgs": "73",
"Repo-size": "920M",
"Repo-baseurl": "https://packages.wazuh.com/3.x/yum/",
"Repo-expire": "21,600second(s)(last:WedOct3108:59:002018)"
}
...so, how does that work?
jq -R turns on raw input mode; input is parsed as a sequence of raw strings, not as a sequence of JSON documents.
jq -n treats null as the only direct input, so one can then use input and inputs primitives inside the script where needed.
[ inputs ] reads all your lines of input, and puts them into a single array.
[ inputs | split(": ")] changes that from an array of strings to an array of lists -- with content both before and after the ": " sequence.
reduce .[] as $kv ( {}; ... ) starts a reducer, with an initial value of {}, and then feeds each value that .[] evaluates to (which is to say, each item in your list) into that reducer (the ... code) as the $kv variable, replacing the . value each time.
To run this with your yum command as the real input, change < <(write_output) to < <(yum -v repolist all | grep -B2 -A6 "enabled" | sed 's/[[:space:]]//g' , 's/--//g' , 's/name=name=/name=/g').
Here is a slightly more robust variation of #CharlesDuffy's answer. Since the latter provides excellent explanatory notes, further explanations are not given here.
jq -nR '
[inputs | index(": ") as $ix | {(.[:$ix]): .[$ix+2:]}]
| add'
This avoids using split in case the "value" contains ": ". It might, however, be still better not to assume that a space follows the first relevant ":".
Notice also that add is used here instead of reduce, solely for compactness and simplicity.
For these sorts of problems, I would prefer to use a regular expression to match keys and values. Otherwise, I would take an approach similar to Charles's.
$ ... | jq -Rn 'reduce (inputs | capture("(?<k>[^:]+):\\s*(?<v>.+)")) as {$k, $v} ({}; .[$k] = $v)'

Looping through json array not working - jq

I have a JSON array conf=
[ { "fraudThreshold": 4, "fraudTTLSec": 60 }, { "fraudThreshold": 44, "fraudTTLSec": 60 } ]
I want to loop through its items. So I have done the following:
for configy in $(echo "${conf}" | jq -r ".[]"); do
echo configy=$configy
done
The results are:-
configy={
configy="fraudThreshold":
configy=4,
configy="fraudTTLSec":
and so on.
It is splitting the string using spaces and giving the results one by one.
Why is bash showing this weird behavior? Is there any solution to this?
Also, it is giving proper values when I do :
configy=$(echo $conf | jq .[-1])
echo configy=$configy
Result:
configy={ "fraudThreshold": 44, "fraudTTLSec": 60 }
In order to loop through the items in the JSON array using bash, you could write:
echo "${conf}" | jq -cr ".[]" |
while read -r configy
do
echo configy="$configy"
done
This yields:
configy={"fraudThreshold":4,"fraudTTLSec":60}
configy={"fraudThreshold":44,"fraudTTLSec":60}
However there is almost surely a better way to achieve your ultimate goal.
echo "${conf}" | jq -car '.[] | "configy=" + tojson'
produces:
configy={"fraudThreshold":4,"fraudTTLSec":60}
configy={"fraudThreshold":44,"fraudTTLSec":60}
for configy in $(echo "${conf}" | jq -r ".[]"); do
It is splitting the string using spaces and giving the results one by one. Why is bash showing this weird behavior?
This behavior is not weird at all. See the Bash Reference Manual: Word Splitting:
The shell scans the results of parameter expansion, command
substitution, and arithmetic expansion that did not occur within
double quotes for word splitting.
Is there any solution to this?
Mâtt Frëëman and peak presented working solutions; you can slightly optimize them by replacing echo "${conf}" | with <<<"$conf".

Resources