Generate json file with formatting - bash

I have a curl command which generates json output. I want to add a few characters in generated file to be able to process it further.
Command:
curl -sN --negotiate -u foo:bar "http://hostname/db/tbl_name/" >> db.json
This runs under a for loop which runs it for a db and tbl_name combination. Hence it ends up generating a number of json outputs(one for each table) concatenated together without any delimiter.
Output looks like :
{"columns":[{"name":"tbl_id","type":"varchar(50)"},{"name":"cret_timestmp","type":"timestamp"},{"name":"updt_timestmp","type":"timestamp"},{"name":"frst_nm","type":"varchar(50)"},{"name":"last_nm","type":"varchar(50)"},{"name":"acct_num","type":"varchar(15)"},{"name":"r_num","type":"varchar(15)"},{"name":"pid","type":"decimal(15,0)"},{"name":"ami_id","type":"varchar(30)"},{"name":"ssn","type":"varchar(9)"},{"name":"client_id","type":"varchar(30)"},{"name":"client_nm","type":"varchar(100)"},{"name":"info","type":"timestamp"},{"name":"rmx","type":"varchar(10)"},{"name":"id","type":"decimal(12,0)"},{"name":"ingest_timestamp","type":"string"},{"name":"incr_ingest_timestamp","type":"string"}],"database":"db_i","table":"db_tbl"}{"columns":[{"name":"key","type":"varchar(15)"},{"name":"foo_cd","type":"varchar(10)"},{"name":"foo_nm","type":"varchar(56)"},{"name":"tmc_regn_cd","type":"varchar(10)"},{"name":"tmc_mrkt_cd","type":"varchar(20)"},{"name":"mrkt_grp","type":"varchar(30)"},{"name":"ingest_timestamp","type":"string"},{"name":"incr_ingest_timestamp","type":"string"}],"database":"db_i","table":"ss_mv"}{"columns":[{"name":"bar_src_name","type":"string"},{"name":"bar_ent_name","type":"string"},{"name":"from_src","type":"string"},{"name":"reload","type":"string"},{"name":"column_mismatch","type":"string"},{"name":"xx_src_name","type":"string"},{"name":"xx_ent_name","type":"string"}],"database":"db_i","table":"test_table"}
Desired output is to start and end the output with []. Also I want to include "," between the end and beginning where column list starts.
So for ex: if the curl command runs against 3 tables as shown above, then the three generated jsons should be created like :
[{json1},{json2},{json3}]
Number 1,2,3 ...etc corresponds to different tables in curl command running in for loop against a particular db whose json should be created in one file but with desired format.
instead of what I'm currently getting :
{json1}{json2}{json3}
In the output pasted above, JSON 1 is :
{"columns":[{"name":"tbl_id","type":"varchar(50)"},{"name":"cret_timestmp","type":"timestamp"},{"name":"updt_timestmp","type":"timestamp"},{"name":"frst_nm","type":"varchar(50)"},{"name":"last_nm","type":"varchar(50)"},{"name":"acct_num","type":"varchar(15)"},{"name":"r_num","type":"varchar(15)"},{"name":"pid","type":"decimal(15,0)"},{"name":"ami_id","type":"varchar(30)"},{"name":"ssn","type":"varchar(9)"},{"name":"client_id","type":"varchar(30)"},{"name":"client_nm","type":"varchar(100)"},{"name":"info","type":"timestamp"},{"name":"rmx","type":"varchar(10)"},{"name":"id","type":"decimal(12,0)"},{"name":"ingest_timestamp","type":"string"},
{"name":"incr_ingest_timestamp","type":"string"}],"database":"db_i","table":"db_tbl"}
JSON 2 is :
{"columns":[{"name":"key","type":"varchar(15)"},{"name":"foo_cd","type":"varchar(10)"},{"name":"foo_nm","type":"varchar(56)"},{"name":"tmc_regn_cd","type":"varchar(10)"},{"name":"tmc_mrkt_cd","type":"varchar(20)"},{"name":"mrkt_grp","type":"varchar(30)"},{"name":"ingest_timestamp","type":"string"},{"name":"incr_ingest_timestamp","type":"string"}],"database":"db_i","table":"ss_mv"}
JSON 3 is :
{"columns":[{"name":"bar_src_name","type":"string"},{"name":"bar_ent_name","type":"string"},{"name":"from_src","type":"string"},{"name":"reload","type":"string"},{"name":"column_mismatch","type":"string"},{"name":"xx_src_name","type":"string"},{"name":"xx_ent_name","type":"string"}],"database":"db_i","table":"test_table"}
I hope the requirement is clear, thanks in advance, looking to achieve this via bash.

Use jq -s.
--slurp/-s: Instead of running the filter for each JSON object in the input, read the entire input stream into a large array
and run the filter just once.
Here's an example:
$ cat file.json
{ "key": "value1" }
{ "key": "value2" }
{ "key":
"value3"}{"key": "value4"}
$ jq -s < file.json
[
{
"key": "value1"
},
{
"key": "value2"
},
{
"key": "value3"
},
{
"key": "value4"
}
]

I'm not sure if I got it correctly, but I think you are looking for something like
echo "[$(cat *.json | paste -sd ',')]" > result.json
This works by creating a string that starts with [ and ends with ], and in the middle, there are the contents of the json files concatenated (cat) and separated by commas (with the help of paste). That string is echoed and written to a new file.

Presuming input in valid JSONL format (one JSON document per line of input), you can embed a Python script inside your bash script:
slurpjson_py='
import json, sys
json.dump([json.loads(line.strip()) for line in sys.stdin], sys.stdout, indent=4)
sys.stdout.write("\n")
'
slurpjson() { python -c "$slurpjson_py" "$#"; }
If called as:
slurpjson <<EOF
{ "first": "document", "starting": "here" }
{ "second": "document", "ending": "here" }
EOF
...output is correctly:
[
{
"starting": "here",
"first": "document"
},
{
"second": "document",
"ending": "here"
}
]

I managed to achieve this by running curl command and adding a "," with every line break using
sed 's/$/,/'
And then remove the last "," and added first and end [] using :
for i in *; do cat $i | sed '$ s/.$//' | awk '{print "["$0"]"}' > $json_dir/$i; done

Related

How to specify jq output fields from variable in bash?

given the following (simplified) json file:
{
"data": [
{
"datum": "2023-01-11 00:00:00",
"prijs": "0.005000",
"prijsZP": "0.161550",
"prijsEE": "0.181484",
"prijsTI": "0.160970",
},
{
"datum": "2023-01-11 01:00:00",
"prijs": "0.000000",
"prijsZP": "0.155500",
"prijsEE": "0.175434",
"prijsTI": "0.154920",
}
]
}
I want to specify in my jq command which fields to retreive, i.e. only "datum" and "prijsTI". But on another moment this selection will be different.
I use the following command to gather all the fields, but would like to set the field selection via a variable:
cat data.json |jq -r '.data[]|[.datum, .prijsTI]|#csv'
I already tried using arguments, but this did not work :-(
myJQselect=".datum, .prijsTI"
cat data.json |jq -r --arg myJQselect "$myJQselect" '.data[$myHour |tonumber]|[$myJQselect]|#csv'
gives the following result: ".datum, .prijs" instead of the correct values.
Would this be possible?
Thanks,
Jeroen
You can use the --args option to provide a variable number of fields to query, then use the $ARGS.positional array to retrieve them:
jq -r '.data[] | [.[$ARGS.positional[]]] | #csv' data.json --args datum prijsTI
"2023-01-11 00:00:00","0.160970"
"2023-01-11 01:00:00","0.154920"

How to use sed command to replace value in json file

My json file looks like this:
"parameters": {
"$connections": {
"value": {
"azureblob": {
"connectionId": "/subscriptions/2b06d50xxxxxedd021/resourceGroups/Reource1005/providers/Microsoft.Web/connections/azureblob",
"connectionName": "azureblob",
"connectionProperties": {
"authentication": {
"type": "ManagedServiceIdentity"
}
},
"id": "/subscriptions/2b06d502-3axxxxxxedd021/providers/Microsoft.Web/locations/eastasia/managedApis/azureblob"
},
"office365": {
"connectionId": "/subscriptions/2b06d502xxxxxc8-5a8939edd021/resourceGroups/Reource1005/providers/Microsoft.Web/connections/office365",
"connectionName": "office365",
"id": "/subscriptions/2b06d50xxxxxx939edd021/providers/Microsoft.Web/locations/eastasia/managedApis/office365"
}
}
}
}
}
I want to use sed command to replace the string in connectionId, currently my script is as follows:
script: 'sed -e ''/connectionId/c\ \"connectionId\" : \"/subscriptions/2b06d50xxxxb-92c8-5a8939edd021/resourceGroups/Reourcetest/providers/Microsoft.Web/connections/azureblob\",'' "$(System.DefaultWorkingDirectory)/function-app-actions/templates/copycode.json"'
This script can replace the strings in both connectionIds in the json file with "Resourcetest", that's what I want to make the strings in the second connectionId replace with other values, how can I do that?
I'm new to sed commands, any insight is appreciated。
Edit:
I just want to replace "Resource1005" in both connectionId strings in the json file with "Resourcetest", but I need other content in the connectionIds string to keep the previous value
So my expected output should look like this:
"connectionId": "/subscriptions/2b06d502-3axxxx8939edd021/resourceGroups/Reourcetest/providers/Microsoft.Web/connections/azureblob"
"connectionId": "/subscriptions/2b06d502-3axxxx8939edd021/resourceGroups/Reourcetest/providers/Microsoft.Web/connections/office365"
If I use the script I mentioned above, it does replace the two Resource1005s, but the other values in the string are also replaced with the same (I just want to replace the Resource1005 value)
1st solution: With your shown samples and attempts, please try following GNU awk code. This will print only edited lines(as per shown samples) in output with substituting Resource1005 with Resourcetest in values.
awk -v RS='[[:space:]]+"connectionId": "[^"]*' '
RT{
sub(/\n+[[:space:]]+/,"",RT)
sub(/\/Resource1005\//,"/Resourcetest/",RT)
print RT
}
' Input_file
2nd solution: With sed you can try following sed code.
sed -nE 's/(^[[:space:]]+"connectionId": ".*)\/Resource1005\/(.*)/\1\/Resourcetest\/\2/p' Input_file
Common practice is to create template files and change them with sed or something else. Like this for example:
cat template.json
...
"office365": {
"connectionId": "__CONNECTIONID__",
"connectionName": "office365",
"id": "__ID__"
}
...
sed 's|__CONNECTIONID__|/some/path|; s|__ID__|/some/other/path|' template.json > new.json

jq: iterate over every element of list and replace it with value

I've got this json-file:
{
"name": "market",
"type": "grocery",
"shelves": {
"upper_one": [
"23423565",
"23552352",
"08789089"
]
}
}
I need to iterate over every element of an list (upper_one), and replace it with other value.
I've tried this code:
#/bin/bash
for product in $(cat first-shop.json| jq -r '.shelves.upper_one[]')
do
cat first-shop.json| jq --arg id "$((1 + $RANDOM % 10))" --arg product "$product" -r '.shelves.upper_one[]|select(. == $product)|= $id'
done
But I got this kind of output:
1
23552352
08789089
23423565
10
08789089
23423565
23552352
7
Is it possible to iterate over list with jq, replace values with value from another function (like $id in the code), and print the whole final json with substituted values?
I need this kind of output:
{
"name": "market",
"type": "grocery",
"shelves": {
"upper_one": [
"1",
"10",
"7"
]
}
}
not just elements of "upper_one" list thrice.
You could try the following script :
#!/usr/bin/env bash
for product in $(jq -r '.shelves.upper_one[]' input.json)
do
id="$((1 + $RANDOM % 10))"
newIds+=("$id")
done
jq '.shelves.upper_one = $ARGS.positional' input.json --args "${newIds[#]}"
IMHO its better to use some scripting language and manipulate objects programmatically. If bash and jq is your only option - this do the job though not nice
$ jq '.shelves.upper_one[] |= (sub("23423565";"1") | sub("23552352";"10") | sub("08789089";"7"))' your.json
{
"name": "market",
"type": "grocery",
"shelves": {
"upper_one": [
"1",
"10",
"7"
]
}
}
consider conversion to numbers with | tonumber

cannot call bash environment variable inside jq

In the below script, I am not able to successfully call the "repovar" variable in the jq command.
cat quayrepo.txt | while read line
do
export repovar="$line"
jq -r --arg repovar "$repovar" '.data.Layer| .Features[] | "\(.Name), \(.Version), $repovar"' severity.json > volume.csv
done
The script uses a text file to loop through the repo names
quayrepo.txt---> file has the list of names in this case the file has a value of "Reponame1"
sample input severity.json file:
{
"status": "scanned",
"data": {
"Layer": {
"IndexedByVersion": 3,
"Features": [
{
"Name": "elfutils",
"Version": "0.168-1",
"Vulnerabilities": [
{
"NamespaceName": "debian:9",
"Severity": "Medium",
"Name": "CVE-2016-2779"
}
]
}
]
}
}
}
desired output:
elfutils, 0.168-1, Medium, Reponame1
Required output: I need to retrieve the value of my environment variable as the last column in my output csv file
You need to surround $repovar with parenthesis, as the other values
repovar='qweqe'; jq -r --arg repovar "$repovar" '.data.Layer| .Features[] | "\(.Name), \(.Version), \($repovar)"' tmp.json
Result:
elfutils, 0.168-1, qweqe
There's no need for the export.
#!/usr/bin/env bash
while read line
do
jq -r --arg repovar "$line" '.data.Layer.Features[] | .Name + ", " + .Version + ", " + $repovar' severity.json
done < quayrepo.txt > volume.csv
with quayrepo.txt as
Reponame1
and severity.json as
{
"status": "scanned",
"data": {
"Layer": {
"IndexedByVersion": 3,
"Features": [
{
"Name": "elfutils",
"Version": "0.168-1",
"Vulnerabilities": [
{
"NamespaceName": "debian:9",
"Severity": "Medium",
"Name": "CVE-2016-2779"
}
]
}
]
}
}
}
produces volume.csv containing
elfutils, 0.168-1, Reponame1
To #peak's point, changing > to >> in ...severity.json >> volume.csv will create a multi-line csv instead of just overwriting until the last line
You don't need a while read loop in bash at all; jq itself can loop over your input lines, even when they aren't JSON, letting you run jq only once, not once per line in quayrepo.txt.
jq -rR --slurpfile inJson severity.json <quayrepo.txt >volume.csv '
($inJson[0].data.Layer | .Features[]) as $features |
[$features.Name, $features.Version, .] |
#csv
'
jq -R specifies raw input, letting jq directly read lines from quayrepo.txt into .
jq --slurpfile varname filename.json reads filename.json into an array of JSON objects parsed from that file. If the file contains only one object, one needs to refer to $varname[0] to refer to it.
#csv converts an array to a CSV output line, correctly handling data with embedded quotes or other oddities that require special processing.

fetch the number of record from a JSON file using shell

I have a test.txt file in this format
{
"user": "sthapa",
"ticket": "LIN-5867_3",
"start_date": "2018-03-16",
"end_date": "2018-03-16",
"demo_nos": [692],
"service_names": [
"service1",
"service2",
"service3",
"service4",
"service5",
"service6",
"service7",
"service8",
"service9"
]
}
I need to look for a tag called demo_nos and provide the count of it.
For example in the above file "demo_nos": [692] which means only one demo nos...similarly if it had "demo_nos": [692,300] then the count would be 2
so what shell script can i write to fetch and print the count?
The output should say the demo nos = 1 or 2 depending on the values inside the tag [].
i.e I have a variable in my shell script called market_nos which should give me it's count
The gold standard for manipulating JSON data from the command line is jq:
$ jq '.demo_nos | length' test.txt
1
.demo_nos returns the value associated with the demo_nos key in the object, and that array is piped to the length function which does the obvious.
I'm assuming you have python and the file is JSON :)
$ cat some.json
{
"user": "sthapa",
"ticket": "LIN-5867_3",
"start_date": "2018-03-16",
"end_date": "2018-03-16",
"demo_nos": [692],
"service_names": [
"service1",
"service2",
"service3",
"service4",
"service5",
"service6",
"service7",
"service8",
"service9"
]
}
$ python -c 'import sys,json; print(len(json.load(sys.stdin)["demo_nos"]))' < some.json
1
Not the most elegant solution but this should do it
cat test.txt | grep -o -P 'demo_nos.{0,200}' | cut -d'[' -f2 | cut -d']' -f1 | awk -F',' '{ print NF }'
Please note that this is a quick and dirty solution treating input as raw text, and not taking into account JSON structure. In exceptional cases were "demo_nos" string would also appear elsewhere in the file, the output from the command above might be incorrect.

Resources