fetch the number of record from a JSON file using shell - shell

I have a test.txt file in this format
{
"user": "sthapa",
"ticket": "LIN-5867_3",
"start_date": "2018-03-16",
"end_date": "2018-03-16",
"demo_nos": [692],
"service_names": [
"service1",
"service2",
"service3",
"service4",
"service5",
"service6",
"service7",
"service8",
"service9"
]
}
I need to look for a tag called demo_nos and provide the count of it.
For example in the above file "demo_nos": [692] which means only one demo nos...similarly if it had "demo_nos": [692,300] then the count would be 2
so what shell script can i write to fetch and print the count?
The output should say the demo nos = 1 or 2 depending on the values inside the tag [].
i.e I have a variable in my shell script called market_nos which should give me it's count

The gold standard for manipulating JSON data from the command line is jq:
$ jq '.demo_nos | length' test.txt
1
.demo_nos returns the value associated with the demo_nos key in the object, and that array is piped to the length function which does the obvious.

I'm assuming you have python and the file is JSON :)
$ cat some.json
{
"user": "sthapa",
"ticket": "LIN-5867_3",
"start_date": "2018-03-16",
"end_date": "2018-03-16",
"demo_nos": [692],
"service_names": [
"service1",
"service2",
"service3",
"service4",
"service5",
"service6",
"service7",
"service8",
"service9"
]
}
$ python -c 'import sys,json; print(len(json.load(sys.stdin)["demo_nos"]))' < some.json
1

Not the most elegant solution but this should do it
cat test.txt | grep -o -P 'demo_nos.{0,200}' | cut -d'[' -f2 | cut -d']' -f1 | awk -F',' '{ print NF }'
Please note that this is a quick and dirty solution treating input as raw text, and not taking into account JSON structure. In exceptional cases were "demo_nos" string would also appear elsewhere in the file, the output from the command above might be incorrect.

Related

How to extract simple text and store into a file?

I'm writing a bash script for hetzner's cloud API, and I need to store the server's ID to a text file. After the command using it will output the below,
{
"server": {
"id": 12345678,
"name": "servertest-101",
"status": "initializing",
"created": "2020-09-18T09:22:21+00:00",
This is just a snippet, but that's from the first line of the response.
How can I extract and store that value?
The api returns in json format: You've not given much information but use jq to parse it:
$ cat myinput.json
{
"server": {
"id": 12345678,
"name": "servertest-101",
"status": "initializing",
"created": "2020-09-18T09:22:21+00:00"
}
}
$ jq -r .server.id myinput.json
12345678
redirect to a file:
$ jq -r .server.id myinput.json > myoutputfile
$ cat myoutputfile
12345678
You can pipe output of your command to process it further as this:
cat yourjson.json | grep -m 1 -E -o '\"id\": [0-9]+' | cut -d" " -f 2 > yourtextfile.txt
First, get your json content, then send it through the grep command that extracts only part "id": 1234567 using regular expression. Then pipe this result to cut command that splits it by a space and selects the second part, which is your value. Lastly, you redirect result of the job to the desired text file.
If you are sure that your value is going to always be the first number in the input, you can just simply select it by grep:
cat yourjson.json | grep -m 1 -E -o '[0-9]+' > output.txt

How to extract text between two patterns with sed/awk

I know this has been asked 1000 times here, but I read a lot of similar questions and still did not manage to find the right way to do this. I need to extract a number from a line that looks like this:
{"version":"4.9.123M","info":{"version":[2034.2],"description":""},"status":"OK"}
Expected output:
2034.2
This version number will not always be the same, but the rest of the line should.
I have tried working with sed but I am new to this and failed:
sed -e 's/version":[\(.*\),"description/\1/'
output:
sed: -e expression #1, char 35: unterminated `s' command
I think the issue is that there are too many special characters involved in the line and I did not write the command very well.
Since it's JSON, use should use JSON aware tools for processing it. If you prefer, for example, awk, the way is to use GNU awk's JSON extension. This is a small how-to.
First download and compile appropriate versions of GNU awk, Gawkextlib and gawk-json. That's pretty straightforward, actually, just ./configure and make. Then, write some code:
awk '
#load "json" # enable json extension
{
lines=lines $0 # read json file records and buffer to var lines
if(json_fromJSON(lines,data)==1) { # once the json is complete
for(i in data["info"]["version"]) # that seems to be an array so all elements
print data["info"]["version"][i] # are outputed
lines="" # once done with the first json object
} # reset the var for more lines
}' file
Output this time:
2034.2
Explained a bit more:
The JSON file structure can vary from one line to multiple lines, for example:
{"version":"4.9.123M","info":{"version":[2034.2],"description":""},"status":"OK"}
or:
{
"version": "4.9.123M",
"info": {
"version": [
2034.2
],
"description": ""
},
"status": "OK"
}
so we need to buffer the JSON lines with lines=lines $0 until there is a whole valid object in variable lines. We use the extension function json_fromJSON() to determine that validity in if(json_fromJSON(lines,data)==1). While validated the object gets disentangled and stored to array data. For this particular object the structure of the array is:
data["version"]="4.9.123M"
data["info"]["version"][1]="2034.2"
data["info"]["description"]=""
data["status"]="OK"
We could examine the object and produce some output of it with this recursive array scanning function:
awk '
#load "json"
function scan(a,p, q) { # a is array, p path to it, q is qnd *
if(isarray(a))
for(i in a) {
q=p (p==""?"":"->") i
scan(a[i],q)
}
else
print p ":" a
}
{
lines=lines $0
if(json_fromJSON(lines,data)==1)
scan(data) #
}' file.json
Output:
status:OK
version:4.9.123M
info->version->1:2034.2
info->description:
*) quick'n dirty
Here is a brief example of how to output JSON from an array: https://stackoverflow.com/a/58109715/4162356
If the version is always enclosed in [] and no other [ or ] is present in a line ,you can try this logic
STR='{"version":"4.9.123M","info":{"version":[2034.2],"description":""},"status":"OK"}'
echo $STR | awk -F'[' '{print $2}' | awk -F']' '{print $1}'
Simplest Way
Try grep when want to extract simple texts
echo "{"version":"4.9.123M","info":{"version":[2034.2],"description":""},"status":"OK"}"| grep -o "\[.*\]" | sed -e 's/\[\|\]//g'
This should do:
STR='{"version":"4.9.123M","info":{"version":[2034.2],"description":""},"status":"OK"}'
echo "$STR" | awk -F'[][]' '{print $2}'
2034.2

Generate json file with formatting

I have a curl command which generates json output. I want to add a few characters in generated file to be able to process it further.
Command:
curl -sN --negotiate -u foo:bar "http://hostname/db/tbl_name/" >> db.json
This runs under a for loop which runs it for a db and tbl_name combination. Hence it ends up generating a number of json outputs(one for each table) concatenated together without any delimiter.
Output looks like :
{"columns":[{"name":"tbl_id","type":"varchar(50)"},{"name":"cret_timestmp","type":"timestamp"},{"name":"updt_timestmp","type":"timestamp"},{"name":"frst_nm","type":"varchar(50)"},{"name":"last_nm","type":"varchar(50)"},{"name":"acct_num","type":"varchar(15)"},{"name":"r_num","type":"varchar(15)"},{"name":"pid","type":"decimal(15,0)"},{"name":"ami_id","type":"varchar(30)"},{"name":"ssn","type":"varchar(9)"},{"name":"client_id","type":"varchar(30)"},{"name":"client_nm","type":"varchar(100)"},{"name":"info","type":"timestamp"},{"name":"rmx","type":"varchar(10)"},{"name":"id","type":"decimal(12,0)"},{"name":"ingest_timestamp","type":"string"},{"name":"incr_ingest_timestamp","type":"string"}],"database":"db_i","table":"db_tbl"}{"columns":[{"name":"key","type":"varchar(15)"},{"name":"foo_cd","type":"varchar(10)"},{"name":"foo_nm","type":"varchar(56)"},{"name":"tmc_regn_cd","type":"varchar(10)"},{"name":"tmc_mrkt_cd","type":"varchar(20)"},{"name":"mrkt_grp","type":"varchar(30)"},{"name":"ingest_timestamp","type":"string"},{"name":"incr_ingest_timestamp","type":"string"}],"database":"db_i","table":"ss_mv"}{"columns":[{"name":"bar_src_name","type":"string"},{"name":"bar_ent_name","type":"string"},{"name":"from_src","type":"string"},{"name":"reload","type":"string"},{"name":"column_mismatch","type":"string"},{"name":"xx_src_name","type":"string"},{"name":"xx_ent_name","type":"string"}],"database":"db_i","table":"test_table"}
Desired output is to start and end the output with []. Also I want to include "," between the end and beginning where column list starts.
So for ex: if the curl command runs against 3 tables as shown above, then the three generated jsons should be created like :
[{json1},{json2},{json3}]
Number 1,2,3 ...etc corresponds to different tables in curl command running in for loop against a particular db whose json should be created in one file but with desired format.
instead of what I'm currently getting :
{json1}{json2}{json3}
In the output pasted above, JSON 1 is :
{"columns":[{"name":"tbl_id","type":"varchar(50)"},{"name":"cret_timestmp","type":"timestamp"},{"name":"updt_timestmp","type":"timestamp"},{"name":"frst_nm","type":"varchar(50)"},{"name":"last_nm","type":"varchar(50)"},{"name":"acct_num","type":"varchar(15)"},{"name":"r_num","type":"varchar(15)"},{"name":"pid","type":"decimal(15,0)"},{"name":"ami_id","type":"varchar(30)"},{"name":"ssn","type":"varchar(9)"},{"name":"client_id","type":"varchar(30)"},{"name":"client_nm","type":"varchar(100)"},{"name":"info","type":"timestamp"},{"name":"rmx","type":"varchar(10)"},{"name":"id","type":"decimal(12,0)"},{"name":"ingest_timestamp","type":"string"},
{"name":"incr_ingest_timestamp","type":"string"}],"database":"db_i","table":"db_tbl"}
JSON 2 is :
{"columns":[{"name":"key","type":"varchar(15)"},{"name":"foo_cd","type":"varchar(10)"},{"name":"foo_nm","type":"varchar(56)"},{"name":"tmc_regn_cd","type":"varchar(10)"},{"name":"tmc_mrkt_cd","type":"varchar(20)"},{"name":"mrkt_grp","type":"varchar(30)"},{"name":"ingest_timestamp","type":"string"},{"name":"incr_ingest_timestamp","type":"string"}],"database":"db_i","table":"ss_mv"}
JSON 3 is :
{"columns":[{"name":"bar_src_name","type":"string"},{"name":"bar_ent_name","type":"string"},{"name":"from_src","type":"string"},{"name":"reload","type":"string"},{"name":"column_mismatch","type":"string"},{"name":"xx_src_name","type":"string"},{"name":"xx_ent_name","type":"string"}],"database":"db_i","table":"test_table"}
I hope the requirement is clear, thanks in advance, looking to achieve this via bash.
Use jq -s.
--slurp/-s: Instead of running the filter for each JSON object in the input, read the entire input stream into a large array
and run the filter just once.
Here's an example:
$ cat file.json
{ "key": "value1" }
{ "key": "value2" }
{ "key":
"value3"}{"key": "value4"}
$ jq -s < file.json
[
{
"key": "value1"
},
{
"key": "value2"
},
{
"key": "value3"
},
{
"key": "value4"
}
]
I'm not sure if I got it correctly, but I think you are looking for something like
echo "[$(cat *.json | paste -sd ',')]" > result.json
This works by creating a string that starts with [ and ends with ], and in the middle, there are the contents of the json files concatenated (cat) and separated by commas (with the help of paste). That string is echoed and written to a new file.
Presuming input in valid JSONL format (one JSON document per line of input), you can embed a Python script inside your bash script:
slurpjson_py='
import json, sys
json.dump([json.loads(line.strip()) for line in sys.stdin], sys.stdout, indent=4)
sys.stdout.write("\n")
'
slurpjson() { python -c "$slurpjson_py" "$#"; }
If called as:
slurpjson <<EOF
{ "first": "document", "starting": "here" }
{ "second": "document", "ending": "here" }
EOF
...output is correctly:
[
{
"starting": "here",
"first": "document"
},
{
"second": "document",
"ending": "here"
}
]
I managed to achieve this by running curl command and adding a "," with every line break using
sed 's/$/,/'
And then remove the last "," and added first and end [] using :
for i in *; do cat $i | sed '$ s/.$//' | awk '{print "["$0"]"}' > $json_dir/$i; done

Bash: Need to replace different email addresses within a file

I'm trying to mask PII in a file (.json).
The file contains different email addresses and I would like to change them with other different email addresses.
For example:
"results":
[{ "email1#domain1.com",
"email2#domain2.com",
"email3#domain3.com",
"email4#domain4.com",
"email5#domain5.com" }]
I need to change them to:
"results":
[{ "mockemail1#mockdomain1.com",
"mockemail2#mockdomain2.com",
"mockemail3#mockdomain3.com",
"mockemail4#mockdomain4.com",
"mockemail5#mockdomain5.com" }]
Using sed and regex I have been able to change the addresses to one of the mock email addresses, but I would like to change each email to a different mock email.
The mock email addresses are stored in a file. To get a random address I use:
RandomEmail=$(shuf -n 1 Mock_data.csv | cut -d "|" -f 3)
Any ideas? Thanks!
input.json
You've got your JSON file (add an extra breakline at the end that does not appear in this example or read function in bash won't work correctly)
"results":
[{ "email1#mockdomain1.com",
"email2#mockdomain2.com",
"email3#mockdomain3.com",
"email4#mockdomain4.com",
"email5#mockdomain5.com" }]
substitutions.txt
(add an extra breakline at the end that does not appear in this example or read function in bash won't work correctly)
domain1.com;mockdomain1.com
domain2.com;mockdomain2.com
domain3.com;mockdomain3.com
domain4.com;mockdomain4.com
domain5.com;mockdomain5.com
script.sh
#!/bin/bash
while read _line; do
unset _ResultLine
while read _subs; do
_strSearch=$(echo $_subs | cut -d";" -f1)
_strReplace=$(echo $_subs | cut -d";" -f2)
if [ "$(echo "$_line" | grep "#$_strSearch")" ]; then
echo "$_line" | awk -F"\t" -v strSearch=$_strSearch -v strReplace=$_strReplace \
'{sub(strSearch,strReplace); print $1}' >> output.json
_ResultLine="ok"
fi
done < substitutions.txt
[ "$_ResultLine" != "ok" ] && echo "$_line" >> output.json
done < input.json
ouput.json
"results":
[{ "email1#mockdomain1.com",
"email2#mockdomain2.com",
"email3#mockdomain3.com",
"email4#mockdomain4.com",
"email5#mockdomain5.com" }]
I saved the first file with emailX#domainX.com to /tmp/1. I created a file /tmp/2 with the content of mockemails:
mockemail1#mockdomain1.com
mockemail2#mockdomain2.com
mockemail3#mockdomain3.com
mockemail4#mockdomain4.com
mockemail5#mockdomain5.com
First I extract a list of email addresses from /tmp/1 and I shuffle mockemails. Then I join using paste emails with shuffled mockemails on columns. Then I convert the lines from format email mockemail into sed argument s/email/mockemail/; and pass it to sed. Then I call sed to suibstitute emails to random mockemail passing /tmp/1 file as stdin.
sed "$(paste <(cat /tmp/1 | sed -n '/#/{s/.*"\(.*#.*.com\)".*/\1/;/^$/d;p;}') <(shuf /tmp/2) | sed 's#\(.*\)\t\(.*\)#s/\1/\2/#' | tr '\n' ';')" </tmp/1
This produces:
"results":
[{ "mockemail1#mockdomain1.com",
"mockemail3#mockdomain3.com",
"mockemail5#mockdomain5.com",
"mockemail4#mockdomain4.com",
"mockemail2#mockdomain2.com" }]
Given these input files:
$ cat file1
"results":
[{ "email1#domain1.com",
"email2#domain2.com",
"email3#domain3.com",
"email4#domain4.com",
"email5#domain5.com" }]
$ cat file2
foo|bar|mockemail1#mockdomain1.com|etc
foo|bar|mockemail2#mockdomain2.com|etc
foo|bar|mockemail3#mockdomain3.com|etc
foo|bar|mockemail4#mockdomain4.com|etc
foo|bar|mockemail5#mockdomain5.com|etc
all you need is:
$ shuf file2 | awk 'NR==FNR{a[NR]=$3;next} /#/{$2=a[++c]} 1' FS='|' - FS='"' OFS='"' file1
"results":
[{ "mockemail2#mockdomain2.com",
"mockemail4#mockdomain4.com",
"mockemail5#mockdomain5.com",
"mockemail1#mockdomain1.com",
"mockemail3#mockdomain3.com" }]
Quick and dirty implementation with python:
hypothesis:
You have a wellformed JSON input:
{
"results":
[
"email1#domain1.com",
"email2#domain2.com",
"email3#domain3.com",
"email4#domain4.com",
"email5#domain5.com"
]
}
you can validate your JSON at this address https://jsonformatter.curiousconcept.com/
code:
import json
import sys
input_message = sys.stdin.read()
json_dict = json.loads(input_message)
results=[]
for elem in json_dict['results']:
results.append("mock"+elem)
results_dict = {}
results_dict['results']=results
print(json.dumps(results_dict))
command:
$ echo '{"results":["email1#domain1.com","email2#domain2.com","email3#domain3.com","email4#domain4.com","email5#domain5.com"]}' | python jsonConvertor.py
{"results": ["mockemail1#domain1.com", "mockemail2#domain2.com", "mockemail3#domain3.com", "mockemail4#domain4.com", "mockemail5#domain5.com"]}
A friend of mine suggested the following elegant solution that works in two parts:
Substitute email addresses with a string.
sed -E -i 's/\b[A-Za-z0-9._%+-]+#[A-Za-z0-9.-]+\.[A-Za-z]{2,6}\b/EMAIL_TO_REPLACE/g' data.json
Iterate the file, and on each iteration substitute the 1st appearance of the string with a random email from the file:
for email in $(egrep -o EMAIL_TO_REPLACE data.json) ; do
sed -i '0,/EMAIL_TO_REPLACE/s//'"$(shuf -n 1 Mock_data.csv | cut -d "|" -f 3)"'/' data.json ;
done
And that's it.
Thanks Elina!

Text replace in a file, on 5h line, from position 18 to position 145

I have this text file:
{
"name": "",
"auth": true,
"username": "rtorrent",
"password": "d5275b68305438499f9660b38980d6cef7ea97001efe873328de1d76838bc5bd15c99df8b432ba6fdcacbff82e3f3c4829d34589cf43236468d0d0b0a3500c1e"
}
Now, I want to be able to replace the d5275b68305438499f9660b38980d6cef7ea97001efe873328de1d76838bc5bd15c99df8b432ba6fdcacbff82e3f3c4829d34589cf43236468d0d0b0a3500c1e using sed for example. (The string has always the exact same length, but the values can be different)
I've tried this using sed:
sed -i 5s/./new-string/18 file.json
That basically replaces text, on the 5th line, starting with position 18. I want to be able to replace the text, exactly starting with position 18 and up to position 154, strictly what's inside the "". The command above will cut the ", at the end of the file and if it's run multiple times, the string becomes every time longer and longer.
Any help is really appreciated.
You can use for example awk for it:
$ awk -v var="new_string" 'NR==5{print substr($0,1,17) var substr($0,146);next}1' file
{
"name": "",
"auth": true,
"username": "rtorrent",
"password": "new_string"
}
but there are better tools for changing a value in a JSON, jq for example:
$ jq '.password="new_string"' file
{
"name": "",
"auth": true,
"username": "rtorrent",
"password": "new_string"
}
Edit: When passing a shell variable $var to awk and jq:
$ var="new_string"
$ awk -v var="$var" 'NR==5{print substr($0,1,17) var substr($0,146);next}1' file
and
$ jq --arg var "$var" '.password=$var'
Edit2: There is always sed:
$ sed -i "5s/\"[^\"]*\"/\"$var\"/2" file

Resources