How to convert a txt file to a `json` object using Shell? - bash

I have a text file, which I want to convert to a json object:
MAX_PDQPRIORITY: 80
DS_MAX_QUERIES: 50
DS_MAX_SCANS: 1048576
DS_NONPDQ_QUERY_MEM: 100000 KB
DS_TOTAL_MEMORY: 1000000 KB
My script outputs wrong and I have to manually edit it to json.
How do I use shell to make this change?
Desired output:
[
{
"MAX_PDQPRIORITY":"80",
"DS_MAX_QUERIES":"50",
"DS_MAX_SCANS":"1048576",
"DS_NONPDQ_QUERY_MEM":"100000",
"DS_TOTAL_MEMORY":"1000000"
}
]
Script:
#!/bin/bash
# date:2019-02-02
# informix Show mgmdy .
LANG=EN
pathfile='/home/ampmon/agents/zabbix-agent/script/informix/text'
#expect mgm.#expect |grep -Ev 'Password:|spawn|Invalid' >$pathfile/mgm1.txt
cat $pathfile/mgm1.txt|grep MGM -A 8|grep -Ev 'MGM|-|^$' >$pathfile/mgm.txt
check=`cat $pathfile/mgm.txt|wc -l`
if [ $check -eq 0 ];then
echo "No query results"
exit 1
fi
MAX_PDQPRIORITY=($(cat $pathfile/mgm.txt|grep MAX_PDQPRIORITY |awk -F[:] '{print $2}'|awk '{print $1*1.00}'))
DS_MAX_QUERIES=($(cat $pathfile/mgm.txt|grep DS_MAX_QUERIES |awk -F[:] '{print $2}'|awk '{print $1}'))
DS_MAX_SCANS=($(cat $pathfile/mgm.txt|grep DS_MAX_SCANS |awk -F[:] '{print $2}'|awk '{print $1}'))
DS_NONPDQ_QUERY_MEM=($(cat $pathfile/mgm.txt|grep DS_NONPDQ_QUERY_MEM |awk -F[:] '{print $2}'|awk '{print $1}'))
DS_TOTAL_MEMORY=($(cat $pathfile/mgm.txt|grep DS_TOTAL_MEMORY |awk -F[:] '{print $2}'|awk '{print $1}'))
printf '\t[\n'
printf '\t\t{\n'
printf "\t\t\t \"MAX_PDQPRIORITY\":\"${MAX_PDQPRIORITY}\",\"DS_MAX_QUERIES\":\"${DS_MAX_QUERIES}\",\"DS_MAX_SCANS\":\"${DS_MAX_SCANS}\",\"DS_NONPDQ_QUERY_MEM\":\"${DS_NONPDQ_QUERY_MEM}\",\"DS_TOTAL_MEMORY\":\"${DS_TOTAL_MEMORY}\"}\n"
printf "\t]\n"
My current output:
[
{
","DS_NONPDQ_QUERY_MEM":"100000","DS_TOTAL_MEMORY":"1000000"}ES":"50
]
Can someone help me?

If jq is available, please try:
jq -s -R '[[ split("\n")[] | select(length > 0) | split(": +";"") | {(.[0]): .[1]}] | add]' input.txt
Output:
[
{
"MAX_PDQPRIORITY": "80",
"DS_MAX_QUERIES": "50",
"DS_MAX_SCANS": "1048576",
"DS_NONPDQ_QUERY_MEM": "100000 KB",
"DS_TOTAL_MEMORY": "1000000 KB"
}
]
As an alternative, if python happens to be your option, following will work as well:
#!/bin/bash
python -c '
import re
import json
import collections as cl
list = []
with open("input.txt") as f:
od = cl.OrderedDict()
for line in f:
key, val = re.split(r":\s*", line.rstrip("\r\n"))
od[key] = val
list.append(od)
print (json.dumps(list, indent=4))
'
Hope this helps.

For a simple translation, try using awk; it only reads the file once:
BEGIN {
print "{"
}
{
name=substr($1, 1, length($1)-1)
value=$2
print "\t\""name"\":\""value"\","
}
END {
print "}"
}
This strips the trailing colon from field 1, then prints the values surrounded by double-quotes. It also silently drops the units (KB), as your sample output indicates.

or perl:
with JSON module
perl -MJSON -lne '
#F = split(/:?\s+/);
$data{$F[0]} = $F[1]
} END {
print encode_json [\%data]
' file
without
perl -lne '
#F = split(/:?\s+/);
push #data, sprintf(q{"%s":"%s"}, map {s/"/""/g; $_} #F[0,1]);
} END {
print "[{", join(",", #data), "}]";
' file

Related

linux:extract specific words using awk,grep or sed

Looking to extract Specific Words from each line
Nov 2 11:25:51 imau03ftc CSCOacs_TACACS_Accounting 0687979272 1 0 2016-11-02 11:25:51.250 +13:00 0311976914 3300 NOTICE Tacacs-Accounting: TACACS+ Accounting with Command, ACSVersion=acs-5.6.0.22-B.225, ConfigVersionId=145, Device IP Address=10.107.32.53, CmdSet=[ CmdAV=show controllers <cr> ], RequestLatency=0, Type=Accounting, Privilege-Level=15, Service=Login, User=nc-rancid, Port=tty1, Remote-Address=172.26.200.204, Authen-Method=TacacsPlus, AVPair=task_id=8280, AVPair=timezone=NZDT, AVPair=start_time=1478039151, AVPair=priv-lvl=1, AcctRequest-Flags=Stop, Service-Argument=shell, AcsSessionID=imau03ftc/262636280/336371030, SelectedAccessService=Default Device Admin, Step=13006 , Step=15008 , Step=15004 , Step=15012 , Step=13035 , NetworkDeviceName=CASWNTHS133, NetworkDeviceGroups=All Devices:All Devices, NetworkDeviceGroups=Device Type:All Device Types:Corporate, NetworkDeviceGroups=Location:All Locations, Response={Type=Accounting; AcctReply-Status=Success; }
Looking to extract
Nov 2 11:25:51 show controllers User=nc-rancid NetworkDeviceName=CASWNTHS133
can use awk,grep or sed
i have tried few combinations like
sudo tail -n 20 /var/log/tacacs/imau03ftc-accounting.log | grep -oP 'User=\K.*' & 'NetworkDeviceName=\K.*'
sudo tail -n 20 /var/log/tacacs/imau03ftc-accounting.log | sudo awk -F" " '{ print $1 " " $3 " " $9 " " $28}'
i can add few more lines but most of them have same format
thanks
Try to run this:
sudo tail -n 20 /var/log/tacacs/imau03ftc-accounting.log > tmpfile
Then execute this script:
#!/bin/sh
while read i
do
str=""
str="$(echo $i |awk '{print $1,$2,$3}')"
str="$str $(echo $i |awk 'match($0, /CmdAV=([^<]+)/) { print substr( $0, RSTART,RLENGTH ) }'|awk -F "=" '{print $2}')"
str="$str $(echo $i |awk 'match($0, /User=([^,]+)/) { print substr( $0, RSTART, RLENGTH ) }')"
str="$str $(echo $i |awk 'match($0, /NetworkDeviceName=([^,]+)/) { print substr( $0, RSTART, RLENGTH ) }')"
echo $str
done < tmpfile
Output:
Nov 2 11:25:51 show controllers User=nc-rancid NetworkDeviceName=CASWNTHS133

find lines that aren't matching in directory with grep

I have this bash script:
function getlist() {
grep -E 'pattern' ../fileWithInput.js | sed "s#^regexPattern#\1 \2#" | grep -v :
}
getlist | while read line; do
method=$(echo $line | awk '{ print $1 }')
uri=$(echo $line | awk '{ print $2 }')
`grep "$method" -vr .
#echo method: $method uri: $uri
done
Question:
Currently I have many 'pattern' strings. How to check with directory and output only 'pattern' strings that doesn't match.
What I have example in fileWithInput.js:
'foo','bar','hello'.
~/repo/anotherDirectory:
'foo','bar'.
How to print only strings from fileWithInput.js that are not in /repo/anotherDirectory?
Final output have to be like this:
'hello': 0 matches.
Please help with grep command to do this. Or maybe you have another idea. Thanks for attention and have a nice day!
file1.txt
'foo','bar','hello'.
filem.txt
'foo','bar'.
with awk
awk 'BEGIN{RS="[,\\.]"} NR==FNR{a[$0];next} {delete a[$0]} END{for(i in a){print i": 0 matches."}} ' filei.txt filem.txt
code breakdown:
BEGIN{RS="[,\\.]"} # Record seperator , or .
NR==FNR{a[$0];next} # store values ina array a and skip from next process
{delete a[$0]} # delete from array if file1 exists in file2
END{
for(i in a){
print i": 0 matches."} # print missing items
}
output:
'hello': 0 matches.

Im not able to fetch data of variables using for loop in Shell script ksh

Now I have the code to work on this file type:
cat myfile.txt
XSAP_SM1_100 COR-REV-SAPQ-P09 - 10/14/2013 -
SCHEDULE XSAP_SM1_100#COR-REV-SAPQ-P09 TIMEZONE Europe/Paris
ON RUNCYCLE RULE1 "FREQ=WEEKLY;BYDAY=WE"
EXCEPT RUNCYCLE CALENDAR2 FR1DOFF -1 DAYS
EXCEPT RUNCYCLE SIMPLE3 11/11/2011
AT 0530
:
XSAP_SM1_100#CORREVSAPQP09-01
AT 0640 TIMEZONE Europe/Paris
XSAP_SM1_100#CORREVSAPQP09-02
AT 0645 TIMEZONE Europe/Paris
Code is
awk 'BEGIN { RS=":"; FS="\n"}
NR==2 {
for(i=1;i<=NF;++i) {
if($i !~ /^$/) {
split($i,tmp,"#")
i=i+1
split($i,tmp2," ")
printf "\"%s\",\"%s\",\"%s\"\n", tmp[1],tmp[2],tmp2[2]
}
}
}'
But I have another file type i.e.I'll be executing this command to 1000s of files in for loop but as of I have consolidated and only for below type it's not working as expected.
] cat testing.txt
ODSSLT_P09 COR-ODS-SMT9-B01 - 12/29/2015 -
SCHEDULE ODSSLT_P09#COR-ODS-SMT9-B01 TIMEZONE UTC
ON RUNCYCLE RULE1 "FREQ=DAILY;"
AT 0505
PRIORITY 11
:
ODSSLT_P09#CORODSSMT9001-01
UNTIL 2355 TIMEZONE Asia/Shanghai
EVERY 0100
ODSSLT_P09#CORODSSMT9001-02
AT 2355
EVERY 0100
ODSSLT_P09#CORODSSMT9001-03
ODSSLT_P09#CORODSSMT9001-04
UNTIL 2355 TIMEZONE Asia/Shanghai
EVERY 0100
EOF
Expected output for this file:
"ODSSLT_P09","CORODSSMT9001-01",""
"ODSSLT_P09","CORODSSMT9001-02","2355"
"ODSSLT_P09","CORODSSMT9001-03",""
"ODSSLT_P09","CORODSSMT9001-04",""
Actual output from the code is
| grep -v -i -w -E
"CONFIRMED|DEADLINE|DAY|DAYS|EVERY|NEEDS|OPENS|PRIORITY|PROMPT|UNTIL|AWSBIA291I|END|FOLLOWS" |
awk 'BEGIN { RS=":"; FS="\n"}
NR==2 {for(i=1;i<=NF;++i) {
if($i !~ /^$/) {
split($i,tmp,"#")
i=i+1
split($i,tmp2," ")
printf "\"%s\",\"%s\",\"%s\"\n", tmp[1],tmp[2],tmp2[2]
}}}'
output just gives:
"ODSSLT_P09","CORODSSMT9001-01",""
"AT 2355","",""
"ODSSLT_P09","CORODSSMT9001-04",""
The best solution would be a small awk program doing everything (awk will loop through the input, so write something without a while).
Since you have tagged with ksh and not bash or linux, I do not trust your version of awk.
First try joining the lines and split again except for the AT. I hope no lines will have the string EOL, so I will join with an EOL marker.
sed 's/$/EOL/' myfile.txt |
tr -d "\n" |
sed -e 's/EOLAT/ AT/g' -e 's/EOL/\n/g'
Perhaps your sed version will not understand the \n, in that case replace it with a real newline.
I know what I want to do with the sed output, so I will filter before sed and change the sed commands.
foundcolon="0";
grep -E "^:$|XSAP|AT" myfile.txt |
sed 's/$/EOL/' |
tr -d "\n" |
sed -e 's/EOLAT//g' -e 's/EOL/\n/g' -e 's/#/ /g' |
while read -r xsap corr numm rest_of_line; do
if [ "${foundcolon}" = "0" ]; then
if [ "${xsap}" = ":" ]; then
foundcolon="1"
fi
continue
fi
printf '"%s","%s","%s"\n' "${xsap}" "${corr}" "${numm}";
done
Using another sed option, sed -e '/address1/,/address2/ d' will make it even more simple:
grep -E "^:$|XSAP|AT" myfile.txt |
sed 's/$/EOL/' |
tr -d "\n" |
sed -e 's/EOLAT//g' -e 's/EOL/\n/g' -e '1,/^:$/ d' -e 's/#/ /g' |
while read -r xsap corr numm rest_of_line; do
printf '"%s","%s","%s"\n' "${xsap}" "${corr}" "${numm}";
done
Here's a more or less pure awk solution, which produces literally the
requested output for the given input file. It suffers from having no
knowledge of the problem domain.
awk '
/^:/ { start=1; next }
! start {next}
$1 == "AT" {
split(last,a,/#/)
printf "\"%s\",\"%s\",\"%s\"\n", a[1], a[2], $2
last=""
next
}
{
last=$0
}' data

Insert a date in a column using awk

I'm trying to format a date in a column of a csv.
The input is something like: 28 April 1966
And I'd like this output: 1966-04-28
which can be obtain with this code:
date -d "28 April 1966" +%F
So now I thought of mixing awk and this code to format the entire column but I can't find out how.
Edit :
Example of input : (separators "|" are in fact tabs)
1 | 28 April 1966
2 | null
3 | null
4 | 30 June 1987
Expected output :
1 | 1966-04-28
2 | null
3 | null
4 | 30 June 1987
A simple way is
awk -F '\\| ' -v OFS='| ' '{ cmd = "date -d \"" $3 "\" +%F 2> /dev/null"; cmd | getline $3; close(cmd) } 1' filename
That is:
{
cmd = "date -d \"" $3 "\" +%F 2> /dev/null" # build shell command
cmd | getline $3 # run, capture output
close(cmd) # close pipe
}
1 # print
This works because date doesn't print anything to its stdout if the date is invalid, so the getline fails and $3 is not changed.
Caveats to consider:
For very large files, this will spawn a lot of shells and processes in those shells (one each per line). This can become a noticeable performance drag.
Be wary of code injection. If the CSV file comes from an untrustworthy source, this approach is difficult to defend against an attacker, and you're probably better off going the long way around, parsing the date manually with gawk's mktime and strftime.
EDIT re: comment: To use tabs as delimiters, the command can be changed to
awk -F '\t' -v OFS='\t' '{ cmd = "date -d \"" $3 "\" +%F 2> /dev/null"; cmd | getline $3; close(cmd) } 1' filename
EDIT re: comment 2: If performance is a worry, as it appears to be, spawning processes for every line is not a good approach. In that case, you'll have to do the parsing manually. For example:
BEGIN {
OFS = FS
m["January" ] = 1
m["February" ] = 2
m["March" ] = 3
m["April" ] = 4
m["May" ] = 5
m["June" ] = 6
m["July" ] = 7
m["August" ] = 8
m["September"] = 9
m["October" ] = 10
m["November" ] = 11
m["December" ] = 12
}
$3 !~ /null/ {
split($3, a, " ")
$3 = sprintf("%04d-%02d-%02d", a[3], m[a[2]], a[1])
}
1
Put that in a file, say foo.awk, and run awk -F '\t' -f foo.awk filename.csv.
This should work with your given input
awk -F'\\|' -vOFS="|" '!/null/{cmd="date -d \""$3"\" +%F";cmd | getline $3;close(cmd)}1' file
Output
| 1 |1966-04-28
| 2 | null
| 3 | null
| 4 |1987-06-30
I would suggest using a language that supports parsing dates, like perl:
$ cat file
1 28 April 1966
2 null
3 null
4 30 June 1987
$ perl -F'\t' -MTime::Piece -lane 'print "$F[0]\t",
$F[1] eq "null" ? $F[1] : Time::Piece->strptime($F[1], "%d %B %Y")->strftime("%F")' file
1 1966-04-28
2 null
3 null
4 1987-06-30
The Time::Piece core module allows you to parse and format dates, using the standard format specifiers of strftime. This solution splits the input on a tab character and modifies the format if the second field is not "null".
This approach will be much faster than using system calls or invoking subprocesses, as everything is done in native perl.
Here is how you can do this in pure BASH and avoid calling system or getline from awk:
while IFS=$'\t' read -ra arr; do
[[ ${arr[1]} != "null" ]] && arr[1]=$(date -d "${arr[1]}" +%F)
printf "%s\t%s\n" "${arr[0]}" "${arr[1]}"
done < file
1 1966-04-28
2 null
3 null
4 1987-06-30
Only one date call and no code injection problem is possible, see the following:
This script extracts the dates (using awk) into a temporary file processes them with one "date" call and merges the results back (using awk).
Code
awk -F '\t' 'match($3,/null/) { $3 = "0000-01-01" } { print $3 }' input > temp.$$
date --file=temp.$$ +%F > dates.$$
awk -F '\t' -v OFS='\t' 'BEGIN {
while ( getline < "'"dates.$$"'" > 0 )
{
f1_counter++
if ($0 == "0000-01-01") {$0 = "null"}
date[f1_counter] = $0
}
}
{$3 = date[NR]}
1' input.$$
One-liner using bash process redirections (no temporary files):
inputfile=/path/to/input
awk -F '\t' -v OFS='\t' 'BEGIN {while ( getline < "'<(date -f <(awk -F '\t' 'match($3,/null/) { $3 = "0000-01-01" } { print $3 }' "$inputfile") +%F)'" > 0 ){f1_counter++; if ($0 == "0000-01-01") {$0 = "null"}; date[f1_counter] = $0}}{$3 = date[NR]}1' "$inputfile"
Details
here is how it can be used:
# configuration
input=/path/to/input
temp1=temp.$$
temp2=dates.$$
output=output.$$
# create the sample file (optional)
#printf "\t%s\n" $'1\t28 April 1966' $'2\tnull' $'3\tnull' $'4\t30 June 1987' > "$input"
# Extract all dates
awk -F '\t' 'match($3,/null/) { $3 = "0000-01-01" } { print $3 }' "$input" > "$temp1"
# transform the dates
date --file="$temp1" +%F > "$temp2"
# merge csv with transformed date
awk -F '\t' -v OFS='\t' 'BEGIN {while ( getline < "'"$temp2"'" > 0 ){f1_counter++; if ($0 == "0000-01-01") {$0 = "null"}; date[f1_counter] = $0}}{$3 = date[NR]}1' "$input" > "$output"
# print the output
cat "$output"
# cleanup
rm "$temp1" "$temp2" "$output"
#rm "$input"
Caveats
Using "0000-01-01" as a temporary placeholder for invalid (null) dates
The code should be faster than other methods calling "date" a lot of times, but it reads the input file two times.

Chain multiple awk commands and shell scripts in sequence

I have written an awk/shell script to process an input xml file and output another xml file with the desired elements. While this script works, I would like to simplify it so that I do not use any temporary files and instead pipe the output between commands.
Here's the script.
#extract elements
awk 'BEGIN {FS="[<|>]"} /(elementname).*$/{matchingstring=$0}
{ printf "%s\n", matchingstring}' input.xml > tmp.xml
#sort, uniq, append closing tag (/>)
for i in `cat tmp.xml | awk '{print $2}' |sort | uniq `; do grep -m 1 $i tmp.xml;
done | sort -r | sed "s/>$/\/>/" > tmp2.xml
# Append xml header and root element
awk 'BEGIN {
FS="[<|>]"}
NR==1{
print "<?xml version=\"1\.0\" encoding=\"UTF\-8\"?>"
print "<listofelements>"
};
{ printf "%s\n", $0 }
END { print "</listifelements>";}' tmp2.xml > final.xml
Any inputs would be much appreciated.
One of the improvements would be:
awk 'BEGIN {FS="[<|>]"} /(elementname).*$/{matchingstring=$0}
{ printf "%s\n", matchingstring}' input.xml > tmp.xml
Can be replaced with :
awk '/(elementname).*$/' input.xml > tmp.xml
And also this below:
awk 'BEGIN {
FS="[<|>]"}
NR==1{
print "<?xml version=\"1\.0\" encoding=\"UTF\-8\"?>"
print "<listofelements>"
};
{ printf "%s\n", $0 }
END { print "</listifelements>";}' tmp2.xml > final.xml
Can be changed to :
awk 'BEGIN {
print "<?xml version=\"1\.0\" encoding=\"UTF\-8\"?>";
print "<listofelements>"}
END {print "</listifelements>";}1' tmp2.xml > final.xml

Resources