Extract json value on regex on bash script - bash

How can i get the values inner depends in bash script?
manifest.py
# Commented lines
{
'category': 'Sales/Subscription',
'depends': [
'sale_subscription',
'sale_timesheet',
],
'auto_install': True,
}
Expected response:
sale_subscription sale_timesheet
The major problem is linebreak, i have already tried | grep depends but i can not get the sale_timesheet value.
Im trying to add this values comming from files into a var, like:
DOWNLOADED_DEPS=($(ls -A $DOWNLOADED_APPS | while read -r file; do cat $DOWNLOADED_APPS/$file/__manifest__.py | [get depends value])
Example updated.

If this is your JSON file:
{
"category": "Sales/Subscription",
"depends": [
"sale_subscription",
"sale_timesheet"
],
"auto_install": true
}
You can get the desired result using jq like this:
jq -r '.depends | join(" ")' YOURFILE.json
This uses .depends to extract the value from the depends field, pipes it to join(" ") to join the array with a single space in between, and uses -r for raw (unquoted) output.

If it is not a json file and only string then you can use below Regex to find the values. If it's json file then you can use other methods like Thomas suggested.
^'depends':\s*(?:\[\s*)(.*?)(?:\])$
demo
you can use egrep for this as follows:
% egrep -M '^\'depends\':\s*(?:\[\s*)(.*?)(?:\])$' pathTo\jsonFile.txt
you can read about grep

As #Thomas has pointed out in a comment, the OPs input data is not in JSON format:
$ cat manifest.py
# Commented lines // comments not allowed in JSON
{
'category': 'Sales/Subscription', // single quotes should be replaced by double quotes
'depends': [
'sale_subscription',
'sale_timesheet', // trailing comma at end of section not allowed
],
'auto_install': True, // trailing comma issue; should be lower case "true"
}
And while the title of the question mentions regex, there is no sign of a regex in the question. I'll leave a regex based solution for someone else to come up with and instead ...
One (quite verbose) awk solution based on the input looking exactly like what's in the question:
$ awk -F"'" ' # use single quote as field separator
/depends/ { printme=1 ; next } # if we see the string "depends" then set printme=1
printme && /]/ { printme=0 ; next} # if printme=1 and line contains a right bracket then set printme=0
printme { printf pfx $2; pfx=" " } # if printme=1 then print a prefix + field #2;
# first time around pfx is undefined;
# subsequent passes will find pfx set to a space;
# since using "printf" with no "\n" in sight, all output will stay on a single line
END { print "" } # add a linefeed on the end of our output
' json.dat
This generates:
sale_subscription sale_timesheet

Related

How to extract text between two patterns with sed/awk

I know this has been asked 1000 times here, but I read a lot of similar questions and still did not manage to find the right way to do this. I need to extract a number from a line that looks like this:
{"version":"4.9.123M","info":{"version":[2034.2],"description":""},"status":"OK"}
Expected output:
2034.2
This version number will not always be the same, but the rest of the line should.
I have tried working with sed but I am new to this and failed:
sed -e 's/version":[\(.*\),"description/\1/'
output:
sed: -e expression #1, char 35: unterminated `s' command
I think the issue is that there are too many special characters involved in the line and I did not write the command very well.
Since it's JSON, use should use JSON aware tools for processing it. If you prefer, for example, awk, the way is to use GNU awk's JSON extension. This is a small how-to.
First download and compile appropriate versions of GNU awk, Gawkextlib and gawk-json. That's pretty straightforward, actually, just ./configure and make. Then, write some code:
awk '
#load "json" # enable json extension
{
lines=lines $0 # read json file records and buffer to var lines
if(json_fromJSON(lines,data)==1) { # once the json is complete
for(i in data["info"]["version"]) # that seems to be an array so all elements
print data["info"]["version"][i] # are outputed
lines="" # once done with the first json object
} # reset the var for more lines
}' file
Output this time:
2034.2
Explained a bit more:
The JSON file structure can vary from one line to multiple lines, for example:
{"version":"4.9.123M","info":{"version":[2034.2],"description":""},"status":"OK"}
or:
{
"version": "4.9.123M",
"info": {
"version": [
2034.2
],
"description": ""
},
"status": "OK"
}
so we need to buffer the JSON lines with lines=lines $0 until there is a whole valid object in variable lines. We use the extension function json_fromJSON() to determine that validity in if(json_fromJSON(lines,data)==1). While validated the object gets disentangled and stored to array data. For this particular object the structure of the array is:
data["version"]="4.9.123M"
data["info"]["version"][1]="2034.2"
data["info"]["description"]=""
data["status"]="OK"
We could examine the object and produce some output of it with this recursive array scanning function:
awk '
#load "json"
function scan(a,p, q) { # a is array, p path to it, q is qnd *
if(isarray(a))
for(i in a) {
q=p (p==""?"":"->") i
scan(a[i],q)
}
else
print p ":" a
}
{
lines=lines $0
if(json_fromJSON(lines,data)==1)
scan(data) #
}' file.json
Output:
status:OK
version:4.9.123M
info->version->1:2034.2
info->description:
*) quick'n dirty
Here is a brief example of how to output JSON from an array: https://stackoverflow.com/a/58109715/4162356
If the version is always enclosed in [] and no other [ or ] is present in a line ,you can try this logic
STR='{"version":"4.9.123M","info":{"version":[2034.2],"description":""},"status":"OK"}'
echo $STR | awk -F'[' '{print $2}' | awk -F']' '{print $1}'
Simplest Way
Try grep when want to extract simple texts
echo "{"version":"4.9.123M","info":{"version":[2034.2],"description":""},"status":"OK"}"| grep -o "\[.*\]" | sed -e 's/\[\|\]//g'
This should do:
STR='{"version":"4.9.123M","info":{"version":[2034.2],"description":""},"status":"OK"}'
echo "$STR" | awk -F'[][]' '{print $2}'
2034.2

convert 1 field of awk to base64 and leave the rest intact

I'm creating a one liner where my ldap export is directly converted into a csv.
So far so good but the challange is now that 1 column of my csv needs to contain base64 encoded values. These values are comming as clear text out of the ldap search filter. So I basically need them converted during the awk creation.
What I have is:
ldapsearch | awk -v OFS=',' '{split($0,a,": ")} /^blobinfo:/{blob=a[2]} /^cn:/{serialnr=a[2]} {^mode=a[2]; print serialnr, mode, blob}'
This gives me a csv output as intended but now I need to convert blob to base64 encoded output.
Getline is not available
demo input:
cn: 1313131313
blobinfo: a string with spaces
mode: d121
cn: 131313asdf1313
blobinfo: an other string with spaces
mode: d122
ouput must be like
1313131313,D121,YSBzdHJpbmcgd2l0aCBzcGFjZXM=
where YSBzdHJpbmcgd2l0aCBzcGFjZXM= is the encoded a string with spaces
but now I get
1313131313,D121,a string with spaces
Something like this, maybe?
$ perl -MMIME::Base64 -lne '
BEGIN { $, = "," }
if (/^cn: (.+)/) { $s = $1 }
if (/^blobinfo: (.+)/) { $b = encode_base64($1, "") }
if (/^mode: (.+)/) { print $s, $1, $b }' input.txt
1313131313,d121,YSBzdHJpbmcgd2l0aCBzcGFjZXM=
131313asdf1313,d122,YW4gb3RoZXIgc3RyaW5nIHdpdGggc3BhY2Vz
If you can't use getline and you just need to output the csv (you can't further process the base64'd field), change the order of fields in output and abuse the system's newline. First, a bit modified input data (changed order, missing field):
cn: 1313131313
blobinfo: a string with spaces
mode: d121
blobinfo: an other string with spaces
mode: d122
cn: 131313asdf1313
cn: 131313asdf1313
mode: d122
The awk:
$ awk '
BEGIN {
RS="" # read in a block of rows
FS="\n" # newline is the FS
h["cn"]=1 # each key has a fixed buffer slot
h["blobinfo"]=2
h["mode"]=3
}
{
for(i=1;i<=NF;i++) { # for all fields
split($i,a,": ") # split to a array
b[h[a[1]]]=a[2] # store to b uffer
}
printf "%s,%s,",b[1],b[3] # output all but blob, no newline
system("echo " b[2] "| base64") # let system output the newline
delete b # buffer needs to be reset
}' file # well, I used file for testing, you can pipe
ANd the output:
1313131313,d121,YSBzdHJpbmcgd2l0aCBzcGFjZXMK
131313asdf1313,d122,YW4gb3RoZXIgc3RyaW5nIHdpdGggc3BhY2VzCg==
131313asdf1313,d122,Cg==

Convert a key:value file w/ comments into JSON document with UNIX tools

I have a file in a subset of YAML with data such as the below:
# This is a comment
# This is another comment
spark:spark.ui.enabled: 'false'
spark:spark.sql.adaptive.enabled: 'true'
yarn:yarn.nodemanager.log.retain-seconds: '259200'
I need to convert that into a JSON document looking like this (note that strings containing booleans and integers still remain strings):
{
"spark:spark.ui.enabled": "false",
"spark:spark.sql.adaptive.enabled": "true",
"yarn:yarn.nodemanager.log.retain-seconds", "259200"
}
The closest I got was this:
cat << EOF > ./file.yaml
> # This is a comment
> # This is another comment
>
>
> spark:spark.ui.enabled: 'false'
> spark:spark.sql.adaptive.enabled: 'true'
> yarn:yarn.nodemanager.log.retain-seconds: '259200'
> EOF
echo {$(cat file.yaml | grep -o '^[^#]*' | sed '/^$/d' | awk -F": " '{sub($1, "\"&\""); print}' | paste -sd "," - )}
which apart from looking rather gnarly doesn't give the correct answer, it returns:
{"spark:spark.ui.enabled": 'false',"spark:spark.sql.adaptive.enabled": 'true',"dataproc:dataproc.monitoring.stackdriver.enable": 'true',"spark:spark.submit.deployMode": 'cluster'}
which, if I pipe to jq causes a parse error.
I'm hoping I'm missing a much much easier way of doing this but I can't figure it out. Can anyone help?
Implemented in pure jq (tested with version 1.6):
#!/usr/bin/env bash
jq_script=$(cat <<'EOF'
def content_for_line:
"^[[:space:]]*([#]|$)" as $ignore_re | # regex for comments, blank lines
"^(?<key>.*): (?<value>.*)$" as $content_re | # regex for actual k/v pairs
"^'(?<value>.*)'$" as $quoted_re | # regex for values in single quotes
if test($ignore_re) then {} else # empty lines add nothing to the data
if test($content_re) then ( # non-empty: match against $content_re
capture($content_re) as $content | # ...and put the groups into $content
$content.key as $key | # string before ": " becomes $key
(if ($content.value | test($quoted_re)) then # if value contains literal quotes...
($content.value | capture($quoted_re)).value # ...take string from inside quotes
else
$content.value # no quotes to strip
end) as $value | # result of the above block becomes $value
{"\($key)": "\($value)"} # and return a map from one key to one value
) else
# we get here if a line didn't match $ignore_re *or* $content_re
error("Line \(.) is not recognized as a comment, empty, or valid content")
end
end;
# iterate over our input lines, passing each one to content_for_line and merging the result
# into the object we're building, which we eventually return as our result.
reduce inputs as $item ({}; . + ($item | content_for_line))
EOF
)
# jq -R: read input as raw strings
# jq -n: don't read from stdin until requested with "input" or "inputs"
jq -Rn "$jq_script" <file.yaml >file.json
Unlike syntax-unaware tools, this can never generate output that isn't valid JSON; and it can easily be extended with application-specific logic (f/e, to emit some values but not others as numeric literals rather than string literals) by adding an additional filter stage to inspect and modify the output of content_for_line.
Here's a no-frills but simple solution:
def tidy: sub("^ *'?";"") | sub(" *'?$";"");
def kv: split(":") | [ (.[:-1] | join(":")), (.[-1]|tidy)];
reduce (inputs| select( test("^ *#|^ *$")|not) | kv) as $row ({};
.[$row[0]] = $row[1] )
Invocation
jq -n -R -f tojson.jq input.txt
You can do it all in awk using gsub and sprintf, for example:
(edit to add "," separating json records)
awk 'BEGIN {ol=0; print "{" }
/^[^#]/ {
if (ol) print ","
gsub ("\047", "\042")
$1 = sprintf (" \"%s\":", substr ($1, 1, length ($1) - 1))
printf "%s %s", $1, $2
ol++
}
END { print "\n}" }' file.yaml
(note: though jq is the proper tool for json formatting)
Explanation
awk 'BEGIN { ol=0; print "{" } call awk setting the output line variable ol=0 for "," output control and printing the header "{",
/^[^#]/ { only match non-comment lines,
if (ol) print "," if the output line ol is greater than zero, output a trailing ","
gsub ("\047", "\042") replace all single-quotes with double-quotes,
$1 = sprintf (" \"%s\":", substr ($1, 1, length ($1) - 1)) add 2 leading spaces and double-quotes around the first field (except for the last character) and then append a ':' at the end.
print $1, $2 output the reformatted fields,
ol++ increment the output line count, and
END { print "}" }' close by printing the "}" footer
Example Use/Output
Just select/paste the awk command above (changing the filename as needed)
$ awk 'BEGIN {ol=0; print "{" }
> /^[^#]/ {
> if (ol) print ","
> gsub ("\047", "\042")
> $1 = sprintf (" \"%s\":", substr ($1, 1, length ($1) - 1))
> printf "%s %s", $1, $2
> ol++
> }
> END { print "\n}" }' file.yaml
{
"spark:spark.ui.enabled": "false",
"spark:spark.sql.adaptive.enabled": "true"
}

How to do an insertion of text before a multi-line regex using sed or awk?

Given the following input (not literally what follows, but shown with some meta notation):
... any content can be above the match ...
# ... optional comment above the match ...
# ... optional comment above the match can have spaces before it ...
"<key>": ... any content can follow ...
... any content can be below the match ...
where the match is ^\s*"<key>": where the <key> is a placeholder for an actual string. Note that comments are matched by ^\s*#.*.
I want to insert a string of text before the matched <key> and before any comments that are immediately above the matched <key>. There may be a variable number of comments, or none at all.
I've come up with a solution using sed; however, it is very ugly because it uses a tr hack. I'm hoping for a simpler solution using either sed or awk.
First, here's a test case:
test.txt:
{
# 1a
# 2a
"key1": true,
# 1b
# 2b
"key2": false,
}
Now my present solution involves sed and translating all newlines to a delimiter character ($'\x01') to make it easier to do multi-line operations. My example involves a regex that matches multiple comment lines followed by a key-value pair.
# The string to insert before the match
s='# 1x
# 2x
"keyx": null,
'
# Define the key before which to do the insertion:
Key='key2'
# Normalize that string: s -> ns
ns="$(printf '%s' "$s" | tr '\n' $'\x01')"
# Normalize test.txt
tr '\n' $'\x01' < test.txt |
# Perform the multi-line insertion
sed "s/\(^\|\x01\)\(\(\s*#[^\x01]*\x01\)*\)\(\s*\"$Key\":\)/\1$ns\2\4/" |
# Return to standard form with newlines
tr $'\x01' '\n'
The above code when executed with the test.txt input produces the correct and expected output:
{
# 1a
# 2a
"key1": true,
# 1x
# 2x
"keyx": null,
# 1b
# 2b
"key2": false,
}
How might I improve on what I've done above using sed or awk to make for more maintainable code? Specifically:
Is there another way to do this using sed without the tr hack above?
Is there a simpler way to do this using awk?
Following your update that the input could include either no or varying amounts of comments, this is the edit (due to some problems editing it, I'm having to edit out v1, so if you want it back leave a comment.)
sed doesn't do loops or if/elses really, just labels and branches, so trying to pick a range of lines is a bit more complicated it seems. Or at least for my knowledge level.
export key='key2'
s='# 1x\n# 2x\n"keyx": null,\n'
key_pattern='[[:space:]]*"'"$key"'":'
sed -n '
/'"$key_pattern"'/ {
:b; i\
'"$s"'
p; d
}
/^[[:space:]]*#/ {
h; :a; n; H
/^[[:space:]]*#/ ba
/'"$key_pattern"'/ { x; bb; }
x; p; d;
}
p
'
This script breaks into three types of patterns; where the key_pattern matches but is on its own (no comments before):
/'"$key_pattern"'/ { # here :b creates label b,
:b; i\ # and inserts
'"$s"' # the contents of this line
p; d # print then delete from buffer and start next line
}
When a group of comments is followed by the key_pattern:
/^[[:space:]]*#/ { # if comment found
h; # copy pattern space into hold space
:a; # create label a
n; H # get next line, append to hold space.
/^[[:space:]]*#/ ba # if new line is comment, goto `a`
/'"$key_pattern"'/ { x; bb; } # else if our pattern retrieve hold
# and goto `b`
x; p; d; # retrieve hold space, print and delete
}
And finally, When the line doesn't match anything else:
p; # print line and start next.
The following code comes with these assumptions:
Blank line between keys and data
Curly braces not elsewhere
awk '/key2/{$0 = "# 1x\n# 2x\n\"keyx\": null,\n\n"$0}ORS = RT' RS='[{}\n]\n' input_file
The main focus here is on setting up the RS value so it delimits each record

Parse out key=value pairs into variables

I have a bunch of different kinds of files I need to look at periodically, and what they have in common is that the lines have a bunch of key=value type strings. So something like:
Version=2 Len=17 Hello Var=Howdy Other
I would like to be able to reference the names directly from awk... so something like:
cat some_file | ... | awk '{print Var, $5}' # prints Howdy Other
How can I go about doing that?
The closest you can get is to parse the variables into an associative array first thing every line. That is to say,
awk '{ delete vars; for(i = 1; i <= NF; ++i) { n = index($i, "="); if(n) { vars[substr($i, 1, n - 1)] = substr($i, n + 1) } } Var = vars["Var"] } { print Var, $5 }'
More readably:
{
delete vars; # clean up previous variable values
for(i = 1; i <= NF; ++i) { # walk through fields
n = index($i, "="); # search for =
if(n) { # if there is one:
# remember value by name. The reason I use
# substr over split is the possibility of
# something like Var=foo=bar=baz (that will
# be parsed into a variable Var with the
# value "foo=bar=baz" this way).
vars[substr($i, 1, n - 1)] = substr($i, n + 1)
}
}
# if you know precisely what variable names you expect to get, you can
# assign to them here:
Var = vars["Var"]
Version = vars["Version"]
Len = vars["Len"]
}
{
print Var, $5 # then use them in the rest of the code
}
$ cat file | sed -r 's/[[:alnum:]]+=/\n&/g' | awk -F= '$1=="Var"{print $2}'
Howdy Other
Or, avoiding the useless use of cat:
$ sed -r 's/[[:alnum:]]+=/\n&/g' file | awk -F= '$1=="Var"{print $2}'
Howdy Other
How it works
sed -r 's/[[:alnum:]]+=/\n&/g'
This places each key,value pair on its own line.
awk -F= '$1=="Var"{print $2}'
This reads the key-value pairs. Since the field separator is chosen to be =, the key ends up as field 1 and the value as field 2. Thus, we just look for lines whose first field is Var and print the corresponding value.
Since discussion in commentary has made it clear that a pure-bash solution would also be acceptable:
#!/bin/bash
case $BASH_VERSION in
''|[0-3].*) echo "ERROR: Bash 4.0 required" >&2; exit 1;;
esac
while read -r -a words; do # iterate over lines of input
declare -A vars=( ) # refresh variables for each line
set -- "${words[#]}" # update positional parameters
for word; do
if [[ $word = *"="* ]]; then # if a word contains an "="...
vars[${word%%=*}]=${word#*=} # ...then set it as an associative-array key
fi
done
echo "${vars[Var]} $5" # Here, we use content read from that line.
done <<<"Version=2 Len=17 Hello Var=Howdy Other"
The <<<"Input Here" could also be <file.txt, in which case lines in the file would be iterated over.
If you wanted to use $Var instead of ${vars[Var]}, then substitute printf -v "${word%%=*}" %s "${word*=}" in place of vars[${word%%=*}]=${word#*=}, and remove references to vars elsewhere. Note that this doesn't allow for a good way to clean up variables between lines of input, as the associative-array approach does.
I will try to explain you a very generic way to do this which you can adapt easily if you want to print out other stuff.
Assume you have a string which has a format like this:
key1=value1 key2=value2 key3=value3
or more generic
key1_fs2_value1_fs1_key2_fs2_value2_fs1_key3_fs2_value3
With fs1 and fs2 two different field separators.
You would like to make a selection or some operations with these values. To do this, the easiest is to store these in an associative array:
array["key1"] => value1
array["key2"] => value2
array["key3"] => value3
array["key1","full"] => "key1=value1"
array["key2","full"] => "key2=value2"
array["key3","full"] => "key3=value3"
This can be done with the following function in awk:
function str2map(str,fs1,fs2,map, n,tmp) {
n=split(str,map,fs1)
for (;n>0;n--) {
split(map[n],tmp,fs2);
map[tmp[1]]=tmp[2]; map[tmp[1],"full"]=map[n]
delete map[n]
}
}
So, after processing the string, you have the full flexibility to do operations in any way you like:
awk '
function str2map(str,fs1,fs2,map, n,tmp) {
n=split(str,map,fs1)
for (;n>0;n--) {
split(map[n],tmp,fs2);
map[tmp[1]]=tmp[2]; map[tmp[1],"full"]=map[n]
delete map[n]
}
}
{ str2map($0," ","=",map) }
{ print map["Var","full"] }
' file
The advantage of this method is that you can easily adapt your code to print any other key you are interested in, or even make selections based on this, example:
(map["Version"] < 3) { print map["var"]/map["Len"] }
The simplest and easiest way is to use the string substitution like this:
property='my.password.is=1234567890=='
name=${property%%=*}
value=${property#*=}
echo "'$name' : '$value'"
The output is:
'my.password.is' : '1234567890=='
Yore.
Using bash's set command, we can split the line into positional parameters like awk.
For each word, we'll try to read a name value pair delimited by =.
When we find a value, assign it to the variable named $key using bash's printf -v feature.
#!/usr/bin/env bash
line='Version=2 Len=17 Hello Var=Howdy Other'
set $line
for word in "$#"; do
IFS='=' read -r key val <<< "$word"
test -n "$val" && printf -v "$key" "$val"
done
echo "$Var $5"
output
Howdy Other
SYNOPSIS
an awk-based solution that doesn't require manually checking the fields to locate the desired key pair :
approach being avoid splitting unnecessary fields or arrays - only performing regex match via function call when needed
only returning FIRST occurrence of input key value. Subsequent matches along the row are NOT returned
i just called it S() cuz it's the closest letter to $
I only included an array (_) of the 3 test values for demo purposes. Those aren't needed. In fact, no state information is being kept at all
caveat being : key-match must be exact - this version of the code isn't for case-insensitive or fuzzy/agile matching
Tested and confirmed working on
- gawk 5.1.1
- mawk 1.3.4
- mawk-2/1.9.9.6
- macos nawk
CODE
# gawk profile, created Fri May 27 02:07:53 2022
{m,n,g}awk '
function S(__,_) {
return \
! match($(_=_<_), "(^|["(_="[:blank:]]")")"(__)"[=][^"(_)"*") \
? "^$" \
: substr(__=substr($-_, RSTART, RLENGTH), index(__,"=")+_^!_)
}
BEGIN { OFS = "\f" # This array is only for testing
_["Version"] _["Len"] _["Var"] # purposes. Feel free to discard at will
} {
for (__ in _) {
print __, S(__) } }'
OUTPUT
Var
Howdy
Len
17
Version
2
So either call the fields in BAU fashion
- $5, $0, $NF, etc
or call S(QUOTED_KEY_VALUE), case-sensitive, like
As a safeguard, to prevent mis-interpreting null strings
or invalid inputs as $0, a non-match returns ^$
instead of empty string
S("Version") to get back 2.
As a bonus, it can safely handle values in multibyte unicode, both for values and even for keys, regardless of whether ur awk is UTF-8-aware or not :
1 ✜
🤡
2 Version
2
3 Var
Howdy
4 Len
17
5 ✜=🤡 Version=2 Len=17 Hello Var=Howdy Other
I know this is particularly regarding awk but mentioning this as many people come here for solutions to break down name = value pairs ( with / without using awk as such).
I found below way simple straight forward and very effective in managing multiple spaces / commas as well -
Source: http://jayconrod.com/posts/35/parsing-keyvalue-pairs-in-bash
change="foo=red bar=green baz=blue"
#use below if var is in CSV (instead of space as delim)
change=`echo $change | tr ',' ' '`
for change in $changes; do
set -- `echo $change | tr '=' ' '`
echo "variable name == $1 and variable value == $2"
#can assign value to a variable like below
eval my_var_$1=$2;
done

Resources