How to convert the yaml to csv with array? - yaml

The example:
ProductLine:
ProductLineName: aa
ADO_FeedsList:
- organizationName: bb
Project:
- ProjectName: cc
ProjectFeedsName:
- dd
- ee
- ProjectName: ff
ProjectFeedsName:
- gg
- hh
OtherInfo: N/A
I expected the following output:
bb,cc,dd
bb,cc,ee
bb,ff,gg
bb,ff,hh
I have tried :
yq -o csv '.ProductLine.ADO_FeedsList[] |[.organizationName] + (.Project[]|.ProjectName)' test.yaml
It can output:
bb,cc
bb,ff
Then i tried:
yq -o csv '.ProductLine.ADO_FeedsList[] |[.organizationName] + (.Project[]|.ProjectName) + (.Project[]|.ProjectFeedsName[]|[.])' test.yaml
Error: !!seq (ProductLine.ADO_FeedsList.0.Project.0.ProjectFeedsName.0) cannot be added to a !!str (ProductLine.ADO_FeedsList.0.Project.0.ProjectName)
How to write the ProjectFeedsName array command?
I am a yq new user,could you share the method to format this yaml ?
Or is there any other way to format this yaml to csv?

When adding arrays, make sure that all parts have brackets:
yq -o csv '
.ProductLine.ADO_FeedsList[] | [.organizationName] + (
.Project[] | [.ProjectName] + (.ProjectFeedsName[] | [.])
)
' test.yaml
bb,cc,dd
bb,cc,ee
bb,ff,gg
bb,ff,hh

You could also use gojq, the Go implementation of jq; if you don't mind the way #csv quotes fields, you could consider:
gojq -r --yaml-input '
.ProductLine.ADO_FeedsList[]
| [.organizationName] +
( .Project[] | [.ProjectName] + (.ProjectFeedsName[]|[.]) )
| #csv
If you do mind, then perhaps replacing #csv by | join(",") will suffice.

Related

How can I parse a YAML file using a shell script?

I have a YAML file which also has lists.
YAML File -
configuration:
account: account1
warehouse: warehouse1
database: database1
object_type:
schema: schema1
functions: funtion1
tables:
- table: table1
sql_file_loc: some_path/some_file.sql
- table: table2
sql_file_loc: some_path/some_file.sql
I want to store the key-pair values to shell variable and loop it through. For example, the value for account/warehouse/database should go to variables which I can use later on. Also, the values for tables(table1 and table2) and sql_file_loc should go to shell variable which I can use for looping like below -
for i in $table ;do
echo $i
done
I have tried this code below -
function parse_yaml {
local prefix=$2
local s='[[:space:]]*' w='[a-zA-Z0-9_]*' fs=$(echo #|tr # '\034')
sed -ne "s|^\($s\):|\1|" \
-e "s|^\($s\)\($w\)$s:$s[\"']\(.*\)[\"']$s\$|\1$fs\2$fs\3|p" \
-e "s|^\($s\)\($w\)$s:$s\(.*\)$s\$|\1$fs\2$fs\3|p" $1 |
awk -F$fs '{
indent = length($1)/2;
vname[indent] = $2;
for (i in vname) {if (i > indent) {delete vname[i]}}
if (length($3) > 0) {
vn=""; for (i=0; i<indent; i++) {vn=(vn)(vname[i])("_")}
printf("%s%s%s=\"%s\"\n", "'$prefix'",vn, $2, $3);
}
}'
}
And this is the output I get -
configuration_account="account_name"
configuration_warehouse="warehouse_name"
configuration_database="database_name"
configuration_object_type_schema="schema1"
configuration_object_type_functions="funtion1"
configuration_object_type_tables__sql_file_loc="some_path/some_file.sql"
configuration_object_type_tables__sql_file_loc="some_path/some_file.sql"
It doesn't print -
configuration_object_type_tables__table="table1" and
configuration_object_type_tables__table="table2"
Also for a list, it prints two underscores(__) unlike other objects.
And I want to loop the values stored in configuration_object_type_tables__table and configuration_object_type_tables__sql_file_loc.
Any help would be appreciated!
Consider using a YAML processor mikefarah/yq.
It's a one liner:
yq e '.. | select(type == "!!str") | (path | join("_")) + "=\"" + . + "\""' "$INPUT"
Output
configuration_account="account1"
configuration_warehouse="warehouse1"
configuration_database="database1"
configuration_object_type_schema="schema1"
configuration_object_type_functions="funtion1"
configuration_object_type_tables_0_table="table1"
configuration_object_type_tables_0_sql_file_loc="some_path/some_file.sql"
configuration_object_type_tables_1_table="table2"
configuration_object_type_tables_1_sql_file_loc="some_path/some_file.sql"
Also take a look at this cool builtin feature of yq:
yq e -o props "$INPUT"
Output
configuration.account = account1
configuration.warehouse = warehouse1
configuration.database = database1
configuration.object_type.schema = schema1
configuration.object_type.functions = funtion1
configuration.object_type.tables.0.table = table1
configuration.object_type.tables.0.sql_file_loc = some_path/some_file.sql
configuration.object_type.tables.1.table = table2
configuration.object_type.tables.1.sql_file_loc = some_path/some_file.sql
I suggest you try yq yaml processor like jpseng mentioned.
About the code you have here, the regex is not matching the "- table" pattern due to "- " prifix.

jq null unix timestamps parsing issue

I'm trying to parse a big json file which I receive using curl.
By following this answer I could parse the next file:
$ cat test.json
{"items": [{"id": 110, "date1": 1590590723, "date2": 1590110000, "name": "somename"}]}
using the next command:
TZ=Europe/Kyiv jq -r '.[] | .[] | .name + "; " + (.date1|strftime("%B %d %Y %I:%M%p")) + "; " + (.date2|strftime("%B %d %Y %I:%M%p"))' test.json
Output is:
somename; May 27 2020 02:45PM; May 22 2020 01:13AM
But when I try to parse the next file using the same command:
$ cat test2.json
{"items": [{"id": 110, "date1": 1590590723, "date2": null, "name": "somename"}]}
Output is:
jq: error (at test2.json:1): strftime/1 requires parsed datetime inputs
I could replace those null values using sed by some valid values before parsing. But maybe there is a better way to skip (ignore) those values, leaving nulls in output:
somename; May 27 2020 02:45PM; null
You could tweak your jq program so that it reads:
def tod: if type=="number" then strftime("%B %d %Y %I:%M%p") else tostring end;
.[] | .[] | .name + "; " + (.date1|tod) + "; " + (.date2|tod)
An alternative would be:
def tod: (tonumber? | strftime("%B %d %Y %I:%M%p")) // null;
.[] | .[] | "\(.name); \(.date1|tod); \(.date2|tod)"

Get directory name with grep and remove it

please is there any simple way how can I get NAME output only from lines, where DATE < 5 days ago and then call other command called rm on these lines with NAME as argument?
I have the following output from mega-ls path/ -l (mega.nz) command:
FLAGS VERS SIZE DATE NAME
d--- - - 06Feb2020 05:00:01 bk_20200206050000
d--- - - 07Feb2020 05:00:01 bk_20200207050000
d--- - - 08Feb2020 05:00:01 bk_20200208050000
d--- - - 09Feb2020 05:00:01 bk_20200209050000
d--- - - 10Feb2020 05:00:01 bk_20200210050000
d--- - - 11Feb2020 05:00:01 bk_20200211050000
I tried grep, sort and other ways e.g. mega-ls path/ -l | head -n 5 but I don't know how to search these lines based on the date.
Thank you a lot.
I try find simple way for you request ;)
mega-ls path/ -l | head -n 5 | tr -s ' ' | cut -d ' ' -f6 | grep -v -e '^$' | grep '^bk_20200206.*' | xargs rm -f
Part 1 : This is you command (returned folders list by extra data)
mega-ls path/ -l | head -n 5
Part 2 : Try to remove extra space in your part 1 result
tr -s ' '
Part 3 : Try to use cut command to delimit result part 2 and return Name Folders column
cut -d ' ' -f6
Part 4 : Try to remove Empty lines from result part 3 (result of header line)
grep -v -e '^$'
Part 5 : This your request for search folders name by date yyyymmdd format example : 20200206 (replace 20200206 to your real date need)
grep '^bk_20200206.*'
Part 6 : (Very Important!!) If you need to delete result folders use this part (Very Important!!)
xargs rm -f
Best Regards

Bash: concatenated variables derived from text file using grep gives confused output

In my directory, I have a multiple nifti files (e.g., WIP944_mp2rage-0.75iso_TR5.nii) from my MRI scanner accompanied by text files (e.g., WIP944_mp2rage-0.75iso_TR5_info.txt) containing information on the acquisition parameters (e.g., "Series description: WIP944_mp2rage-0.75iso_TR5_INV1_PHS_ND"). Based on these parameters (e.g., INV1_PHS_ND), I need to change the nifti file name, which are echoed in $niftibase. I used grep to do this. When echoing all variables individually, it gives me what I want, but when I try to concatenate them into one filename, the variables are mixed together, instead of delimited by a dot.
I tried multiple forms of sed to cut away potentially invisible characters and identified the source of the problems: the "INV1_PHS_ND" part of 'series description' gives me troubles, which is the $struct component, potentially due to the fact that this part varies in how many fields are extracted. Sometimes this is 3 (in the case of INV1_PHS_ND), but it can be 2 as well (INV1_ND). When I introduce this variable into the filename, everything goes haywire.
for infofile in ${PWD}/*.txt; do
# General characteristics of subjects (i.e., date of session, group number, and subject number)
reco=$(grep -A0 "Series description:" ${infofile} | cut -d ' ' -f 3 | cut -d '_' -f 1)
date=$(grep -A0 "Series date:" ${infofile} | cut -c 16-21)
group=$(grep -A0 "Subject:" ${infofile} | cut -d '^' -f 2 | cut -d '_' -f 1 )
number=$(grep -A0 "Subject:" ${infofile} | cut -d '^' -f 2 | cut -d '_' -f 2)
ScanNr=$(grep -A0 "Series number:" ${infofile} | cut -d ' ' -f 3)
# Change name if reco has structural prefix
if [[ $reco = *WIP944* ]]; then
struct=$(grep -A0 "Series description: WIP944" ${infofile} | cut -d '_' -f 4,5,6)
niftibase=$(basename $infofile _info.txt).nii
#echo ${subStudy}.struct.${date}.${group}.${protocol}.${paradigm}.nii
echo ${subStudy}.struct.${struct}.${date}.${group}.${protocol}${number}.${paradigm}.n${ScanNr}.nii
#mv ${niftibase} ${subStudy}.struct.${struct}.${date}.${group}.${protocol}${number}.${paradigm}.n${ScanNr}.nii
fi
done
This gives me output like this:
.niit47.n4lot.Noc002
.niit47.n5lot.Noc002D
.niit47.n6lot.Noc002
.niit47.n8lot.Noc002
.niit47.n9lot.Noc002
.niit47.n10ot.Noc002
.niit47.n11ot.Noc002D
for all 7 WIP944 files. However, it needs to be in the direction of this:
H1.struct.INV2_PHS_ND.190523.Pilot.Noc001.Heat47.n11.nii, where H1, Noc, and Heat47 are loaded in from a setup file.
EDIT: I tried to use awk in the following way:
reco=$(awk 'FNR==8 {print;exit}' $infofile | cut -d ' ' -f 3 | cut -d '_' -f 1)
date=$(awk 'FNR==2 {print;exit}' $infofile | cut -c 15-21)
group=$(awk 'FNR==6 {print;exit}' $infofile | cut -d '^' -f 2 | cut -d '_' -f 1 )
number=$(awk 'FNR==6 {print;exit}' $infofile | cut -d '^' -f 2 | cut -d '_' -f 2)
ScanNr=$(awk 'FNR==14 {print;exit}' $infofile | cut -d ' ' -f 3)
which again gave me the correct output when echoing the variables individually, but not when I tried to combine them: .niit47.n11022_PHS_ND.
I used echo "$struct" | tr -dc '[:print:]' | od -c to see if there were hidden characters due to line endings, which resulted in:
0000000 I N V 2 _ P H S _ N D
0000013
EDIT: This is how the text file looks like:
Series UID: 1.3.12.2.1107.5.2.34.18923.2019052316005066316714852.0.0.0
Study date: 20190523
Study time: 153529.718000
Series date: 20190523
Series time: 160111.750000
Subject: MDC-0153,pilot_003^pilot_003
Subject birth date: 19970226
Series description: WIP944_mp2rage-0.75iso_TR5_INV1_PHS_ND
Image type: ORIGINAL\PRIMARY\P\ND
Manufacturer: SIEMENS
Model name: Investigational_Device_7T
Software version: syngo MR B17
Study id: 1
Series number: 5
Repetition time (ms): 5000
Echo time[1] (ms): 2.51
Inversion time (ms): 900
Flip angle: 7
Number of averages: 1
Slice thickness (mm): 0.75
Slice spacing (mm):
Image columns: 320
Image rows: 320
Phase encoding direction: ROW
Voxel size x (mm): 0.75
Voxel size y (mm): 0.75
Number of volumes: 1
Number of slices: 240
Number of files: 240
Number of frames: 0
Slice duration (ms) : 0
Orientation: sag
PixelBandwidth: 248
I have one of these for each nifti file. subStudy is hardcoded in a setup file, which is loaded in prior to running the for loop. When I echo this, it shows the correct value. I need to change the names of multiple files with a specific prefix, which are stored in $reco.
As confirmed in comments, the input files have DOS carriage returns, which are basically invalid in Unix files. Also, you should pay attention to proper quoting.
As a general overhaul, I would recommend replacing the entire Bash script with a simple Awk script, which is both simpler and more idiomatic.
for infofile in ./*.txt; do # no need to use $(PWD)
# Pre-filter with a simple grep
grep -q '^Series description: [^ _]*WIP944' "$infofile" && continue
# Still here? Means we want to rename
suffix="$(awk -F : '
BEGIN { split("Series description:Series date:Subject:Series number", f, /:/) }
{ sub(/\r/, ""); } # get rid of pesky DOS carriage return
NR == 1 { nifbase = FILENAME; sub(/_info\.txt$/, ".nii", nifbase) }
$1 in f { x[$1] = substring($0, length($1)+2) }
END {
split(x["Series description"], t, /_/); struct=t[4] "_" t[5] "_" t[6]
split(x["Series description"], t, /_/); reco = t[1]
date=substr(x["Series date"], 16, 5)
split(x["Subject"], t, /\^/); split(t[2], tt, /_/); group=tt[1]
number=tt[2]
ScanNr=x["Series number"]
### FIXME: protocol and paradigm are still undefined
print struct "." date "." group "." protocol number "." paradigm ".n" ScanNr
}' "$infofile")"
echo mv "$infofile" "$subStudy.struct.$suffix"
done
This probably still requires some tweaking (at least "protocol" and "paradigm" are still undefined). Once it seems to print the correct values, you can remove the echo before mv and have it actually rename files for you.
(Probably still better test on a copy of your real data files first!)

AWK Script to read from log file

I have a requirement to read certain parameters from log file and then update to a database. I am trying to achieve the first part, i.e. to read from log file using awk commands in a shell script
Log file may consists of below lines or more-
[2018-05-22T11:35:17,857] [RQST: rqst_3ADE-5439-598D-1B8B | TB: 9000042] - [588455375] - INFO - com.test.webapp.services.functions.TestTransactionService - Line 769 - requestType="TESTING",partnerName="Test Merchant 123",testId="123456",lob="TEST1_TO_TEST2",tranType="TEST1",paymentType="P2M",amount="110.00",currency="840",processor="CBN",network="TestSend",responseCode="00", acctNumLastFour="0087",binCountry="USA",binCurr="USD"
[2018-05-22T11:35:17,857] [RQST: rqst_2AEF-2339-598D-1B8B | TB: 9000043] - [588455376] - INFO - com.test.webapp.services.functions.TestTransactionService - Line 770 - requestType="TESTING",partnerName="Test Merchant 234",testId="234567",lob="TEST2_TO_TEST3",tranType="TEST2",paymentType="P2M",amount="120.00",currency="850",processor="CBN",network="TestSend",responseCode="00", acctNumLastFour="0087",binCountry="USA",binCurr="USD"
[2018-05-22T11:35:17,857] [RQST: rqst_4EDA-4539-598D-1B8B | TB: 9000044] - [588455377] - INFO - com.test.webapp.services.functions.TestTransactionService - Line 771 - requestType="TESTING",partnerName="Test Merchant 345",testId="345678",lob="TEST3_TO_TEST4",tranType="TEST3",paymentType="P2M",amount="130.00",currency="860",processor="CBN",network="TestSend",responseCode="00", acctNumLastFour="0087",binCountry="USA",binCurr="USD"
I need to apply filters processor and paymentType and retrieve values of the amount, currency, network and responseCode to variables in a shell script which will be inserted into an Oracle DB table.
I am new to ShellScript and AWK and unable to wrap this. I have tried using
awk '/amount/{print}' testAPI.log
however, is returning all rows which have amount.
since you didn't specify the expected output, here is a template you can tailor for your needs
$ awk -F' - ' '{n=split($NF,a,",");
for(i=1;i<=n;i++) {split(a[i],b,"="); kv[b[1]]=b[2]}}
kv["processor"]=="\"CBN\""
&& kv["paymentType"]=="\"P2M\""{print kv["amount"],kv["currency"]}' file
"110.00" "840"
"120.00" "850"
"130.00" "860"
you can trim the double quotes as well but not sure it's needed as is...
I tried with the three entries in the question, below gives you the output you want
it checks if $5 is paymentType="P2M" and if $8 is having the value processor="CBN" basically, the filter you were looking for, substitute with the required filters you need.
cat testAccelAPI.log | grep -i "[RQST: rqst" | cut -d ' ' -f 19 | awk -F, '{ if($5=="paymentType=\"P2M\"" && $8=="processor=\"CBN\"") print $5 "=" $6 "="$7 "="$8 "=" $9 "="$10}' | cut -d= -f 4,6,8,9 | tr = " "

Resources