unix parse values from key-value pairs and print nested results

unix parse values from key-value pairs and print nested results - bash

I managed to parse a custom yaml using below script from How can I parse a YAML file from a Linux shell script? by Stefan:
function parse_yaml {
local prefix=$2
local s='[[:space:]]*' w='[a-zA-Z0-9_]*' fs=$(echo #|tr # '\034')
sed -ne "s|^\($s\):|\1|" \
-e "s|^\($s\)\($w\)$s:$s[\"']\(.*\)[\"']$s\$|\1$fs\2$fs\3|p" \
-e "s|^\($s\)\($w\)$s:$s\(.*\)$s\$|\1$fs\2$fs\3|p" $1 |
awk -F$fs '{
indent = length($1)/2;
vname[indent] = $2;
for (i in vname) {if (i > indent) {delete vname[i]}}
if (length($3) > 0) {
vn=""; for (i=0; i<indent; i++) {vn=(vn)(vname[i])("_")}
printf("%s%s%s=\"%s\"\n", "'$prefix'",vn, $2, $3);
}
}'
}
Output:
$ parse_yaml new_export.yaml
schemas_name="exports"
schemas_tables_name="TEST1"
schemas_tables_description="'"Tracks analysis"
schemas_tables_active_date="2019-01-07 00:00:00"
schemas_tables_columns_name="event_create_ts"
schemas_tables_columns_type="timestamp without time zone"
schemas_tables_columns_name="issue_id"
schemas_tables_columns_type="bigint"
schemas_tables_columns_description="conv id"
schemas_tables_columns_example="21352352"
schemas_tables_columns_name="company_id"
schemas_tables_columns_type="bigint"
schemas_tables_columns_description="'"Tracks analysis"
schemas_tables_columns_example="10001"
schemas_tables_name="TEST2"
schemas_tables_description="This table presents funny encounters"
schemas_tables_active_date="2018-12-18 00:00:00"
schemas_tables_columns_name="instance_ts"
schemas_tables_columns_type="datetime"
schemas_tables_columns_description="|-"
schemas_tables_columns_example="2018-03-03 12:30:00"
schemas_tables_columns_name="address_id"
schemas_tables_columns_type="bigint"
How can I generate a csv file out of it using nested hierarchy for each table and its colum etc based on the Keys ?
Something like below:
exports.TEST1.event_create_ts,"timestamp without time zone"
exports.TEST1.issue_id,bigint,"conv id",21352352
exports.TEST1.company_id,bigint,"'"Tracks analysis",10001
exports.TEST2.instance_ts,datetime,"|-","2018-03-03 12:30:00"
exports.TEST2.address_id,bigint
Any help would be appreciated!

Related

How can I parse a YAML file using a shell script?

I have a YAML file which also has lists.
YAML File -
configuration:
account: account1
warehouse: warehouse1
database: database1
object_type:
schema: schema1
functions: funtion1
tables:
- table: table1
sql_file_loc: some_path/some_file.sql
- table: table2
sql_file_loc: some_path/some_file.sql
I want to store the key-pair values to shell variable and loop it through. For example, the value for account/warehouse/database should go to variables which I can use later on. Also, the values for tables(table1 and table2) and sql_file_loc should go to shell variable which I can use for looping like below -
for i in $table ;do
echo $i
done
I have tried this code below -
function parse_yaml {
local prefix=$2
local s='[[:space:]]*' w='[a-zA-Z0-9_]*' fs=$(echo #|tr # '\034')
sed -ne "s|^\($s\):|\1|" \
-e "s|^\($s\)\($w\)$s:$s[\"']\(.*\)[\"']$s\$|\1$fs\2$fs\3|p" \
-e "s|^\($s\)\($w\)$s:$s\(.*\)$s\$|\1$fs\2$fs\3|p" $1 |
awk -F$fs '{
indent = length($1)/2;
vname[indent] = $2;
for (i in vname) {if (i > indent) {delete vname[i]}}
if (length($3) > 0) {
vn=""; for (i=0; i<indent; i++) {vn=(vn)(vname[i])("_")}
printf("%s%s%s=\"%s\"\n", "'$prefix'",vn, $2, $3);
}
}'
}
And this is the output I get -
configuration_account="account_name"
configuration_warehouse="warehouse_name"
configuration_database="database_name"
configuration_object_type_schema="schema1"
configuration_object_type_functions="funtion1"
configuration_object_type_tables__sql_file_loc="some_path/some_file.sql"
configuration_object_type_tables__sql_file_loc="some_path/some_file.sql"
It doesn't print -
configuration_object_type_tables__table="table1" and
configuration_object_type_tables__table="table2"
Also for a list, it prints two underscores(__) unlike other objects.
And I want to loop the values stored in configuration_object_type_tables__table and configuration_object_type_tables__sql_file_loc.
Any help would be appreciated!

Consider using a YAML processor mikefarah/yq.
It's a one liner:
yq e '.. | select(type == "!!str") | (path | join("_")) + "=\"" + . + "\""' "$INPUT"
Output
configuration_account="account1"
configuration_warehouse="warehouse1"
configuration_database="database1"
configuration_object_type_schema="schema1"
configuration_object_type_functions="funtion1"
configuration_object_type_tables_0_table="table1"
configuration_object_type_tables_0_sql_file_loc="some_path/some_file.sql"
configuration_object_type_tables_1_table="table2"
configuration_object_type_tables_1_sql_file_loc="some_path/some_file.sql"
Also take a look at this cool builtin feature of yq:
yq e -o props "$INPUT"
Output
configuration.account = account1
configuration.warehouse = warehouse1
configuration.database = database1
configuration.object_type.schema = schema1
configuration.object_type.functions = funtion1
configuration.object_type.tables.0.table = table1
configuration.object_type.tables.0.sql_file_loc = some_path/some_file.sql
configuration.object_type.tables.1.table = table2
configuration.object_type.tables.1.sql_file_loc = some_path/some_file.sql

I suggest you try yq yaml processor like jpseng mentioned.
About the code you have here, the regex is not matching the "- table" pattern due to "- " prifix.

Substraction from one column to another using loop in script file

I have the following data file. The file has 16 column. I want to subtract column $2 from each column (from $3 to $16) and save in a separate file.
13.85 0.545417 0.530096 0.539271 0.551338 0.555655 0.548994 0.547872 0.542829 0.537696 0.548105 0.55492 0.522208 0.54062 0.517871 0.574612
25.62 1.14902 1.10938 1.13254 1.16315 1.17326 1.15723 1.15442 1.14388 1.13291 1.15441 1.16761 1.09476 1.13889 1.09123 1.21286
44.56 2.24456 2.164 2.21009 2.27259 2.2943 2.26313 2.25812 2.23368 2.20829 2.2556 2.27972 2.13678 2.22223 2.1265 2.3646
71.36 4.07674 3.92425 4.0065 4.12716 4.16732 4.11178 4.10184 4.05272 4.00639 4.08657 4.1366 3.87901 4.0282 3.87433 4.30779
109.02 6.82479 6.55002 6.70016 6.91858 6.99698 6.88975 6.86855 6.78574 6.70263 6.82276 6.90574 6.48615 6.72953 6.47619 7.19417
160.94 10.8211 10.3596 10.6115 10.9768 11.0954 10.9379 10.8903 10.7438 10.6058 10.845 10.9637 10.2915 10.7044 10.3087 11.4487
226.64 16.1075 15.4102 15.7951 16.3518 16.5692 16.3009 16.2218 15.9761 15.7459 16.148 16.3254 15.3147 15.9319 15.3075 16.9761
290.41 21.4363 20.4612 20.9759 21.7806 22.0759 21.7127 21.6024 21.2401 20.9127 21.4066 21.6855 20.3395 21.1352 20.3578 22.5532
337.36 24.8405 23.7434 24.3358 25.3111 25.6545 25.2228 25.0429 24.5822 24.1328 24.8275 25.0746 23.677 24.5417 23.6901 26.133
For this, I tried to wrote the following script.
for (( i=3; i <= 16; i++ ))
do
paste file.txt | awk '{print $1, $i-$2}' > output/new_$i.txt
done
but it is not working.

Just use awk alone:
$ awk '{for(i=3;i<=16;i++)$i-=$2}1' file
13.85 0.545417 -0.015321 -0.006146 0.005921 0.010238 0.003577 0.002455 -0.002588 -0.007721 0.002688 0.009503 -0.023209 -0.004797 -0.027546 0.029195
25.62 1.14902 -0.03964 -0.01648 0.01413 0.02424 0.00821 0.0054 -0.00514 -0.01611 0.00539 0.01859 -0.05426 -0.01013 -0.05779 0.06384
...
Explained:
$awk '{
for(i=3;i<=16;i++) # loop
$i-=$2 # subtract 2nd field from each field 3-16
}1' file # 1 is the implicit print of record

If you do want to use the method you began, you need to tell awk about the shell variable you are trying to use:
for (( i=3; i <= 16; i++ ))
do
paste file.txt | awk -v i=$i '{print $1, $i-$2}' > output/new_$i.txt
done

Conditional insert of line breaks to number sequence (preferably using bash, awk, or sed)

I'm trying to add line breaks to a text file each time a subsequent number is smaller than the immediately preceding number (e.g. a break between "72.774" and "7.009") in a text file with this structure:
7.007 28.929 50.851 72.774 7.009 28.932 50.854 72.777 7.015 32.939 54.862 76.784
I want the output to be in this format:
7.007 28.929 50.851 72.774
7.009 28.932 50.854 72.777
7.015 32.939 54.862 76.784
Files do not always have the same number of numerical entries (either in total or before the series begins counting up again) nor are the same number of line breaks required in all text files.
I've been trying to use conditionals in awk or sed but haven't had any luck.
Thank you in advance for any suggestions/solutions.
note: edited to reflect 1st comment.

This may be what you want:
$ awk -v RS=' ' '{printf "%s%s", (NR>1?($0<p?ORS:OFS):""), $0; p=$0}' file
7.007 28.929 50.851 72.774
7.009 28.932 50.854 72.777
7.015 32.939 54.862 76.784

Here's one solution using awk:
{
for (i=1; i<=NF; ++i) {
if ($i < last) {
printf "\n"
last=-1
} else if (last > 0) {
printf " "
}
printf "%s", $i
last = $i
}
}
END { printf "\n" }
Example run:
$ awk -f foo.awk bar.txt
7.007 28.929 50.851 72.774 94.696 116.619 138.542 160.464 182.387 204.309 226.232 248.155 270.077 292 313.922 335.845 357.768 379.69 401.613 423.535 445.458 467.381
7.009 28.932 50.854 72.777 94.699 116.622 138.545 160.467 182.39 204.312 226.235 248.158 270.08 292.003 313.925 335.848 357.771 379.693 401.616 423.538 445.461 467.384 489.306
7.015 32.939 54.862 76.784 102.708 124.631 146.553 168.476 190.398 212.321 234.244 260.167 282.09 308.013 333.937 355.86 377.782 403.706
7.005 28.928 50.85 72.773 94.696 116.618 138.541 160.463 186.387 212.311 234.233 256.156 278.079 300.001 321.924 347.847

Another AWK solution:
awk 'BEGIN { RS=" "; ORS=" "; prev=-999 }
{ if ( $1<prev ) { printf "\n%.3f", $1 }
else { printf "%.3f ", $1 } prev=$1
}
END { print }'

Parsing iw wlan0 scan output

I wrote wlan manager script to handle open/ad-hoc/wep/wpa2 networks. Now im trying to parse iw wlan0 scan output to get nice scan feature to my script. My goal is to get output like this :
SSID channel signal encryption
wlan-ap 6 70% wpa2-psk
test 1 55% wep
What i have achived already is output like this :
$ iw wlan0 scan | grep 'SSID\|freq\|signal\|capability' | tac
SSID: Koti783
signal: -82.00 dBm
capability: ESS Privacy ShortPreamble SpectrumMgmt ShortSlotTime (0x0531)
freq: 2437
I have been trying to study bash/sed/awk but havent found yet a way to achieve what im trying. So what is good way to achieve that?

Here is my final solution based of Sudo_O answer:
$1 == "BSS" {
MAC = $2
wifi[MAC]["enc"] = "Open"
}
$1 == "SSID:" {
wifi[MAC]["SSID"] = $2
}
$1 == "freq:" {
wifi[MAC]["freq"] = $NF
}
$1 == "signal:" {
wifi[MAC]["sig"] = $2 " " $3
}
$1 == "WPA:" {
wifi[MAC]["enc"] = "WPA"
}
$1 == "WEP:" {
wifi[MAC]["enc"] = "WEP"
}
END {
printf "%s\t\t%s\t%s\t\t%s\n","SSID","Frequency","Signal","Encryption"
for (w in wifi) {
printf "%s\t\t%s\t\t%s\t%s\n",wifi[w]["SSID"],wifi[w]["freq"],wifi[w]["sig"],wifi[w]["enc"]
}
}'
Output:
$ sudo iw wlan0 scan | awk -f scan.awk
SSID Frequency Signal Encryption
netti 2437 -31.00 dBm Open
Koti783 2437 -84.00 dBm WPA
WLAN-AP 2462 -85.00 dBm WPA

it's generally bad practice to try parsing complex output of programs intended for humans to read (rather than machines to parse).
e.g. the output of iw might change depending on the language settings of the system and/or the version of iw, leaving you with a "manager" that only works on your development machine.
instead you might use the same interface that iw uses to get it's information: the library backend libnl
you might also want to have a look at the wireless-tools (iwconfig, iwlist,...) that use the libiw library.

Here is an GNU awk script to get you going that grabs the SSIDs and the channel for each unique BSS:
/^BSS / {
MAC = $2
}
/SSID/ {
wifi[MAC]["SSID"] = $2
}
/primary channel/ {
wifi[MAC]["channel"] = $NF
}
# Insert new block here
END {
printf "%s\t\t%s\n","SSID","channel"
for (w in wifi) {
printf "%s\t\t%s\n",wifi[w]["SSID"],wifi[w]["channel"]
}
}
It should be easy for you to add the new blocks for signal and encryption considering all the studying you have been doing.
Save the script to file such as wifi.awk and run like:
$ sudo iw wlan0 scan | awk -f wifi.awk
The output will be in the formatted requested:
SSID channel
wlan-ap 6
test 1

Here is a simple Bash function which uses exclusively Bash internals and spawns only one sub-shell:
#!/bin/bash
function iwScan() {
# disable globbing to avoid surprises
set -o noglob
# make temporary variables local to our function
local AP S
# read stdin of the function into AP variable
while read -r AP; do
## print lines only containing needed fields
[[ "${AP//'SSID: '*}" == '' ]] && printf '%b' "${AP/'SSID: '}\n"
[[ "${AP//'signal: '*}" == '' ]] && ( S=( ${AP/'signal: '} ); printf '%b' "${S[0]},";)
done
set +o noglob
}
iwScan <<< "$(iw wlan0 scan)"
Output:
-66.00,FRITZ!Box 7312
-56.00,ALICE-WLAN01
-78.00,o2-WLAN93
-78.00,EasyBox-7A2302
-62.00,dlink
-74.00,EasyBox-59DF56
-76.00,BELAYS_Network
-82.00,o2-WLAN20
-82.00,BPPvM
The function can be easily modified to provide additional fields by adding a necessary filter into the while read -r AP while-loop, eg:
[[ "${AP//'last seen: '*}" == '' ]] && ( S=( ${AP/'last seen: '} ); printf '%b' "${S[0]},";)
Output:
-64.00,1000,FRITZ!Box 7312
-54.00,492,ALICE-WLAN01
-76.00,2588,o2-WLAN93
-78.00,652,LN8-Gast
-72.00,2916,WHITE-BOX
-66.00,288,ALICE-WLAN
-78.00,800,EasyBox-59DF56
-80.00,720,EasyBox-7A2302
-84.00,596,ALICE-WLAN08

I am using such solution for openwrt:
wlan_scan.sh
#!/bin/sh
sudo iw dev wlan0 scan | awk -f wlan_scan.awk | sort
wlan_scan.awk
/^BSS/ {
mac = gensub ( /^BSS[[:space:]]*([0-9a-fA-F:]+).*?$/, "\\1", "g", $0 );
}
/^[[:space:]]*signal:/ {
signal = gensub ( /^[[:space:]]*signal:[[:space:]]*(\-?[0-9.]+).*?$/, "\\1", "g", $0 );
}
/^[[:space:]]*SSID:/ {
ssid = gensub ( /^[[:space:]]*SSID:[[:space:]]*([^\n]*).*?$/, "\\1", "g", $0 );
printf ( "%s %s %s\n", signal, mac, ssid );
}
result
-62.00 c8:64:c7:54:d9:05 a
-72.00 70:72:3c:1c:af:17 b
-81.00 78:f5:fd:be:33:cb c

There is a bug in the awk script above.
The following code will not work if the SSID has spaces in the name. The received result will be the first token of the SSID name only.
$1 == "SSID:" {
wifi[MAC]["SSID"] = $2
}
When printing $0, $1, $2:
$0: SSID: DIRECT-82-HP OfficeJet 8700
$1: SSID:
$2: DIRECT-82-HP
One possibly solution is to take a substr of $0 which contains leading spaces, the token "SSID: " and the provided multi-token network name.
Any other suggestions?

I've taken awk code from Ari Malinen and reworked it a bit, because iw output is not stable and changes, also there are other issues like spaces in SSID. I put it on github in case if I'll change it in the future.
#!/usr/bin/env awk -f
$1 ~ /^BSS/ {
if($2 !~ /Load:/) { #< Escape "BBS Load:" line
gsub("(\\(.*|:)", "", $2)
MAC = toupper($2)
wifi[MAC]["enc"] = "OPEN"
wifi[MAC]["WPS"] = "no"
wifi[MAC]["wpa1"] = ""
wifi[MAC]["wpa2"] = ""
wifi[MAC]["wep"] = ""
}
}
$1 == "SSID:" {
# Workaround spaces in SSID
FS=":" #< Changing field separator on ":", it should be
# forbidded sign for SSID name
$0=$0
sub(" ", "", $2) #< remove first whitespace
wifi[MAC]["SSID"] = $2
FS=" "
$0=$0
}
$1 == "capability:" {
for(i=2; i<=NF; i++) {
if($i ~ /0x[0-9]{4}/) {
gsub("(\\(|\\))", "", $i)
if (and(strtonum($i), 0x10))
wifi[MAC]["wep"] = "WEP"
}
}
}
$1 == "WPA:" {
wifi[MAC]["wpa1"] = "WPA1"
}
$1 == "RSN:" {
wifi[MAC]["wpa2"] = "WPA2"
}
$1 == "WPS:" {
wifi[MAC]["WPS"] = "yes"
}
$1 == "DS" {
wifi[MAC]["Ch"] = $5
}
$1 == "signal:" {
match($2, /-([0-9]{2})\.00/, m)
wifi[MAC]["Sig"] = m[1]
}
$1 == "TSF:" {
gsub("(\\(|d|,)", "", $4)
match($5, /([0-9]{2}):([0-9]{2}):/, m)
day = $4
hour = m[1]
min = m[2]
wifi[MAC]["TSF"] = day"d"hour"h"min"m"
}
END {
for (w in wifi) {
if (wifi[w]["wep"]) {
if (wifi[w]["wpa1"] || wifi[w]["wpa2"])
wifi[w]["enc"] = wifi[w]["wpa1"]wifi[w]["wpa2"]
else
wifi[w]["enc"] = "WEP"
}
printf "%s:%s:%s:%s:%s:%s:%s\n", w, wifi[w]["SSID"], wifi[w]["enc"], \
wifi[w]["WPS"], wifi[w]["Ch"], wifi[w]["Sig"], wifi[w]["TSF"]
}
}
Output:
A5FEF2C499BB:test-ssid2:OPEN:no:9:43:0d00h00m
039EFACA9A8B:test-ssid2:WPA1:no:9:33:0d00h00m
038BF3C1988B:test-ssid2:WPA2:no:9:35:0d00h00m
028EF3C2997B:test-ssid2:WPA1:no:9:35:0d00h03m
if you wonder what if($2 !~ /Load:/) does, well on some routers there might be "BSS Load:" string.

Extracting multiple parts of a string using bash

I have a caret delimited (key=value) input and would like to extract multiple tokens of interest from it.
For example: Given the following input
$ echo -e "1=A00^35=D^150=1^33=1\n1=B000^35=D^150=2^33=2"
1=A00^35=D^22=101^150=1^33=1
1=B000^35=D^22=101^150=2^33=2
I would like the following output
35=D^150=1^
35=D^150=2^
I have tried the following
$ echo -e "1=A00^35=D^150=1^33=1\n1=B000^35=D^150=2^33=2"|egrep -o "35=[^/^]*\^|150=[^/^]*\^"
35=D^
150=1^
35=D^
150=2^
My problem is that egrep returns each match on a separate line. Is it possible to get one line of output for one line of input? Please note that due to the constraints of the larger script, I cannot simply do a blind replace of all the \n characters in the output.
Thank you for any suggestions.This script is for bash 3.2.25. Any egrep alternatives are welcome. Please note that the tokens of interest (35 and 150) may change and I am already generating the egrep pattern in the script. Hence a one liner (if possible) would be great

You have two options. Option 1 is to change the "white space character" and use set --:
OFS=$IFS
IFS="^ "
set -- 1=A00^35=D^150=1^33=1 # No quotes here!!
IFS="$OFS"
Now you have your values in $1, $2, etc.
Or you can use an array:
tmp=$(echo "1=A00^35=D^150=1^33=1" | sed -e 's:\([0-9]\+\)=: [\1]=:g' -e 's:\^ : :g')
eval value=($tmp)
echo "35=${value[35]}^150=${value[150]}"

To get rid of the newline, you can just echo it again:
$ echo $(echo "1=A00^35=D^150=1^33=1"|egrep -o "35=[^/^]*\^|150=[^/^]*\^")
35=D^ 150=1^
If that's not satisfactory (I think it may give you one line for the whole input file), you can use awk:
pax> echo '
1=A00^35=D^150=1^33=1
1=a00^35=d^157=11^33=11
' | awk -vLIST=35,150 -F^ ' {
sep = "";
split (LIST, srch, ",");
for (i = 1; i <= NF; i++) {
for (idx in srch) {
split ($i, arr, "=");
if (arr[1] == srch[idx]) {
printf sep "" arr[1] "=" arr[2];
sep = "^";
}
}
}
if (sep != "") {
print sep;
}
}'
35=D^150=1^
35=d^
pax> echo '
1=A00^35=D^150=1^33=1
1=a00^35=d^157=11^33=11
' | awk -vLIST=1,33 -F^ ' {
sep = "";
split (LIST, srch, ",");
for (i = 1; i <= NF; i++) {
for (idx in srch) {
split ($i, arr, "=");
if (arr[1] == srch[idx]) {
printf sep "" arr[1] "=" arr[2];
sep = "^";
}
}
}
if (sep != "") {
print sep;
}
}'
1=A00^33=1^
1=a00^33=11^
This one allows you to use a single awk script and all you need to do is to provide a comma-separated list of keys to print out.
And here's the one-liner version :-)
echo '1=A00^35=D^150=1^33=1
1=a00^35=d^157=11^33=11
' | awk -vLST=1,33 -F^ '{s="";split(LST,k,",");for(i=1;i<=NF;i++){for(j in k){split($i,arr,"=");if(arr[1]==k[j]){printf s""arr[1]"="arr[2];s="^";}}}if(s!=""){print s;}}'

given a file 'in' containing your strings :
$ for i in $(cut -d^ -f2,3 < in);do echo $i^;done
35=D^150=1^
35=D^150=2^

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

unix parse values from key-value pairs and print nested results - bash

Related

How can I parse a YAML file using a shell script?

Substraction from one column to another using loop in script file

Conditional insert of line breaks to number sequence (preferably using bash, awk, or sed)

Parsing iw wlan0 scan output

Extracting multiple parts of a string using bash

Categories

Resources