Sed awk text formatting - bash

I would like to filter and order text files with something like awk or sed. It does not need to be a single command, a small bash script should be fine too.
#
home: address01
name: name01
info: info01
number: number01
company: company01
#
name: name02
company: company02
info: info02
home: home02
#
company: company03
home: address03
name: name03
info: info03
info: info032
number: number03
company: company032
#
name: name04
info: info04
company: company04
number: number04
number: number042
info: info042
I only need name, number, and info. There is always exactly one name, but there can be 0,1 or 2 number and info. The # is the only thing which is consistent and always on the same spot.
output should be:
name01,number01,,info01,
name02,,,info02,
name03,number03,,info03,info032
name04,number04,number042,info04,info042
What I tried so far:
awk -v OFS=',' '{split($0,a,": ")} /^name:/{name=a[2]} /^number:/{number=a[2]} /^info:/{info=a[2]; print name,number,info}' > dump.csv

Consider changing the logic to print on '#' or after the last line (assuming last block not terminated with #):
awk -v OFS=',' '
{split($0,a,": ")}
/^name:/{name=a[2]}
/^number:/{number=a[2]}
/^info:/{info=a[2]}
/^#/ { print name,number,info}
END { print name,number,info}
' < w.txt > dump.csv

Related

bash search and replace a line after a certain line

I have a big yaml file containing multiple declaration blocks, related to different services.
The structure is similar to the following (but repeated for multiple applications):
- name: commerce-api
type: helm
version: 0.0.5
I would like to find the block of code that is containing commerce-api and replace the version property value with something else.
The thing is, I wrote this script:
bumpConfig() {
LINE=$(awk "/- name: $1$/{print NR + $2}" "$CONFIG_YML")
sed -i "" -E "${LINE}s/version: $3.*$/\version: $4/" "$CONFIG_YML"
}
bumpConfig "commerce-api" 2 "$OLD_APP_VERSION" "$NEW_APP_VERSION"
Which is kind of allowing me to do what I want, but the only problem is that, the property version is not always on the third line.
How can I make my script to look for the first occurrence of version given the service name to be commerce-api?
Is this even possible using awk?
Adding some variation to the input file:
$ cat config.yml
- name: commerce-api-skip
type: helm
version: 0.0.5
- name: commerce-api
type: helm
bogus line1: bogus value1
version: 0.0.5
bogus line2: bogus value2
- name: commerce-api-skip-too
type: helm
version: 0.0.5
One awk idea:
bumpConfig() {
awk -v name="$1" -v old="$2" -v new="$3" '
/- name: / { replace=0
if ($NF == name)
replace=1
}
replace && $1=="version:" { if ($NF == old)
$0=substr($0,1,index($0,old)-1) new
}
1
' "${CONFIG_YML}"
}
Taking for a test drive:
CONFIG_YML='config.yml'
name='commerce-api'
OLD_APP_VERSION='0.0.5'
NEW_APP_VERSION='0.0.7'
bumpConfig "${name}" "${OLD_APP_VERSION}" "${NEW_APP_VERSION}"
This generates:
- name: commerce-api-skip
type: helm
version: 0.0.5
- name: commerce-api
type: helm
bogus line1: bogus value1
version: 0.0.7
bogus line2: bogus value2
- name: commerce-api-skip-too
type: helm
version: 0.0.5
Once OP is satisfied with the result:
if running GNU awk the file can be updated 'in place' via: awk -i inplace -v name="$1" ...
otherwise the output can be saved to a temp file and then copy the temp file over the original: awk -v name="$1" ... > tmpfile; mv tmpfile "${CONFIG_YML}"
Entirely in sed
sed -i '' "s/^version: $3/version: $4/' "$CONFIG_YML"
/^- name: $1\$/,/^- name:/ restricts the s command to just the lines between the requested name and the next - name: line.
#!/bin/bash
OLD_APP_VERSION=0.0.5
NEW_APP_VERSION=0.0.7
CONFIG_YML=config.yml
bumpConfig() {
gawk -i inplace -v name="$1" -v old="$2" -v new="$3" '
1
/^- name: / && $3 == name {
while (getline > 0) {
if (/^ version: / && $2 == old)
$0 = " version: " new
print
if (!NF || /^-/ || /^ version: /)
break
}
}
' "${CONFIG_YML}"
}
bumpConfig commerce-api "${OLD_APP_VERSION}" "${NEW_APP_VERSION}"

Removing a delimited block of lines when one of them matches a regex pattern with awk

Let's assume the following reprepro distributions file:
Origin: git.sdxlive.com/git/PPA
Label: Ubuntu focal
Suite: focal
Version: 20.04
Codename: focal
Architectures: i386 amd64
Components: stable unstable
Limit: 0
Description: Latest Ubuntu focal 20.04 packages
Contents: .gz .bz2
Tracking: keep
SignWith: xxxxxxxxxxxxxxxxxxxx
Signed-By: xxxxxxxxxxxxxxxxxxxx
ValidFor: 2y 6m
Log: packages.Ubuntu.log
Origin: git.sdxlive.com/git/PPA
Label: Ubuntu groovy
Suite: groovy
Version: 20.10
Codename: groovy
Architectures: i386 amd64
Components: stable unstable
Limit: 0
Description: Latest Ubuntu groovy 20.10 packages
Contents: .gz .bz2
Tracking: keep
SignWith: xxxxxxxxxxxxxxxxxxxx
Signed-By: xxxxxxxxxxxxxxxxxxxx
ValidFor: 2y 6m
Log: packages.Ubuntu.log
The goal is to remove the whole block of lines delimited by 'Origin: ' and an empty line when it contains the line "Codename: ${os_code_name}" where os_code_name is a bash variable.
So the expected output is:
Origin: git.sdxlive.com/git/PPA
Label: Ubuntu groovy
Suite: groovy
Version: 20.10
Codename: groovy
Architectures: i386 amd64
Components: stable unstable
Limit: 0
Description: Latest Ubuntu groovy 20.10 packages
Contents: .gz .bz2
Tracking: keep
SignWith: xxxxxxxxxxxxxxxxxxxx
Signed-By: xxxxxxxxxxxxxxxxxxxx
ValidFor: 2y 6m
Log: packages.Ubuntu.log
Without a variable Codename, we could use for instance the following to remove the block matching the focal Codename:
awk '/^Origin: /{s=x} {s=s $0 RS} /^$/{if(s!~/Codename: focal/) printf "%s",s}' distributions
I could not find a solution to use a variable Codename; I tried to use:
--assign=var="${os_code_name}"
ENVIRON["os_code_name"]
In the first case, I don't know how awk can differentiate between the string 'Codename: ' and the variable var, since we cannot use "$var". The following does not work obviously:
awk --assign=var="${os_code_name}" '/^Origin: /{s=x} {s=s $0 RS} /^$/{if(s!~/Codename: $var/) printf "%s",s}' distributions
In the second case, it is also unsuccessful:
awk '/^Origin: /{s=x} {s=s $0 RS} /^$/{if(s!~/Codename: ENVIRON["os_code_name"]/) printf "%s",s}' distributions
I also checked this answer.
Any suggestion?
Could you please try following, written and tested with shown samples and should work in all kind of awk.
os_code_name="focal" ##shell variable
awk -v co="$os_code_name" '
/Origin/{
if(!foundCo && FNR>1){ print val }
val=foundCo=""
}
/^Codename/ && $NF==co{
foundCo=1
}
{
val=(val?val ORS:"")$0
}
END{
if(!foundCo){ print val }
}
' Input_file
Explanation: Adding detailed explanation for above.
os_code_name="focal" ##This is a shell variable.
awk -v co="$os_code_name" ' ##Starting awk program from here and setting co variable as value of os_code_name here.
/Origin/{ ##Checking condition if line has Origin string in it then do following.
if(!foundCo && FNR>1){ print val } ##Checking condition if foundCo is NULL and FNR>1 then print val here.
val=foundCo="" ##Nullifying variables here.
}
/^Codename/ && $NF==co{ ##Checking condition if line starts with Codenam and last field is equal to variable.
foundCo=1 ##Setting value for foundCo here.
}
{
val=(val?val ORS:"")$0 ##Creating val which has all lines values from Origin to just before next occurrence of Origin it either gets printed above or gets NULL.
}
END{ ##Starting END block of this awk program from here.
if(!foundCo){ print val } ##Checking condition if foundCo is NULL then print val here.
}
' Input_file ##Mentioning Input_file name here.
You can use empty RS, this is the paragraph mode, and do not print any record where that codename exists.
awk -v cn="$cn" -v RS="" '!($0 ~ "Codename: " cn){print $0,"\n"}' file
The variable has to be passed in the way your linked answer says. The pattern matching can be done either using ~ /.../ or ~ "...", using the double quotes is what you have to do here, "Codename: " var is the matching string.

convert 1 field of awk to base64 and leave the rest intact

I'm creating a one liner where my ldap export is directly converted into a csv.
So far so good but the challange is now that 1 column of my csv needs to contain base64 encoded values. These values are comming as clear text out of the ldap search filter. So I basically need them converted during the awk creation.
What I have is:
ldapsearch | awk -v OFS=',' '{split($0,a,": ")} /^blobinfo:/{blob=a[2]} /^cn:/{serialnr=a[2]} {^mode=a[2]; print serialnr, mode, blob}'
This gives me a csv output as intended but now I need to convert blob to base64 encoded output.
Getline is not available
demo input:
cn: 1313131313
blobinfo: a string with spaces
mode: d121
cn: 131313asdf1313
blobinfo: an other string with spaces
mode: d122
ouput must be like
1313131313,D121,YSBzdHJpbmcgd2l0aCBzcGFjZXM=
where YSBzdHJpbmcgd2l0aCBzcGFjZXM= is the encoded a string with spaces
but now I get
1313131313,D121,a string with spaces
Something like this, maybe?
$ perl -MMIME::Base64 -lne '
BEGIN { $, = "," }
if (/^cn: (.+)/) { $s = $1 }
if (/^blobinfo: (.+)/) { $b = encode_base64($1, "") }
if (/^mode: (.+)/) { print $s, $1, $b }' input.txt
1313131313,d121,YSBzdHJpbmcgd2l0aCBzcGFjZXM=
131313asdf1313,d122,YW4gb3RoZXIgc3RyaW5nIHdpdGggc3BhY2Vz
If you can't use getline and you just need to output the csv (you can't further process the base64'd field), change the order of fields in output and abuse the system's newline. First, a bit modified input data (changed order, missing field):
cn: 1313131313
blobinfo: a string with spaces
mode: d121
blobinfo: an other string with spaces
mode: d122
cn: 131313asdf1313
cn: 131313asdf1313
mode: d122
The awk:
$ awk '
BEGIN {
RS="" # read in a block of rows
FS="\n" # newline is the FS
h["cn"]=1 # each key has a fixed buffer slot
h["blobinfo"]=2
h["mode"]=3
}
{
for(i=1;i<=NF;i++) { # for all fields
split($i,a,": ") # split to a array
b[h[a[1]]]=a[2] # store to b uffer
}
printf "%s,%s,",b[1],b[3] # output all but blob, no newline
system("echo " b[2] "| base64") # let system output the newline
delete b # buffer needs to be reset
}' file # well, I used file for testing, you can pipe
ANd the output:
1313131313,d121,YSBzdHJpbmcgd2l0aCBzcGFjZXMK
131313asdf1313,d122,YW4gb3RoZXIgc3RyaW5nIHdpdGggc3BhY2VzCg==
131313asdf1313,d122,Cg==

Sed conditional match and execute command with offset

I am finding a bash command for a conditional replacement with offset. The existing posts that I've found are conditional replacement without offset or with a fixed offset.
Task: If uid contains 8964, then insert the line FORBIDDEN before DOB.
Each TXT file below represents one user, and it contains (in the following order)
some property(ies)
unique uid
some quality(ies)
unique DOB
a random lorem ipsum
I hope I can transform the following files
# file1.txt (uid doens't match 8964)
admin: false
uid: 123456
happy
movie
DOB: 6543-02-10
lorem ipsum
seo varis lireccuni paccem noba sako
# file2.txt (uid matches 8964)
citizen: true
hasSEAcct: true
uid: 289641
joyful hearty
final debug Juno XYus
magazine
DOB: 1234-05-06
saadi torem lopez dupont
into
# file1.txt (uid doens't match 8964)
admin: false
uid: 123456
happy
movie
DOB: 6543-02-10
lorem ipsum
seo varis lireccuni paccem noba sako
# file2.txt (uid matches 8964)
citizen: true
hasSEAcct: true
uid: 289641
joyful hearty
final debug Juno XYus
magazine
FORBIDDEN
DOB: 1234-05-06
saadi torem lopez dupont
My try:
If uid contains 8964, then do a 2nd match with DOB, and insert FORBIDDEN above DOB.
sed '/^uid: [0-9]*8964[0-9]*$/{n;/^DOB: .*$/{iFORBIDDEN}}' file*.txt
This gives me an unmatched { error.
sed: -e expression #1, char 0: unmatched `{'
I know that sed '/PAT/{n;p}' will execute {n;p} if PAT is matched, but it seems impossible to put /PAT2/{iTEXT} inside /PAT/{ }.
How can I perform such FORBIDDEN insertion?
$ awk '
/^uid/ && /8964/ {f=1} #1
/^DOB/ && f {print "FORBIDDEN"; f=0} #2
1 #3
' file
If a line starting with "uid" matches "8964", set flag
If a line starts with "DOB" and flag is set, print string and unset flag
print every line
$ awk -v RS='' '/uid: [0-9]*8964/{sub(/DOB/, "FORBIDDEN\nDOB")} 1' file
Alternatively, treat every block separated by a blank line as a single record, then sub in "FORBIDDEN\nDOB" if there's a match. I think the first one's better practice. As a very general rule, once you start thinking in terms of fields/records, it's time for awk/perl.
In my opinion, this is a good use-case for sed.
Here is a GNU sed solution with some explanation:
# script.sed
/^uid:.*8964/,/DOB/ { # Search only inside this range, if it exists.
/DOB/i FORBIDDEN # Insert FORBIDDEN before the line matching /DOB/.
}
Testing:
▶ gsed -f script.sed FILE2
citizen: true
hasSEAcct: true
uid: 289641
joyful hearty
final debug Juno XYus
magazine
FORBIDDEN
DOB: 1234-05-06
saadi torem lopez dupont
▶ gsed -f script.sed FILE1
admin: false
uid: 123456
happy
movie
DOB: 6543-02-10
lorem ipsum
seo varis lireccuni paccem noba sako
Or on one line:
▶ gsed -e '/^uid:.*8964/,/DOB/{/DOB/i FORBIDDEN' -e '}' FILE*
tried on gnu sed
sed -Ee '/^uid:\s*\w*8964\w*$/{n;/^DOB:/iFORBIDDEN' -e '}' file*.txt

AWK print after match with multi search

i have a log as below need to parse with new format :
2018-08-14 12:07:06,410 - MAILER - INFO - Email sent! - (TEMPORARY PASSWORD: cristronaldode ) to ['cristronaldode#eeee.com'] - Message ID: 01010165369da693-216f985f-e1b0-4dc2-bcea-8a2cd275a506-000000 Result: {'MessageId': '01010165369da693-216f985f-e1b0-4dc2-bcea-8a2cd275a506-000000', 'ResponseMetadata': {'HTTPHeaders': {'content-length': '338', 'date': 'Tue, 14 Aug 2018 04:07:05 GMT', 'x-amzn-requestid': '81bbc0c4-9f77-11e8-81fe-8502a68e3b7d', 'content-type': 'text/xml'}, 'RetryAttempts': 0, 'RequestId': '81bbc0c4-9f77-11e8-81fe-8502a68e3b7d', 'HTTPStatusCode': 200}}
output :
2018-08-14 12:07:06,410|TEMPORARY PASSWORD: cristronaldode|cristronaldode#eeee.com|'HTTPStatusCode': 200|
i'm trying use awk and match function but don't know how can use multi match in 1 line. Thanks
Update: i was using this command for parsing field but because i'm seperating fields by space so need to correct field in all lines. so i want other solutions.
awk -F ' ' '{print $1,$2"|"$11,$12,$13,$14,$15,$16,$17,$18,$19"|"$21"|"$48,$49}' | sed -e 's/[()]//g' | sed -e 's/[][]//g'| sed -e 's/}//g'
The field seperator of awk can be regex.
awk -F '[][)(}{ ]*' '{print $1,$2}' file
We do now delimit on all the brackets like characters and count multiple occurrences of those characters as one.
You can figure out what fields you have to use now.

Resources