convert 1 field of awk to base64 and leave the rest intact

convert 1 field of awk to base64 and leave the rest intact - bash

I'm creating a one liner where my ldap export is directly converted into a csv.
So far so good but the challange is now that 1 column of my csv needs to contain base64 encoded values. These values are comming as clear text out of the ldap search filter. So I basically need them converted during the awk creation.
What I have is:
ldapsearch | awk -v OFS=',' '{split($0,a,": ")} /^blobinfo:/{blob=a[2]} /^cn:/{serialnr=a[2]} {^mode=a[2]; print serialnr, mode, blob}'
This gives me a csv output as intended but now I need to convert blob to base64 encoded output.
Getline is not available
demo input:
cn: 1313131313
blobinfo: a string with spaces
mode: d121
cn: 131313asdf1313
blobinfo: an other string with spaces
mode: d122
ouput must be like
1313131313,D121,YSBzdHJpbmcgd2l0aCBzcGFjZXM=
where YSBzdHJpbmcgd2l0aCBzcGFjZXM= is the encoded a string with spaces
but now I get
1313131313,D121,a string with spaces

Something like this, maybe?
$ perl -MMIME::Base64 -lne '
BEGIN { $, = "," }
if (/^cn: (.+)/) { $s = $1 }
if (/^blobinfo: (.+)/) { $b = encode_base64($1, "") }
if (/^mode: (.+)/) { print $s, $1, $b }' input.txt
1313131313,d121,YSBzdHJpbmcgd2l0aCBzcGFjZXM=
131313asdf1313,d122,YW4gb3RoZXIgc3RyaW5nIHdpdGggc3BhY2Vz

If you can't use getline and you just need to output the csv (you can't further process the base64'd field), change the order of fields in output and abuse the system's newline. First, a bit modified input data (changed order, missing field):
cn: 1313131313
blobinfo: a string with spaces
mode: d121
blobinfo: an other string with spaces
mode: d122
cn: 131313asdf1313
cn: 131313asdf1313
mode: d122
The awk:
$ awk '
BEGIN {
RS="" # read in a block of rows
FS="\n" # newline is the FS
h["cn"]=1 # each key has a fixed buffer slot
h["blobinfo"]=2
h["mode"]=3
}
{
for(i=1;i<=NF;i++) { # for all fields
split($i,a,": ") # split to a array
b[h[a[1]]]=a[2] # store to b uffer
}
printf "%s,%s,",b[1],b[3] # output all but blob, no newline
system("echo " b[2] "| base64") # let system output the newline
delete b # buffer needs to be reset
}' file # well, I used file for testing, you can pipe
ANd the output:
1313131313,d121,YSBzdHJpbmcgd2l0aCBzcGFjZXMK
131313asdf1313,d122,YW4gb3RoZXIgc3RyaW5nIHdpdGggc3BhY2VzCg==
131313asdf1313,d122,Cg==

Related

lowercase and remove punctuation from a csv

I have a giant file (6gb) which is a csv and the rows look like so:
"87687","institute Polytechnic, Brazil"
"342424","university of India, India"
"24343","univefrsity columbia, Bogata, Colombia"
and I would like to remove all punctuation and lower the case of second column yielding:
"87687","institutepolytechnicbrazil"
"342424","universityofindiaindia"
"24343","univefrsitycolumbiabogatacolombia"
what would be the most efficient way to do this on the terminal?
Tried:
cat TEXTFILE | tr -d '[:punct:]' > OUTFILE
problem: resultant is not in lowercase and tr seems to act on both columns not just the ssecond.

With a real CSV parser in Perl, the robust/reliable way, using just one process.
As far as it's line by line, the 6GB requirement of file size should not be an issue.
#!/usr/bin/perl
use strict; use warnings; # harness
use Text::CSV; # load the needed module (install it)
use feature qw/say/; # say = print("...\n")
# create an instance of a new CSV parser
my $csv = Text::CSV->new({ auto_diag => 1 });
# open a File Handle or exit with error
open my $fh, "<:encoding(utf8)", "file.csv" or die "file.csv: $!";
while (my $row = $csv->getline ($fh)) { # parse line by line
$_ = $row->[1]; # parse only column 2
s/[\s[:punct:]]//g; # removes both space(s) and punct(s)
$_ = lc $_; # Lower Case current value $_
$row->[1] = qq/"$_"/; # edit changes and (re)"quote"
say join ",", #$row; # display the whole current row
}
close $fh; # close the File Handle
Output
"87687","institutepolytechnicbrazil"
"342424","universityofindiaindia"
"24343","univefrsitycolumbiabogatacolombia"
install
cpan Text::CSV

Here's an approach using xsv and process substitution:
paste -d, \
<(xsv select 1 infile.csv) \
<(xsv select 2 infile.csv | sed 's/[[:blank:][:punct:]]*//g;s/.*/\L&/')
The sed command first removes all blanks and punctuation, then lowercases the entire match.
This also works when the first field contains blanks and commas, and retains quoting where required.

Using sed
$ sed -E ':a;s/([^,]*,)([^ ,]*)[ ,]([[:alpha:]]+)/\1\L\2\3/;ta' input_file
"87687","institutepolytechnicbrazil"
"342424","universityofindiaindia"
"24343","univefrsitycolumbiabogatacolombia

I suggest using this awk solution, which should work with any version of awk:
awk 'BEGIN{FS=OFS="\",\""} {
gsub(/[^[:alnum:]"]+/, "", $2); $2 = tolower($2)} 1' file
"87687","institutepolytechnicbrazil"
"342424","universityofindiaindia"
"24343","univefrsitycolumbiabogatacolombia"
Details:
We make "," input and output field separators in BEGIN block
gsub(/[^[:alnum:]"]+/, "", $2): Strip all non-alphanumeric characters except "
$2 = tolower($2): Lowercase second column

One GNU awk (for gensub()) idea:
awk '
BEGIN { FS=OFS="\"" }
{ $4=gensub(/[^[:alnum:]]/,"","g",tolower($4)) }
1'
This generates:
"87687","institutepolytechnicbrazil"
"342424","universityofindiaindia"
"24343","univefrsitycolumbiabogatacolombia"

Another sed approach -
sed -E 's/ +//g; s/([^"]),/\1/g; s/"([^"]*)"/"\L\1"/g' file
I don't like how that leaves no flexibility, and makes you rewrite the logic if you find something else you want to remove, though.
Another in awk -
awk -F'[", ]+' '
{ printf "\"%s\",\"", $2;
for(c=3;c<=NF;c++) printf "%s", tolower($c);
print "\"";
}' file
This approach lets you define and add any additional offending characters into the field delimiters without editing your logic.
$: pat=$"[\"',_;:!##\$%)(* -]+"
$: echo "$pat"
["',_;:!##$%)(* -]+
$: cat file
"87687","institute 'Polytechnic, Brazil"
"342424","university; of-India, India"
"24343","univefrsity )columbia, Bogata, Colombia"
$: awk -F"$pat" '{printf "\"%s\",\"", $2; for(c=3;c<=NF;c++) printf "%s", tolower($c); print "\"" }' file
"87687","institutepolytechnicbrazil"
"342424","universityofindiaindia"
"24343","univefrsitycolumbiabogatacolombia"
(I hate the way that lone single quote throws the markup color/format parsing off, lol)

Another way using ruby. Edited the data to show only the second field is modified.
% ruby -r 'csv' -e 'f = open("file");
CSV.parse(f) do |i|
puts "\"" + i[0] + "\",\"" + i[1].downcase.gsub(/[ ,]/,"") + "\"" end'
"8768, 7","institutepolytechnicbrazil"
"342 424","universityofindiaindia"
"243 43","univefrsitycolumbiabogatacolombia"
Using FastCSV gives a huge speedup
gem install fastcsv
% ruby -r 'fastcsv' -e 'f = open("file");
FastCSV.raw_parse(f) do |i|
puts "\"" + i[0] + "\",\"" + i[1].downcase.gsub(/[ ,]/,"") + "\"" end'
"8768, 7","institutepolytechnicbrazil"
"342 424","universityofindiaindia"
"243 43","univefrsitycolumbiabogatacolombia"
Data
% cat file
"8768, 7","institute Polytechnic, Brazil"
"342 424","university of India, India"
"243 43","univefrsity columbia, Bogata, Colombia"

With your shown samples and attempts please try following GNU awk code using match function of it. Using regex (^"[^"]*",")([^"]*)(".*)$ in match function which will create 3 capturing groups and will store the value into arr and respectively I am fetching the values of it later in program to meet OP's requirement.
awk '
match($0,/(^"[^"]*",")([^"]*)(".*)$/,arr){
gsub(/[^[:alnum:]]+/,"",arr[2])
print arr[1] tolower(arr[2]) arr[3]
}
' Input_file

This might work for you (GNU sed):
sed -E s'/("[^"]*",)/\1\n/;h;s/.*\n//;s/[[:punct:] ]//g;s/.*/"\L&"/;H;g;s/\n.*\n//' file
Divide and rule.
Partition the line into two fields, make a copy, process the second field removing punctuation and spaces, re-quote and lowercase and then re-assemble the fields
An alternative, perhaps?
sed -E ':a;s/^("[^"]*",".*)[^[:alpha:]"](.*)/\L\1\2/;ta' file

Here is a way to do so in PHP.
Note: PHP will not output double quotes unless needed by the first column. The second column will never need double quotes, it has no space or special characters.
$max_line_length = 100;
if (($fp = fopen("file.csv", "r")) !== FALSE) {
while (($data = fgetcsv($fp, $max_line_length, ",")) !== FALSE) {
$data[1] = strtolower(preg_replace('/[\s[:punct:]]/', '', $data[1]));
fputcsv(STDOUT, $data, ',', '"');
}
fclose($fp);
}

Extract json value on regex on bash script

How can i get the values inner depends in bash script?
manifest.py
# Commented lines
{
'category': 'Sales/Subscription',
'depends': [
'sale_subscription',
'sale_timesheet',
],
'auto_install': True,
}
Expected response:
sale_subscription sale_timesheet
The major problem is linebreak, i have already tried | grep depends but i can not get the sale_timesheet value.
Im trying to add this values comming from files into a var, like:
DOWNLOADED_DEPS=($(ls -A $DOWNLOADED_APPS | while read -r file; do cat $DOWNLOADED_APPS/$file/__manifest__.py | [get depends value])
Example updated.

If this is your JSON file:
{
"category": "Sales/Subscription",
"depends": [
"sale_subscription",
"sale_timesheet"
],
"auto_install": true
}
You can get the desired result using jq like this:
jq -r '.depends | join(" ")' YOURFILE.json
This uses .depends to extract the value from the depends field, pipes it to join(" ") to join the array with a single space in between, and uses -r for raw (unquoted) output.

If it is not a json file and only string then you can use below Regex to find the values. If it's json file then you can use other methods like Thomas suggested.
^'depends':\s*(?:\[\s*)(.*?)(?:\])$
demo
you can use egrep for this as follows:
% egrep -M '^\'depends\':\s*(?:\[\s*)(.*?)(?:\])$' pathTo\jsonFile.txt
you can read about grep

As #Thomas has pointed out in a comment, the OPs input data is not in JSON format:
$ cat manifest.py
# Commented lines // comments not allowed in JSON
{
'category': 'Sales/Subscription', // single quotes should be replaced by double quotes
'depends': [
'sale_subscription',
'sale_timesheet', // trailing comma at end of section not allowed
],
'auto_install': True, // trailing comma issue; should be lower case "true"
}
And while the title of the question mentions regex, there is no sign of a regex in the question. I'll leave a regex based solution for someone else to come up with and instead ...
One (quite verbose) awk solution based on the input looking exactly like what's in the question:
$ awk -F"'" ' # use single quote as field separator
/depends/ { printme=1 ; next } # if we see the string "depends" then set printme=1
printme && /]/ { printme=0 ; next} # if printme=1 and line contains a right bracket then set printme=0
printme { printf pfx $2; pfx=" " } # if printme=1 then print a prefix + field #2;
# first time around pfx is undefined;
# subsequent passes will find pfx set to a space;
# since using "printf" with no "\n" in sight, all output will stay on a single line
END { print "" } # add a linefeed on the end of our output
' json.dat
This generates:
sale_subscription sale_timesheet

Split CSV into two files based on column matching values in an array in bash / posh

I have a input CSV that I would like to split into two CSV files. If the value of column 4 matches any value in WLTarray it should go in output file 1, if it doesn't it should go in output file 2.
WLTarray:
"22532" "79994" "18809" "21032"
input CSV file:
header1,header2,header3,header4,header5,header6,header7,header8
"83","6344324","585677","22532","Entitlements","BX","22532:718","36721"
"83","1223432","616454","79994","Compliance Stuff","DR","79994:64703","206134"
"83","162217","616454","83223","Data Enrichment","IEO","83223:64701","206475"
"83","267216","616457","79994","Compliance Engine","ABC","79994:64703","206020"
output CSV file1:
header1,header2,header3,header4,header5,header6,header7,header8
"83","6344324","585677","22532","Entitlements","BX","22532:718","36721"
"83","1223432","616454","79994","Compliance Stuff","DR","79994:64703","206134"
"83","267216","616457","79994","Compliance Engine","ABC","79994:64703","206020"
output CSV file2:
header1,header2,header3,header4,header5,header6,header7,header8
"83","162217","616454","83223","Data Enrichment","IEO","83223:64701","206475"
I've been looking at awk to filter this (python & perl not an option in my environment) but I think there is probably a much smarter way:
declare -a WLTarray=("22532" "79994" "18809" "21032")
for WLTvalue in "${WLTarray[#]}" #Everything in the WLTarray will go to $filename-WLT.tmp
do
awk -F, '($4=='$WLTvalue'){print}' $filename.tmp >> $filename-WLT.tmp #move the lines to the WLT file
# now filter to remove non matching values? why not just move the rows entirely?
done

With regular awk you can make use of split and substr (to handle double-quote removal for comparison) and split the csv file as you indicate. For example you can use:
awk 'BEGIN { FS=","; s="22532 79994 18809 21032"
split (s,a," ") # split s into array a
for (i in a) # loop over each index in a
b[a[i]]=1 # use value in a as index for b
}
FNR == 1 { # first record, write header to both output files
print $0 > "output1.csv"
print $0 > "output2.csv"
next
}
substr($4,2,length($4)-2) in b { # 4th field w/o quotes in b?
print $0 > "output1.csv" # write to output1.csv
next
}
{ print $0 > "output2.csv" } # otherwise write to output2.csv
' input.csv
Where:
in the BEGIN {...} rule you set the field separator (FS) to break on comma, and split the string containing your desired output1.csv field 4 matches into the array a, then loops over the values in a using them for the indexes in array b (to allow a simple i in b check);
the first rule is applied to the first records in the file (the header line) which is simply written out to both output files;
the next rule removes the double-quotes surrounding field-4 and then checks if the number in field-4 matches an index in array b. If so the record is written to output1.csv otherwise it is written to output2.csv.
Example Input File
$ cat input.csv
header1,header2,header3,header4,header5,header6,header7,header8
"83","6344324","585677","22532","Entitlements","BX","22532:718","36721"
"83","1223432","616454","79994","Compliance Stuff","DR","79994:64703","206134"
"83","162217","616454","83223","Data Enrichment","IEO","83223:64701","206475"
"83","267216","616457","79994","Compliance Engine","ABC","79994:64703","206020"
Resulting Output Files
$ cat output1.csv
header1,header2,header3,header4,header5,header6,header7,header8
"83","6344324","585677","22532","Entitlements","BX","22532:718","36721"
"83","1223432","616454","79994","Compliance Stuff","DR","79994:64703","206134"
"83","267216","616457","79994","Compliance Engine","ABC","79994:64703","206020"
$ cat output2.csv
header1,header2,header3,header4,header5,header6,header7,header8
"83","162217","616454","83223","Data Enrichment","IEO","83223:64701","206475"

You can use gawk like this:
test.awk
#!/usr/bin/gawk -f
BEGIN {
split("22532 79994 18809 21032", a)
for(i in a) {
WLTarray[a[i]]
}
FPAT="[^\",]+"
}
NR > 1 {
if ($4 in WLTarray) {
print >> "output1.csv"
} else {
print >> "output2.csv"
}
}
Make it executable and run it like this:
chmod +x test.awk
./test.awk input.csv

using grep with a filter file as input was the simplest answer.
declare -a WLTarray=("22532" "79994" "18809" "21032")
for WLTvalue in "${WLTarray[#]}"
do
awkstring="'\$4 == "\"\\\"$WLTvalue\\\"\"" {print}'"
eval "awk -F, $awkstring input.csv >> output.WLT.csv"
done
grep -v -x -f output.WLT.csv input.csv > output.NonWLT.csv

Convert a key:value file w/ comments into JSON document with UNIX tools

I have a file in a subset of YAML with data such as the below:
# This is a comment
# This is another comment
spark:spark.ui.enabled: 'false'
spark:spark.sql.adaptive.enabled: 'true'
yarn:yarn.nodemanager.log.retain-seconds: '259200'
I need to convert that into a JSON document looking like this (note that strings containing booleans and integers still remain strings):
{
"spark:spark.ui.enabled": "false",
"spark:spark.sql.adaptive.enabled": "true",
"yarn:yarn.nodemanager.log.retain-seconds", "259200"
}
The closest I got was this:
cat << EOF > ./file.yaml
> # This is a comment
> # This is another comment
>
>
> spark:spark.ui.enabled: 'false'
> spark:spark.sql.adaptive.enabled: 'true'
> yarn:yarn.nodemanager.log.retain-seconds: '259200'
> EOF
echo {$(cat file.yaml | grep -o '^[^#]*' | sed '/^$/d' | awk -F": " '{sub($1, "\"&\""); print}' | paste -sd "," - )}
which apart from looking rather gnarly doesn't give the correct answer, it returns:
{"spark:spark.ui.enabled": 'false',"spark:spark.sql.adaptive.enabled": 'true',"dataproc:dataproc.monitoring.stackdriver.enable": 'true',"spark:spark.submit.deployMode": 'cluster'}
which, if I pipe to jq causes a parse error.
I'm hoping I'm missing a much much easier way of doing this but I can't figure it out. Can anyone help?

Implemented in pure jq (tested with version 1.6):
#!/usr/bin/env bash
jq_script=$(cat <<'EOF'
def content_for_line:
"^[[:space:]]*([#]|$)" as $ignore_re | # regex for comments, blank lines
"^(?<key>.*): (?<value>.*)$" as $content_re | # regex for actual k/v pairs
"^'(?<value>.*)'$" as $quoted_re | # regex for values in single quotes
if test($ignore_re) then {} else # empty lines add nothing to the data
if test($content_re) then ( # non-empty: match against $content_re
capture($content_re) as $content | # ...and put the groups into $content
$content.key as $key | # string before ": " becomes $key
(if ($content.value | test($quoted_re)) then # if value contains literal quotes...
($content.value | capture($quoted_re)).value # ...take string from inside quotes
else
$content.value # no quotes to strip
end) as $value | # result of the above block becomes $value
{"\($key)": "\($value)"} # and return a map from one key to one value
) else
# we get here if a line didn't match $ignore_re *or* $content_re
error("Line \(.) is not recognized as a comment, empty, or valid content")
end
end;
# iterate over our input lines, passing each one to content_for_line and merging the result
# into the object we're building, which we eventually return as our result.
reduce inputs as $item ({}; . + ($item | content_for_line))
EOF
)
# jq -R: read input as raw strings
# jq -n: don't read from stdin until requested with "input" or "inputs"
jq -Rn "$jq_script" <file.yaml >file.json
Unlike syntax-unaware tools, this can never generate output that isn't valid JSON; and it can easily be extended with application-specific logic (f/e, to emit some values but not others as numeric literals rather than string literals) by adding an additional filter stage to inspect and modify the output of content_for_line.

Here's a no-frills but simple solution:
def tidy: sub("^ *'?";"") | sub(" *'?$";"");
def kv: split(":") | [ (.[:-1] | join(":")), (.[-1]|tidy)];
reduce (inputs| select( test("^ *#|^ *$")|not) | kv) as $row ({};
.[$row[0]] = $row[1] )
Invocation
jq -n -R -f tojson.jq input.txt

You can do it all in awk using gsub and sprintf, for example:
(edit to add "," separating json records)
awk 'BEGIN {ol=0; print "{" }
/^[^#]/ {
if (ol) print ","
gsub ("\047", "\042")
$1 = sprintf (" \"%s\":", substr ($1, 1, length ($1) - 1))
printf "%s %s", $1, $2
ol++
}
END { print "\n}" }' file.yaml
(note: though jq is the proper tool for json formatting)
Explanation
awk 'BEGIN { ol=0; print "{" } call awk setting the output line variable ol=0 for "," output control and printing the header "{",
/^[^#]/ { only match non-comment lines,
if (ol) print "," if the output line ol is greater than zero, output a trailing ","
gsub ("\047", "\042") replace all single-quotes with double-quotes,
$1 = sprintf (" \"%s\":", substr ($1, 1, length ($1) - 1)) add 2 leading spaces and double-quotes around the first field (except for the last character) and then append a ':' at the end.
print $1, $2 output the reformatted fields,
ol++ increment the output line count, and
END { print "}" }' close by printing the "}" footer
Example Use/Output
Just select/paste the awk command above (changing the filename as needed)
$ awk 'BEGIN {ol=0; print "{" }
> /^[^#]/ {
> if (ol) print ","
> gsub ("\047", "\042")
> $1 = sprintf (" \"%s\":", substr ($1, 1, length ($1) - 1))
> printf "%s %s", $1, $2
> ol++
> }
> END { print "\n}" }' file.yaml
{
"spark:spark.ui.enabled": "false",
"spark:spark.sql.adaptive.enabled": "true"
}

Display column from empty column (fixed width and space delimited) in bash

I have log file (in txt) with the following text
UNIT PHYS STATE LOCATION INFO
TCSM-1098 SE-NH -
ETPE-5-0 1403 SE-OU BCSU-1 ACTV FLTY
ETIP-6 1402 SE-NH -
They r delimited by space...
How am I acquired the output like below?
UNIT|PHYS|STATE|LOCATION|INFO
TCSM-1098||SE-NH||-
ETPE-5-0|1403|SE-OU|BCSU-1|ACTV FLTY
ETIP-6|1402|SE-NH||-
Thank in advance
This is what I've tried so far
cat file.txt | awk 'BEGIN { FS = "[[:space:]][[:space:]]+" } {print $1,$2,$3,$4}' | sed 's/ /|/g'
It produces output like this
|UNIT|PHYS|STATE|LOCATION|INFO|
|TCSM-1098|SE-NH|-|
|ETPE-5-0|1403|SE-OU|BCSU-1|ACTV|FLTY
|ETIP-6|1402|SE-NH|-|
The column isn't excatly like what I hope for

It seems it's not delimited but fixed-width format.
$ perl -ple '
$_ = join "|",
map {s/^\s+|\s+$//g;$_}
unpack ("a11 a5 a6 a22 a30",$_);
' <file.txt
how it works
-p switch : loop over input lines (default var: $_) and print it
-l switch : chomp line ending (\n) and add it to output
-e : inline command
unpack function : takes defined format and input line and returns an array
map function : apply block to each element of array: regex to remove heading trailing spaces
join function : takes delimiter and array and gives string
$_ = : affects the string to default var for output

Perl to the rescue!
perl -wE 'my #lengths;
$_ = <>;
push #lengths, length $1 while /(\S+\s*)/g;
$lengths[-1] = "*";
my $f;
say join "|",
map s/^\s+|\s+$//gr,
unpack "A" . join("A", #lengths), $_
while (!$f++ or $_ = <>);' -- infile
The format is not whitespace separated, it's a fixed-width.
The #lengths array will be populated by the widths of the columns taken from the first line of the input. The last column width is replaced with *, as its width can't be deduced from the header.
Then, an unpack template is created from the lengths that's used to parse the file.
$f is just a flag that makes it possible to apply the template to the header line itself.

With GNU awk for FIELDWITDHS to handle fixed-width fields:
awk -v FIELDWIDTHS='11 5 6 22 99' -v OFS='|' '{$1=$1; gsub(/ *\| */,"|"); sub(/ +$/,"")}1' file
UNIT|PHYS|STATE|LOCATION|INFO
TCSM-1098||SE-NH||-
ETPE-5-0|1403|SE-OU|BCSU-1|ACTV FLTY
ETIP-6|1402|SE-NH||-
I think it's pretty clear and self-explanatory but let me know if you have any questions.

Manually, in awk:
$ awk 'BEGIN{split("11 5 6 23 99", cols); }
{s=0;
for (i in cols) {
field = substr($0, s, cols[i]);
s += cols[i];
sub(/^ */, "", field);
sub(/ *$/, "", field);
printf "%s|", field;
};
printf "\n" } ' file
UNIT|PHYS|STATE|LOCATION|INFO|
TCSM-1098||SE-NH||-|
ETPE-5-0|1403|SE-OU|BCSU-1|ACTV FLTY|
ETIP-6|1402|SE-NH||-|
The widths of the columns are set in the BEGIN block, then for each line we take substrings of the line of the required length. s counts the starting position of the current column, the sub() calls remove leading and trailing spaces. The code as such prints a trailing | on each line, but that can be worked around by making the first or last column a special case.
Note that the last field is not like in your output, it's hard to tell where the split between ACTV and FLTY should be. Is that fixed width too, or is the space a separator there?

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

convert 1 field of awk to base64 and leave the rest intact - bash

Related

lowercase and remove punctuation from a csv

Extract json value on regex on bash script

Split CSV into two files based on column matching values in an array in bash / posh

Convert a key:value file w/ comments into JSON document with UNIX tools

Display column from empty column (fixed width and space delimited) in bash

Categories

Resources