How to align text in columns (centered) without removing delimiter? - bash

I would like to adjust my source centered in columns...
Source:
IP | ASN | Prefix | AS Name | CN | Domain | ISP
109.228.12.96 | 8560 | 109.228.0.0/18 | ONEANDONE | DE | fasthosts.com | Fast Hosts LTD
Goal:
IP | ASN | Prefix | AS Name | CN | Domain | ISP
109.228.12.96 | 8560 | 109.228.0.0/18 | ONEANDONE | DE | fasthosts.com | Fast Hosts LTD
I tried different things with the command column...but I have double spaces inside:
cat Source.txt | sed 's/ *| */#| /g' | column -s '#' -t
IP | ASN | Prefix | AS Name | CN | Domain | ISP
109.228.12.96 | 8560 | 109.228.0.0/18 | ONEANDONE | DE | fasthosts.com | Fast Hosts LTD
Is there a way to use column without removing the delimiter...or another solution?
Thanks in advance for your help!

You can also do everything in awk. Save the program to pr.awk and run
awk -f pr.awk input.dat
BEGIN {
FS = "|"
ARGV[2] = "pass=2" # a trick to read file two times
ARGV[3] = ARGV[1]
ARGC=4
pass = 1
}
function trim(s) {
sub(/^[[:space:]]+/, "", s) # remove leading
sub(/[[:space:]]+$/, "", s) # and trailing whitespaces
return s
}
pass == 1 {
for (i=1; i<=NF; i++) {
field = trim($i)
len = length(field)
w[i] = len>w[i] ? len : w[i] # find the maximum width
}
}
pass == 2 {
line = ""
for (i=1; i<=NF; i++) {
field = trim($i)
s = i==NF ? field : sprintf("%-" w[i] "s", field)
sep = i==1 ? "" : " | "
line = line sep s
}
print line
}

column has input sepatator -s and also output seperator -o
so call is like
cat file | column -t -s '|' -o '|'

Related

How to remove rows from a CSV with no data using AWK

I am working with a large csv in a linux shell that I narrowed down to 3 columns:
Species name, Latitude, and Longitude.
awk -F "\t" '{print $10,","$22,",",$23}' occurance.csv > three_col.csv
The file ends up looking like this:
species | Lat | Long |
----------------------|---------|---------
Leucoraja erinacea | 41.0748 | 72.9461|
Brevoortia tyrannus | 39.0748 | 70.9461|
Paralichthys dentatus | | 73.2354|
Paralichthys dentatus | | |
Leucoraja erinacea | 41.0748 | |
Brevoortia tyrannus | | |
Brevoortia tyrannus | | |
Paralichthys dentatus | 39.0748 | 70.9461|
Brevoortia tyrannus | 39.0748 | 70.9461|
However this is what I want it to Look: Notice all species with no lat or long data have been removed
species | Lat | Long |
----------------------|---------|---------
Leucoraja erinacea | 41.0748 | 72.9461|
Brevoortia tyrannus | 39.0748 | 70.9461|
Paralichthys dentatus | 39.0748 | 70.9461|
Brevoortia tyrannus | 39.0748 | 70.9461|
I've been trying to remove rows that are lacking either Lat or Long data. Using a line like this:
awk -F "\t" BEGIN '{print $1,$2,$3}' END '{$2!=" " && $3!= " " }' three_col.csv > del_blanks.csv
but it results in this error even with small changes that I make trying to solve the problem
awk: line 1: syntax error at or near end of line
How can I get rid of these rows with missing data, is this something I need a "for" loop for?
Since I don't know what your occurance.csv file looks like, this is a shot in the dark:
awk -F "\t" '$22 && $23 {print $10,","$22,",",$23}' occurance.csv > three_col.csv
The expression $22 && $23 says: Both field 22 and field 23 must not be blank. It is a condition to filter out those lines which don't qualify. It is a shorthand for $22 != "" && $3 != "".
awk -F "|" '
{
if (substr($1,1,1) == "-"){
e = ""
}else{
e=FS
}
gsub(/[ \t]+$/, "", $2)
gsub(/[ \t]+$/, "", $3)
if(length($2) !=0 && length($3) !=0){
printf "%s%s%-9s%s%-8s%s\n", $1, FS, $2, FS, $3, e
}
}' file.txt
species | Lat | Long |
----------------------|---------|---------
Leucoraja erinacea | 41.0748 | 72.9461|
Brevoortia tyrannus | 39.0748 | 70.9461|
Paralichthys dentatus | 39.0748 | 70.9461|
Brevoortia tyrannus | 39.0748 | 70.9461|
perhaps something like this ?
mawk '($!NF=$10","$22","$23)!~",,$"' FS='\t' OFS=','
You already know only fields 10/22/23 needs to be printed, so you can first overwrite $0 with those just 3 columns, already-split by OFS
afterwards simply use a quick regex check, since 2 consecutive OFS at the tail is the sign $22 and $23 are empty - saving the print statement and pattern-action blocks.

How to outer-join two CSV files, using shell script?

I have two CSV files, like the following:
file1.csv
label,"Part-A"
"ABC mn","2.0"
"XYZ","3.0"
"PQR SN","6"
file2.csv
label,"Part-B"
"XYZ","4.0"
"LMN Wv","8"
"PQR SN","6"
"EFG","1.0"
Desired Output.csv
label,"Part-A","Part-B"
"ABC mn","2.0",NA
"EFG",NA,"1.0"
"LMN Wv",NA,"8"
"PQR SN","6","6"
"XYZ","3.0","4.0"
Currently with the below awk command i am able to combine the matching one's which have entries for label in both the files like PQR and XYZ but unable to append the ones that are not having label values present in both the files:
awk -F, 'NR==FNR{a[$1]=substr($0,length($1)+2);next} ($1 in a){print $0","a[$1]}' file1.csv file2.csv
This solution prints exactly the wished result with any AWK.
Please note that the sorting algorithm is taken from the mawk manual.
# SO71053039.awk
#-------------------------------------------------
# insertion sort of A[1..n]
function isort( A,A_SWAP, n,i,j,hold ) {
n = 0
for (j in A)
A_SWAP[++n] = j
for( i = 2 ; i <= n ; i++)
{
hold = A_SWAP[j = i]
while ( A_SWAP[j-1] "" > "" hold )
{ j-- ; A_SWAP[j+1] = A_SWAP[j] }
A_SWAP[j] = hold
}
# sentinel A_SWAP[0] = "" will be created if needed
return n
}
BEGIN {
FS = OFS = ","
out = "Output.csv"
# read file 1
fnr = 0
while ((getline < ARGV[1]) > 0) {
++fnr
if (fnr == 1) {
for (i=1; i<=NF; i++)
FIELDBYNAME1[$i] = i # e.g. FIELDBYNAME1["label"] = 1
}
else {
LABEL_KEY[$FIELDBYNAME1["label"]]
LABEL_KEY1[$FIELDBYNAME1["label"]] = $FIELDBYNAME1["\"Part-A\""]
}
}
close(ARGV[1])
# read file2
fnr = 0
while ((getline < ARGV[2]) > 0) {
++fnr
if (fnr == 1) {
for (i=1; i<=NF; i++)
FIELDBYNAME2[$i] = i # e.g. FIELDBYNAME2["label"] = 1
}
else {
LABEL_KEY[$FIELDBYNAME2["label"]]
LABEL_KEY2[$FIELDBYNAME2["label"]] = $FIELDBYNAME2["\"Part-B\""]
}
}
close(ARGV[2])
# print the header
print "label" OFS "\"Part-A\"" OFS "\"Part-B\"" > out
# get the result
z = isort(LABEL_KEY, LABEL_KEY_SWAP)
for (i = 1; i <= z; i++) {
result_string = sprintf("%s", LABEL_KEY_SWAP[i])
if (LABEL_KEY_SWAP[i] in LABEL_KEY1)
result_string = sprintf("%s", result_string OFS LABEL_KEY1[LABEL_KEY_SWAP[i]] OFS (LABEL_KEY_SWAP[i] in LABEL_KEY2 ? LABEL_KEY2[LABEL_KEY_SWAP[i]] : "NA"))
else
result_string = sprintf("%s", result_string OFS "NA" OFS LABEL_KEY2[LABEL_KEY_SWAP[i]])
print result_string > out
}
}
Call:
awk -f SO71053039.awk file1.csv file2.csv
=> result file Output.csv with content:
label,"Part-A","Part-B"
"ABC mn","2.0",NA
"EFG",NA,"1.0"
"LMN Wv",NA,"8"
"PQR SN","6","6"
"XYZ","3.0","4.0"
I would like to introduce Miller to you. It is a tool that can do a few things with a few file formats and that is available as a stand-alone binary. You just have to download the archive, put the mlr executable somewhere (preferably in your PATH) and you're done with the installation.
mlr --csv \
join -f file1.csv -j 'label' --ul --ur \
then \
unsparsify --fill-with 'NA' \
then \
sort -f 'label' \
file2.csv
Command parts:
mlr --csv
means that you want to read CSV files and output a CSV format. As an other example, if you want to read CSV files and output a JSON format it would be mlr --icsv --ojson
join -f file1.csv -j 'label' --ul --ur ...... file2.csv
means to join file1.csv and file2.csv on the field label and emit the unmatching records of both files
then is Miller's way of chaining operations
unsparsify --fill-with 'NA'
means to create the fields that didn't exist in each file and fill them with NA. It's needed for the records that had a uniq label
then sort -f 'label'
means to sort the records on the field label
Regarding the updated question: mlr handles the CSV quoting on its own. The only difference with your new expected output is that it removes the superfluous quotes:
label,Part-A,Part-B
ABC mn,2.0,NA
EFG,NA,1.0
LMN Wv,NA,8
PQR SN,6,6
XYZ,3.0,4.0
awk -v OFS=, '{
if(!o1[$1]) { o1[$1]=$NF; o2[$1]="NA" } else { o2[$1]=$NF }
}
END{
for(v in o1) { print v, o1[v], o2[v] }
}' file{1,2}
## output
LMN,8,NA
ABC,2,NA
PQR,6,6
EFG,1,NA
XYZ,3,4
I think this will do nicely.
We suggest gawk script which is standard Linux awk:
script.awk
NR == FNR {
valsStr = sprintf("%s,%s", $2, "na");
rowsArr[$1] = valsStr;
}
NR != FNR && $1 in rowsArr {
split(rowsArr[$1],valsArr);
valsStr = sprintf("%s,%s", valsArr[1], $2);
rowsArr[$1] = valsStr;
next;
}
NR != FNR {
valsStr = sprintf("%s,%s", "na", $2);
rowsArr[$1] = valsStr;
}
END {
printf("%s,%s\n", "label", rowsArr["label"]);
for (rowName in rowsArr) {
if (rowName == "label") continue;
printf("%s,%s\n", rowName, rowsArr[rowName]);
}
}
output:
awk -F, -f script.awk input.{1,2}.txt
label,Part-A,Part-B
LMN,na,8
ABC,2,na
PQR,6,6
EFG,na,1
XYZ,3,4
Since your question was titled with "how to do ... in a shell script?" and not necessarily with awk, I'm going to recommend GoCSV, a command-line tool with several sub-commands for processing CSVs (delimited files).
It doesn't have a single command that can accomplish what you need, but you can compose a number of commands to get the correct result.
The core of this solution is the join command which can perform inner (default), left, right, and outer joins; you want an outer join to keep the non-overlapping elements:
gocsv join -c 'label' -outer file1.csv file2.csv > joined.csv
echo 'Joined'
gocsv view joined.csv
Joined
+-------+--------+-------+--------+
| label | Part-A | label | Part-B |
+-------+--------+-------+--------+
| ABC | 2 | | |
+-------+--------+-------+--------+
| XYZ | 3 | XYZ | 4 |
+-------+--------+-------+--------+
| PQR | 6 | PQR | 6 |
+-------+--------+-------+--------+
| | | LMN | 8 |
+-------+--------+-------+--------+
| | | EFG | 1 |
+-------+--------+-------+--------+
The data-part is correct, but it'll take some work to get the columns correct, and to get the NA values in there.
Here's a complete pipeline:
gocsv join -c 'label' -outer file1.csv file2.csv \
| gocsv rename -c 1 -names 'Label_A' \
| gocsv rename -c 3 -names 'Label_B' \
| gocsv add -name 'label' -t '{{ list .Label_A .Label_B | compact | first }}' \
| gocsv select -c 'label','Part-A','Part-B' \
| gocsv replace -c 'Part-A','Part-B' -regex '^$' -repl 'NA' \
| gocsv sort -c 'label' \
> final.csv
echo 'Final'
gocsv view final.csv
which gets us the correct, final, file:
Final pipeline
+-------+--------+--------+
| label | Part-A | Part-B |
+-------+--------+--------+
| ABC | 2 | NA |
+-------+--------+--------+
| EFG | NA | 1 |
+-------+--------+--------+
| LMN | NA | 8 |
+-------+--------+--------+
| PQR | 6 | 6 |
+-------+--------+--------+
| XYZ | 3 | 4 |
+-------+--------+--------+
There's a lot going on in that pipeline, the high points are:
Merge the the two label fields
| gocsv rename -c 1 -names 'Label_A' \
| gocsv rename -c 3 -names 'Label_B' \
| gocsv add -name 'label' -t '{{ list .Label_A .Label_B | compact | first }}' \
Pare-down to just the 3 columns you want
| gocsv select -c 'label','Part-A','Part-B' \
Add the NA values and sort by label
| gocsv replace -c 'Part-A','Part-B' -regex '^$' -repl 'NA' \
| gocsv sort -c 'label' \
I've made a step-by-step explanation at this Gist.
You mentioned join in the comment on my other answer, and I'd forgotten about this utility:
#!/bin/sh
rm -f *sorted.csv
# Join two files, normally inner-join only, but
# - `-a 1 -a 2`: include "unpaired lines" from file 1 and file 2
# - `-1 1 -2 1`: the first column from each is the "join column"
# - `-o 0,1.2,2.2`: output the "join column" (0) and the second fields from files 1 and 2
join -a 1 -a 2 -1 1 -2 1 -o '0,1.2,2.2' -t, file1.csv file2.csv > joined.csv
# Add NA values
cat joined.csv | sed 's/,,/,NA,/' | sed 's/,$/,NA/' > unsorted.csv
# Sort, pull out header first
head -n 1 unsorted.csv > sorted.csv
# Then sort remainder
tail -n +2 unsorted.csv | sort -t, -k 1 >> sorted.csv
And, here's sorted.csv
+--------+--------+--------+
| label | Part-A | Part-B |
+--------+--------+--------+
| ABC mn | 2.0 | NA |
+--------+--------+--------+
| EFG | NA | 1.0 |
+--------+--------+--------+
| LMN Wv | NA | 8 |
+--------+--------+--------+
| PQR SN | 6 | 6 |
+--------+--------+--------+
| XYZ | 3.0 | 4.0 |
+--------+--------+--------+
As #Fravadona stated correctly in his comment, for CSV files that can contain the delimiter, a newline or double quotes inside a field a proper CSV parser is needed.
Actually, only two functions are needed: One for unquoting CSV fields to normal AWK fields and one for quoting the AWK fields to write the data back to CSV fields.
I have written a variant of my previous answer (https://stackoverflow.com/a/71056926/18135892) that uses Ed Morton's CSV parser (https://stackoverflow.com/a/45420607/18135892 with the gsub variant which works with any AWK version) to give an example of proper CSV parsing:
This solution prints the wished result correctly sorted with any AWK.
Please note that the sorting algorithm is taken from the mawk manual.
# SO71053039_2.awk
# unquote CSV:
# Ed Morton's CSV parser: https://stackoverflow.com/a/45420607/18135892
function buildRec( fpat,fldNr,fldStr,done) {
CurrRec = CurrRec $0
if ( gsub(/"/,"&",CurrRec) % 2 ) {
# The string built so far in CurrRec has an odd number
# of "s and so is not yet a complete record.
CurrRec = CurrRec RS
done = 0
}
else {
# If CurrRec ended with a null field we would exit the
# loop below before handling it so ensure that cannot happen.
# We use a regexp comparison using a bracket expression here
# and in fpat so it will work even if FS is a regexp metachar
# or a multi-char string like "\\\\" for \-separated fields.
CurrRec = CurrRec ( CurrRec ~ ("[" FS "]$") ? "\"\"" : "" )
$0 = ""
fpat = "([^" FS "]*)|(\"([^\"]|\"\")+\")"
while ( (CurrRec != "") && match(CurrRec,fpat) ) {
fldStr = substr(CurrRec,RSTART,RLENGTH)
# Convert <"foo"> to <foo> and <"foo""bar"> to <foo"bar>
if ( sub(/^"/,"",fldStr) && sub(/"$/,"",fldStr) ) {
gsub(/""/, "\"", fldStr)
}
$(++fldNr) = fldStr
CurrRec = substr(CurrRec,RSTART+RLENGTH+1)
}
CurrRec = ""
done = 1
}
return done
}
# quote CSV:
# Quote according to https://datatracker.ietf.org/doc/html/rfc4180 rules
function csvQuote(field, sep) {
if ((field ~ sep) || (field ~ /["\r\n]/)) {
gsub(/"/, "\"\"", field)
field = "\"" field "\""
}
return field
}
#-------------------------------------------------
# insertion sort of A[1..n]
function isort( A,A_SWAP, n,i,j,hold ) {
n = 0
for (j in A)
A_SWAP[++n] = j
for( i = 2 ; i <= n ; i++)
{
hold = A_SWAP[j = i]
while ( A_SWAP[j-1] "" > "" hold )
{ j-- ; A_SWAP[j+1] = A_SWAP[j] }
A_SWAP[j] = hold
}
# sentinel A_SWAP[0] = "" will be created if needed
return n
}
BEGIN {
FS = OFS = ","
# read file 1
fnr = 0
while ((getline < ARGV[1]) > 0) {
if (! buildRec())
continue
++fnr
if (fnr == 1) {
for (i=1; i<=NF; i++)
FIELDBYNAME1[$i] = i # e.g. FIELDBYNAME1["label"] = 1
}
else {
LABEL_KEY[$FIELDBYNAME1["label"]]
LABEL_KEY1[$FIELDBYNAME1["label"]] = $FIELDBYNAME1["Part-A"]
}
}
close(ARGV[1])
# read file2
fnr = 0
while ((getline < ARGV[2]) > 0) {
if (! buildRec())
continue
++fnr
if (fnr == 1) {
for (i=1; i<=NF; i++)
FIELDBYNAME2[$i] = i # e.g. FIELDBYNAME2["label"] = 1
}
else {
LABEL_KEY[$FIELDBYNAME2["label"]]
LABEL_KEY2[$FIELDBYNAME2["label"]] = $FIELDBYNAME2["Part-B"]
}
}
close(ARGV[2])
# print the header
print "label" OFS "Part-A" OFS "Part-B"
# get the result
z = isort(LABEL_KEY, LABEL_KEY_SWAP)
for (i = 1; i <= z; i++) {
result_string = sprintf("%s", csvQuote(LABEL_KEY_SWAP[i], OFS))
if (LABEL_KEY_SWAP[i] in LABEL_KEY1)
result_string = sprintf("%s", result_string OFS csvQuote(LABEL_KEY1[LABEL_KEY_SWAP[i]], OFS) OFS (LABEL_KEY_SWAP[i] in LABEL_KEY2 ? csvQuote(LABEL_KEY2[LABEL_KEY_SWAP[i]], OFS) : "NA"))
else
result_string = sprintf("%s", result_string OFS "NA" OFS csvQuote(LABEL_KEY2[LABEL_KEY_SWAP[i]], OFS))
print result_string
}
}
Call:
awk -f SO71053039_2.awk file1.csv file2.csv
=> result (superfluous quotes according to CSV rules are omitted):
label,Part-A,Part-B
ABC mn,2.0,NA
EFG,NA,1.0
LMN Wv,NA,8
PQR SN,6,6
XYZ,3.0,4.0

Match string in file1 with string in file2

my data examples are
1.txt
MTQZ3CODT0SQKGE3QE6B | j t | j | t | 22312 | stimpy | EST | 8 | 20 | text | list | 0 | | 2002-08-22 13:07:05
2.txt
MTQZ3CODT0SQKGE3QE6B | joe#example.com
desired output
joe#example.com | j t | j | t | 22312 | stimpy | EST | 8 | 20 | text | list | 0 | | 2002-08-22 13:07:05
I suppose to match & replace 1st column from 1.txt
with 2nd column in 2.txt
so far i did try :
awk 'BEGIN { while((getline < "file2.txt") > 0) a[$1]=$3 } { $1 = a[$1] } 1' file1.txt
Its work well but after 12hours of running i just finalise only 1GB looks very slow
INFO: file1.txt=7GB file2.txt=4GB my memory 16GB
I'm not sure what cause the slowly thing but i hope if there's another fast way then i'm using of awk
will be helpfull.
Thanks!!
Note: I'm running out of memory is there another way to do it
and that's to not have an array at all?
Also in my case lines are randomly and not in the same lines!
$ join <(sort 2.txt) <(sort 1.txt) | cut -d' ' -f3-
joe#example.com | j t | j | t | 22312 | stimpy | EST | 8 | 20 | text | list | 0 | | 2002-08-22 13:07:05
If that's not all you need then edit your question to provide more truly representative sample input/output including cases that this doesn't work for.
You may use this awk:
awk -F ' *\\| *' -v OFS=' | ' '
FNR == NR {
map[$1]=$2
next
}
$1 in map {
$1 = map[$1]
} 1' 2.txt 1.txt
joe#example.com | j t | j | t | 22312 | stimpy | EST | 8 | 20 | text | list | 0 | | 2002-08-22 13:07:05

awk command to print multiple columns using for loop

I am having a single file in which it contains 1st and 2nd column with item code and name, then from 3rd to 12th column which contains its 10 days consumption quantity continuously.
Now i need to convert that into 10 different files. In each the 1st and 2nd column should be the same item code and item name and the 3rd column will contain the consumption quantity of one day in each..
input file:
Code | Name | Day1 | Day2 | Day3 |...
10001 | abcd | 5 | 1 | 9 |...
10002 | degg | 3 | 9 | 6 |...
10003 | gxyz | 4 | 8 | 7 |...
I need the Output in different file as
file 1:
Code | Name | Day1
10001 | abcd | 5
10002 | degg | 3
10003 | gxyz | 4
file 2:
Code | Name | Day2
10001 | abcd | 1
10002 | degg | 9
10003 | gxyz | 8
file 3:
Code | Name | Day3
10001 | abcd | 9
10002 | degg | 6
10003 | gxyz | 7
and so on....
I wrote a code like this
awk 'BEGIN { FS = "\t" } ; {print $1,$2,$3}' FILE_NAME > file1;
awk 'BEGIN { FS = "\t" } ; {print $1,$2,$4}' FILE_NAME > file2;
awk 'BEGIN { FS = "\t" } ; {print $1,$2,$5}' FILE_NAME > file3;
and so on...
Now i need to write it with in a 'for' or 'while' loop which would be faster...
I dont know the exact code, may be like this..
for (( i=3; i<=NF; i++)) ; do awk 'BEGIN { FS = "\t" } ; {print $1,$2,$i}' input.tsv > $i.tsv; done
kindly help me to get the output as i explained.
If you absolutely need to to use a loop in Bash, then your loop can be fixed like this:
for ((i = 3; i <= 10; i++)); do awk -v field=$i 'BEGIN { FS = "\t" } { print $1, $2, $field }' input.tsv > file$i.tsv; done
But it would be really better to solve this using pure awk, without shell at all:
awk -v FS='\t' '
NR == 1 {
for (i = 3; i < NF; i++) {
fn = "file" (i - 2) ".txt";
print $1, $2, $i > fn;
print "" >> fn;
}
}
NR > 2 {
for (i = 3; i < NF; i++) {
fn = "file" (i - 2) ".txt";
print $1, $2, $i >> fn;
}
}' inputfile
That is, when you're on the first record,
create the output files by writing the header line and a blank line (as in specified in your question).
For the 3rd and later records, append to the files.
Note that the code in your question suggests that the fields in the file are separated by tabs, but the example files seem to use | padded with variable number of spaces. It's not clear which one is your actual case. If it's really tab-separated, then the above code will work. If in fact it's as the example inputs, then change the first line to this:
awk -v OFS=' | ' -v FS='[ |]+' '
bash + cut solution:
input.tsv test content:
Code | Name | Day1 | Day2 | Day3
10001 | abcd | 5 | 1 | 9
10002 | degg | 3 | 9 | 6
10003 | gxyz | 4 | 8 | 7
day_splitter.sh script:
#!/bin/bash
n=$(cat $1 | head -1 | awk -F'|' '{print NF}') # total number of fields
for ((i=3; i<=$n; i++))
do
fn="Day"$(($i-2)) # file name containing `Day` number
$(cut -d'|' -f1,2,$i $1 > $fn".txt")
done
Usage:
bash day_splitter.sh input.tsv
Results:
$cat Day1.txt
Code | Name | Day1
10001 | abcd | 5
10002 | degg | 3
10003 | gxyz | 4
$cat Day2.txt
Code | Name | Day2
10001 | abcd | 1
10002 | degg | 9
10003 | gxyz | 8
$cat Day3.txt
Code | Name | Day3
10001 | abcd | 9
10002 | degg | 6
10003 | gxyz | 7
In pure awk:
$ awk 'BEGIN{FS=OFS="|"}{for(i=3;i<=NF;i++) {f="file" (i-2); print $1,$2,$i >> f; close(f)}}' file
Explained:
$ awk '
BEGIN {
FS=OFS="|" } # set delimiters
{
for(i=3;i<=NF;i++) { # loop the consumption fields
f="file" (i-2) # create the filename
print $1,$2,$i >> f # append to target file
close(f) } # close the target file
}' file

Check a Value and Tag according to that value and append in same row using shell

I have a file as
NUMBER|05-1-2016|05-2-2016|05-3-2016|05-4-2016|
0000000 | 0 | 225.993 | 0 | 324|
0003450 | 89| 225.993 | 0 | 324|
0005350 | 454 | 225.993 | 54 | 324|
In example There are four dates in the header
I want to check the value under the date for the field 1 'number' and tag values according to that using shell
example if value is between 0-100 tag 'L' and if greater than 100 , tag 'H'
So the output should be like
NUMBER|05-1-2016|05-2-2016|05-3-2016|05-4-2016|05-1-2016|05-2-2016|05-3-2016|05-4-2016|
0000000 | 0 | 225.993 | 0 | 324| L | H | L | H|
0003450 | 89| 225.993 | 0 | 324|L | H | L | H|
0005350 | 454 | 225.993 | 54 | 324|H | H | L | H|
A quick and dirty example, that:
sets the input and output field separator (-F and OFS below) to |,
prints the the header (record with NR==1)
for all others prints the fields 1-5, and then executes function lh for fields 2-5
defines the function lh, as one returning L for values < 100, and H for all others
Code:
awk -F \| '
BEGIN {OFS="|"}
NR==1 {print}
NR > 1 {print $1, $2, $3, $4, $5, lh($2), lh($3), lh($4), lh($5) }
function lh(val) { return (val < 100) ? "L" : "H"}
' file.txt
Alternative function lh:
function lh(val) {
result = "";
if (val < 100) {
result = "L";
} else {
result = "H";
}
return result;
}

Resources