Append a specific identifier to data in a tab delimited text file - bash

Essentially I have something like this:
B3 LPC1030_64571 LPC1283_613422
B2 LPC107_67093 LPC174_1161466 LPC1283_579823 LPC5_2182288 LPC1378_340850 LPC203_5679639 LPC107_67396 LPC107_67535 LPC107_70165 LPC107_77297 LPC107_80176 LPC107_81524 LPC107_88715 AMZ216_267328 AMZ216_268028
B1 ...
For those in each Bx row I want to append *".Bx"

A simple awk script will do that:
awk '{for(i=2;i<=NF;i++){$i=$i "." $1}; print}' <infile
or more nicely formated:
awk '{
for(i=2;i<=NF;i++) #NF is the number of fields
{
$i = $i "." $1 #$i now is the text in each field exept the first
};
print #print the modified fields to stdout
}' <infile

Related

Appending result of function on another field into csv using shell script, awk

I have a csv file stored as a temporary variable in a shell script (*.sh).
Let's say the data looks like this:
Account,Symbol,Price
100,AAPL US,200
102,SPY US,500
I want to add a fourth column, "Type", which is the result of a shell function "foobar". Run from the command line or a shell script itself:
$ foobar "AAPL US"
"Stock"
$ foobar "SPY US"
"ETF"
How do I add this column to my csv, and populate it with calls to foobar which take the second column as an argument? To clarify, this is my ideal result post-script:
Account,Symbol,Price,Type
100,AAPL US,200,Common Stock
102,SPY US,500,ETF
I see many examples online involving such a column addition using awk, and populating the new column with fixed values, conditional values, mathematical derivations from other columns, etc. - but nothing that calls a function on another field and stores its output.
You may use this awk:
export -f foobar
awk 'BEGIN{FS=OFS=","} NR==1{print $0, "Type"; next} {
cmd = "foobar \"" $2 "\""; cmd | getline line; close(cmd);
print $0, line
}' file.csv
Account,Symbol,Price,Type
100,AAPL US,200,Common Stock
102,SPY US,500,ETF
#anubhavas answer is a good approach so please don't change the accepted answer as I'm only posting this as an answer as it's too big and in need of formatting to fit in a comment.
FWIW I'd write his awk script as:
awk '
BEGIN { FS=OFS="," }
NR==1 { type = "Type" }
NR > 1 {
cmd = "foobar \047" $2 "\047"
type = ((cmd | getline line) > 0 ? line : "ERROR")
close(cmd)
}
{ print $0, type }
' file.csv
to:
better protect $2 from shell expansion, and
protect from silently printing the previous value if/when cmd | getline fails, and
consolidate the print statements to 1 line so it's easy to change for all output lines if/when necessary
awk to the rescue!
$ echo "Account,Symbol,Price
100,AAPL US,200
102,SPY US,500" |
awk -F, 'NR>1{cmd="foobar "$2; cmd | getline type} {print $0 FS (NR==1?"Type":type)}'
Not sure you need to quote the input to foobar
Another way not using awk:
paste -d, input.csv <({ read; printf "Type\n"; while IFS=, read -r _ s _; do foobar "$s"; done; } < input.csv)

How to preserve new lines while printing to a text file in shell?

I have to print out some values in a txt file.
they are of the following format
input="Sno;Name;Field1;Field2"
However the output must be:
Sno-Name
FIELDS ALLOCATED:
Field1
Field2
I do it like so:
echo $input | $(awk -F';' '{print $1"-"$2}') >>$txtfile
echo "FIELDS ALLOCATED:">>$txtfile
echo "$input" | cut -d';' -f 3,4 >>$txtfile
This is easy. However, the problem is that Field1 or Field2 can contain new lines. Whenever this happens, the cut or awk doesn't read the field number 4 and treats it as a new line. Do help how can I print the two fields (with new lines preserved) from the given input format.
If the input is well-formed, you can collect input lines until you have four fields.
awk -F ';' 'r { $0 = r ORS $0 }
NR<4 { next }
{ print $1 "-" $2
print "FIELDS ALLOCATED:"
print $3; print $4
print ""; r="" }' file
Single gnu-awk can do the job with FPAT and empty RS:
input=$'Sno;Name;Field1\nFoo;Field2'
awk -v RS= -v FPAT='[^;]+' '{
printf "%s-%s\nFIELDS ALLOCATED:\n%s\n%s\n", $1, $2, $3, $4}' <<< "$input"
Sno-Name
FIELDS ALLOCATED:
Field1
Foo
Field2
Just change the input record separator in awk - RS. < and > added around each field for clarity.
EDIT: removed extra trailing newline by adding ';' at the end of the here-doc data, plus another condition.
input="Sno;Name;Fie
ld1;Fi
eld2"
awk 'BEGIN{RS=";"} NR==1{f1=$0};
NR==2{print f1 "-" $0; print "FIELDS ALLOCATED:"}
$0=="\n"{next}
NR>2{print "<" $0 ">"}' <<< "$input;"
Gives:
Sno-Name
FIELDS ALLOCATED:
<Fie
ld1>
<Fi
eld2>
input=$'Sno;Name;Field1\nFoo;Field2'
awk 'BEGIN{ RS = "\n\n+" ; FS = ";" } { print $1"-"$2; for(i=3;i<=NF;i++) {print $i}}' <<<"$input"
Since it does not know how many field I can give, i added a for loop until NF and changed the RS to a blank line instead of newline.

Unix row to column format with string prefix and post fix

I have the requirement to convert row string data to column format and pre/postfix specific strings. The data string in file has 4 major fixed columns (separated by ";") and each column is further divided in two sections (separated by ":").
E.g.
Source data file:
A100:T100;B100:T200;A200:T300;B200:T400
Output from file should be:
TABa:BatchID=A100:TagId=T100:ProcId=1
TABb:BatchID=B100:TagId=T200:ProcId=2
TABc:BatchID=A200:TagId=T300:ProcId=3
TABd:BatchID=B200:TagId=T400:ProcId=4
Meanwhile I am trying with following code:
String="A100:T100;B100:T200;A200:T300;B200:T400"
> File.txt
for deploy in $(echo $String | tr ";" "\n")
do
echo $deploy >> File.txt
done
cat File.txt | awk 'BEGIN { FS=":"; OFS=":" } NR==1{ print "TABa:BatchID="$1,$2 } NR==2{ print "TABb:BatchID="$1,$2 }'
printf handles this:
$ awk -F: '{sub(/\n/,""); printf "TAB%c:BatchID=%s:TagId=%s:ProcId=%i\n",(NR+96),$1,$2,NR }' RS=';' File.txt
TABa:BatchID=A100:TagId=T100:ProcId=1
TABb:BatchID=B100:TagId=T200:ProcId=2
TABc:BatchID=A200:TagId=T300:ProcId=3
TABd:BatchID=B200:TagId=T400:ProcId=4
How it works
-F:
This sets the field separator to a colon: :.
sub(/\n/,"")
This removes newline characters.
printf "TAB%c:BatchID=%s:TagId=%s:ProcId=%i\n",(NR+96),$1,$2,NR
This does all the work. It makes use of the record number, NR, and the first and second fields and prints the output that you want.
RS=';'
This tells awk to use a semicolon, ;, as the record separator.

How to split string by a delimiter in unix

I have the following string and I want to split it into 3 parts:
Text:
<http://rdf.freebase.com/ns/american_football.football_player.footballdb_id> <http://www.w3.org/2000/01/rdf-schema#label> "footballdb ID"#en
Output should be
$1 = <http://rdf.freebase.com/ns/american_football.football_player.footballdb_id>
$2 = <http://www.w3.org/2000/01/rdf-schema#label>
$3 = "footballdb ID"#en
basically an splitting a RDF'sh tuple into its parts.
I want to do this via a UNIX script , but I do not know sed or awk.
Please help.
If your input fields are tab-separated, this will produce your posted desired output:
$ awk -F'\t' '{ for (i=1;i<=NF;i++) printf "$%d = %s\n", i, $i }' file
$1 = <http://rdf.freebase.com/ns/american_football.football_player.footballdb_id>
$2 = <http://www.w3.org/2000/01/rdf-schema#label>
$3 = "footballdb ID"#en
Alternatively this might be what you want if your fields are not tab-separated:
$ cat tst.awk
{
gsub(/<[^>]+>/,"&\n")
split($0,a,/[[:space:]]*\n[[:space:]]*/)
for (i=1; i in a; i++)
printf "$%d = %s\n", i, a[i]
}
$
$ awk -f tst.awk file
$1 = <http://rdf.freebase.com/ns/american_football.football_player.footballdb_id>
$2 = <http://www.w3.org/2000/01/rdf-schema#label>
$3 = "footballdb ID"#en
If that's not how your input fields are separated and/or not what you want output, update your question to clarify.
read A B C <<< $string
echo -e "\$1 = $A\n\$2 = $B\n\$3 = $C"
Output:
$1 = <http://rdf.freebase.com/ns/american_football.football_player.footballdb_id>
$2 = <http://www.w3.org/2000/01/rdf-schema#label>
$3 = "footballdb ID"#en
Whatever you use to split the string needs to recognize not only the white space but also the convention that the double quote "protects" the blank space before ID and prevents it from splitting the fields. I fear this computation may be beyond what is possible with sed. You could do it in awk, but awk provides little special advantage here.
You show a space-separated format with quotes. A similar problem is to parse comma-separated format with quotes. Related questions:
Parse CSV with double quote in some cases
How to split csv whose columns may contain ,
echo "your string" |awk -F" " '{ print $1 $2 $3 $4}'
awk '{ print "$1 = " $1 "\n$2 = " $2 "\n$3 = " $3 }' filename

How to print selected columns separated by tabs?

I have a txt file with columns separated by tabs and based on that file, I want to create a new file that only contains information from some of the columns.
This is what I have now:
awk '{ print $1, $5 }' filename > newfilename
That works except that when column 5 contains spaces e.g 123 Street, only 123 shows up and the street is considered as another column.
How can I achieve what I'm trying to do?
You can specify the field separator as tab:
awk 'BEGIN { FS = "\t" } ; { print $1, $5 }' filename > newfilename
Or from the command line like this:
awk -F"\t" '{ print $1, $5 }' filename > newfilename
What about simple cut shell comand?
very simple yet does the job
cut -d "\t" -f 1,5 filename > newfilename
You can use Bash syntax in the following way:
while IFS=$'\t' read -a cols; do
printf "%s\t%s\n" "${cols[0]}" "${cols[4]}";
done < in.txt > newfile.txt
This will save 1st and 5th columns separated by tabs into the new file.

Resources