Add length of following line to current line in bash - bash

I have a small sample data set test1.faa
>PROKKA_00001_A1#hypothetical#protein
MTIALHLTAVLAFAALAGCGANDSDPGPGGVTVSEARALDQAAEMLEKRGRSPADENAEQAERLRREQAQARTPGQPPEQALQQDGASAPE
>PROKKA_00002_A1#Cystathionine#beta-lyase
MHRFGGMVTAILKGGLDDARRFLERCELFALAESLGGVESLIEHPAIMTHASVPREIREALGISDGLVRLSVGIEDADDLLAELETALA
>PROKKA_00003_A1#hypothetical#protein
MVPIVSAAPVFTLLLTVAVFRRERLTAGRIAAVAVVVPSVILIALGH
and I would like to add the length of the following line to the headerline, followed by next line, such as
>PROKKA_00001_A1#hypothetical#protein_92
MTIALHLTAVLAFAALAGCGANDSDPGPGGVTVSEARALDQAAEMLEKRGRSPADENAEQAERLRREQAQARTPGQPPEQALQQDGASAPE
I tried to do this with awk, but it returns the following error:
awk: >PROKKA_00001_A1#hypothetical#protein: No such file or directory
I assume it is related to the >in the beginning? But I need it in the output file.
This is the code I tried:
#!/bin/bash
cat test1.faa | while read line
do
headerline=$(awk '/>/{print $0}' $line)
echo -e "this is the headerline \n ${headerline}"
fastaline=$(awk '!/>/{print $0}' $line)
echo -e "this is the fastaline \n ${fastaline}"
fastaline_length=$(awk -v linelength=$fastaline '{print length(linelength)}')
echo -e "this is length of fastaline \n ${fastaline_length}"
echo "${headerline}_${fastaline_length}"
echo $fastaline
done
Any suggestions on how to do this?

Could you please try following(considering that your actual Input_file is same as shown sample).
awk '/^>/{value=$0;next} {print value"_"length($0) ORS $0;value=""}' Input_file

this awk command would do what you want
awk '
/^>/ {
getline next_line
print $0 "_" length(next_line)
print next_line
}
' test1.faa

Related

Comma-delimited text: read last two, add them, place them at end of each line

I have a file that looks like this:
GOES-15,167,170,+,3
GOES-14,150,146,-,4
GOES-13,100,100,-,0
GOES-WEST,-160,-170,-,10
I would like to read the last two elements of each line (for example + and 3 on the first line) and add them together side by side (+3) and put it at the end of the line with a comma delimit, so like this:
GOES-15,167,170,+,3,+3
Here is what I am trying:
#!/bin/bash
file=weather_sats.txt
while read line
do
ADD=$(awk -F, '{print $4$5}')
sed -i 's/$/,$ADD/' $file
done < $file
exit 0
This doesn't work, since I get "$ADD" at end of each line.
This might do what you wanted.
awk -F, '{print $0","$(NF-1)$NF}' file.txt
Use pure awk:
awk -F, 'BEGIN { OFS="," } {print $0, $4$5 }'
That produces the required output.

Extract specific substring in shell

I have a file which contains following line:
ro fstype=sd timeout=10 console=ttymxc1,115200 show=true
I'd like to extract and store fstype attribue "sd" in a variable.
I did the job using bash
IFS=" " read -a args <<< file
for arg in ${args[#]}; do
if [[ "$arg" =~ "fstype" ]]; then
id=$(cut -d "=" -f2 <<< "$arg")
echo $id
fi
done
and following awk command in another shell script:
awk -F " " '{print $2}' file | cut -d '=' -f2
Because 'fstype' argument position and file content can differ, how to do the same things and keep compatibility in shell script ?
Could you please try following.
awk 'match($0,/fstype=[^ ]*/){print substr($0,RSTART+7,RLENGTH-7)}' Input_file
OR more specifically to handle any string before = try following:
awk '
match($0,/fstype=[^ ]*/){
val=substr($0,RSTART,RLENGTH)
sub(/.*=/,"",val)
print val
val=""
}
' Input_file
With sed:
sed 's/.*fstype=\([^ ]*\).*/\1/' Input_file
awk code's explanation:
awk ' ##Starting awk program from here.
match($0,/fstype=[^ ]*/){ ##Using match function to match regex fstype= till first space comes in current line.
val=substr($0,RSTART,RLENGTH) ##Creating variable val which has sub-string of current line from RSTART to till RLENGTH.
sub(/.*=/,"",val) ##Substituting everything till = in value of val here.
print val ##Printing val here.
val="" ##Nullifying val here.
}
' Input_file ##mentioning Input_file name here.
Any time you have tag=value pairs in your data I find it best to start by creating an array (f[] below) that maps those tags (names) to their values:
$ awk -v tag='fstype' -F'[ =]' '{for (i=2;i<NF;i+=2) f[$i]=$(i+1); print f[tag]}' file
sd
$ awk -v tag='console' -F'[ =]' '{for (i=2;i<NF;i+=2) f[$i]=$(i+1); print f[tag]}' file
ttymxc1,115200
With the above approach you can do whatever you like with the data just by referencing it by it's name as the index in the array, e.g.:
$ awk -F'[ =]' '{
for (i=2;i<NF;i+=2) f[$i]=$(i+1)
if ( (f["show"] == "true") && (f["timeout"] < 20) ) {
print f["console"], f["fstype"]
}
}' file
ttymxc1,115200 sd
If your data has more than 1 row and there can be different fields on each row (doesn't appear to be true for your data) then add delete f as the first line of the script.
If the key and value can be matched by the regex fstype=[^ ]*, grep and -o option which extracts matched pattern can be used.
$ grep -o 'fstype=[^ ]*' file
fstype=sd
In addition, regex \K can be used with -P option (please make sure this option is only valid in GNU grep).
Patterns that are to the left of \K are not shown with -o.
Therefore, below expression can extract the value only.
$ grep -oP 'fstype=\K[^ ]*' file
sd

Shell script to match a string and print the next string on aix machine

I have a following line as input.
Parsing events:hostname='tom';Ipaddress='10.10.10.1';situation_name='sgd_abc_app_a';type='General';
Like this there are many fields in a line separated by a delimiter as semi-colon. (But starting with Parsing Events:)
I want to extract onlysgd_abc_app_a when it matches situation_name.
Thanks
Kulli
Try
sed -n 's/^.*situation_name=//p' input_file| awk -F "'" '{print $2}'
For your request, it would work no matter the position of situation_name
$ awk '/situation_name/{match($0,/situation_name=[^;]+/); print substr($0,RSTART+16,RLENGTH-17)}' file
sgd_abc_app_a
awk solution:
s="Parsing events: hostname='tom';Ipaddress='10.10.10.1';situation_name='sgd_abc_app_a';type='General';"
awk -F'[=;]' '{ gsub("\047","",$6); print $6 }' <<< $s
Or with sed:
sed -n "s/^Parsing events:.*situation_name='\([^']*\).*/\1/p" <<< $s
The output:
sgd_abc_app_a

AWK - Print complete input string after comparison

I have a file a.text:
hello world
my world
hello universe
I want to print the complete string if the second word is "world":
[root#sc-rdops-vm18-dhcp-57-128:/var/log] cat a | awk -F " " '{if($2=="world") print $1}'
hello
my
But the output which I want is:
[root#sc-rdops-vm18-dhcp-57-128:/var/log] cat a | awk -F " " '{if($2=="world") print <Something here>}'
hello world
my world
Any pointers on how I can do this?
Thanks in advance.
awk '{if ($2=="world") {print}}' file
Output:
hello world
my world
First off, since you are writing a single if statement, you can use the awk 'filter{commands;}' pattern, like so
awk -F " " '$2=="world" { print <Something here> }'
To print the entire line you can use print $0
awk -F " " '$2=="world"{print $0}' file
which can be written as
awk -F " " '$2=="world"{print}' file
But {print} is the default action, so it can be omitted after the filter like this:
awk -F " " '$2=="world"' file
Or even without the -F option, since the space is the default FS value
awk '$2=="world"' file
If you want / have to use awk to solve your problem:
awk '$0~/world/' file.txt
If a line (i.e., $0) matches the string "world" (i.e., ~/world/) the entire line is printed
If you only want to check the second column for world:
awk '$2 == "world"' file.txt

Find and update(append) csv with shell script

Input file:
ID,Name,Values
1,A,vA|A2
2,B,VB
Expected output:
1,A,vA|VA2|vA3
2,B,VB
Search file for a given ID and then append a given value in the values {field}
use case : append 'testvalue' to the values filed of ID = 1
Problem is : How tho cache the line found ?
sed's s can be used to substitution, I used sed's p {print but of no use }.
Just set n to ID of the row you want to update and x to the value:
# vA3 to entry with ID==1
$ awk -F, '$1==n{$0=$0"|"x}1' n=1 x="vA3" file
ID,Name,Values
1,A,vA|A2|vA3
2,B,VB
# TEST_VALUE to entry with ID==2
$ awk -F, '$1==x{$0=$0"|"v}1' x=2 v="TEST_VALUE" file
ID,Name,Values
1,A,vA|A2
2,B,VB|TEST_VALUE
Explanation:
-F, sets the field separator to be a comma.
$1==x checks if the line we are looking at contains the ID we want to change. Where $1 is the first field on each line and x is the variable we define.
If the previous condition was true then follow block gets executed {$0=$0"|"v} where $0 is the variable containing the whole line so we are just appending the string "|" and value of the variable v to end of the line.
The trailing 1 is just a shortcut in awk to say print the line. The 1 is the condition for the block which is evaluated to true and since no block is provide awk executes the default block {print $0}. Explicitly the script would be awk -F, '$1==n{$0=$0"|"x}{print $0}' n=1 x="vA3" file.
The following script is doing something similar to Your need. It is in pure bash.
#!/usr/bin/bash
[ $# -ne 2 ] && echo "Arg missing" && exit 1;
while read l; do
[ ${l%%,*} == "$1" ] && l="$l|$2"
echo $l
done <infile
You can use as script <ID> <VALUE>. Example:
$ ./script 1 va3
ID,Name,Values
1,A,vA|A2|va3
2,B,VB
$ cat infile
ID,Name,Values
1,A,vA|A2
2,B,VB
or may be this?
awk '/vA/ { $NF=$NF"|VA2" } 1' FS=, OFS=,
$ echo "1,A,vA
2,B,VB" | awk '/vA/ { $NF=$NF"|VA2" } 1' FS=, OFS=,
1,A,vA|VA2
2,B,VB
Edit 1: awk started supporting in-file substitution recently. But with your requirement it is best to go with sed solution that Kent has posted above.
$ cat file
ID,Name,Values
1,A,vA|A2
2,B,VB
$ awk '$1==1 { $NF=$NF"|vA3" } 1' FS=, OFS=, file
ID,Name,Values
1,A,vA|A2|vA3
2,B,VB
are your looking for this?
kent$ echo "1,A,vA
2,B,VB"|sed '/vA/s/$/|VA2/'
1,A,vA|VA2
2,B,VB
EDIT check the ID, then replace
kent$ echo "ID,Name,Values
1,A,vA|A2
2,B,VB"|sed 's/^1,.*/&|vA3/'
ID,Name,Values
1,A,vA|A2|vA3
2,B,VB
& means the matched part. that would be what you meant "cache"
sed ' 1 a\ |VA2|vA3 ' file1.txt

Resources