How to alter number of columns [with awk] only if a string is in the 1st column of the line while printing changed line and whole text - bash

I want to replace the number of columns, only use 1st and last one for each line containing a >.
But then I want to print the whole file again, with the changed lines like this.
>TRF [name1]
AAAAAAAAAAAAAAAAAAAAAAAAAAATTGGA
ATGGGGGGGGGGGGGGGGGGGGGGGGGC
I have tried with this code but it only returns the changed lines. Thanks.
awk '$1 ~ />/ { print $1" "$NF}' file

You can use:
awk '$1 ~ />/ { $0 = $1 " " $NF} 1' file
Default action 1 in the end will print all lines from input.

Related

awk: select first column and value in column after matching word

I have a .csv where each row corresponds to a person (first column) and attributes with values that are available for that person. I want to extract the names and values a particular attribute for persons where the attribute is available. The doc is structured as follows:
name,attribute1,value1,attribute2,value2,attribute3,value3
joe,height,5.2,weight,178,hair,
james,,,,,,
jesse,weight,165,height,5.3,hair,brown
jerome,hair,black,breakfast,donuts,height,6.8
I want a file that looks like this:
name,attribute,value
joe,height,5.2
jesse,height,5.3
jerome,height,6.8
Using this earlier post, I've tried a few different awk methods but am still having trouble getting both the first column and then whatever column has the desired value for the attribute (say height). For example the following returns everything.
awk -F "height," '{print $1 "," FS$2}' file.csv
I could grep only the rows with height in them, but I'd prefer to do everything in a single line if I can.
You may use this awk:
cat attrib.awk
BEGIN {
FS=OFS=","
print "name,attribute,value"
}
NR > 1 && match($0, k "[^,]+") {
print $1, substr($0, RSTART+1, RLENGTH-1)
}
# then run it as
awk -v k=',height,' -f attrib.awk file
name,attribute,value
joe,height,5.2
jesse,height,5.3
jerome,height,6.8
# or this one
awk -v k=',weight,' -f attrib.awk file
name,attribute,value
joe,weight,178
jesse,weight,165
With your shown samples please try following awk code. Written and tested in GNU awk. Simple explanation would be, using GNU awk and setting RS(record separator) to ^[^,]*,height,[^,]* and then printing RT as per requirement to get expected output.
awk -v RS='^[^,]*,height,[^,]*' 'RT{print RT}' Input_file
I'd suggest a sed one-liner:
sed -n 's/^\([^,]*\).*\(,height,[^,]*\).*/\1\2/p' file.csv
One awk idea:
awk -v attr="height" '
BEGIN { FS=OFS="," }
FNR==1 { print "name", "attribute", "value"; next }
{ for (i=2;i<=NF;i+=2) # loop through even-numbered fields
if ($i == attr) { # if field value is an exact match to the "attr" variable then ...
print $1,$i,$(i+1) # print current name, current field and next field to stdout
next # no need to check rest of current line; skip to next input line
}
}
' file.csv
NOTE: this assumes the input value (height in this example) will match exactly (including same capitalization) with a field in the file
This generates:
name,attribute,value
joe,height,5.2
jesse,height,5.3
jerome,height,6.8
With a perl one-liner:
$ perl -lne '
print "name,attribute,value" if $.==1;
print "$1,$2" if /^(\w+).*(height,\d+\.\d+)/
' file
output
name,attribute,value
joe,height,5.2
jesse,height,5.3
jerome,height,6.8
awk accepts variable-value arguments following a -v flag before the script. Thus, the name of the required attribute can be passed into an awk script using the general pattern:
awk -v attr=attribute1 ' {} ' file.csv
Inside the script, the value of the passed variable is reference by the variable name, in this case attr.
Your criteria are to print column 1, the first column containing the name, the column corresponding to the required header value, and the column immediately after that column (holding the matched values).
Thus, the following script allows you to fish out the column headed "attribute1" and it's next neighbour:
awk -v attr=attribute1 ' BEGIN {FS=","} /attr/{for (i=1;i<=NF;i++) if($i == attr) col=i;} {print $1","$col","$(col+1)} ' data.txt
result:
name,attribute1,value1
joe,height,5.2
james,,
jesse,weight,165
jerome,hair,black
another column (attribute 3):
awk -v attr=attribute3 ' BEGIN {FS=","} /attr/{for (i=1;i<=NF;i++) if($i == attr) col=i;} {print $1","$col","$(col+1)} ' awkNames.txt
result:
name,attribute3,value3
joe,hair,
james,,
jesse,hair,brown
jerome,height,6.8
Just change the value of the -v attr= argument for the required column.

convert white space to tab on first line of a tab delimited file

I have multiple tab delimited files with the same column headers. However, the headers (1st row of the files) are delimited by white spaces instead of tabs. How can I convert the white space to tab on first line of a tab delimited file?
You can use sed for one line only:
sed -i.bak $'1s/ /\t/g' file.csv
Sounds like you can use awk:
awk -v OFS='\t' 'NR == 1 { $1 = $1 } 1' file
Assigning the first field of the first line $1 to itself causes awk to reformat the line, inserting the output field separator OFS (defined as a tab character). 1 is the shortest true condition, so awk does the default: { print } for every line.
To overwrite "in-place", use a temp file:
awk -v OFS='\t' 'NR == 1 { $1 = $1 } 1' file > tmp && mv tmp file
Note that this will interpret any number of spaces as a single field separator.

How to replace the empty place with next line content in shell script

1,n1,abcd,1234
2,n2,abrt,5666
,h2,yyyy,123x
3,h2,yyyy,123y
3,h2,yyyy,1234
,k1,yyyy,5234
4,22,yyyy,5234
the above given is my input file abc.txt , all I want the missing first column value should fill with next row first value.
example:
3,h2,yyyy,123x
3,h2,yyyy,123y
I want output like below,
1,n1,abcd,1234
2,n2,abrt,5666
3,h2,yyyy,123x// the missing first column value 3 should fill with second row first value
3,h2,yyyy,123y
3,h2,yyyy,1234
4,k1,yyyy,5234
4,22,yyyy,5234
How to implement this with help of AWK or some other alternate in shell script,please help.
Using awk you can do:
awk -F, '$1 ~ /^ *$/ {
p=p RS $0
next
}
p!="" {
gsub(RS " +", RS $1, p)
sub("^" RS, "", p)
print p
p=""
} 1' file
1,n1,abcd,1234
2,n2,abrt,5666
3,h2,yyyy,123x
3,h2,yyyy,123y
3,h2,yyyy,1234
4,k1,yyyy,5234
4,22,yyyy,5234
I would reverse the file, and then replace the value from the previous line:
tac filename | awk -F, '$1 ~ /^[[:blank:]]*$/ {$1 = prev} {print; prev=$1}' | tac
This will also fill in missing values on multiple lines.
With GNU sed:
$ sed '/^ ,/{N;s/ \(.*\n\)\([^,]*\)\(.*\)/\2\1\2\3/}' infile
1,n1,abcd,1234
2,n2,abrt,5666
3,h2,yyyy,123x
3,h2,yyyy,123y
3,h2,yyyy,1234
4,k1,yyyy,5234
4,22,yyyy,5234
The sed command does the following:
/^ ,/ { # If the line starts with 'space comma'
N # Append the next line
# Extract the value before the comma, prepend to first line
s/ \(.*\n\)\([^,]*\)\(.*\)/\2\1\2\3/
}
BSD sed would require an extra semicolon before the closing brace.
This only works with non-contiguous lines with missing values.

join all lines that have the same first column to the same line

IE:
File:
1234:abcd
1234:930
1234:999999
194:keee
194:284
194:222222
Result:
1234:abcd:930:999999
194:kee:284:222222
I have exhausted my brain to the best of my knowledge and can't come up with a way. Sorry to bother you guys!
$ awk -F: '$1==last {printf ":%s",$2; next} NR>1 {print "";} {last=$1; printf "%s",$0;} END{print "";}' file
1234:abcd:930:999999
194:keee:284:222222
How it works
-F:
This tells awk to use a : as the field separator.
$1==last {printf ":%s",$2; next}
If the first field of this line is the same as the first field of the last line, print a colon followed by field 2. Then, skip the rest of the commands and start over with the next line.
NR>1 {print "";}
If we get here, that means that this line has a new not-seen-before value of the first field. If this not the first line, we finish the last line by printing a newline character.
{last=$1; printf "%s",$0;}
Update the variable last with the new value of field 1. Then, print this line.
END{print "";}
After we reach the end of the file, print one last newline character.
Combining non-consecutive lines
Consider this test file:
$ cat testfile2
3:abcd
4:abcd
10:123
3:999
4:999
10:123
Apply this awk script:
$ awk -F: '{a[$1]=a[$1]":"$2;} END{for (x in a) print x ":" substr(a[x],2);}' testfile2
3:abcd:999
4:abcd:999
10:123:123
In this approach, the lines will not necessarily come out in any particular order. If order is important, you may want to pipe this output to sort.

search a keyword in file and replace line next to it

I have a file which requires modification via a shell script.
Need to do the following:
1. search for a keyword in the file.
2. replace the next line to this keyword with my supplied line text.
for e.g., my file has the following text:
(some text)
(some text)
(text_to_search)
(text_to_replace)
(some text)
(some text)
(some text)
I need to search the file for and rewrite the file replace the line leaving the remaining content untouched.
How can this be done?
Regards
awk ' zap==1 {print "my new line goes here"; zap=0; next}
/my pattern/ {zap=1; print $0; next}
{print $0} ' infile > newfile
Assuming I got what you wanted....
var="this is the line of text to insert"
awk -v lin="$var" ' zap==1 {print lin ; zap=0; next}
/my pattern/ {zap=1; print $0; next}
{print $0} ' infile > newfile
the awk internal variable lin, defined: -v lin="$var" is named lin, lin comes from the external bash variable var.
sed -i '/pattern/r/dev/stdin' file
the script will wait for us to enter the newline and press ctrl +d interactively +
cat file
first
second
third
fourth
fiveth
set var = "inserted"
sed '/second/a\'$var file
first
second
inserted
third
fourth
fiveth

Resources