How to print the line number where a string appears in a file? - shell

I have a specific word, and I would like to find out what line number in my file that word appears on.
This is happening in a c shell script.
I've been trying to play around with awk to find the line number, but so far I haven't been able to. I want to assign that line number to a variable as well.

Using grep
To look for word in file and print the line number, use the -n option to grep:
grep -n 'word' file
This prints both the line number and the line on which it matches.
Using awk
This will print the number of line on which the word word appears in the file:
awk '/word/{print NR}' file
This will print both the line number and the line on which word appears:
awk '/word/{print NR, $0}' file
You can replace word with any regular expression that you like.
How it works:
/word/
This selects lines containing word.
{print NR}
For the selected lines, this prints the line number (NR means Number of the Record). You can change this to print any information that you are interested in. Thus, {print NR, $0} would print the line number followed by the line itself, $0.
Assigning the line number to a variable
Use command substitution:
n=$(awk '/word/{print NR}' file)
Using shell variables as the pattern
Suppose that the regex that we are looking for is in the shell variable url:
awk -v x="$url" '$0~x {print NR}' file
And:
n=$(awk -v x="$url" '$0~x {print NR}' file)

Sed
You can use the sed command
sed -n '/pattern/=' file
Explanation
The -n suppresses normal output so it doesn't print the actual lines. It first matches the /pattern/, and then the = operator means print the line number. Note that this will print all lines that contains the pattern.

Use the NR Variable
Given a file containing:
foo
bar
baz
use the built-in NR variable to find the line number. For example:
$ awk '/bar/ { print NR }' /tmp/foo
2

find the line number for which the first column match RRBS
awk 'i++ {if($1~/RRBS/) print i}' ../../bak/bak.db

Related

Prepend text to specific line numbers with variables

I have spent hours trying to solve this. There are a bunch of answers as to how to prepend to all lines or specific lines but not with a variable text and a variable number.
while [ $FirstVariable -lt $NextVariable ]; do
#sed -i "$FirstVariables/.*/$FirstVariableText/" "$PWD/Inprocess/$InprocessFile"
cat "$PWD/Inprocess/$InprocessFile" | awk 'NR==${FirstVariable}{print "$FirstVariableText"}1' > "$PWD/Inprocess/Temp$InprocessFile"
FirstVariable=$[$FirstVariable+1]
done
Essentially I am looking for a particular string delimiter and then figuring out where the next one is and appending the first result back into the following lines... Note that I already figured out the logic I am just having issues prepending the line with the variables.
Example:
This >
Line1:
1
2
3
Line2:
1
2
3
Would turn into >
Line1:
Line1:1
Line1:2
Line1:3
Line2:
Line2:1
Line2:2
Line2:3
You can do all that using below awk one liner.
Assuming your pattern starts with Line, then the below script can be used.
> awk '{if ($1 ~ /Line/ ){var=$1;print $0;}else{ if ($1 !="")print var $1}}' $PWD/Inprocess/$InprocessFile
Line1:
Line1:1
Line1:2
Line1:3
Line2:
Line2:1
Line2:2
Line2:3
Here is how the above script works:
If the first record contains word Line then it is copied into an awk variable var. From next word onwards, if the record is not empty, the newly created var is appended to that record and prints it producing the desired result.
If you need to pass the variables dynamically from shell to awk you can use -v option. Like below:
awk -v var1=$FirstVariable -v var2=$FirstVariableText 'NR==var{print var2}1' > "$PWD/Inprocess/Temp$InprocessFile"
The way you addressed the problem is by parsing everything both with bash and awk to process the file. You make use of bash to extract a line, and then use awk to manipulate this one line. The whole thing can actually be done with a single awk script:
awk '/^Line/{str=$1; print; next}{print (NF ? str $0 : "")}' inputfile > outputfile
or
awk 'BEGIN{RS="";ORS="\n\n";FS=OFS="\n"}{gsub(FS,OFS $1)}1' inputfile > outputfile

Use sed or awk to remove lines not adjacent to pattern in bash

I need to delete any line that is not surrounding a ">" symbol.
Here is some sample data:
sample1.fasta
>R00003
ATCATACTACTACG
sample2.fasta
sample3.fasta
sample4.fasta
>R00003
ATACTACGTA
sample7.fasta
>R00003
ATGCATCAT
sample8.fasta
>R00003
AATCATCGACCT
sample9.fasta
sample10.fasta
>R00003
AGCATCTCAGTC
I tried using awk to help reveal the issue:
awk '{/fasta/?f++:f=0} f==2' R3.fasta
This returns:
sample3.fasta
sample10.fasta
This is diagnostic in that it shows where duplicates are present. However, I want to remove the lines that are not flanking the ">" symbol on either side. This does not remove them and only displays the second.
The result I expect is:
sample1.fasta
>R00003
ATCATACTACTACG
sample4.fasta
>R00003
ATACTACGTA
sample7.fasta
>R00003
ATGCATCAT
sample8.fasta
>R00003
AATCATCGACCT
sample10.fasta
>R0003
AGCATCTCAGTC
Where the lines not flaking the ">" symbol have been removed
Seems like the plain grep will suffice for this:
grep '^>' -C1 file | grep -v ^--$
First print one line above and one line below each line starting with > (use context -C1), then just filter out the -- lines that grep inserts to separate each context.
But if you prefer awk:
awk '/^>/{print a ORS $0; getline; print} {a=$0}' file
Keep the previous line in a, and when a line starts with >, print the previous line (in a), the current line, and the next line (which we get with getline).
From the example, it appears that you want all non-fasta lines and, for the fasta lines, you want only the last fasta file before the next >. In that case, try:
$ awk 'f && !/fasta/{print f; f=""} /fasta/{f=$0; next} END{if(f)print f} 1' R3.fasta
sample1.fasta
>R00003
ATCATACTACTACG
sample4.fasta
>R00003
ATACTACGTA
sample7.fasta
>R00003
ATGCATCAT
sample8.fasta
>R00003
AATCATCGACCT
sample10.fasta
>R0003
AGCATCTCAGTC
How it works
f && !/fasta/{print f; f=""}
If the variable f is set and the current line does not contain fasta, then print f and erase its current value.
/fasta/{f=$0; next}
If the current line contains fasta, then save the current line to variable f, slip the rest of the commands, and jump to the next line.
END{if(f)print f}
If f is still set to something after we have reached the end of the file, then print it.
1
For all other lines, print them.
another awk
$ awk '{if (/fasta/) f=$0;
else {if(f) print f; f=""; print}}' file
doesn't depend on number of lines in between.
With sed
sed ':A;/fasta$/!d;N;/\n>/!{s/.*\n//;bA};N' infile
sed '
:A # label for jump
/fasta$/!d # if the line end with fasta not delete
N # add the next line in the pattern space
/\n>/!{ # if this new line don'\''t start with >
s/.*\n// # delete it
bA} # and jump to A
N # else get the next line and print
' infile

How to replace the specific line occurence in file in bash

Hello i need a little help with my personal project i have something like this:
sourceFile:
something,something,something,something,something,someth ing,
something,something,something,something,something,somethi ng,
something,something,something,something,something,someth ing,
I need to write my variable after the last , in specific line (i have different value for every line)
resultFile:
something,something,something,something,something,someth ing,result1
something,something,something,something,something,somethi ng,result2
something,something,something,something,something,someth ing,result3
I used this:
sed -i "$numberOfLine,/,/ s/,/,$actualDeparture/6" $fileName
but the result is:
badResultFile:
something,something,something,something,something,someth ing,result1
something,something,something,something,something,somethi ng,result2result1
something,something,something,something,something,someth ing,result3result2
I don't know why i have result2 and result1 in second line and i'm really
desperate, because i don't know hoiw to fix this.
I would use awk:
awk '{ print $0 "result" NR }' sourceFile
print $0 "result" NR prints each line, then string result, and then each line (record) number (NR)
Example:
% cat file.txt
something,something,something,something,something,something,
something,something,something,something,something,something,
something,something,something,something,something,something,
% awk '{ print $0 "result" NR }' file.txt
something,something,something,something,something,something,result1
something,something,something,something,something,something,result2
something,something,something,something,something,something,result3
With your address range $numberOfLine,/,/ all lines starting from $numberOfLine to next line containing , are processed.
And you don't need to count number of , in your s command, just replace $(end of line) with your variable value.
To process each line individually, try this:
sed -i "$numberOfLine s/$/$actualDeparture/" "$fileName"

How to set FS to eof?

I want to read whole file not per lines. How to change field separator to eof-symbol?
I do:
awk "^[0-9]+∆DD$1[PS].*$" $(ls -tr)
$1 - param (some integer), .* - message that I want to print. There is a problem: message can contains \n. In that way this code prints only first line of file. How can I scan whole file not per lines?
Can I do this using awk, sed, grep? Script must have length <= 60 characters (include spaces).
Assuming you mean record separator, not field separator, with GNU awk you'd do:
gawk -v RS='^$' '{ print "<" $0 ">" }' file
Replace the print with whatever you really want to do and update your question with some sample input and expected output if you want help with that part too.
The portable way to do this, by the way, is to build up the record line by line and then process it in the END section:
awk '{rec = rec (NR>1?RS:"") $0} END{ print "<" rec ">" }' file
using nf = split(rec,flds) to create fields if necessary.

Cut and replace bash

I have to process a file with data organized like this
AAAAA:BB:CCC:EEEE:DDDD
FF:III:JJJ:KK:LLL
MMMM:NN:OOO:PP
etc
Columns can have different length but lines always have the same number of columns.
I want to be able to cut a specific column of a given line and change it to the value I want.
For example I'd apply my command and change the file to
AAAAA:BB:XXXX:EEEE:DDDD
FF:III:JJJ:KK:LLL
MMMM:NN:OOO:PP
I know how to select a specific line with sed and then cut the field but I have no idea on how to replace the field with the value I have.
Thanks
Here's a way to do it with awk:
Going with your example, if you wanted to replace the 3rd field of the 1st line:
awk 'BEGIN{FS=OFS=":"} {if (NR==1) {$3 = "XXXX"}; print}' input_file
Input:
AAAAA:BB:CCC:EEEE:DDDD
FF:III:JJJ:KK:LLL
MMMM:NN:OOO:PP
Output:
AAAAA:BB:XXXX:EEEE:DDDD
FF:III:JJJ:KK:LLL
MMMM:NN:OOO:PP
Explanation:
awk: invoke the awk command
'...': everything enclosed by single-quotes are instructions to awk
BEGIN{FS=OFS=":"}: Use : as delimiters for both input and output. FS stands for Field Separator. OFS stands for Output Field Separator.
if (NR==1) {$3 = "XXXX"};: If Number of Records (NR) read so far is 1, then set the 3rd field ($3) to "XXXX".
print: print the current line
input_file: name of your input file.
If instead what you are trying to accomplish is simply replace all occurrences of CCC with XXXX in your file, simply do:
sed -i 's/CCC/XXXX/g` input_file
Note that this will also replace partial matches, such as ABCCCDD -> ABXXXXDD
This might work for you (GNU sed):
sed -r 's/^(([^:]*:?){2})CCC/\1XXXX/' file
or
awk -F: -vOFS=: '$3=="CCC"{$3="XXXX"};1' file

Resources