sed insert line after a match only once [duplicate] - shell

UPDATED:
Using sed, how can I insert (NOT SUBSTITUTE) a new line on only the first match of keyword for each file.
Currently I have the following but this inserts for every line containing Matched Keyword and I want it to only insert the New Inserted Line for only the first match found in the file:
sed -ie '/Matched Keyword/ i\New Inserted Line' *.*
For example:
Myfile.txt:
Line 1
Line 2
Line 3
This line contains the Matched Keyword and other stuff
Line 4
This line contains the Matched Keyword and other stuff
Line 6
changed to:
Line 1
Line 2
Line 3
New Inserted Line
This line contains the Matched Keyword and other stuff
Line 4
This line contains the Matched Keyword and other stuff
Line 6

You can sort of do this in GNU sed:
sed '0,/Matched Keyword/s//New Inserted Line\n&/'
But it's not portable. Since portability is good, here it is in awk:
awk '/Matched Keyword/ && !x {print "Text line to insert"; x=1} 1' inputFile
Or, if you want to pass a variable to print:
awk -v "var=$var" '/Matched Keyword/ && !x {print var; x=1} 1' inputFile
These both insert the text line before the first occurrence of the keyword, on a line by itself, per your example.
Remember that with both sed and awk, the matched keyword is a regular expression, not just a keyword.
UPDATE:
Since this question is also tagged bash, here's a simple solution that is pure bash and doesn't required sed:
#!/bin/bash
n=0
while read line; do
if [[ "$line" =~ 'Matched Keyword' && $n = 0 ]]; then
echo "New Inserted Line"
n=1
fi
echo "$line"
done
As it stands, this as a pipe. You can easily wrap it in something that acts on files instead.

If you want one with sed*:
sed '0,/Matched Keyword/s//Matched Keyword\nNew Inserted Line/' myfile.txt
*only works with GNU sed

This might work for you:
sed -i -e '/Matched Keyword/{i\New Inserted Line' -e ':a;n;ba}' file
You're nearly there! Just create a loop to read from the Matched Keyword to the end of the file.
After inserting a line, the remainder of the file can be printed out by:
Introducing a loop place holder :a (here a is an arbitrary name).
Print the current line and fetch the next into the pattern space with the ncommand.
Redirect control back using the ba command which is essentially a goto to the a place holder. The end-of-file condition is naturally taken care of by the n command which terminates any further sed commands if it tries to read passed the end-of-file.
With a little help from bash, a true one liner can be achieved:
sed $'/Matched Keyword/{iNew Inserted Line\n:a;n;ba}' file
Alternative:
sed 'x;/./{x;b};x;/Matched Keyword/h;//iNew Inserted Line' file
This uses the Matched Keyword as a flag in the hold space and once it has been set any processing is curtailed by bailing out immediately.

If you want to append a line after first match only, use AWK instead of SED as below
awk '{print} /Matched Keyword/ && !n {print "New Inserted Line"; n++}' myfile.txt
Output:
Line 1
Line 2
Line 3
This line contains the Matched Keyword and other stuff
New Inserted Line
Line 4
This line contains the Matched Keyword and other stuff
Line 6

Related

Replace k-th to n-th characters in 1st line and last line using bash?

I want to replace some characters in header and footer of a file. If say, I want to replace 5th to 9th character how do I do it? I need to use bash or a shell command.
I want to do something like this
s="abcdabcd"
s=s=s[0]+"12"+s[4:]
>a12dabcd
I have a string of exact length I can substitute and the start and end of replacement. I want to put the generated replacement back into the file.
Example:
I have this header:
HEADER 22aabbccdd23aabbccdd
I get these start and end indices : 2,10
I get this string: xyz56789
I want this: HEADER 22xyz5678923aabbccdd
to replace the existing 1st line in the file.
This can be done with Perl:
perl -i -lpe 'if ($. == 1 || eof) { substr($_, 1, 2) = "12" }' input.txt
-i: modify file in place
-l: automatically strip newlines from input and add them back on output
-p: iterate over lines of the input file and print them back out
-e CODE: what to do for each line
First we check whether the current line number ($.) is 1 (i.e. we're processing the first line of the file) or we have reached the end of the file (i.e. the line currently being processed is the last line of the file). If the condition is true, we take the substring of the current line ($_) starting from offset 1 of length 2 and set it to "12".
Simply with sed:
input.txt:
$ cat input.txt
22aabbccdd23aabbccdd
asasdfsdfd234234234234
$ sed -Ei '1 s/(..).{8}/\1xyz56789/' input.txt
Result:
22xyz5678923aabbccdd
asasdfsdfd234234234234

UNIX Search in specific column for user specified code and output entire line

I'm working on a program that searches a medication list and returns a report as requested by the user. So i am trying to search this list for a code that the user inputs and then return the relevant information.
EX. (medcode) (doseage)
commA6314 ifosfamide 30
home5341209 urokinase 6314
When i search the file i only want it to return the line if it finds a match in columns 6-12 (6314 for the first line) but at the moment it will return both lines since the second line also contains 6314. All of the answers i saw used text processing utilities like awk, sed or perl and one of the conditions of the program is not to use any of these utilities.
The programs expected output:
Enter medication code?
6314
See Generic name g/G or Dose d/D?
g
ifosfamide
What i am getting currently:
Enter medication code?
6314
See Generic name g/G or Dose d/D?
g
ifosfamide
urokinase
so it is also displaying information about the second medication because 6314 is also contained in the columns for doseage.
Using bash
To match 6314 but only if it starts in column 6 using just bash, try:
$ while read -r line; do [[ "$line" =~ ^.{5}6314 ]] && echo "$line"; done <infile
commA6314 ifosfamide 30
This reads lines from the file one-by-one. The line is echoed to output only if it matches the regex ^.{5}6314 which requires that 6314 appear starting at the sixth character from the start of the line.
To print just the second word on the line but only if the first word matches your number position six:
$ while read -r code name extra; do [[ "$code" =~ ^.{5}6314 ]] && echo "$name"; done <infile
ifosfamide
Using grep
To match 6314 but only if it starts in column 6, try:
$ grep -E '^.{5}6314' infile
commA6314 ifosfamide 30
Here, ^ specifies the beginning of a line and .{5} matches any five characters. Thus ^.{5}6314 matches 6314 but only if it starts as the sixth character on the line.
Using awk
$ awk '"6314" == substr($0, 6, 4)' infile
commA6314 ifosfamide 30
Here, substr($0, 6, 4) selects four characters from the line starting at the sixth. If this equals 6314, then the line is printed.
Using sed
$ sed -En '/^.{5}6314/p' infile
commA6314 ifosfamide 30
-n tells sed not to print unless we explicitly ask it to. /^.{5}6314/p tells sed to print any line that, starting at the sixth character, matches 6314.
Try this using just bash :
while read -r line; do
[[ ${line%% *} == *6314* ]] && echo "$line"
done < input_file
It search only in the medication column.
explanations
${line%% *}
is a bash parameter expansion, it keep only the first 'word' before the first space

Replace some lines in fasta file with appended text using while loop and if/else statement

I am working with a fasta file and need to add line-specific text to each of the headers. So for example if my file is:
>TER1
AGCATGCTAGCTAGTCGACTCGATCGCATGCTC
>TER2
AGCATGCTAGCTAGACGACTCGATCGCATGCTC
>URC1
AGCATGCTAGCTAGTCGACTCGATCGCATGCTC
>URC2
AGCATGCTACCTAGTCGACTCGATCGCATGCTC
>UCR3
AGCATGCTAGCTAGTCGACTCGATGGCATGCTC
I want a while loop that will read through each line; for those with a > at the start, I want to append |population: plus the first three characters after the >. So line one would be:
>TER1|population:TER
etc.
I can't figure out how to make this work. Here my best attempt so far.
filename="testfasta.fa"
while read -r line
do
if [[ "$line" == ">"* ]]; then
id=$(cut -c2-4<<<"$line")
printf $line"|population:"$id"\n" >>outfile
else
printf $line"\n">>outfile
fi
done <"$filename"
This produces a file with the original headers and following line each on a single line.
Can someone tell me where I'm going wrong? My if and else loop aren't working at all!
Thanks!
You could use a while loop if you really want,
but sed would be simpler:
sed -e 's/^>\(...\).*/&|population:\1/' "$filename"
That is, for lines starting with > (pattern: ^>),
capture the next 3 characters (with \(...\)),
and match the rest of the line (.*),
replace with the line as it was (&),
and the fixed string |population:,
and finally the captured 3 characters (\1).
This will produce for your input:
>TER1|population:TER
AGCATGCTAGCTAGTCGACTCGATCGCATGCTC
>TER2|population:TER
AGCATGCTAGCTAGACGACTCGATCGCATGCTC
>URC1|population:URC
AGCATGCTAGCTAGTCGACTCGATCGCATGCTC
>URC2|population:URC
AGCATGCTACCTAGTCGACTCGATCGCATGCTC
>UCR3|population:UCR
AGCATGCTAGCTAGTCGACTCGATGGCATGCTC
Or you can use this awk, also producing the same output:
awk '{sub(/^>.*/, $0 "|population:" substr($0, 2, 3))}1' "$filename"
You can do this quickly in awk:
awk '$1~/^>/{$1=$1"|population:"substr($1,2,3)}{}1' infile.txt > outfile.txt
$ awk '$1~/^>/{$1=$1"|population:"substr($1,2,3)}{}1' testfile
>TER1|population:TER
AGCATGCTAGCTAGTCGACTCGATCGCATGCTC
>TER2|population:TER
AGCATGCTAGCTAGACGACTCGATCGCATGCTC
>URC1|population:URC
AGCATGCTAGCTAGTCGACTCGATCGCATGCTC
>URC2|population:URC
AGCATGCTACCTAGTCGACTCGATCGCATGCTC
>UCR3|population:UCR
AGCATGCTAGCTAGTCGACTCGATGGCATGCTC
Here awk will:
Test if the record starts with a > The $1 looks at the first field, but $0 for the entire record would work just as well in this case. The ~ will perform a regex test, and ^> means "Starts with >". Making the test: ($1~/^>/)
If so it will set the first field to the output you are looking for (using substr() to get the bits of the string you want. {$1=$1"|population:"substr($1,2,3)}
Finally it will print out the entire record (with the changes if applicable): {}1 which is shorthand for {print $0} or.. print the entire record.

Comment out line, only if previous line contains matching string

Looking for a solution for a bash script using sed or awk to comment out a line, only if the previous line contains a matching string.
For example, a file containing:
...
if [ $V1 -gt 100 ]; then
some specific commands
else
some other specific commands
fi
...
I'd like to comment out the line containing else but ONLY if the previous line contains specific.
I've attempted piping multiple sed commands along with grep commands to no avail.
sed -E '/specific/{n;s/^([[:blank:]]*)else$/\1#else/}'
Output
...
if [ $V1 -gt 100 ]; then
some specific commands
#else
some other commands
fi
...
A retrospection
/specific/ look for the line containing the pattern specific
n add the next line to the pattern space. n auto prints the current pattern space.
Check if the next line is (one_or_more_spaces)else,if yes, substitute the line with a (one_or_more_spaces_found_previously)#else. Remember () is for pattern reuse and \1 is the previously matched pattern reused.
-E enable extended regex
-i is for inplace edit of the actual file
You can use this awk solution:
awk '/specific/{p=NR} NR==p+1{p=0; if (/^[[:blank:]]*else/) $0 = "#" $0} 1' file
if [ $V1 -gt 100 ]; then
some specific commands
#else
some other commands
fi
In this block /specific/p=NR we find specific and store current line # in p
Next block is executed for very next line due to p == NR+1 condition
We rest p=0 and if that line has else at start with optional whitespaces before we just comment it out.

shell: how to read a certain column in a certain line into a variable

I want to extract the first column of the last line of a text file. Instead of output the content of interest in another file and read it in again, can I just use some command to read it into a variable directly?
For exampole, if my file is like this:
...
123 456 789(this is the last line)
What I want is to read 123 into a variable in my shell script. How can I do that?
One approach is to extract the line you want, read its columns into an array, and emit the array element you want.
For the last line:
#!/bin/bash
# ^^^^- not /bin/sh, to enable arrays and process substitution
read -r -a columns < <(tail -n 1 "$filename") # put last line's columns into an array
echo "${columns[0]}" # emit the first column
Alternately, awk is an appropriate tool for the job:
line=2
column=1
var=$(awk -v line="$line" -v col="$column" 'NR == line { print $col }' <"$filename")
echo "Extracted the value: $var"
That said, if you're looking for a line close to the start of a file, it's often faster (in a runtime-performance sense) and easier to stick to shell builtins. For instance, to take the third column of the second line of a file:
{
read -r _ # throw away first line
read -r _ _ value _ # extract third value of second line
} <"$filename"
This works by using _s as placeholders for values you don't want to read.
I guess with "first column", you mean "first word", do you?
If it is guaranteed, that the last line doesn't start with a space, you can do
tail -n 1 YOUR_FILE | cut -d ' ' -f 1
You could also use sed:
$> var=$(sed -nr '$s/(^[^ ]*).*/\1/p' "file.txt")
The -nr tells sed to not output data by default (-n) and use extended regular expressions (-r to avoid needing to escape the paranthesis otherwise you have to write \( \))). The $ is an address that specifies the last line. The regular expression anchors the beginning of the line with the first ^, then matches everything that is not a space [^ ]* and puts that the result into a capture group ( ) and then gets rid of the rest of the line .* by replacing the line with the capture group \1, then print p to print the line.

Resources