Add file content in another file after first match only - bash

Using bash, I have this line of code that adds the content of a temp file into another file, after a specific match:
sed -i "/text_to_match/r ${tmpFile}" ${fileName}
I would like it to add the temp file content only after the FIRST match.
I tried using addresses:
sed -i "0,/text_to_match//text_to_match/r ${tmpFile}" ${fileName}
But it doesn't work, saying that "/" is an unknown command.
I can make addresses work if I use a standard replacement "s/to_replace/with_this/", but I can't make it work with this sed command.
It seems like I can't use addresses if my sed command starts with / instead of a letter.
I'm not stuck with addresses, as long as I can insert the temp file content into another file only once.

You're getting that error because if you have an address range (ADDR1,ADDR2) you can't put another address after it: sed expects a command there and / is not a command.
You'll want to use some braces here:
$ seq 20 > file
$ echo "new content" > tmpFile
$ sed '0,/5/{/5/ r tmpFile
}' file
outputs the new text only after the first line with '5'
1
2
3
4
5
new content
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
I found I needed to put a newline after the filename. I was getting this error otherwise
sed: -e expression #1, char 0: unmatched `{'
It appears that sed takes the whole rest of the line as the filename.
Probably more tidy to write
sed '0,/5/ {
/5/ r tmpFile
}' file
Full transparency: I don't use sed except for very simple tasks. In reality I would use awk for this job
awk '
{print}
!seen && $0 ~ patt {
while (getline line < f) print line
close(f)
seen = 1
}
' patt="5" f=tmpFile file

Glenn Jackman provided with an excellent answer to why the OP's attempt did not work.
In continuation to Glenn Jackman's answer, if you want to have the command on a single line, you should use branching so that the r command is at the end.
Editing commands other than {...}, a, b, c, i, r, t, w, :, and # can be followed by a <semicolon>, optional <blank> characters, and another editing command. However, when an s editing command is used with the w flag, following it with another command in this manner produces undefined results. [source: POSIX sed Standard]
The r,R,w,W commands parse the filename until end of the line. If whitespace, comments or semicolons are found, they will be included in the filename, leading to unexpected results.[source: GNU sed manual]
which gives:
sed -e '1,/pattern/{/pattern/ba};b;:a;r rfile' file

GNU sed also allows s///e to shell out. So there's this one-liner using Glenn's tmpFile and file.
sed '0,/5/{//{p;s/.*/cat tmpFile/e}}' file
// to repeat the previous pattern match (helps if it's longer than /5/)
p to print the matching line
s/.*/cat tmpFile/e to empty the pattern buffer and stick a the cat tmpFile shell command in there and e execute it and dump the output in the stream

You have 2 forward slashes together, right next to each other in the second sed example.

Related

3 Is there anyway to insert new lines in-between two patterns

Is there anyway to insert new lines in-between 2 specific patterns of characters? I want to insert a new line every time "butterfly" occurs in a text file, however I want this new line to be inserted between the "butter" and "fly". For example butter\nfly
I also want to find the length of each line after splitting.
Eg:
if textfile contains:
fgsccgewvdhbejbecbecboubutterflybvdcvhkebcjl
vdjchvhecbihbutterflyglehblejkbedkbutterflyr
Then, I want a result like the following:
29 fgsccgewvdhbejbecbecboubutter
33 flybvdcvhkebcjlvdjchvhecbihbutter
22 flyglehblejkbedkbutter
4 flyr
I believe one way to tackle it would be to insert a new line using "sed" everywhere "butter" occurs and is followed by "fly". Strip out all blank line using grep with a -v flag. Then get the length of each line. However, even after trying a lot, I am unable to get the correct answer.
The Sed 's' sub-command + awk can work together:
sed -e "s/butterfly/butter\\nfly/g" < input.txt | awk '{ print length, $0 }'
This might work for you (GNU sed & bash):
sed -Ez 's/\n//g;s/(butter)(fly)/\1\n\2/g;s/^.*$/l=&;printf "%d %s\n" ${#l} &/meg' file
Slurp the file into memory using the -z sed option. Remove all existing newlines and then insert new ones between butter and fly. Using the m, g and e flags of the sed substitute command, split into separate lines and using bash make a variable l and via printf print the required format.

How to split a text file content by a string?

Suppose I've got a text file that consists of two parts separated by delimiting string ---
aa
bbb
---
cccc
dd
I am writing a bash script to read the file and assign the first part to var part1 and the second part to var part2:
part1= ... # should be aa\nbbb
part2= ... # should be cccc\ndd
How would you suggest write this in bash ?
You can use awk:
foo="$(awk 'NR==1' RS='---\n' ORS='' file.txt)"
bar="$(awk 'NR==2' RS='---\n' ORS='' file.txt)"
This would read the file twice, but handling text files in the shell, i.e. storing their content in variables should generally be limited to small files. Given that your file is small, this shouldn't be a problem.
Note: Depending on your actual task, you may be able to just use awk for the whole thing. Then you don't need to store the content in shell variables, and read the file twice.
A solution using sed:
foo=$(sed '/^---$/q;p' -n file.txt)
bar=$(sed '1,/^---$/b;p' -n file.txt)
The -n command line option tells sed to not print the input lines as it processes them (by default it prints them). sed runs a script for each input line it processes.
The first sed script
/^---$/q;p
contains two commands (separated by ;):
/^---$/q - quit when you reach the line matching the regex ^---$ (a line that contains exactly three dashes);
p - print the current line.
The second sed script
1,/^---$/b;p
contains two commands:
1,/^---$/b - starting with line 1 until the first line matching the regex ^---$ (a line that contains only ---), branch to the end of the script (i.e. skip the second command);
p - print the current line;
Using csplit:
csplit --elide-empty-files --quiet --prefix=foo_bar file.txt "/---/" "{*}" && sed -i '/---/d' foo_bar*
If version of coreutils >= 8.22, --suppress-matched option can be used and sed processing is not required, like
csplit --suppress-matched --elide-empty-files --quiet --prefix=foo_bar file.txt "/---/" "{*}".

Sed/Awk to delete second occurence of string - platform independent

I'm looking for a line in bash that would work on both linux as well as OS X to remove the second line containing the desired string:
Header
1
2
...
Header
10
11
...
Should become
Header
1
2
...
10
11
...
My first attempt was using the deletion option of sed:
sed -i '/^Header.*/d' file.txt
But well, that removes the first occurence as well.
How to delete the matching pattern from given occurrence suggests to use something like this:
sed -i '/^Header.*/{2,$d} file.txt
But on OS X that gives the error
sed: 1: "/^Header.*/{2,$d}": extra characters at the end of d command
Next, i tried substitution, where I know how to use 2,$, and subsequent empty line deletion:
sed -i '2,$s/^Header.*//' file.txt
sed -i '/^\s*$/d' file.txt
This works on Linux, but on OS X, as mentioned here sed command with -i option failing on Mac, but works on Linux , you'd have to use
sed -i '' '2,$s/^Header.*//' file.txt
sed -i '' '/^\s*$/d' file.txt
And this one in return doesn't work on Linux.
My question then, isn't there a simple way to make this work in any Bash? Doesn't have to be sed, but should be as shell independent as possible and i need to modify the file itself.
Since this is file-dependent and not line-dependent, awk can be a better tool.
Just keep a counter on how many times this happened:
awk -v patt="Header" '$0 == patt && ++f==2 {next} 1' file
This skips the line that matches exactly the given pattern and does it for the second time. On the rest of lines, it prints normally.
I would recommend using awk for this:
awk '!/^Header/ || !f++' file
This prints all lines that don't start with "Header". Short-circuit evaluation means that if the left hand side of the || is true, the right hand side isn't evaluated. If the line does start with Header, the second part !f++ is only true once.
$ cat file
baseball
Header and some other stuff
aardvark
Header for the second time and some other stuff
orange
$ awk '!/^Header/ || !f++' file
baseball
Header and some other stuff
aardvark
orange
This might work for you (GNU sed):
sed -i '1b;/^Header/d' file
Ignore the first line and then remove any occurrence of a line beginning with Header.
To remove subsequent occurrences of the first line regardless of the string, use:
sed -ri '1h;1b;G;/^(.*)\n\1$/!P;d' file

Add multiple lines in file using bash script

Using a bash script, I am trying to insert a line in a file (eventually there will be 4 extra lines, one after the other).
I am trying to implement the answer by iiSeymour to the thread:
Insert lines in a file starting from a specific line
which I think is the same comment that dgibbs made in his own thread:
Bash: Inserting a line in a file at a specific location
The line after which I want to insert the new text is very long, so I save it in a variable first:
field1=$(head -2 file847script0.xml | tail -1)
The text I want to insert is:
insert='newtext123'
When running:
sed -i".bak" "s/$field1/$field1\n$insert/" file847script0.xml
I get the error:
sed: 1: "s/<ImageAnnotation xmln ...": bad flag in substitute command: 'c'
I also tried following the thread
sed throws 'bad flag in substitute command'
but the command
sed -i".bak" "s/\/$field1/$field1\n$insert/" file847script0.xml
still gives me the same error:
sed: 1: "s/\/<ImageAnnotation xm ...": bad flag in substitute command: 'c'
I am using a Mac OS X 10.5.
Any idea of what am I doing wrong? Thank you!
Good grief, just use awk. No need to worry about special characters in your replacement text or random single-character commands and punctuation.
In this case it looks like all you need is to print some new text after the 2nd line so that's just:
$ cat file
a
b
c
$ insert='absolutely any text you want, including newlines
slashes (/), backslashes (\\), whatever...'
$ awk -v insert="$insert" '{print} NR==2{print insert}' file
a
b
absolutely any text you want, including newlines
slashes (/), backslashes (\), whatever...
c
Isn't it easier to do it by line number? If you know it's the second line or the nth line (and grep will tell you line numbers if you are pattern matching) then you can simply use sed to find the correct line and then append a new line (or 4 new lines).
cat <<EOF > testfile
one two three
four five six
seven eight nine
EOF
sed -re '2a\hello there' testfile
will output
one two three
four five six
hello there
seven eight nine

'grep +A': print everything after a match [duplicate]

This question already has answers here:
How to get the part of a file after the first line that matches a regular expression
(12 answers)
Closed 7 years ago.
I have a file that contains a list of URLs. It looks like below:
file1:
http://www.google.com
http://www.bing.com
http://www.yahoo.com
http://www.baidu.com
http://www.yandex.com
....
I want to get all the records after: http://www.yahoo.com, results looks like below:
file2:
http://www.baidu.com
http://www.yandex.com
....
I know that I could use grep to find the line number of where yahoo.com lies using
grep -n 'http://www.yahoo.com' file1
3 http://www.yahoo.com
But I don't know how to get the file after line number 3. Also, I know there is a flag in grep -A print the lines after your match. However, you need to specify how many lines you want after the match. I am wondering is there something to get around that issue. Like:
Pseudocode:
grep -n 'http://www.yahoo.com' -A all file1 > file2
I know we could use the line number I got and wc -l to get the number of lines after yahoo.com, however... it feels pretty lame.
AWK
If you don't mind using AWK:
awk '/yahoo/{y=1;next}y' data.txt
This script has two parts:
/yahoo/ { y = 1; next }
y
The first part states that if we encounter a line with yahoo, we set the variable y=1, and then skip that line (the next command will jump to the next line, thus skip any further processing on the current line). Without the next command, the line yahoo will be printed.
The second part is a short hand for:
y != 0 { print }
Which means, for each line, if variable y is non-zero, we print that line. In AWK, if you refer to a variable, that variable will be created and is either zero or empty string, depending on context. Before encounter yahoo, variable y is 0, so the script does not print anything. After encounter yahoo, y is 1, so every line after that will be printed.
Sed
Or, using sed, the following will delete everything up to and including the line with yahoo:
sed '1,/yahoo/d' data.txt
This is much easier done with sed than grep. sed can apply any of its one-letter commands to an inclusive range of lines; the general syntax for this is
START , STOP COMMAND
except without any spaces. START and STOP can each be a number (meaning "line number N", starting from 1); a dollar sign (meaning "the end of the file"), or a regexp enclosed in slashes, meaning "the first line that matches this regexp". (The exact rules are slightly more complicated; the GNU sed manual has more detail.)
So, you can do what you want like so:
sed -n -e '/http:\/\/www\.yahoo\.com/,$p' file1 > file2
The -n means "don't print anything unless specifically told to", and the -e directive means "from the first appearance of a line that matches the regexp /http:\/\/www\.yahoo\.com/ to the end of the file, print."
This will include the line with http://www.yahoo.com/ on it in the output. If you want everything after that point but not that line itself, the easiest way to do that is to invert the operation:
sed -e '1,/http:\/\/www\.yahoo\.com/d' file1 > file2
which means "for line 1 through the first line matching the regexp /http:\/\/www\.yahoo\.com/, delete the line" (and then, implicitly, print everything else; note that -n is not used this time).
awk '/yahoo/ ? c++ : c' file1
Or golfed
awk '/yahoo/?c++:c' file1
Result
http://www.baidu.com
http://www.yandex.com
This is most easily done in Perl:
perl -ne 'print unless 1 .. m(http://www\.yahoo\.com)' file
In other words, print all lines that aren’t between line 1 and the first occurrence of that pattern.
Using this script:
# Get index of the "yahoo" word
index=`grep -n "yahoo" filepath | cut -d':' -f1`
# Get the total number of lines in the file
totallines=`wc -l filepath | cut -d' ' -f1`
# Subtract totallines with index
result=`expr $total - $index`
# Gives the desired output
grep -A $result "yahoo" filepath

Resources