I have a big file with content separated by headers.
How would I go about substituting the text between the line after a particular header and the line before a specific string (non-inclusive) in the file via sed or awk?
Example: I want to replace the text between the line after header1 and the line beforestring_near_end_of_file (non-inclusive) with EXAMPLE in the file below...
...
header1
#######
content1
header2
#######
content2
...
string_near_end_of_file
...
Intended outcome:
...
header1
#######
EXAMPLE
string_near_end_of_file
...
I know that sed '/pattern1/,/.. pattern2/c\substition_string' file substitutes substition_string between pattern1 and pattern2 and that sed -n '/pattern/{n;p}' file prints the line after the pattern match but I'm not sure now to put this together to achieve my stated goal.
Any help would be greatly appreciated!
You can use this awk command:
awk '/string_near_end_of_file/{p=0} !p; $1=="header1"{p=1; print "######\nEXAMPLE\n"}' file
Output:
...
header1
######
EXAMPLE
string_near_end_of_file
...
GNU sed:
sed -e '/header1/,/string_near/{/header1/{n;s/$/\nEXAMPLE\n/;n};/string_near/!d}' file
In the range:
/header1/{n; - skip the header1 line
s/$/\nEXAMPLE\n/; - replace the ### end of line with \nEXAMPLE\n
n} leave the ### and EXAMPLE lines alone
/string_near/!d; - delete other lines except string_near
sed sucks for requirements to repeat the /START/,/END/ patterns within the block. You can sometimes use // to mean the last regex, but any nested regexes limit its use in /START/,/END/ ranges.
Note that some sed implementations require semicolons before closing curly braces.
Note also that I use replacement of the line end s/$/\nEXAMPLE\n/ because it is much easier to inline at the command line then it is to use sed's c a or i commands, which require a newline in the command string.
Awk solution (assuming that header line is always followed by ####### line):
awk -v repl="EXAMPLE" '!f && /header/{ f=1; n=NR; getline nl; r=$0 ORS nl }
/string_near_end_of_file/{ f=0; print r ORS repl ORS ORS $0; next }
f && NR>n{ next }1' file
The output:
...
header1
#######
EXAMPLE
string_near_end_of_file
...
Related
I have a text file with every other line ending with a % character. I want to find the pattern "% + newline" and replace it with "%". In other words, I want to delete the newline character right after the % and not the other newline characters.
For example, I want to change the following:
abcabcabcabc%
123456789123
abcabcabcabc%
123456789123
to
abcabcabcabc%123456789123
abcabcabcabc%123456789123
I've tried the following sed command, to no avail.
sed 's/%\n/%/g' < input.txt > output.txt
By default sed can't remove newlines because it reads one newline-separated line at a time.
With any awk in any shell on every UNIX box for any number of lines ending in %, consecutive or not:
$ awk '{printf "%s%s", $0, (/%$/ ? "" : ORS)}' file
abcabcabcabc%123456789123
abcabcabcabc%123456789123
and with consecutive % lines:
$ cat file
now is the%
winter of%
our%
discontent
$ awk '{printf "%s%s", $0, (/%$/ ? "" : ORS)}' file
now is the%winter of%our%discontent
Your data sample imply that there are no several consecutive lines ending with %.
In that case, you may use
sed '/%$/{N;s/\n//}' file.txt > output.txt
It works as follows:
/%$/ - finds all lines ending with %
{N;s/\n//} - a block:
N - adds a newline to the pattern space, then appends the next line of input to the pattern space
s/\n// - removes a newline in the current pattern space.
See the online sed demo.
In portable sed that supports any number of continued lines:
parse.sed
:a # A goto label named 'a'
/%$/ { # When the last line ends in '%'
N # Append the next line
s/\n// # Remove new-line
ta # If new-line was replaced goto label 'a'
}
Run it like this:
sed -f parse.sed infile
Output when infile contains your input and the input from Ed Morton's answer:
abcabcabcabc%123456789123
abcabcabcabc%123456789123
now is the%winter of%our%discontent
I have a file that looks like this:
ABCDEFGH
ABCDEFGH
ABC
ABCDEFGH
ABCDEFGH
ABCD
ABCDEFGH
Most of the lines have a fixed length of 8. But there are some lines in between that have a length less than 8. I need a simple line of code that appends each of those short lines to its previous line.
I have tried the following code but it takes lots of memory when working with large files.
cat FILENAME | awk 'BEGIN{OFS=FS="\t"}{print length($1), $1}' | tr
'\n' '\t' | sed 's/8/\n/g' | awk 'BEGIN{OFS="";FS="\t"}{print $2, $4}'
The output I expect:
ABCDEFGH
ABCDEFGHABC
ABCDEFGH
ABCDEFGHABCD
ABCDEFGH
If perl is your option, please try:
perl -0777 -pe 's/(\n)(.{1,7})$/\2/mg' filename
-0777 option tells perl to slurp all lines.
The pattern (\n)(.{1,7}) matches to a line with length less than 8, assigning \1 to a newline and \2 to the string.
The replacement \2 does not contain the preceding newline and is appended to the previous line.
sed <FILENAME 'N;/\n.\{8\}/!s/\n//;P;D'
N; - append next line to pattern space
/\n.\{8\}/ - does second line contain 8 characters?
!s/\n//; - no: join the two lines
P - print first line of pattern space
D - delete first line of pattern space, start next cycle
Default print without \n and append it to the last line when the current line has length 8.
The first and last line are special.
awk 'NR==1 {printf $0;next}
length($0)==8 {printf "\n"}
{printf("%s",$0)}
END { printf "\n" }' FILENAME
When you have GNU sed 4.2 (support -z option), you can try
EDIT (see comments): the inferiour
sed -rz 's/\n(.{0,7})\n/\1\n/g' FILENAME
If you like old traditional tools, you can use ed, the standard text editor:
printf '%s\n' 'g/^.\{,7\}$/-,.j' wq | ed -s filename
Suppose I have an input file with lines of text:
line 1
line 2
line 3
line 4
line 2
now suppose I would like to check if my inputfile contains
line 2
line 3
and remove that block of text if it is found. This would give:
line 1
line 4
line 2
Note that I don't want to remove just every occurrence of line 2 or line 3; but only if they are found one after another. (In reality I want to check for a block of 5 lines, and not just any block of code between two placeholders, but let's keep the example simple).
I looked into awk but that is getting complicated very quick (I'm not yet ready with this; since I feel this is not the right approach and will explode with 5 lines...)
awk '/line 2/ {if (line0) {print line0; line0=""}; line0=$0}' input.txt
One way with GNU awk for multi-char RS and RT:
$ awk -v RS='(^|\n)line 2\nline 3\n' '{ORS=(RT ~ /^\n/ ? "\n" : "")} 1' file
line 1
line 4
line 2
With any awk:
$ cat file
line 2
line 3
line 1
line 2
line 3
line 4
line 2
line 3
$ awk '
{ rec = rec $0 RS }
END {
rec = RS rec
gsub(/\nline 2\nline 3\n/,RS,rec)
gsub(/^\n|\n$/,"",rec)
print rec
}
' file
line 1
line 4
The above assumes you want to match using regexps since that's what your posted code does. If you want to do literal string matches instead that's do-able too with some massaging:
$ cat tst.awk
{ rec = rec $0 RS }
END {
while ( beg = index(RS rec,RS block RS) ) {
out = out substr(RS rec,1,beg-1)
rec = substr(RS rec,beg+length(block)+2)
}
print substr(out rec,2)
}
$ awk -v block='line 2\nline 3' -f tst.awk file
line 1
line 4
Not awk, but this is straightforward with Perl 5, as #triplee pointed out. With the five-line input file you showed above as foo.txt:
perl -0777 -pe 's{^line 2\nline 3\n}{}gm' foo.txt
produces the desired three-line output.
Explanation:
-0777 causes perl to read the entire input as one string (see perlrun).
The /m modifier on the regex causes ^ to match at the beginning of a line (see perlre).
Edit ^ will also match at the beginning of the file, so you can detect blocks of lines even if there is not a newline before them.
The separators between the lines are literal \ns because $ matches before the \n with the /m modifier. Therefore, it's easier just to match the \n.
Thanks to this U&L SE answer by Stéphane Chazelas for the basics.
With gnu sed
sed -z 's/line 2\nline 3\n//g;s/line 2\nline 3\n$//' infile
This might work for you (GNU sed):
sed '/^line 2$/!b;N;/^line 3$/Md;P;D' file
If a line does not match the string line 2, print it and begin the next cycle. Otherwise, append the following line and if that does match the string line 3, delete both lines. Otherwise, print then delete the first line and repeat.
Line 1 of a csv file has the values separated by a comma like so:
word1,word2,word3,word4,word5
but needs to be wrapped with quotations like below:
"word1","word2","word3","word4","word5"
I would like a command to address line 1 only and leave the rest of the file alone.
Consider this test file:
$ cat file.csv
word1,word2,word3,word4,word5
12345,12346,12347,12348,12349
To put quotes around the items in the first line only:
$ sed '1 { s/^/"/; s/,/","/g; s/$/"/ }' file.csv
"word1","word2","word3","word4","word5"
12345,12346,12347,12348,12349
How it works
1 { ... }
This tells sed to perform the commands in braces only on line 1.
s/^/"/
This puts a quote at the start of the line.
s/,/","/g
This replaces each comma with quote-comma-quote.
s/$/"/
This puts a quote at the end of the line.
awk alternative approach:
awk -F, 'NR==1{ gsub(/[^,]+/,"\"&\"",$0) }1' file
NR==1 - consider only the 1st record
Here is example file:
somestuff...
all: thing otherthing
some other stuff
What I want to do is to add to the line that starts with all: like this:
somestuff...
all: thing otherthing anotherthing
some other stuff
This works for me
sed '/^all:/ s/$/ anotherthing/' file
The first part is a pattern to find and the second part is an ordinary sed's substitution using $ for the end of a line.
If you want to change the file during the process, use -i option
sed -i '/^all:/ s/$/ anotherthing/' file
Or you can redirect it to another file
sed '/^all:/ s/$/ anotherthing/' file > output
You can append the text to $0 in awk if it matches the condition:
awk '/^all:/ {$0=$0" anotherthing"} 1' file
Explanation
/patt/ {...} if the line matches the pattern given by patt, then perform the actions described within {}.
In this case: /^all:/ {$0=$0" anotherthing"} if the line starts (represented by ^) with all:, then append anotherthing to the line.
1 as a true condition, triggers the default action of awk: print the current line (print $0). This will happen always, so it will either print the original line or the modified one.
Test
For your given input it returns:
somestuff...
all: thing otherthing anotherthing
some other stuff
Note you could also provide the text to append in a variable:
$ awk -v mytext=" EXTRA TEXT" '/^all:/ {$0=$0mytext} 1' file
somestuff...
all: thing otherthing EXTRA TEXT
some other stuff
This should work for you
sed -e 's_^all: .*_& anotherthing_'
Using s command (substitute) you can search for a line which satisfies a regular expression. In the command above, & stands for the matched string.
Here is another simple solution using sed.
$ sed -i 's/all.*/& anotherthing/g' filename.txt
Explanation:
all.* means all lines started with 'all'.
& represent the match (ie the complete line that starts with 'all')
then sed replace the former with the later and appends the ' anotherthing' word
In bash:
while read -r line ; do
[[ $line == all:* ]] && line+=" anotherthing"
echo "$line"
done < filename
Solution with awk:
awk '{if ($1 ~ /^all/) print $0, "anotherthing"; else print $0}' file
Simply: if the row starts with all print the row plus "anotherthing", else print just the row.