Sed replace regex with regex - bash

Given the following file.txt:
this is line 1
# this is line 2
this is line 3
I would like to use sed to replace the lines with # at the beginning with \e[31m${lineContent}\e[0m. This will color that particular line. Additionally, I need the color \e[31m to be in a variable, color. (The desired output of this example would be having the second line colored). I have the following:
function colorLine() {
cat file.txt | sed ''/"$1"/s//"$(printf \e[31m $1 \e[0m)"/g''
}
colorLine "#.*"
The variable color is not included in what I have so far, as I am not sure how to go about that part.
The output of this is (with the second line being red):
this is line 1
#.*
this is line 3
It is apparently interpreting the replace string literally. My question is how do I use the matched line to generate the replace string?
I understand that I could do something much easier like appending \e[31m to the beginning of all lines that start with #, but it is important to use sed with the regexes.

colorLine() {
sed "/$1/s//"$'\e[31m&\e[0m/' file.txt
}
colorLine "#.*"
Multiple fixes, but it uses $1 to identify the pattern from the arguments to the function, and then uses ANSI-C quoting to encode the escape sequences — and fixes the color reset sequence which was (originally) missing the [ after the escape sequence. It also avoids the charge of "UUoC — Useless Use of cat".
The fixed file name is not exactly desirable, but fixing it is left as an exercise for the reader.
What if I needed \e[31m to be a variable, $color. How do I change the quoting?
I have a colour-diff script which contains (in Perl notation — I've translated it to Bash notation using ANSI C quoting as before):
reset=$'\e[0m'
black=$'\e[30;1m' # 30 = Black, 1 = bold
red=$'\e[31;1m' # 31 = Red, 1 = bold
green=$'\e[32;1m' # 32 = Green, 1 = bold
yellow=$'\e[33;1m' # 33 = Yellow, 1 = bold
blue=$'\e[34;1m' # 34 = Blue, 1 = bold
magenta=$'\e[35;1m' # 35 = Magenta, 1 = bold
cyan=$'\e[36;1m' # 36 = Cyan, 1 = bold
white=$'\e[37;1m' # 37 = White, 1 = bold
With those variables around, you can create your function as you wish:
colorLine() {
sed "/$1/s//$blue&$reset/“ file.txt
}
Where you set those variables depends on where you define your function. For myself, I'd probably make a script rather than a function, with full-scale argument parsing, and go from there. YMMV
Take a look at List of ANSI color escape sequences to get a more comprehensive list of colours (and other effects — including background and foreground colours) and the escape sequence used to generate it.

With GNU sed and Kubuntu 16.04.
foo="#.*"
sed 's/'"$foo"'/\x1b[31m&\x1b[0m/' file

I'd trick grep to do it for me this way:
function colorLine() {
GREP_COLORS="mt=31" grep --color=always --context=$(wc -l <file.txt) --no-filename "$1" file.txt
}
Split-out of the trick:
GREP_COLORS="mt=31": SGR substring for matching non-empty text in any matching line. Here will generate \e[31m red before matched string, and reset to default color after matched string.
--color=always: always colorise even in non interactive shell
context=$(wc -l <file.txt): output as much context lines as number of lines in the file (so all lines).
--no-filename: do not print the file name

An awk version
black='\033[30;1m'
red='\033[31;1m'
green='\033[32;1m'
yellow='\033[33;1m'
blue='\033[34;1m'
magenta='\033[35;1m'
cyan='\033[36;1m'
white='\033[37;1m'
color=$cyan
colorLine() { awk -v s="$1" -v c=$color '$0~s {$0=c$0"\033[0m"}1' file.txt; }
colorLine "#.*"
You can add file as a variable as vell:
color=$cyan
file="file.txt"
colorLine() { awk -v s="$1" -v c=$color '$0~s {$0=c$0"\033[0m"}1' $file; }
colorLine "#.*"
In awk \e is printed as \033
A more dynamic version:
colorLine() {
temp=$2;
col=${!temp};
awk -v s="$1" -v c=$col '$0~s {$0=c$0"\033[0m"}1' $3; }
colorLine "#.*" green file.txt
Then you have colorLine pattern color file

Related

How to replace a whole line (between 2 words) using sed?

Suppose I have text as:
This is a sample text.
I have 2 sentences.
text is present there.
I need to replace whole text between two 'text' words. The required solution should be
This is a sample text.
I have new sentences.
text is present there.
I tried using the below command but its not working:
sed -i 's/text.*?text/text\
\nI have new sentence/g' file.txt
With your shown samples please try following. sed doesn't support lazy matching in regex. With awk's RS you could do the substitution with your shown samples only. You need to create variable val which has new value in it. Then in awk performing simple substitution operation will so the rest to get your expected output.
awk -v val="your_new_line_Value" -v RS="" '
{
sub(/text\.\n*[^\n]*\n*text/,"text.\n"val"\ntext")
}
1
' Input_file
Above code will print output on terminal, once you are Happy with results of above and want to save output into Input_file itself then try following code.
awk -v val="your_new_line_Value" -v RS="" '
{
sub(/text\.\n*[^\n]*\n*text/,"text.\n"val"\ntext")
}
1
' Input_file > temp && mv temp Input_file
You have already solved your problem using awk, but in case anyone else will be looking for a sed solution in the future, here's a sed script that does what you needed. Granted, the script is using some advanced sed features, but that's the fun part of it :)
replace.sed
#!/usr/bin/env sed -nEf
# This pattern determines the start marker for the range of lines where we
# want to perform the substitution. In our case the pattern is any line that
# ends with "text." — the `$` symbol meaning end-of-line.
/text\.$/ {
# [p]rint the start-marker line.
p
# Next, we'll read lines (using `n`) in a loop, so mark this point in
# the script as the beginning of the loop using a label called `loop`.
:loop
# Read the next line.
n
# If the last read line doesn't match the pattern for the end marker,
# just continue looping by [b]ranching to the `:loop` label.
/^text/! {
b loop
}
# If the last read line matches the end marker pattern, then just insert
# the text we want and print the last read line. The net effect is that
# all the previous read lines will be replaced by the inserted text.
/^text/ {
# Insert the replacement text
i\
I have a new sentence.
# [print] the end-marker line
p
}
# Exit the script, so that we don't hit the [p]rint command below.
b
}
# Print all other lines.
p
Usage
$ cat lines.txt
foo
This is a sample text.
I have many sentences.
I have many sentences.
I have many sentences.
I have many sentences.
text is present there.
bar
$
$ ./replace.sed lines.txt
foo
This is a sample text.
I have a new sentence.
text is present there.
bar
Substitue
sed -i 's/I have 2 sentences./I have new sentences./g'
sed -i 's/[A-Z]\s[a-z].*/I have new sentences./g'
Insert
sed -i -e '2iI have new sentences.' -e '2d'
I need to replace whole text between two 'text' words.
If I understand, first text. (with a dot) is at the end of first line and second text at the beginning of third line. With awk you can get the required solution adding values to var s:
awk -v s='\nI have new sentences.\n' '/text.?$/ {s=$0 s;next} /^text/ {s=s $0;print s;s=""}' file
This is a sample text.
I have new sentences.
text is present there.

How to replace text in file between known start and stop positions with a command line utility like sed or awk?

I have been tinkering with this for a while but can't quite figure it out. A sample line within the file looks like this:
"...~236 characters of data...Y YYY. Y...many more characters of data"
How would I use sed or awk to replace spaces with a B character only between positions 236 and 246? In that example string it starts at character 29 and ends at character 39 within the string. I would want to preserve all the text preceding and following the target chunk of data within the line.
For clarification based on the comments, it should be applied to all lines in the file and expected output would be:
"...~236 characters of data...YBBYYY.BBY...many more characters of data"
With GNU awk:
$ awk -v FIELDWIDTHS='29 10 *' -v OFS= '{gsub(/ /, "B", $2)} 1' ip.txt
...~236 characters of data...YBBYYY.BBY...many more characters of data
FIELDWIDTHS='29 10 *' means 29 characters for first field, next 10 characters for second field and the rest for third field. OFS is set to empty, otherwise you'll get space added between the fields.
With perl:
$ perl -pe 's/^.{29}\K.{10}/$&=~tr| |B|r/e' ip.txt
...~236 characters of data...YBBYYY.BBY...many more characters of data
^.{29}\K match and ignore first 29 characters
.{10} match 10 characters
e flag to allow Perl code instead of string in replacement section
$&=~tr| |B|r convert space to B for the matched portion
Use this Perl one-liner with substr and tr. Note that this uses the fact that you can assign to substr, which changes the original string:
perl -lpe 'BEGIN { $from = 29; $to = 39; } (substr $_, ( $from - 1 ), ( $to - $from + 1 ) ) =~ tr/ /B/;' in_file > out_file
To change the file in-place, use:
perl -i.bak -lpe 'BEGIN { $from = 29; $to = 39; } (substr $_, ( $from - 1 ), ( $to - $from + 1 ) ) =~ tr/ /B/;' in_file
The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-p : Loop over the input one line at a time, assigning it to $_ by default. Add print $_ after each loop iteration.
-l : Strip the input line separator ("\n" on *NIX by default) before executing the code in-line, and append it when printing.
-i.bak : Edit input files in-place (overwrite the input file). Before overwriting, save a backup copy of the original file by appending to its name the extension .bak.
I would use GNU AWK following way, for simplicity sake say we have file.txt content
S o m e s t r i n g
and want to change spaces from 5 (inclusive) to 10 (inclusive) position then
awk 'BEGIN{FPAT=".";OFS=""}{for(i=5;i<=10;i+=1)$i=($i==" "?"B":$i);print}' file.txt
output is
S o mBeBsBt r i n g
Explanation: I set field pattern (FPAT) to any single character and output field seperator (OFS) to empty string, thus every field is populated by single characters and I do not get superfluous space when print-ing. I use for loop to access desired fields and for every one I check if it is space, if it is I assign B here otherwise I assign original value, finally I print whole changed line.
Using GNU awk:
awk -v strt=29 -v end=39 '{ ram=substr($0,strt,(end-strt));gsub(" ","B",ram);print substr($0,1,(strt-1)) ram substr($0,(end)) }' file
Explanation:
awk -v strt=29 -v end=39 '{ # Pass the start and end character positions as strt and end respectively
ram=substr($0,strt,(end-strt)); # Extract the 29th to the 39th characters of the line and read into variable ram
gsub(" ","B",ram); # Replace spaces with B in ram
print substr($0,1,(strt-1)) ram substr($0,(end)) # Rebuild the line incorporating raw and printing the result
}'file
This is certainly a suitable task for perl, and saddens me that my perl has become so rusty that this is the best I can come up with at the moment:
perl -e 'local $/=\1;while(<>) { s/ /B/ if $. >= 236 && $. <= 246; print }' input;
Another awk but using FS="":
$ awk 'BEGIN{FS=OFS=""}{for(i=29;i<=39;i++)sub(/ /,"B",$i)}1' file
Output:
"...~236 characters of data...YBBYYY.BBY...many more characters of data"
Explained:
$ awk ' # yes awk yes
BEGIN {
FS=OFS="" # set empty field delimiters
}
{
for(i=29;i<=39;i++) # between desired indexes
sub(/ /,"B",$i) # replace space with B
# if($i==" ") # couldve taken this route, too
# $i="B"
}1' file # implicit output
With sed :
sed '
H
s/\(.\{236\}\)\(.\{11\}\).*/\2/
s/ /B/g
H
g
s/\n//g
s/\(.\{236\}\)\(.\{11\}\)\(.*\)\(.\{11\}\)/\1\4\3/
x
s/.*//
x' infile
When you have an input string without \r, you can use:
sed -r 's/(.{236})(.{10})(.*)/\1\r\2\r\3/;:a;s/(\r.*) (.*\r)/\1B\2/;ta;s/\r//g' input
Explanation:
First put \r around the area that you want to change.
Next introduce a label to jump back to.
Next replace a space between 2 markers.
Repeat until all spaces are replaced.
Remove the markers.
In your case, where the length doesn't change, you can do without the markers.
Replace a space after 236..245 characters and try again when it succeeds.
sed -r ':a; s/^(.{236})([^ ]{0,9}) /\1\2B/;ta' input
This might work for you (GNU sed):
sed -E 's/./&\n/245;s//\n&/236/;h;y/ /B/;H;g;s/\n.*\n(.*)\n.*\n(.*)\n.*/\2\1/' file
Divide the problem into 2 lines, one with spaces and one with B's where there were spaces.
Then using pattern matching make a composite line from the two lines.
N.B. The newline can be used as a delimiter as it is guaranteed not to be in seds pattern space.

sed removing # and ; comments from files up to certain keyword

I have files that need to be removed from comments and white space until keyword . Line number varies . Is it possible to limit multiple continued sed substitutions based on Keyword ?
This removes all comments and white spaces from file :
sed -i -e 's/#.*$//' -e 's/;.*$//' -e '/^$/d' file
For example something like this :
# string1
# string2
some string
; string3
; string4
####
<Keyword_Keep_this_line_and_comments_white_space_after_this>
# More comments that need to be here
; etc.
sed -i '1,/keyword/{/^[#;]/d;/^$/d;}' file
I would suggest using awk and setting a flag when you reach your keyword:
awk '/Keyword/ { stop = 1 } stop || !/^[[:blank:]]*([;#]|$)/' file
Set stop to true when the line contains Keyword. Do the default action (print the line) when stop is true or when the line doesn't match the regex. The regex matches lines whose first non-blank character is a semicolon or hash, or blank lines. It's slightly different to your condition but I think it does what you want.
The command prints to standard output so you should redirect to a new file and then overwrite the original to achieve an "in-place edit":
awk '...' input > tmp && mv tmp input
Use grep -n keyword to get the line number that contains the keyword.
Use sed -i -e '1,N s/#..., when N is the line number that contains the keyword, to only remove comments on the lines 1 to N.

Replace the last character in string

How can I just replace the last character (it's a }) from a string? I need everything before the last character but replace the last character with some new string.
I tried many things with awk and sed but didn't succeed.
For example:
...\\tx4535\\tx5102\\tx5669\\tx6236\\tx6803\\pardirnatural
\\f0
}'
should become:
...\\tx4535\\tx5102\\tx5669\\tx6236\\tx6803\\pardirnatural
\\f0
\\cf2 Its red now
}'
This replaces the last occurrence of:
}
with
\\cf2 Its red now
}
sed would do this:
# replace '}' in the end
echo '\tx4535\tx5102\tx5669\tx6236\tx6803\pardirnatural \f0 }' | sed 's/}$/\\cf2 Its red now}/'
# replace any last character
echo '\tx4535\tx5102\tx5669\tx6236\tx6803\pardirnatural \f0 }' | sed 's/\(.\)$/\\cf2 Its red now\1/'
Replacing the trailing } could be done like this (with $ as the PS1 prompt and > as the PS2 prompt):
$ str="...\\tx4535\\tx5102\\tx5669\\tx6236\\tx6803\\pardirnatural
> \\f0
> }"
$ echo "$str"
...\tx4535\tx5102\tx5669\tx6236\tx6803\pardirnatural
\f0
}
$ echo "${str%\}}\cf2 It's red now
}"
...\tx4535\tx5102\tx5669\tx6236\tx6803\pardirnatural
\f0
\cf2 It's red now
}
$
The first 3 lines assign your string to my variable str. The next 4 lines show what's in the string. The 2 lines:
echo "${str%\}}\cf2 It's red now
}"
contain a (grammar-corrected) substitution of the material you asked for, and the last lines echo the substituted value.
Basically, ${str%tail} removes the string tail from the end of $str; I remember % ends in 't' for tail (and the analogous ${str#head} has hash starting with 'h' for head).
See shell parameter expansion in the Bash manual for the remaining details.
If you don't know the last character, you can use a ? metacharacter to match the end instead:
echo "${str%?}and the extra"
First make a string with newlines
str=$(printf "%s\n%s\n%s" '\\tx4535\\tx5102\\tx5669\\tx6236\\tx6803\\pardirnatural' '\\f0' "}'")
Now you look for the last } in your string and replace it including a newline.
The $ makes sure it will only replace it at the last line, & stands for the matches string.
echo "${str}" |sed '$ s/}[^}]$/\\\\cf2 Its red now\n&/'
The above solution only works when the } is at the last line. It becomes more difficult when you also want to support str2:
str2=$(printf "Extra } here.\n%s\nsome other text" "${str}")
You can not match the } on the last line. Removing the address $ for the last line will result in replacing all } characters (I added a } at the beginning of str2). You only want to replace the last one.
Replacing once is forced with ..../1. Replacing the last and not the first is done by reversing the order of lines with tac. Since you will tac again after the replacement, you need to use a different order in your sedreplacement string.
echo "${str2}" | tac |sed 's/}[^}]$/&\n\\\\cf2 Its red now/1' |tac
In awk:
$ awk ' BEGIN { RS=OFS=FS="" } $NF="\\\\cf2 Its red now\n}"' file
RS="" sets RS to an empty record (change it to suit your needs)
OFS=FS="" separates characters each to its own field
$NF="\\\\cf2 Its red now\n}" replaces the char in the last field ($NF=}) with the quoted text
awk '{sub(/\\f0/,"\\f0\n\\\\\cfs Its red now")}1' file
...\\tx4535\\tx5102\\tx5669\\tx6236\\tx6803\\pardirnatural
\\f0
\\cfs Its red now
}'

Converting traditional line breaks to Markdown double-space newlines

I've just learned how to do real line breaks in Markdown, with two spaces at the end of the line. I have a lot of files that I want to convert to this way of doing things because getting used to it is going to make my life a lot easier when using Markdown tools such as Pandoc.
These files currently look like this:
Roses are red
Violets are blue
Bananas are yellow
Oranges are orange
I'd like to transform paragraphs with more than one line so that the result would look like this:
Roses are red<space><space>
Violets are blue
Bananas are yellow
Oranges are orange
Sadly my linux fu is not up to the task. I have \n end of lines. Here's how I would start it:
for i in \*; do sed -e 's/\n/ /g' "$i"; done
I have absolutely no idea on how to differentiate line breaks followed by empty lines which shouldn't be modified (line 2), from line breaks followed by text which should be modified by sed (line 1). Also, empty lines (line 3) should be ignored. Could someone please help me?
To do this reliably, you need a markdown parser. (I believe the awk-based solutions will insert spaces at the end of lines in code blocks, too, which you don't want.) Using pandoc 1.11.1 or later, you can do this:
pandoc -fmarkdown_strict+hard_line_breaks -t markdown_strict
Note that if you plan to use pandoc as your markdown processor, you can simply leave your files as they are, and use either markdown+hard_line_breaks or markdown_strict+hard_line_breaks as your input format.
$ awk '
{
if (NF) {
head = tail
tail = "<space><space>"
}
else {
head = ""
tail = ""
}
printf "%s%s%s", head, (NR>1?ORS:""), $0
}
END { print "" }
' file
Roses are red<space><space>
Violets are blue
Bananas are yellow
Oranges are orange
Just change tail = "<space><space>" to tail = " ".
change empty lines
do you mean this? I used xx to make it easier to see in output:
kent$ awk '{$0=$0"xx"}7' f
Roses are redxx
Violets are bluexx
xx
Bananas are yellowxx
xx
Oranges are orangexx
so, each "new line" will be replaced with two 'x' with newline. if this is what you are looking for, you could do:
awk '{$0=$0" "}7' file
without changing empty lines
if you want to ignore empty lines (for empty line don't do any substitution):
check this out:
kent$ awk '$0{$0=$0"xx"}7' f
Roses are redxx
Violets are bluexx
Bananas are yellowxx
Oranges are orangexx
so you see above the double x didn't show on empty lines. you could use the command:
awk '$0{$0=$0" "}7' file
EDIT
kent$ awk 'NR==1{p=$0;next}{p=p&&$0?p"xx":p; print p;p=$0}END{print $0}' f
Roses are redxx
Violets are blue
Bananas are yellow
Oranges are orange
check the above one-liner, all empty lines and previous line of empty lines are ignored. the last line of the file is ignored too.
This might work for you (GNU sed):
sed '$!N;/^\s*\n\|\n\s*$/!s/\n/<space><space>&/;P;D file
This keeps 2 lines in the pattern space. If the first or second lines are empty i.e the start or end of a paragraph, prints out the first line unchanged. If however they are not, then it prefixes the newline by the desired string.

Resources