Insert multiple lines and keep their indentation with sed - bash

I have some Tython functions that I want to insert in a file. Inserting multiple lines in itself works well using a variable and some \n, but the indentation isn't kept. Because it's Python code, that's a big issue, the code can't work as it is.
Here is what I tried:
cat sed-insertlines.sh
#!/bin/bash
read -r -d '' lines_to_insert << 'EOF'
def string_cleanup(x, notwanted):\n
for item in notwanted:\n
x = re.sub(item, '', x)\n
return x\n
EOF
lines_to_insert=$(echo ${lines_to_insert} )
sed -i "/import re # Regular Expression library/a $lines_to_insert" sed-insertlines.txt
But here is what I get in the end when I cat sed-insertlines.txt:
#!/bin/python
import re # Regular Expression library
def string_cleanup(x, notwanted):
for item in notwanted:
x = re.sub(item, '', x)
return x
def string_replace(i_string, pattern, newpattern):
string_corrected = re.sub(pattern, newpattern, i_string)
return string_corrected
Lines are there but the indentation is gone.

First, let's get the data cleanly into a shell variable. Here's one way:
lines_to_insert=$(cat<<'EOF'
def string_cleanup(x, notwanted):
for item in notwanted:
x = re.sub(item, '', x)
return x
EOF
)
Note that there are no \n added; you can just use the text you want to insert unmodified with the sole restriction that it can't contain a line consisting of exactly EOF (and if it does, you can change the here-doc delimiter.) Unfortunately, the later use of sed will modify the text by interpreting some backslash-sequences.
The correct syntax for the sed a command would be the following:
sed -i '/^import re/a \
def string_cleanup(x, notwanted):\
for item in notwanted:\
x = re.sub(item, '', x)\
return x
'
(The commonly-seen sed 'a line to insert' is not Posix standard, and does not allow you to put leading spaces on the line. The correct syntax is as shown above; an a followed by whitespace, followed by a continuation marker and a newline.)
Note that every line except the last ends with a continuation marker (a trailing backslash). We could have put those in the text above, but that would defeat the goal of allowing you to use precisely the text you want inserted.
Instead, when we interpolate the shell variable into the sed command, we'll insert the backslashes using the global search-and-replace syntax:
# The following works with bash 4.3 and up
sed -i.bak "/^import re/a \
${lines_to_insert//$'\n'/$'\\\n'}
" sed-insertlines.txt
# Prior to v4.3, quoting worked differently in replacement
# patterns, and there was a bug with `$'...'` quoting. The
# following will work with all bashes I tested (starting with v3.2):
nl=$'\n' bsnl=$'\\\n'
sed -i.bak "/^import re/a \
${lines_to_insert//$nl/$bsnl}
" sed-insertlines.txt
Another solution is to use the mapfile command to read the lines into an array:
mapfile -t lines_to_insert <<'EOF'
def string_cleanup(x, notwanted):
for item in notwanted:
x = re.sub(item, '', x)
return x
EOF
Now we can add the backslashes using printf:
sed -i.bak "/^import re/a \
$(printf '%s\\\n' "${lines_to_insert[#]}")
" sed-insertlines.txt
(The search-and-replace syntax would work on the array as well, but I think the printf command is more readable.)
Unfortunately, that adds an extra newline after the text because all of the lines in the original text were continued. If that's undesired, it could easily be removed in the second solution by inserting the backslash and newline at the beginning of the printf instead of the end, making a slightly less-readable command:
sed -i.bak "/^import re/a $(printf '\\\n%s' "${lines_to_insert[#]}")
" sed-insertlines.txt
Finally, based on a nice answer by Benjamin W, here's a version which uses the sed r command and process substitution (to avoid a temporary file):
sed '/^import re/r '<(cat<<'EOF'
def string_cleanup(x, notwanted):
for item in notwanted:
x = re.sub(item, '', x)
return x
EOF
) sed-insertlines.txt

I would use the sed r command, which inserts the contents of a file after the current cycle:
#!/bin/bash
# Write code to be inserted into 'insertfile' with proper indentation
cat <<'EOF' > insertfile
def string_cleanup(x, notwanted):
for item in notwanted:
x = re.sub(item, '', x)
return x
EOF
# Sed with r command
sed -i '/import re # Regular Expression library/r insertfile' sed-insertlines.txt
# Remove temp file
rm -f insertfile
resulting in
import re # Regular Expression library
def string_cleanup(x, notwanted):
for item in notwanted:
x = re.sub(item, '', x)
return x
def string_replace(i_string, pattern, newpattern):
string_corrected = re.sub(pattern, newpattern, i_string)
return string_corrected

Awk solution for this in case you're interested :
python_file:
#!/bin/python
import re # Regular Expression library
def string_replace(i_string, pattern, newpattern):
string_corrected = re.sub(pattern, newpattern, i_string)
return string_corrected
Our Script
#!/bin/bash
read -rd '' lines_to_insert << 'EOF'
def string_cleanup(x, notwanted):
for item in notwanted:
x = re.sub(item, '', x)
return x
EOF
awk -v from_shell="$lines_to_insert" '
{
if ($0 ~ /import re # Regular Expression library/){
printf "%s\n%s\n",$0,from_shell
}
else{
print $0
}
}' python_file
Output:
#!/bin/python
import re # Regular Expression library
def string_cleanup(x, notwanted):
for item in notwanted:
x = re.sub(item, '', x)
return x
def string_replace(i_string, pattern, newpattern):
string_corrected = re.sub(pattern, newpattern, i_string)
return string_corrected
Note :
I have removed the \ns from the $lines_to_insert.

Related

Use `sed` to replace text in code block with output of command at the top of the code block

I have a markdown file that has snippets of code resembling the following example:
```
$ cat docs/code_sample.sh
#!/usr/bin/env bash
echo "Hello, world"
```
This means there there's a file at the location docs/code_sample.sh, whose contents is:
#!/usr/bin/env bash
echo "Hello, world"
I'd like to parse the markdown file with sed (awk or perl works too) and replace the bottom section of the code snippet with whatever the above bash command evaluates to, for example whatever cat docs/code_sample.sh evaluates to.
Perl to the rescue!
perl -0777 -pe 's/(?<=```\n)^(\$ (.*)\n\n)(?^s:.*?)(?=```)/"$1".qx($2)/meg' < input > output
-0777 slurps the whole file into memory
-p prints the input after processing
s/PATTERN/REPLACEMENT/ works similarly to a substitution in sed
/g replaces globally, i.e. as many times as it can
/m makes ^ match start of each line instead of start of the whole input string
/e evaluates the replacement as code
(?<=```\n) means "preceded by three backquotes and a newline"
(?^s:.*?) changes the behaviour of . to match newlines as well, so it matches (frugally because of the *?) the rest of the preformatted block
(?=```) means "followed by three backquotes`
qx runs the parameter in a shell and returns its output
A sed-only solution is easier if you have the GNU version with an e command.
That said, here's a quick, simplistic, and kinda clumsy version I knocked out that doesn't bother to check the values of previous or following lines - it just assumes your format is good, and bulls through without any looping or anything else. Still, for my example code, it worked.
I started by making an a, a b, and an x that is the markup file.
$: cat a
#! /bin/bash
echo "Hello, World!"
$: cat b
#! /bin/bash
echo "SCREW YOU!!!!"
$: cat x
```
$ cat a
foo
bar
" b a z ! "
```
```
$ cat b
foo
bar
" b a z ! "
```
Then I wrote s which is the sed script.
$: cat s
#! /bin/env bash
sed -En '
/^```$/,/^```$/ {
# for the lines starting with the $ prompt
/^[$] / {
# save the command to the hold space
x
# write the ``` header to the pattern space
s/.*/```/
# print the fabricated header
p
# swap the command back in
x
# the next line should be blank - add it to the current pattern space
N
# first print the line of code as-is with the (assumed) following blank line
p
# scrub the $ (prompt) off the command
s/^[$] //
# execute the command - store the output into the pattern space
e
# print the output
p
# put the markdown footer back
s/.*/```/
# and print that
p
}
# for the (to be discarded) existing lines of "content"
/^[^`$]/d
}
' $*
It does the job and might get you started.
$: s x
```
$ cat a
#! /bin/bash
echo "Hello, World!"
```
```
$ cat b
#! /bin/bash
echo "SCREW YOU!!!!"
```
Lots of caveats - better to actually check that the $ follows a line of backticks and is followed by a blank line, maybe make sure nothing bogus could be in the file to get executed... but this does what you asked, with (GNU) sed.
Good luck.
A rare case when use of getline would be appropriate:
$ cat tst.awk
state == "importing" {
while ( (getline line < $NF) > 0 ) {
print line
}
close($NF)
state = "imported"
}
$0 == "```" { state = (state ? "" : "importing") }
state != "imported" { print }
$ awk -f tst.awk file
See http://awk.freeshell.org/AllAboutGetline for getline uses and caveats.

Sed fails to update long text

Consider test file csf.conf:
CC_DENY = ""
Running the command:
sed -i -E 's/(CC_DENY *= *")[^"]+/\1AR,BE,CL,CN,CO,CS,ES,FR,GR,HK,IT,KO,PA,PE,PH,PL,RS,RU,SG,SK,TH,UA,VN,AE,AF,AL,AS,AZ,BA,BD,BF,BH,BJ,BN,CI,DJ,EG,EH,ER,ET,GM,GN,GW,IQ,IR,IS,JO,KG,KM,KW,KZ,LB,LY,MC,MK,ML,MR,MV,MY,NE,NG,OM,PK,PS,QA,SA,SD,SL,SN,SO,SY,TD,TJ,TM,TN,TR,UZ,XK,YE,YT/g' csf.conf
Does not replace the match inside the file. Output should look like this:
CC_DENY="AR,BE,CL,CN,CO,CS,ES,FR,GR,HK,IT,KO,PA,PE,PH,PL,RS,RU,SG,SK,TH,UA,VN,AE,AF,AL..."
Sed v4.2.2, same result on Debian 8, and Centos 7
This has nothing to do with long text, your regexp just doesn't match the content of your file. Change [^"]+ to [^"]* so it'll match even when there's nothing between the double quotes "". Look:
$ cat csf.conf
CC_DENY = ""
$ sed -E 's/(CC_DENY *= *")[^"]+/\1foo/' csf.conf
CC_DENY = ""
$ sed -E 's/(CC_DENY *= *")[^"]*/\1foo/' csf.conf
CC_DENY = "foo"
wrt the comment below from the OP that this sed command works:
$ cat file
LF_SPI = ""
$ sed -E 's/(LF_SPI *= *\")[^\"]+/\1blah/g' file
LF_SPI = ""
Clearly and predictably, no it does not. It simply can't because the regexp metacharacter + means 1 or more so [^\"]+ states there must be at least one non-" after the " and that just does not exist in the input file. There is no reason to escape the double quotes btw.
Suppose the current variable value in the file is empty. Then your regular expression doesn't match because [^"]+ means "any character, except double quote repeated one or more times".
You might fix it by replacing + quantifier with * (zero or more times). But suppose the value contains a double quote:
CC_DENY = "\""
Then the [^"]* will match everything until it gets to the double quote within the value.
Thus, I suggest the following command:
# Put the variable value here
value='AR,BE\\" ... YE,YT';
sed -i -r 's/^( *CC_DENY *= *").*"/\1'"$value"'"/' csf.conf
Also note, that the expression above uses an anchor for the beginning of the line. Otherwise, it will fail to match as expected, if such a CC_DENY = "... exists in the variable value in the configuration file: CC_DENY = "SOMETHING_CC_DENY = \"value\"".
Sed is certainly the wrong tool for this:
#!/usr/bin/awk -f
BEGIN {
FS = OFS = "\42"
}
$2 = "AR,BE,CL,CN,CO,CS,ES,FR,GR,HK,IT,KO,PA,PE,PH,PL,RS,RU,SG,SK,TH,UA,VN," \
"AE,AF,AL,AS,AZ,BA,BD,BF,BH,BJ,BN,CI,DJ,EG,EH,ER,ET,GM,GN,GW,IQ,IR,IS,JO,KG," \
"KM,KW,KZ,LB,LY,MC,MK,ML,MR,MV,MY,NE,NG,OM,PK,PS,QA,SA,SD,SL,SN,SO,SY,TD,TJ," \
"TM,TN,TR,UZ,XK,YE,YT"

Bash sed deleting lines with words existing in another pattern

I've got console output, sth like:
SECTION/foo
SECTION/fo1
SECTION/fo3
Foo = N
Fo1 = N
Fo2 = N
Fo3 = N
Bar = Y
as an output, I want to have:
Foo = N
Fo1 = N
Fo3 = N
Any (simple) solution?
Thanks in advance!
Using awk you can do:
awk -F' *[/=] *' '$1 == "SECTION" {a[tolower($2)]} tolower($1) in a' file
Foo = N
Fo1 = N
Fo3 = N
Description:
We split each line using custom field separator as ' *[/=] *' which means / or = surrounded with 0 or more spaces on each side.
When first field is SECTION then we store each lowercase column 2 into an array a
Later when lowercase first column is found in array a then we print each line (default action).
Perl to the rescue!
perl -ne ' $h{ ucfirst $1 } = 1 if m(SECTION/(.*));
print if /(.*) = / && $h{$1};
' < input
A hash table is created from lines containing SECTION/. If the line contains = and its left hand side is stored in the hash, it gets printed.
This might work for you (GNU sed):
sed -nr '/SECTION/H;s/.*/&\n&/;G;s/\n.*/\L&/;/\n(.*) .*\n.*\/\1/P' file
Collect all SECTION lines in the hold space (HS). Double the line and delimit by a newline. Append the collected lines from the HS and convert everything from the first newline to the end to lowercase. Using a backreference match the variable to the section suffix and if so print only the first line i.e. the original line unadulterated.
N.B. the -n invokes the grep-like nature of sed and the -r reduces the number of backslashes needed to write a regexp.
awk '$1 ~ /Foo|Fo1|Fo3/' file
Foo = N
Fo1 = N
Fo3 = N

Replacing quotation marks with "``" and "''"

I have a document containing many " marks, but I want to convert it for use in TeX.
TeX uses 2 ` marks for the beginning quote mark, and 2 ' mark for the closing quote mark.
I only want to make changes to these when " appears on a single line in an even number (e.g. there are 2, 4, or 6 "'s on the line). For e.g.
"This line has 2 quotation marks."
--> ``This line has 2 quotation marks.''
"This line," said the spider, "Has 4 quotation marks."
--> ``This line,'' said the spider, ``Has 4 quotation marks.''
"This line," said the spider, must have a problem, because there are 3 quotation marks."
--> (unchanged)
My sentences never break across lines, so there is no need to check on multiple lines.
There are few quotes with single quotes, so I can manually change those.
How can I convert these?
This is my one-liner which is works for me:
awk -F\" '{if((NF-1)%2==0){res=$0;for(i=1;i<NF;i++){to="``";if(i%2==0){to="'\'\''"}res=gensub("\"", to, 1, res)};print res}else{print}}' input.txt >output.txt
And there is long version of this one-liner with comments:
{
FS="\"" # set field separator to double quote
if ((NF-1) % 2 == 0) { # if count of double quotes in line are even number
res = $0 # save original line to res variable
for (i = 1; i < NF; i++) { # for each double quote
to = "``" # replace current occurency of double quote by ``
if (i % 2 == 0) { # if its closes quote replace by ''
to = "''"
}
# replace " by to in res and save result to res
res = gensub("\"", to, 1, res)
}
print res # print resulted line
} else {
print # print original line when nothing to change
}
}
You may run this script by:
awk -f replace-quotes.awk input.txt >output.txt
Here's my one-liner using repeated sed's:
cat file.txt | sed -e 's/"\([^"]*\)"/`\1`/g' | sed '/"/s/`/\"/g' | sed -e 's/`\([^`]*\)`/``\1'\'''\''/g'
(note: it won't work correctly if there are already back-ticks (`) in the file but otherwise should do the trick)
EDIT:
Removed back-tick bug by simplifying, now works for all cases:
cat file.txt | sed -e 's/"\([^"]*\)"/``\1'\'\''/g' | sed '/"/s/``/"/g' | sed '/"/s/'\'\''/"/g'
With comments:
cat file.txt # read file
| sed -e 's/"\([^"]*\)"/``\1'\'\''/g' # initial replace
| sed '/"/s/``/"/g' # revert `` to " on lines with extra "
| sed '/"/s/'\'\''/"/g' # revert '' to " on lines with extra "
Using awk
awk '{n=gsub("\"","&")}!(n%2){while(n--){n%2?Q=q:Q="`";sub("\"",Q Q)}}1' q=\' in
Explanation
awk '{
n=gsub("\"","&") # set n to the number of quotes in the current line
}
!(n%2){ # if there are even number of quotes
while(n--){ # as long as we have double-quotes
n%2?Q=q:Q="`" # alternate Q between a backtick and single quote
sub("\"",Q Q) # replace the next double quote with two of whatever Q is
}
}1 # print out all other lines untouched'
q=\' in # set the q variable to a single quote and pass the file 'in' as input
Using sed
sed '/^\([^"]*"[^"]*"[^"]*\)*$/s/"\([^"]*\)"/``\1'\'\''/g' in
This might work for you:
sed 'h;s/"\([^"]*\)"/``\1''\'\''/g;/"/g' file
Explanation:
Make a copy of the original line h
Replace pairs of "'s s/"\([^"]*\)"/``\1''\'\''/g
Check for odd " and if found revert to original line /"/g

Grep search strings with line breaks

How to use grep to output occurrences of the string 'export to excel' in the input files given below? Specifically, how to handle the line breaks that happen in between the search strings? Is there a switch in grep that can do this or some other command probably?
Input files:
File a.txt:
blah blah ... export to
excel ...
blah blah..
File b.txt:
blah blah ... export to excel ...
blah blah..
Do you just want to find files that contain the pattern, ignoring linebreaks, or do you want to actually see the matching lines?
If the former, you can use tr to convert newlines to spaces:
tr '\n' ' ' | grep 'export to excel'
If the latter you can do the same thing, but you may want to use the -o flag to only print the actual match. You'll then want to adjust your regex to include any extra context you want.
I don't know how to do this in grep. I checked the man page for egrep(1) and it can't match with a newline in the middle either.
I like the solution #Laurence Gonsalves suggested, of using tr(1) to wipe out the newlines. But as he noted, it will be a pain to print the matching lines if you do it that way.
If you want to match despite a newline and then print the matching line(s), I can't think of a way to do it with grep, but it would be not too hard in any of Python, AWK, Perl, or Ruby.
Here's a Python script that solves the problem. I decided that, for lines that only match when joined to the previous line, I would print a --> arrow before the second line of the match. Lines that match outright are always printed without the arrow.
This is written assuming that /usr/bin/python is Python 2.x. You can trivially change the script to work under Python 3.x if desired.
#!/usr/bin/python
import re
import sys
s_pat = "export\s+to\s+excel"
pat = re.compile(s_pat)
def print_ete(fname):
try:
f = open(fname, "rt")
except IOError:
sys.stderr.write('print_ete: unable to open file "%s"\n' % fname)
sys.exit(2)
prev_line = ""
i_last = -10
for i, line in enumerate(f):
# is ete within current line?
if pat.search(line):
print "%s:%d: %s" % (fname, i+1, line.strip())
i_last = i
else:
# construct extended line that included previous
# note newline is stripped
s = prev_line.strip("\n") + " " + line
# is ete within extended line?
if pat.search(s):
# matched ete in extended so want both lines printed
# did we print prev line?
if not i_last == (i - 1):
# no so print it now
print "%s:%d: %s" % (fname, i, prev_line.strip())
# print cur line with special marker
print "--> %s:%d: %s" % (fname, i+1, line.strip())
i_last = i
# make sure we don't match ete twice
prev_line = re.sub(pat, "", line)
try:
if sys.argv[1] in ("-h", "--help"):
raise IndexError # print help
except IndexError:
sys.stderr.write("print_ete <filename>\n")
sys.stderr.write('grep-like tool to print lines matching "%s"\n' %
"export to excel")
sys.exit(1)
print_ete(sys.argv[1])
EDIT: added comments.
I went to some trouble to make it print the correct line number on each line, using a format similar to what you would get with grep -Hn.
It could be much shorter and simpler if you don't need line numbers, and you don't mind reading in the whole file at once into memory:
#!/usr/bin/python
import re
import sys
# This pattern not compiled with re.MULTILINE on purpose.
# We *want* the \s pattern to match a newline here so it can
# match across multiple lines.
# Note the match group that gathers text around ete pattern uses a character
# class that matches anything but "\n", to grab text around ete.
s_pat = "([^\n]*export\s+to\s+excel[^\n]*)"
pat = re.compile(s_pat)
def print_ete(fname):
try:
text = open(fname, "rt").read()
except IOError:
sys.stderr.write('print_ete: unable to open file "%s"\n' % fname)
sys.exit(2)
for s_match in re.findall(pat, text):
print s_match
try:
if sys.argv[1] in ("-h", "--help"):
raise IndexError # print help
except IndexError:
sys.stderr.write("print_ete <filename>\n")
sys.stderr.write('grep-like tool to print lines matching "%s"\n' %
"export to excel")
sys.exit(1)
print_ete(sys.argv[1])
grep -A1 "export to" filename | grep -B1 "excel"
I have tested this a little and it seems to work:
sed -n '$b; /export to excel/{p; b}; N; /export to\nexcel/{p; b}; D' filename
You can allow for some extra white space at the end and beginning of the lines like this:
sed -n '$b; /export to excel/{p; b}; N; /export to\s*\n\s*excel/{p; b}; D' filename
use gawk. set record separator as excel, then check for "export to".
gawk -vRS="excel" '/export.*to/{print "found export to excel at record: "NR}' file
or
gawk '/export.*to.*excel/{print}
/export to/&&!/excel/{
s=$0
getline line
if (line~/excel/){
printf "%s\n%s\n",s,line
}
}' file

Resources