how to extract multiple lines between header and footer in bash [duplicate] - bash

This question already has answers here:
Remove a fixed prefix/suffix from a string in Bash
(9 answers)
Delete everyting preceding and including a certain substring from variable
(2 answers)
How can I remove all text after a character in bash?
(7 answers)
Closed 2 years ago.
if I have a bash variable that contains the following string:
my signed 1.5 tag
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJTZbQlAAoJEF0+sviABDDrZbQH/09PfE51KPVPlanr6q1v4/Ut
LQxfojUWiLQdg2ESJItkcuweYg+kc3HCyFejeDIBw9dpXt00rY26p05qrpnG+85b
hM1/PswpPLuBSr+oCIDj5GMC2r2iEKsfv2fJbNW8iWAXVLoWZRF8B0MfqX/YTMbm
ecorc4iXzQu7tupRihslbNkfvfciMnSDeSvzCpWAHl7h8Wj6hhqePmLm9lAYqnKp
8S5B/1SSQuEAjRZgI4IexpZoeKGVDptPHxLLS38fozsyi0QyDyzEgJxcJQVMXxVi
RUysgqjcpT8+iQM1PblGfHR4XAhuOqN5Fx06PSaFZhqvWFezJ28/CLyX5q+oIVk=
=EFTF
-----END PGP SIGNATURE-----
commit ca82a6dff817ec66f44342007202690a93763949
Author: Scott Chacon <schacon#gee-mail.com>
Date: Mon Mar 17 21:52:11 2008 -0700
Change version number
in bash, how can extract the base64 signature and store it in a new variable, so that it will contain exactly
iQEcBAABAgAGBQJTZbQlAAoJEF0+sviABDDrZbQH/09PfE51KPVPlanr6q1v4/Ut
LQxfojUWiLQdg2ESJItkcuweYg+kc3HCyFejeDIBw9dpXt00rY26p05qrpnG+85b
hM1/PswpPLuBSr+oCIDj5GMC2r2iEKsfv2fJbNW8iWAXVLoWZRF8B0MfqX/YTMbm
ecorc4iXzQu7tupRihslbNkfvfciMnSDeSvzCpWAHl7h8Wj6hhqePmLm9lAYqnKp
8S5B/1SSQuEAjRZgI4IexpZoeKGVDptPHxLLS38fozsyi0QyDyzEgJxcJQVMXxVi
RUysgqjcpT8+iQM1PblGfHR4XAhuOqN5Fx06PSaFZhqvWFezJ28/CLyX5q+oIVk=
=EFTF
I tried various combinations of sed or awk but didn't get any working that preserves the line breaks

Using bash parameter expansions
signature=${string#*$'\n'-----BEGIN PGP SIGNATURE-----$'\n'}
signature=${signature#*$'\n\n'}
signature=${signature%%$'\n'-----END PGP SIGNATURE-----*}
The first assignment removes the part from the beginning of the string to the line consisting of -----BEGIN PGP SIGNATURE-----. The second one removes the part to the first blank line. The third removes the part from the -----END PGP SIGNATURE----- to the end of the string. The remaining string is the base64 signature.
Explanation of parameter expansion forms used in the answer:
${var#pattern} is replaced by the content of the variable var with the shortest matching pattern deleted (from the beginning), if the pattern matches the leading portion of the content of the variable var.
${var%%pattern} is replaced by the content of the variable var with the longest matching pattern deleted (from the end), if the pattern matches the trailing portion of the content of the variable var.
For detailed information on all forms of bash parameter expansion, read the shell parameter expansion.

sed -n '/---/,/---/{/---/n;/^Version/{n;n};p}'
Between the --- lines, use n to skip the lines that you don't want. Otherwise p to print them.
Could obviously expand the regexes to be a bit more strict.

Assuming the empty lines around the signature is always present:
awk -v RS= 'sub(/\n-----END PGP SIGNATURE-----/, ""){print; exit}'
-v RS= paragraph mode, record separator will be two or more consecutive newlines
sub(/\n-----END PGP SIGNATURE-----/, "") if this substitution succeeds
print; exit print the modified record and quit
\n is used in the substitution since default ORS is newline

Assuming a bash variable str holds the mentioned string, would you please try:
pat=$'-----BEGIN PGP SIGNATURE-----\n.*\n\n([^-]+)\n-----END PGP SIGNATURE-----'
if [[ $str =~ $pat ]]; then
signature=${BASH_REMATCH[1]}
echo "$signature"
fi

Related

Replace string containing slash using SED command [duplicate]

This question already has answers here:
How to insert strings containing slashes with sed? [duplicate]
(11 answers)
Closed 3 years ago.
I am trying the below code to replace the string /IRM/I with E/IRM/I but am getting the file processed with no error and no transformation. I assume I'm using the cancel character incorrectly to allow the forward slash. Any help is much appreciated.
sed -i '/\/IRM\/IE\/IRM\/I/g'
A sed command needs to specify an operation (like s to replace), and that operation requires a sigil. You don't need to use a slash as that sigil.
printf '%s\n' 'This is a test: </IRM/I>' | \
sed -e 's#/IRM/I#E/IRM/I#g'
...correctly emits as output:
This is a test: <E/IRM/I>
Note that we added a s at the beginning of your sed expression, and followed it up with a # -- a sigil that isn't contained anywhere in the source or replacement strings, so you don't need to escape it as you would /.

Split string using \r\n using IFS in bash

I would like to split string contains \r\n in bash but carriage return and \n gives issue. Can anyone give me hint for different IFS? I tried IFS=' |\' too.
input:
projects.google.tests.inbox.document_01\r\nprojects.google.tests.inbox.document_02\r\nprojects.google.tests.inbox.global_02
Code:
IFS=$'\r'
inputData="projects.google.tests.inbox.document_01\r\nprojects.google.tests.inbox.document_02\r\nprojects.google.tests.inbox.global_02"
for line1 in ${inputData}; do
line2=`echo "${line1}"`
echo ${line2} //Expected one by one entry
done
Expected:
projects.google.tests.inbox.document_01
projects.google.tests.inbox.document_02
projects.google.tests.inbox.global_02
inputData=$'projects.google.tests.inbox.document_01\r\nprojects.google.tests.inbox.document_02\r\nprojects.google.tests.inbox.global_02'
while IFS= read -r line; do
line=${line%$'\r'}
echo "$line"
done <<<"$inputData"
Note:
The string is defined as string=$'foo\r\n', not string="foo\r\n". The latter does not put an actual CRLF sequence in your variable. See ANSI C-like strings on the bash-hackers' wiki for a description of this syntax.
${line%$'\r'} is a parameter expansion which strips a literal carriage return off the end of the contents of the variable line, should one exist.
The practice for reading an input stream line-by-line (used here) is described in detail in BashFAQ #1. Unlike iterating with for, it does not attempt to expand your data as globs.
Following awk could help you in your question.
awk '{gsub(/\\r\\n/,RS)} 1' Input_file
OR
echo "$var" | awk '{gsub(/\\r\\n/,RS)} 1'
Output will be as follows.
projects.google.tests.inbox.document_01
projects.google.tests.inbox.document_02
projects.google.tests.inbox.global_02
Explanation: Using awk's gsub utility which is used for globally substitution and it's method is gsub(/regex_to_be_subsituted/,variable/new_value,current_line/variable), so here I am giving \\r\\n(point to be noted here I am escaping here \\ which means it will take it as a literal character) with RS(record separator, whose default value is new line) in the current line. Then 1 means, awk works on method of condition and action, so by mentioning 1 I am making condition as TRUE and no action is given, so default action print of current will happen.
EDIT: With a variable you could use as following.
var="projects.google.tests.inbox.document_01\r\nprojects.google.tests.inbox.document_02\r\nprojects.google.tests.inbox.global_02"
echo "$var" | awk '{gsub(/\\r\\n/,RS)} 1'
projects.google.tests.inbox.document_01
projects.google.tests.inbox.document_02
projects.google.tests.inbox.global_02

How to pass string literal containing newlines to grep from bash script

I am trying to pass the "strings" from a file as input to grep using the -F (fixed string) parameter.
From grep the man page, the expected format is newline-separated:
-F, --fixed-strings
Interpret PATTERN as a list of fixed strings (instead of regular expressions), separated by newlines, any of which is to be matched.
How can this be done in bash? I have:
#!/bin/bash
INFILE=$1
DIR=$2
# Create a newline-separated string array
STRINGS="";
while read -r string; do
STRINGS+=$'\n'$string;
done < <(strings $INFILE);
cd $DIR
for file in *; do
grep -Frn \"$STRINGS\" .
done;
But grep reports error at run-time regarding input formatting. Grep is interpreting the passed string arguments as parameters -- hence the need to pass them as one large string literal.
Debugging bash with -xand passing the first parameter (INFILE) as the script itself gives:
+ grep -Frn '"' '#!/bin/bash' 'INFILE=$1' 'DIR=$2' [...]
Try the following:
#!/bin/bash
inFile=$1
dir=$2
# Read all lines output by `string` into a single variable using
# a command substitution, $(...).
# Note that the trailing newlines is trimmed, but grep still recognizes
# the last line.
strings="$(strings "$inFile")"
cd "$dir"
for file in *; do
grep -Frn "$strings" .
done
string outputs each string found in the target file on its own line, so you can use its output as-is, via a command substitution ($(...)).
On a side note: strings is used to extract strings from binary files, and strings are only included if they're at least 4 ASCII(!) characters long and are followed by a newline or NUL.
Note that while the POSIX spec for strings does mandate locale-awareness with respect to character interpretation, both GNU strings and BSD/macOS strings recognize 7-bit ASCII characters only.
If, by contrast, your search strings come from a text file from which you want to strip empty and blank lines, use strings="$(awk 'NF>0' "$inFile")"
Double-quote your variable references and command substitutions to ensure that their values are used as-is.
Do not use \" unless you want to pass a literal " char. to the target command - as opposed to an unquoted one that has syntactical meaning to the shell.
In your particular case, \"$STRINGS\" breaks down as follows:
An unquoted reference to variable $STRINGS - because the enclosing " are \-escaped and therefore literals.
The resulting string - "<value-of-$STRINGS>" - due to $STRINGS being unquoted, is then subject to word-splitting
(and globbing), i.e., split into multiple arguments by whitespace. As a result, because grep expects the search term(s) as a single argument, the command breaks.
Do not use all-uppercase shell variable names in order to avoid conflicts with environment variables and special shell variables.

Shell - check if file contains string with newlines inside [duplicate]

This question already has answers here:
How can I search for a multiline pattern in a file?
(11 answers)
Closed 5 years ago.
I've declared a string with two newlines inside of string
somestring=$'\n##### Branch FREEZE enable/disable\nRelease:'
I have a $file with a text inside like this
###############################
##### Branch RELEASE enable/disable
Release: disable
##### Branch FREEZE enable/disable
Freeze: disable
##### Mail list #####
I am trying to figure out, if there is a string inside with both of the newlines with a command
if grep -q "$somestring" "$file"; then
echo "found the string"
But the result is always positive, when there is a newline inside of string.
How can I make it work correct with newlines inside?
grep patterns are matched against individual lines so there is no way for a pattern to match a newline found in the input.
try pcregrep instead of regular grep:
pcregrep -M "pattern1.*\n.*pattern2" filename
the -M option allows it to match across multiple lines, so you can search for newlines as \n.

Delete all comments in a file using sed

How would you delete all comments using sed from a file(defined with #) with respect to '#' being in a string?
This helped out a lot except for the string portion.
If # always means comment, and can appear anywhere on a line (like after some code):
sed 's:#.*$::g' <file-name>
If you want to change it in place, add the -i switch:
sed -i 's:#.*$::g' <file-name>
This will delete from any # to the end of the line, ignoring any context. If you use # anywhere where it's not a comment (like in a string), it will delete that too.
If comments can only start at the beginning of a line, do something like this:
sed 's:^#.*$::g' <file-name>
If they may be preceded by whitespace, but nothing else, do:
sed 's:^\s*#.*$::g' <file-name>
These two will be a little safer because they likely won't delete valid usage of # in your code, such as in strings.
Edit:
There's not really a nice way of detecting whether something is in a string. I'd use the last two if that would satisfy the constraints of your language.
The problem with detecting whether you're in a string is that regular expressions can't do everything. There are a few problems:
Strings can likely span lines
A regular expression can't tell the difference between apostrophies and single quotes
A regular expression can't match nested quotes (these cases will confuse the regex):
# "hello there"
# hello there"
"# hello there"
If double quotes are the only way strings are defined, double quotes will never appear in a comment, and strings cannot span multiple lines, try something like this:
sed 's:#[^"]*$::g' <file-name>
That's a lot of pre-conditions, but if they all hold, you're in business. Otherwise, I'm afraid you're SOL, and you'd be better off writing it in something like Python, where you can do more advanced logic.
This might work for you (GNU sed):
sed '/#/!b;s/^/\n/;ta;:a;s/\n$//;t;s/\n\(\("[^"]*"\)\|\('\''[^'\'']*'\''\)\)/\1\n/;ta;s/\n\([^#]\)/\1\n/;ta;s/\n.*//' file
/#/!b if the line does not contain a # bail out
s/^/\n/ insert a unique marker (\n)
ta;:a jump to a loop label (resets the substitute true/false flag)
s/\n$//;t if marker at the end of the line, remove and bail out
s/\n\(\("[^"]*"\)\|\('\''[^'\'']*'\''\)\)/\1\n/;ta if the string following the marker is a quoted one, bump the marker forward of it and loop.
s/\n\([^#]\)/\1\n/;ta if the character following the marker is not a #, bump the marker forward of it and loop.
s/\n.*// the remainder of the line is comment, remove the marker and the rest of line.
Since there is no sample input provided by asker, I will assume a couple of cases and Bash is the input file because bash is used as the tag of the question.
Case 1: entire line is the comment
The following should be sufficient enough in most case:
sed '/^\s*#/d' file
It matches any line has which has none or at least one leading white-space characters (space, tab, or a few others, see man isspace), followed by a #, then delete the line by d command.
Any lines like:
# comment started from beginning.
# any number of white-space character before
# or 'quote' in "here"
They will be deleted.
But
a="foobar in #comment"
will not be deleted, which is the desired result.
Case 2: comment after actual code
For example:
if [[ $foo == "#bar" ]]; then # comment here
The comment part can be removed by
sed "s/\s*#*[^\"']*$//" file
[^\"'] is used to prevent quoted string confusion, however, it also means that comments with quotations ' or " will not to be removed.
Final sed
sed "/^\s*#/d;s/\s*#[^\"']*$//" file
To remove comment lines (lines whose first non-whitespace character is #) but not shebang lines (lines whose first characters are #!):
sed '/^[[:space:]]*#[^!]/d; /#$/d' file
The first argument to sed is a string containing a sed program consisting of two delete-line commands of the form /regex/d. Commands are separated by ;. The first command deletes comment lines but not shebang lines. The second command deletes any remaining empty comment lines. It does not handle trailing comments.
The last argument to sed is a file to use as input. In Bash, you can also operate on a string variable like this:
sed '/^[[:space:]]*#[^!]/d; /#$/d' <<< "${MYSTRING}"
Example:
# test.sh
S0=$(cat << HERE
#!/usr/bin/env bash
# comment
# indented comment
echo 'FOO' # trailing comment
# last line is an empty, indented comment
#
HERE
)
printf "\nBEFORE removal:\n\n${S0}\n\n"
S1=$(sed '/^[[:space:]]*#[^!]/d; /#$/d' <<< "${S0}")
printf "\nAFTER removal:\n\n${S1}\n\n"
Output:
$ bash test.sh
BEFORE removal:
#!/usr/bin/env bash
# comment
# indented comment
echo 'FOO' # trailing comment
# last line is an empty, indented comment
#
AFTER removal:
#!/usr/bin/env bash
echo 'FOO' # trailing comment
Supposing "being in a string" means "occurs between a pair of quotes, either single or double", the question can be rephrased as "remove everything after the first unquoted #". You can define the quoted strings, in turn, as anything between two quotes, excepting backslashed quotes. As a minor refinement, replace the entire line with everything up through just before the first unquoted #.
So we get something like [^\"'#] for the trivial case -- a piece of string which is neither a comment sign, nor a backslash, nor an opening quote. Then we can accept a backslash followed by anything: \\. -- that's not a literal dot, that's a literal backslash, followed by a dot metacharacter which matches any character.
Then we can allow zero or more repetitions of a quoted string. In order to accept either single or double quotes, allow zero or more of each. A quoted string shall be defined as an opening quote, followed by zero or more of either a backslashed arbitrary character, or any character except the closing quote: "\(\\.\|[^\"]\)*" or similarly for single-quoted strings '\(\\.\|[^\']\)*'.
Piecing all of this together, your sed script could look something like this:
s/^\([^\"'#]*\|\\.\|"\(\\.\|[^\"]\)*"\|'\(\\.\|[^\']\)*'\)*\)#.*/\1/
But because it needs to be quoted, and both single and double quotes are included in the string, we need one more additional complication. Recall that the shell allows you to glue together strings like "foo"'bar' gets replaced with foobar -- foo in double quotes, and bar in single quotes. Thus you can include single quotes by putting them in double quotes adjacent to your single-quoted string -- '"foo"'"'" is "foo" in single quotes next to ' in double quotes, thus "foo"'; and "' can be expressed as '"' adjacent to "'". And so a single-quoted string containing both double quotes foo"'bar can be quoted with 'foo"' adjacent to "'bar" or, perhaps more realistically for this case 'foo"' adjacent to "'" adjacent to another single-quoted string 'bar', yielding 'foo'"'"'bar'.
sed 's/^\(\(\\.\|[^\#"'"'"']*\|"\(\\.\|[^\"]\)*"\|'"'"'\(\\.\|[^\'"'"']\)*'"'"'\)*\)#.*/\1/p' file
This was tested on Linux; on other platforms, the sed dialect may be slightly different. For example, you may need to omit the backslashes before the grouping and alteration operators.
Alas, if you may have multi-line quoted strings, this will not work; sed, by design, only examines one input line at a time. You could build a complex script which collects multiple lines into memory, but by then, switching to e.g. Perl starts to make a lot of sense.
As you have pointed out, sed won't work well if any parts of a script look like comments but actually aren't. For example, you could find a # inside a string, or the rather common $# and ${#param}.
I wrote a shell formatter called shfmt, which has a feature to minify code. That includes removing comments, among other things:
$ cat foo.sh
echo $# # inline comment
# lone comment
echo '# this is not a comment'
[mvdan#carbon:12] [0] [/home/mvdan]
$ shfmt -mn foo.sh
echo $#
echo '# this is not a comment'
The parser and printer are Go packages, so if you'd like a custom solution, it should be fairly easy to write a 20-line Go program to remove comments in the exact way that you want.
sed 's:^#\(.*\)$:\1:g' filename
Supposing the lines starts with single # comment, Above command removes all comments from file.

Resources