Passing zsh function parameter to grep -E - bash

Background...
Trying to find which commit(s) last touched a specific file.
I can do this on the CLI piping from git-log to grep but I'm trying to wrap this in a zsh function, more for ease of memory.
Here's my function, and then here is the output I'd like to generate with it.
# match lines from git log that start with commit or include the
# filename I'm interested in and then pipe back through grep to color the output
glpg() {
\git log --name-only | \grep -E ‘“$"|^commit\s\S' | \grep -B1 --color -E ‘$'
}
Desired usage and output
dwight:assets (add-analytics*) $ glpg clickouts
commit 6662418b8e68e478b95e7254faa6406abdada30f
web/assets/app/viewmodels/clickouts.js
web/assets/app/views/clickouts.html
web/client/app/viewmodels/clickouts.js
web/client/app/views/clickouts.html
--
commit cee37549f613985210c9caf90a48e2cca28d4412
web/client/app/viewmodels/clickouts.js
web/client/app/views/clickouts.html
--
commit df9ea8cd90ff80b89a0c7e2b0657141b105d5e7e
web/client/app/viewmodels/clickouts.js
web/client/app/views/clickouts.html

Three problems.
You use Unicode apostrophes and quotes, ‘ and “. Replace them with ASCII quotes and doublequotes.
You can't use \s and \S to mean space or non-space with a standard (POSIX) grep. Use ' ' and [^ ] instead to be portable.
The list of all args is referenced with "$#" including the double quotes.

Related

MacOS Bash Script using sed and grep doesn't work because of grep: repetition-operator operand invalid error

I tried to update this line in pubspec.yaml on macOS
version: 0.2.6-alpha+26
#!/bin/bash
version=$(grep -oE '(?\<=version: )\[^ \]+' pubspec.yaml)
version=$(echo $version | sed 's/(\[0-9\].\[0-9\].\[0-9\])-alpha+(\[0-9\])/\\1-alpha+\\1/')
sed -i '' "s/version:.\*/version: $version/" pubspec.yaml
This script is intended to increment the patch number of the version number in the file pubspec.yaml. However, the script contains errors that cause it to throw the following error message:
grep: repetition-operator operand invalid
sed: 1: "s/([0-9].[0-9].[0-9])-a ...": \1 not defined in the RE
I'd like it to simply work
It looks like your grep statement derives from this bump_version.sh.
But the macOS grep does not recognize the GNU-grep -P option.
(Aside: The ?<= syntax comes from Perl regular expressions. See "Lookbehind assertions" here.)
But you can drop the regex gymnastics by translating dots & dashes into spaces, which are easily digested by the Bash "read" command:
#!/usr/bin/env bash
grep version pubspec.yaml | (
IFS=' .-' read label major minor patch build;
sed -i '' "s/version:.*/version: $major.$minor.$((patch+1))-$build/" pubspec.yaml
)
Notes:
This script favors piped output over variables & redirection.
I didn't know that IFS can take multiple characters & applied C.Duffy's tip (to omit tr)!
Reusing variables through Process Substitution is left as an exercise for the reader.

Bash command - how to grep and then truncate but keep grep-ed part?

I am trying to splice out a particular piece of string. I used:
myVar=$(grep --color 'GACCT[ATCG]*AGGTC' FILE.txt | cat)
then, I used the code below to remove everything before and after my desired portion.
myVar1=$(echo ${myVar##*GACCT})
echo ${myVar1%%AGGTC*}
The code is working however, it cuts off the GACCT and AGGTC at the beginning and end of the desired fragmen that I want to keep. Is there anyway to cut the beginning and end off while still keeping the GACCT and AGGTC?
Thank you!
If you have a GNU grep, you can make use of
myVar=$(grep --color=never -oP 'GACCT\K[ATCG]+(?=AGGTC)' FILE.txt)
See the online demo:
#!/bin/bash
s='GACCTAAATTTGGGCCCAGGTC'
# Original script
myVar=$(grep --color 'GACCT[ATCG]*AGGTC' <<< "$s" | cat)
myVar1=$(echo ${myVar##*GACCT})
echo ${myVar1%%AGGTC*}
# => AAATTTGGGCCC
# My suggestion:
grep --color=never -oP 'GACCT\K[ATCG]+(?=AGGTC)' <<< "$s"
# => AAATTTGGGCCC
With --color=never, your matches are not colored.
The -o option outputs the matched texts, and the P option enables the PCRE regex engine. It is necessary here since the regex pattern contains specific operators, like \K and (?=...).
More details
GACCT - a literal string
\K - operator that makes the regex engine "forget" what has been consumed
[ATCG]+ - one or more letters from the set
(?=AGGTC) - a positive lookahead that requires an AGGTC string immediately to the right of the current location.
Note you can get this result with pcregrep, too, if you install it:
myVar=$(pcregrep -o 'GACCT\K[ATCG]+(?=AGGTC)' FILE.txt)

Append text to top of file using sed doesn't work for variable whose content has "/" [duplicate]

This question already has answers here:
Using different delimiters in sed commands and range addresses
(3 answers)
Closed 1 year ago.
I have a Visual Studio project, which is developed locally. Code files have to be deployed to a remote server. The only problem is the URLs they contain, which are hard-coded.
The project contains URLs such as ?page=one. For the link to be valid on the server, it must be /page/one .
I've decided to replace all URLs in my code files with sed before deployment, but I'm stuck on slashes.
I know this is not a pretty solution, but it's simple and would save me a lot of time. The total number of strings I have to replace is fewer than 10. A total number of files which have to be checked is ~30.
An example describing my situation is below:
The command I'm using:
sed -f replace.txt < a.txt > b.txt
replace.txt which contains all the strings:
s/?page=one&/pageone/g
s/?page=two&/pagetwo/g
s/?page=three&/pagethree/g
a.txt:
?page=one&
?page=two&
?page=three&
Content of b.txt after I run my sed command:
pageone
pagetwo
pagethree
What I want b.txt to contain:
/page/one
/page/two
/page/three
The easiest way would be to use a different delimiter in your search/replace lines, e.g.:
s:?page=one&:pageone:g
You can use any character as a delimiter that's not part of either string. Or, you could escape it with a backslash:
s/\//foo/
Which would replace / with foo. You'd want to use the escaped backslash in cases where you don't know what characters might occur in the replacement strings (if they are shell variables, for example).
The s command can use any character as a delimiter; whatever character comes after the s is used. I was brought up to use a #. Like so:
s#?page=one&#/page/one#g
A very useful but lesser-known fact about sed is that the familiar s/foo/bar/ command can use any punctuation, not only slashes. A common alternative is s#foo#bar#, from which it becomes obvious how to solve your problem.
add \ before special characters:
s/\?page=one&/page\/one\//g
etc.
In a system I am developing, the string to be replaced by sed is input text from a user which is stored in a variable and passed to sed.
As noted earlier on this post, if the string contained within the sed command block contains the actual delimiter used by sed - then sed terminates on syntax error. Consider the following example:
This works:
$ VALUE=12345
$ echo "MyVar=%DEF_VALUE%" | sed -e s/%DEF_VALUE%/${VALUE}/g
MyVar=12345
This breaks:
$ VALUE=12345/6
$ echo "MyVar=%DEF_VALUE%" | sed -e s/%DEF_VALUE%/${VALUE}/g
sed: -e expression #1, char 21: unknown option to `s'
Replacing the default delimiter is not a robust solution in my case as I did not want to limit the user from entering specific characters used by sed as the delimiter (e.g. "/").
However, escaping any occurrences of the delimiter in the input string would solve the problem.
Consider the below solution of systematically escaping the delimiter character in the input string before having it parsed by sed.
Such escaping can be implemented as a replacement using sed itself, this replacement is safe even if the input string contains the delimiter - this is since the input string is not part of the sed command block:
$ VALUE=$(echo ${VALUE} | sed -e "s#/#\\\/#g")
$ echo "MyVar=%DEF_VALUE%" | sed -e s/%DEF_VALUE%/${VALUE}/g
MyVar=12345/6
I have converted this to a function to be used by various scripts:
escapeForwardSlashes() {
# Validate parameters
if [ -z "$1" ]
then
echo -e "Error - no parameter specified!"
return 1
fi
# Perform replacement
echo ${1} | sed -e "s#/#\\\/#g"
return 0
}
this line should work for your 3 examples:
sed -r 's#\?(page)=([^&]*)&#/\1/\2#g' a.txt
I used -r to save some escaping .
the line should be generic for your one, two three case. you don't have to do the sub 3 times
test with your example (a.txt):
kent$ echo "?page=one&
?page=two&
?page=three&"|sed -r 's#\?(page)=([^&]*)&#/\1/\2#g'
/page/one
/page/two
/page/three
replace.txt should be
s/?page=/\/page\//g
s/&//g
please see this article
http://netjunky.net/sed-replace-path-with-slash-separators/
Just using | instead of /
Great answer from Anonymous. \ solved my problem when I tried to escape quotes in HTML strings.
So if you use sed to return some HTML templates (on a server), use double backslash instead of single:
var htmlTemplate = "<div style=\\"color:green;\\"></div>";
A simplier alternative is using AWK as on this answer:
awk '$0="prefix"$0' file > new_file
You may use an alternative regex delimiter as a search pattern by backs lashing it:
sed '\,{some_path},d'
For the s command:
sed 's,{some_path},{other_path},'

Removing duplicate entries from files on the basis of substring postfixes

Let's say that I have the following text in a file:
foo.bar.baz
bar.baz
123.foo.bar.baz
pqr.abc.def
xyz.abc.def
abc.def.ghi.jkl
def.ghi.jkl
How would I remove duplicates from the file, on the basis of postfixes? The expected output without duplicates would be:
bar.baz
pqr.abc.def
xyz.abc.def
def.ghi.jkl
(Consider foo.bar.baz and bar.baz. The latter is a substring postfix so only bar.baz remains. However, neither of pqr.abc.def and xyz.abc.def are not substring postfixes of each other, so both remain.)
Try this:
#!/bin/bash
INPUT_FILE="$1"
in="$(cat $INPUT_FILE)"
out="$in"
for line in $in; do
out=$(echo "$out" | grep -v "\.$line\$")
done
echo "$out"
You need to save it to a script (e.g. bashor.sh), make it executable (chmod +x bashor.sh) and call it with your input file as the first argument:
./bashor.sh path/to/input.txt
Use sed to escape the string for regular expressions, prefix ., postfix $ and pipe this into GNU grep (-f - doesn't work with BSD grep, eg. on a mac).
sed 's/[^-A-Za-z0-9_]/\\&/g; s/^/./; s/$/$/' test.txt |grep -vf - test.txt
I just used to regular expression escaping from another answer and didn't think about whether it is reasonable. On first sight it seems fine, but escapes too much, though probably this is not an issue.

In a bash script, how do I sanitize user input?

I'm looking for the best way to take a simple input:
echo -n "Enter a string here: "
read -e STRING
and clean it up by removing non-alphanumeric characters, lower(case), and replacing spaces with underscores.
Does order matter? Is tr the best / only way to go about this?
As dj_segfault points out, the shell can do most of this for you. Looks like you'll have to fall back on something external for lower-casing the string, though. For this you have many options, like the perl one-liners above, etc., but I think tr is probably the simplest.
# first, strip underscores
CLEAN=${STRING//_/}
# next, replace spaces with underscores
CLEAN=${CLEAN// /_}
# now, clean out anything that's not alphanumeric or an underscore
CLEAN=${CLEAN//[^a-zA-Z0-9_]/}
# finally, lowercase with TR
CLEAN=`echo -n $CLEAN | tr A-Z a-z`
The order here is somewhat important. We want to get rid of underscores, plus replace spaces with underscores, so we have to be sure to strip underscores first. By waiting to pass things to tr until the end, we know we have only alphanumeric and underscores, and we can be sure we have no spaces, so we don't have to worry about special characters being interpreted by the shell.
Bash can do this all on it's own, thank you very much. If you look at the section of the man page on Parameter Expansion, you'll see that that bash has built-in substitutions, substring, trim, rtrim, etc.
To eliminate all non-alphanumeric characters, do
CLEANSTRING=${STRING//[^a-zA-Z0-9]/}
That's Occam's razor. No need to launch another process.
For Bash >= 4.0:
CLEAN="${STRING//_/}" && \
CLEAN="${CLEAN// /_}" && \
CLEAN="${CLEAN//[^a-zA-Z0-9]/}" && \
CLEAN="${CLEAN,,}"
This is especially useful for creating container names programmatically using docker/podman. However, in this case you'll also want to remove the underscores:
# Sanitize $STRING for a container name
CLEAN="${STRING//[^a-zA-Z0-9]/}" && \
CLEAN="${CLEAN,,}"
After a bit of looking around it seems tr is indeed the simplest way:
export CLEANSTRING="`echo -n "${STRING}" | tr -cd '[:alnum:] [:space:]' | tr '[:space:]' '-' | tr '[:upper:]' '[:lower:]'`"
Occam's razor, I suppose.
You could run it through perl.
export CLEANSTRING=$(perl -e 'print join( q//, map { s/\\s+/_/g; lc } split /[^\\s\\w]+/, \$ENV{STRING} )')
I'm using ksh-style subshell here, I'm not totally sure that it works in bash.
That's the nice thing about shell, is that you can use perl, awk, sed, grep....

Resources