Escape "./" when using sed - bash

I wanted to use grep to exclude words from $lastblock by using a pipeline, but I found that grep works only for files, not for stdout output.
So, here is what I'm using:
lastblock="./2.json"
echo $lastblock | sed '1,/firstmatch/d;/.json/,$d'
I want to exclude ./ and .json, keeping only what is between.
This sed command is correct for this purpose, but how to escape the ./ replacing firstmatch so it can work?
Thanks in advance!

Use bash's Parameter Substitution
lastblock="./2.json"
name="${lastblock##*/}" # strips from the beginning until last / -> 2.json
base="${name%.*}" # strips from the last . to the end -> 2

but I found that grep works only for files, not for stdout output.
here it is. (if your grep supports the -P flag.
lastblock="./2.json"
echo "$lastblock" | grep -Po '(?<=\./).*(?=\.)'
but how to escape the ./
With sed(1), escape it using a back slash \
lastblock="./2.json"
echo "$lastblock" | sed 's/^\.\///;s/\..*$//'
Or use a different delimiter like a pipe |
sed 's|^\./||;s|\..*$||'
with awk
lastblock="./2.json"
echo "$lastblock" | awk -F'[./]+' '{print $2}'
Starting from bashv3, regular expression pattern matching is supported using the =~ operator inside the [[ ... ]] keyword.
lastblock="./2.json"
regex='^\./([[:digit:]]+)\.json'
[[ $lastblock =~ $regex ]] && echo "${BASH_REMATCH[1]}"
Although a P.E. should suffice just for this purpose.

I wanted to use grep to exclude words from $lastblock by using a pipeline, but I found that grep works only for files, not for stdout output.
Nonsense. grep works the same for the same input, regardless of whether it is from a file or from the standard input.
So, here is what I'm using:
lastblock="./2.json"
echo $lastblock | sed '1,/firstmatch/d;/.json/,$d'
I want to exclude ./ and .json, keeping only what is between. This sed
command is correct for this purpose,
That sed command is nowhere near correct for the stated purpose. It has this effect:
delete every line from the very first one up to and including the next subsequent one that matches the regular expression /firstmatch/, AND
delete every line from the first one matching the regular expression /.json/ to the last one of the file (and note that . is a regex metacharacter).
To remove part of a line instead of deleting a whole line, use an s/// command instead of a d command. As for escaping, you can escape a character to sed by preceding it with a backslash (\), which itself must be quoted or escaped to protect it from interpretation by the shell. Additionally, most regex metacharacters lose their special significance when they appear inside a character class, which I find to be a more legible way to include them in a pattern as literals. For example:
lastblock="./2.json"
echo "$lastblock" | sed 's/^[.]\///; s/[.]json$//'
That says to remove the literal characters ./ appearing at the beginning of the (any) line, and, separately, to remove the literal characters .json appearing at the end of the line.
Alternatively, if you want to modify only those lines that both start with ./ and end with .json then you can use a single s command with a capturing group and a backreference:
lastblock="./2.json"
echo "$lastblock" | sed 's/^[.]\/\(.*\)[.]json$/\1/'
That says that on lines that start with ./ and end with .json, capture everything between those two and replace the whole line with the captured part alone.

You can use another character like '#' when you want to avoid slashes.
You can remember a part that matches and use it in the replacement.
Use [.] avoiding the dot to be any character.
echo "$lastblock" | sed -r 's#[.]/(.*)[.]json#\1#'

Solution!
Just discovered today the tr command thanks to this legendary, unrelated answer.
When searching all over Google for how to exclude "." and "/", 100% of StackOverflow answers didn't helped.
So, to escape characters from the output of a command, just append this pipe:
| tr -d "{character-emoji-anything-you-want-to-exclude}"
So, a full working and simple sample:
echo "./2.json" | tr -d "/" | tr -d "." | tr -d "json"
And done!

Related

How to convert separators using regex in bash

How do I modify my bash file to achieve the expected result shown below ?
#!/bin/bash
filename=$1
var="$(<$filename)" | tr -d '\n'
sed -i 's/;/,/g' $var
Convert this input file
a,b;c^d"e}
f;g,h!;i8j-
To this output file
a,b,c,d,e,f,g,h,i,j
How to convert separators using regex in bash
You would, well, literally, do exactly that - convert any of the separators using regex. This consists of steps:
most importantly, figure out the exact definition of what consists of a "separator"
writing a regex for it
writing an algorithm for it
running and testing the code
For example, assuming a separator is a sequence of of any of \n,;^"}!8- characters, you could do:
sed -zi 's/[,;^"}!8-]\+/,/g; s/,$/\n/' input_file
Or similar with first tr '\n' , for example when -z is not available with your sed, and then pass the result of tr to sed. The second regex adds a trailing newline on the output instead of a trailing ,.
Additionally, in your code:
var is unset on sed line. Parts of | pipeline are running in a subshell.
var=$(<$filename) contains the contents of the file, whereas sed wants a filename as argument, not file contents.
var=.... | ... is pipeing the result of assignment to tr. The output of assignment is empty, so that line produces nothing, and its output is unused.
Remember to check bash scripts with shellcheck.
For a somewhat portable solution, maybe try
tr -cs A-Za-z , <input_file | sed '$s/,$/\n/' >output_file
The use of \n to force a final newline is still not entirely reliable; there are some sed versions which interpret the sequence as a literal n.
You'd move output_file back on top of input_file after this command if you want to replace the original.

Insert the contents of the variable in SED command [duplicate]

If I run these commands from a script:
#my.sh
PWD=bla
sed 's/xxx/'$PWD'/'
...
$ ./my.sh
xxx
bla
it is fine.
But, if I run:
#my.sh
sed 's/xxx/'$PWD'/'
...
$ ./my.sh
$ sed: -e expression #1, char 8: Unknown option to `s'
I read in tutorials that to substitute environment variables from shell you need to stop, and 'out quote' the $varname part so that it is not substituted directly, which is what I did, and which works only if the variable is defined immediately before.
How can I get sed to recognize a $var as an environment variable as it is defined in the shell?
Your two examples look identical, which makes problems hard to diagnose. Potential problems:
You may need double quotes, as in sed 's/xxx/'"$PWD"'/'
$PWD may contain a slash, in which case you need to find a character not contained in $PWD to use as a delimiter.
To nail both issues at once, perhaps
sed 's#xxx#'"$PWD"'#'
In addition to Norman Ramsey's answer, I'd like to add that you can double-quote the entire string (which may make the statement more readable and less error prone).
So if you want to search for 'foo' and replace it with the content of $BAR, you can enclose the sed command in double-quotes.
sed 's/foo/$BAR/g'
sed "s/foo/$BAR/g"
In the first, $BAR will not expand correctly while in the second $BAR will expand correctly.
Another easy alternative:
Since $PWD will usually contain a slash /, use | instead of / for the sed statement:
sed -e "s|xxx|$PWD|"
You can use other characters besides "/" in substitution:
sed "s#$1#$2#g" -i FILE
一. bad way: change delimiter
sed 's/xxx/'"$PWD"'/'
sed 's:xxx:'"$PWD"':'
sed 's#xxx#'"$PWD"'#'
maybe those not the final answer,
you can not known what character will occur in $PWD, / : OR #.
if delimiter char in $PWD, they will break the expression
the good way is replace(escape) the special character in $PWD.
二. good way: escape delimiter
for example:
try to replace URL as $url (has : / in content)
x.com:80/aa/bb/aa.js
in string $tmp
URL
A. use / as delimiter
escape / as \/ in var (before use in sed expression)
## step 1: try escape
echo ${url//\//\\/}
x.com:80\/aa\/bb\/aa.js #escape fine
echo ${url//\//\/}
x.com:80/aa/bb/aa.js #escape not success
echo "${url//\//\/}"
x.com:80\/aa\/bb\/aa.js #escape fine, notice `"`
## step 2: do sed
echo $tmp | sed "s/URL/${url//\//\\/}/"
URL
echo $tmp | sed "s/URL/${url//\//\/}/"
URL
OR
B. use : as delimiter (more readable than /)
escape : as \: in var (before use in sed expression)
## step 1: try escape
echo ${url//:/\:}
x.com:80/aa/bb/aa.js #escape not success
echo "${url//:/\:}"
x.com\:80/aa/bb/aa.js #escape fine, notice `"`
## step 2: do sed
echo $tmp | sed "s:URL:${url//:/\:}:g"
x.com:80/aa/bb/aa.js
With your question edit, I see your problem. Let's say the current directory is /home/yourname ... in this case, your command below:
sed 's/xxx/'$PWD'/'
will be expanded to
sed `s/xxx//home/yourname//
which is not valid. You need to put a \ character in front of each / in your $PWD if you want to do this.
Actually, the simplest thing (in GNU sed, at least) is to use a different separator for the sed substitution (s) command. So, instead of s/pattern/'$mypath'/ being expanded to s/pattern//my/path/, which will of course confuse the s command, use s!pattern!'$mypath'!, which will be expanded to s!pattern!/my/path!. I’ve used the bang (!) character (or use anything you like) which avoids the usual, but-by-no-means-your-only-choice forward slash as the separator.
Dealing with VARIABLES within sed
[root#gislab00207 ldom]# echo domainname: None > /tmp/1.txt
[root#gislab00207 ldom]# cat /tmp/1.txt
domainname: None
[root#gislab00207 ldom]# echo ${DOMAIN_NAME}
dcsw-79-98vm.us.oracle.com
[root#gislab00207 ldom]# cat /tmp/1.txt | sed -e 's/domainname: None/domainname: ${DOMAIN_NAME}/g'
--- Below is the result -- very funny.
domainname: ${DOMAIN_NAME}
--- You need to single quote your variable like this ...
[root#gislab00207 ldom]# cat /tmp/1.txt | sed -e 's/domainname: None/domainname: '${DOMAIN_NAME}'/g'
--- The right result is below
domainname: dcsw-79-98vm.us.oracle.com
VAR=8675309
echo "abcde:jhdfj$jhbsfiy/.hghi$jh:12345:dgve::" |\
sed 's/:[0-9]*:/:'$VAR':/1'
where VAR contains what you want to replace the field with
I had similar problem, I had a list and I have to build a SQL script based on template (that contained #INPUT# as element to replace):
for i in LIST
do
awk "sub(/\#INPUT\#/,\"${i}\");" template.sql >> output
done
If your replacement string may contain other sed control characters, then a two-step substitution (first escaping the replacement string) may be what you want:
PWD='/a\1&b$_' # these are problematic for sed
PWD_ESC=$(printf '%s\n' "$PWD" | sed -e 's/[\/&]/\\&/g')
echo 'xxx' | sed "s/xxx/$PWD_ESC/" # now this works as expected
for me to replace some text against the value of an environment variable in a file with sed works only with quota as the following:
sed -i 's/original_value/'"$MY_ENVIRNONMENT_VARIABLE"'/g' myfile.txt
BUT when the value of MY_ENVIRONMENT_VARIABLE contains a URL (ie https://andreas.gr) then the above was not working.
THEN use different delimiter:
sed -i "s|original_value|$MY_ENVIRNONMENT_VARIABLE|g" myfile.txt

Text processing in bash - extracting information between multiple HTML tags and outputting it into CSV format [duplicate]

I can't figure how to tell sed dot match new line:
echo -e "one\ntwo\nthree" | sed 's/one.*two/one/m'
I expect to get:
one
three
instead I get original:
one
two
three
sed is line-based tool. I don't think these is an option.
You can use h/H(hold), g/G(get).
$ echo -e 'one\ntwo\nthree' | sed -n '1h;1!H;${g;s/one.*two/one/p}'
one
three
Maybe you should try vim
:%s/one\_.*two/one/g
If you use a GNU sed, you may match any character, including line break chars, with a mere ., see :
.
Matches any character, including newline.
All you need to use is a -z option:
echo -e "one\ntwo\nthree" | sed -z 's/one.*two/one/'
# => one
# three
See the online sed demo.
However, one.*two might not be what you need since * is always greedy in POSIX regex patterns. So, one.*two will match the leftmost one, then any 0 or more chars as many as possible, and then the rightmost two. If you need to remove one, then any 0+ chars as few as possible, and then the leftmost two, you will have to use perl:
perl -i -0 -pe 's/one.*?two//sg' file # Non-Unicode version
perl -i -CSD -Mutf8 -0 -pe 's/one.*?two//sg' file # S&R in a UTF8 file
The -0 option enables the slurp mode so that the file could be read as a whole and not line-by-line, -i will enable inline file modification, s will make . match any char including line break chars, and .*? will match any 0 or more chars as few as possible due to a non-greedy *?. The -CSD -Mutf8 part make sure your input is decoded and output re-encoded back correctly.
You can use python this way:
$ echo -e "one\ntwo\nthree" | python -c 'import re, sys; s=sys.stdin.read(); s=re.sub("(?s)one.*two", "one", s); print s,'
one
three
$
This reads the entire python's standard input (sys.stdin.read()), then substitutes "one" for "one.*two" with dot matches all setting enabled (using (?s) at the start of the regular expression) and then prints the modified string (the trailing comma in print is used to prevent print from adding an extra newline).
This might work for you:
<<<$'one\ntwo\nthree' sed '/two/d'
or
<<<$'one\ntwo\nthree' sed '2d'
or
<<<$'one\ntwo\nthree' sed 'n;d'
or
<<<$'one\ntwo\nthree' sed 'N;N;s/two.//'
Sed does match all characters (including the \n) using a dot . but usually it has already stripped the \n off, as part of the cycle, so it no longer present in the pattern space to be matched.
Only certain commands (N,H and G) preserve newlines in the pattern/hold space.
N appends a newline to the pattern space and then appends the next line.
H does exactly the same except it acts on the hold space.
G appends a newline to the pattern space and then appends whatever is in the hold space too.
The hold space is empty until you place something in it so:
sed G file
will insert an empty line after each line.
sed 'G;G' file
will insert 2 empty lines etc etc.
How about two sed calls:
(get rid of the 'two' first, then get rid of the blank line)
$ echo -e 'one\ntwo\nthree' | sed 's/two//' | sed '/^$/d'
one
three
Actually, I prefer Perl for one-liners over Python:
$ echo -e 'one\ntwo\nthree' | perl -pe 's/two\n//'
one
three
Below discussion is based on Gnu sed.
sed operates on a line by line manner. So it's not possible to tell it dot match newline. However, there are some tricks that can implement this. You can use a loop structure (kind of) to put all the text in the pattern space, and then do the operation.
To put everything in the pattern space, use:
:a;N;$!ba;
To make "dot match newline" indirectly, you use:
(\n|.)
So the result is:
root#u1804:~# echo -e "one\ntwo\nthree" | sed -r ':a;N;$!ba;s/one(\n|.)*two/one/'
one
three
root#u1804:~#
Note that in this case, (\n|.) matches newline and all characters. See below example:
root#u1804:~# echo -e "oneXXXXXX\nXXXXXXtwo\nthree" | sed -r ':a;N;$!ba;s/one(\n|.)*two/one/'
one
three
root#u1804:~#

How to grep information?

What I have:
test
more text
#user653434 text and so
test
more text
#user9659333 text and so
I'd like to filter this text and finally get the following list as .txt file:
user653434
user9659333
It's important to get the names without "#" sign.
Thx for help ;)
Using grep -P (requires GNU grep):
$ grep -oP '(?<=#)\w+' File
user653434
user9659333
-o tells grep to print only the match.
-P tells grep to use Perl-style regular expressions.
(?<=#) tells sed that # must precede the match but the # is not included in the match.
\w+ matches one or more word characters. This is what grep will print.
To change the file in place with grep:
grep -oP '(?<=#)\w+' File >tmp && mv tmp File
Using sed
$ sed -En 's/^#([[:alnum:]]+).*/\1/p' File
user653434
user9659333
And, to change the file in place:
sed -En -i.bak 's/^#([[:alnum:]]+).*/\1/p' File
-E tells sed to use the extended form of regular expressions. This reduces the need to use escapes.
-n tells sed not to print anything unless we explicitly ask it to.
-i.bak tells sed to change the file in place while leaving a backup file with the extension .bak.
The leading s in s/^#([[:alnum:]]+).*/\1/p tells sed that we are using a substitute command. The command has the typical form s/old/new/ where old is a regular expression and sed replaces old with new. The trailing p is an option to the substitute command: the p tells sed to print the resulting line.
In our case, the old part is ^#([[:alnum:]]+).*. Starting from the beginning of the line, ^, this matches # followed by one or more alphanumeric characters, ([[:alnum:]]+), followed by anything at all, .*. Because the alphanumeric characters are placed in parens, this is saved as a group, denoted \1.
The new part of the substitute command is just \1, the alphanumeric characters from above which comprise the user name.
Here, the s indicates that we are using a sed substitute command. The usual form
With GNU grep:
grep -Po '^#\K[^ ]*' file
Output:
user653434
user9659333
See: The Stack Overflow Regular Expressions FAQ

Sed and dollar sign

"The Unix Programming Environment" states that '$' used in regular expression in sed means end-of-the line which is fine for me, because
cat file | sed 's/$/\n/'
is interpretted as "add newline at the end of each line".
The question arises, when I try to use the command:
cat file | sed '$d'
Shouldn't this line remove each line instead of the last one? In this context, dollar sign means end of the LAST line. What am I getting wrong?
$ is treated as regex anchor when used in pattern in s command e.g.
s/$/\n
However in $d, $ is not a regex anchor, it is address notation that means the last line of the input, which is deleted using the d command.
Also note that cat is unnecessary in your last command. It can be used as:
sed '$d' file
In the second usage, there is no regular expression. The $ there is an address, meaning the last line.
Note that regex in sed must be inside the delimiters(;,:, ~, etc) other than quotes.
/regex/
ex:
sed '/foo/s/bar/bux/g' file
or
~regex~
ex:
sed 's~dd~s~' file
but not 'regex'. So $ in '$d' won't be considered as regex by sed. '$d' acts like an address which points out the last line.

Resources