sed: remove parentheses from string - bash

I am using a Mac.
I'm trying to remove all parentheses, ( and ), from a string using sed.
Input: this string contains (parentheses)
Desired output: this string contains parentheses
I've tried:
sed -E 's/[\)\(]//g'
but whether I escape the parentheses or not, I still only get a match (and consequently removal) for the first one.
EDIT: the problem was with the input string:
A close paren is ASCII 41, whereas my input has ASCII 239 which explains what's failing. Even more confusingly this equates to an acute accent. Closer examination shows that the ) can't be selected without the following 'space'.

tr with the -d (delete) flag is my goto for removing one or more characters. From the man page:
The tr utility copies the standard input to the standard output with substitution or deletion of selected characters.
echo -n 'this string contains (parentheses)' | tr -d '()'
# this string contains parentheses

Just DON'T use backslashes (typing \( meta-fies the paren) or -Extended pattern matching (which would then require the backslash to UN-meta-fy).
$: echo "this string contains (parentheses)" | sed 's/[)(]//g'
this string contains parentheses

Related

Replace exact matching word containing special character

I am trying to replace a word in a string which contains same word with special character in it.
Example:
string="this is a joke. this is a poor-joke. this is a joke-club"
I just want to replace the word joke with coke, not with the special character.
below command replaces all the word joke.
[chandu#mynode ~]$ echo $string | sed "s/joke/coke/g;"
this is a coke. this is a poor-coke. this is a coke-club
I tried using sed "s/\<joke\>/coke/g;"
but even this replaces all the words
Expected output:
this is a coke. this is a poor-joke. this is a joke-club
You can match beginning and ending of the word yourself if you want to include - as word character.
$ sed 's/\(^\|[^a-zA-Z-]\)joke\([^a-zA-Z-]\|$\)/\1coke\2/g' <<<"$string"
this is a coke. this is a poor-joke. this is a joke-club
Using perl and look-around to detect favorable leading (space) and trailing (space or period) characters around the word joke:
$ echo $string | perl -p -e 's/(?<=[ ])joke(?=[. ])/coke/g'
Output.
this is a coke. this is a poor-joke. this is a joke-club
Unfortunately in your case, the hyphen separates the string into different words.
i.e.: if I change your string to:
string='this is a joke. this is a poorjoke. this is a jokeclub'
and I'm running the command:
echo $string | sed 's/\bjoke\b/coke/g'
(where \b stands for: word boundary), I get the following result:
this is a coke. this is a poorjoke. this is a jokeclub
But when I'm applying this same command on your string, I get (as you do):
this is a coke. this is a poor-coke. this is a coke-club
So, in your particular case, I'd try something like:
echo $string | sed 's/\([^-]\)\(joke\)\([^-]\)/\1coke\3/g'
Which produces the following result:
this is a coke. this is a poor-joke. this is a joke-club

Remove word from url

I Need to remove /%(tenant_id)s from this source:
https://ext.an1.test.dev:8776/v3/%(tenant_id)s
To make it look like this:
https://ext.an1.test.dev:8776/v3
I'm trying through sed, but unsuccessfully.
curl ....... | jq -r .endpoints[].url | grep '8776/v3' | sed -e 's/[/%(tenant_id)s] //g'
I get it again:
https://ext.an1.test.dev:8776/v3/%(tenant_id)s
You seem to be confused about the meaning of square brackets.
curl ....... |
jq -r '.endpoints[].url' |
sed -n '\;8776/v3;s;/%.*;;p'
fixes the incorrect regex, loses the useless grep, and somewhat simplifies the processing by switching to a different delimiter. To protect against (fairly unlikely) shell wildcard matches on the text in the jq search expression, I also added single quotes around that.
In some more detail, sed -n avoids printing input lines, and the address expression \;8776/v3; selects only input lines which match the regex 8776/v3; we use ; as the delimiter around the regex, which (somewhat obscurely) requires the starting delimiter to be backslashed. Then, we perform the substitution: again, we use ; as the delimiter so that slashes and percent signs in the regex do not need to be escaped. The p flag on the substitution causes sed to print lines where the substitution was performed successfully; we remove the g flag, as we don't expect more than one match per input line. The substitution replaces everything after the first occurrence of /% with nothing.
(Equivalently, with slash delimiters, you would have to backslash all literal slashes: sed -n '/8776\/v3/s/\/%.*//p'.)
For the record, square brackets in regular expressions form a character class; the expression [abc] matches a single character which can be one of a, b, or c. Perhaps review the tips on the Stack Overflow regex tag info page for a quick rerun on this and other common beginner mistakes.
Besides the incorrect square brackets, your regex specified a space after s, which is unlikely to be there. Other than that, your regex should work fine if you are sure the string you want to remove is always exactly /%(tenant_id)s. (Many regex dialects require round parentheses to be escaped, but sed without -E or -r is not one of those.)
If you've managed to get the address into a variable then one parameter expansion idea:
$ myaddr='https://ext.an1.test.dev:8776/v3/%(tenant_id)s'
$ echo "${myaddr%/*}"
https://ext.an1.test.dev:8776/v3
$ mynewaddr="${myaddr%/*}"
$ echo "${mynewaddr}"
https://ext.an1.test.dev:8776/v3

grep for a variable content with a dot

i found many similar questions about my issue but i still don't find the correct one for me.
I need to grep for the content of a variable plus a dot but it doesn't run escaping the dot after the variable. For example:
The file content is
item.
newitem.
My variable content is item. and i want to grep for the exact word, therefore I must use -w and not -F but with the command I can't obtain the correct output:
cat file | grep -w "$variable\."
Do you have suggestions please?
Hi, I have to rectify my scenario. My file contains some FQDN and for some reasons I have to look for hostname. with the dot.
Unfortunatelly the grep -wF doesn't run:
My file is
hostname1.domain.com
hostname2.domain.com
and the command
cat file | grep -wF hostname1.
doesn't show any output. I have to find another solution and I'm not sure that grep could help.
If $variable contains item., you're searching for item.\. which is not what you want. In fact, you want -F which interprets the pattern literally, not as a regular expression.
var=item.
echo $'item.\nnewitem.' | grep -F "$var"
Try:
grep "\b$word\."
\b: word boundary
\.: the dot itself is a word boundary
Following awk solution may help you in same.
awk -v var="item." '$0==var' Input_file
You are dereferencing variable and append \. to it, which results in calling
cat file | grep -w "item.\.".
Since grep accepts files as parameter, calling grep "item\." file should do.
from man grep
-w, --word-regexp
Select only those lines containing matches that form whole words. The test is that the matching substring must either be at the beginning of the line, or preceded by a non-word constituent
character. Similarly, it must be either at the end of the line or followed by a non-word constituent character. Word-constituent characters are letters, digits, and the underscore.
and
The Backslash Character and Special Expressions
The symbols \< and \> respectively match the empty string at the beginning and end of a word. The symbol \b matches the empty string at the edge of a word, and \B matches the empty string
provided it's not at the edge of a word. The symbol \w is a synonym for [[:alnum:]] and \W is a synonym for [^[:alnum:]].
as the last character is a . it must be followed by a non word [A-Za-z0-9_] however the next character is d
grep '\<hostname1\.'
should work as \< ensures previous chracter is not a word constituent.
You can dynamically construct the search pattern and then call grep
rexp='^hostname1\.'
grep "$rexp" file.txt
The single quotes tell bash not to interpret special characters in the variable. Double quotes tell bash to allow replacing $rexp with its value. The caret ( ^ ) in the expression tells grep to look for lines starting with 'hostname1.'

search for all string literals in an xcode project using the command line

I know I can use command + shift + f to search an xcode project, and that I can choose to search using Regex in the "Search criterial Bar" as shown here & find the string literals in the project using the regex (\"[\w\s]+\"). But, I'm trying to automate this process in a Bash scrip. So my question is, how can I performing a similar search using the command line that outputs a list of all the string literals in my project. I've been messing around with grep without much success.
You may use a grep command like
grep -o '"[^"\]*\(\\.[^"\]*\)*"' file > outfile
See an online demo:
s='"String1 \"here\""
"String2"'
grep -o '"[^"\]*\(\\.[^"\]*\)*"' <<< "$s";
The "[^"\]*\(\\.[^"\]*\)*" POSIX BRE pattern matches ", then any 0+ chars other than \ and ", then any 0+ occurrences of \ and then any char followed with any 0+ chars other than \ and " and then a " char.
If you can use a PCRE enabled grep, you may use a safer expression that will only start matching at the first unescaped double quote:
grep -oP '(?<!\\)(?:\\{2})*\K"[^"\\]*(?:\\.[^"\\]*)*"' file > outfile
Here, "[^"\\]*(?:\\.[^"\\]*)*" is the PCRE equivalent of the above POSIX BRE pattern and (?<!\\)(?:\\{2})*\K makes sure the first " is not escaped: (?<!\\) matches a location not preceded with \, then (?:\\{2})* matches 0 or more occurrences of double backslashes and then \K omits the text matched so far.
See an online demo:
s='A culprit \" escaped quote and "String1 \"here\""
"String2"'
grep -oP '(?<!\\)(?:\\{2})*\K"[^"\\]*(?:\\.[^"\\]*)*"' <<< "$s";
Both yield the following output:
"String1 \"here\""
"String2"

Bash: remove semicolons from a line in a CSV-file

I've a CSV-file with a few hundred lines and a lot (not all) of these lines contains data (Klas/Lesgroep:;;T2B1) which I want to extract.
i.e. ;;;;;;Klas/Lesgroep:;;T2B1;;;;;;;;;;
I want to delete the semicolons which are in front of Klas/Lesgroep but the number of semicolons is variable. How can I delete these semicolons in Bash ?
I'm not a native speaking Englishman so I hope it's clear to you
To remove any nonempty run of ; chars. that come directly before literal Klas/Lesgroep:
With GNU or BSD/macOS sed:
$ sed -E 's|;+(Klas/Lesgroep)|\1|' <<< ";;;;;;Klas/Lesgroep:;;T2B1;;;;;;;;;;"
Klas/Lesgroep:;;T2B1;;;;;;;;;;
The s function performs string substitution (replacement):
The 1st argument is a regex (regular expression) that specifies what part of the line to match,
and the 2nd arguments specifies what to replace the matching part with.
Note how I've chosen | as the regex/argument delimiter instead of the customary /, because that allows unescaped use of / chars. inside the regex.
;+ matches one or more directly adjacent ; chars.
(Klas/Lesgroep) matches literal Klas/Lesgroep and by enclosing it in (...) - making it a capture group - the match is remembered and can be referenced as \1 - the 1st capture group in the regex - in the replacement argument to s.
The net effect is that all ; chars. directly preceding Klas/Lesgroep are removed.
POSIX-compliant form:
$ sed 's|;\{1,\}\(Klas/Lesgroep\)|\1|' <<< ";;;;;;Klas/Lesgroep:;;T2B1;;;;;;;;;;"
Klas/Lesgroep:;;T2B1;;;;;;;;;;
POSIX requires the less powerful and antiquated BRE syntax, where duplication symbol + must be emulated as \{1,\}, and, generally, metacharacters (, ), {, } must be \-escaped.
With sed you can search for lines starting with at least one semi-colon followed by Klas/Lesgroep and, if found, substitute leading ; with nothing:
$ sed '/;;*Klas\/Lesgroep/s/^;*//g' <<< ";;;;;;Klas/Lesgroep:;;T2B1;;;;;;;;;;"
Klas/Lesgroep:;;T2B1;;;;;;;;;;
To remove all ";" from a file , we can use sed command . sed is used for modifying the files.
$ sed 's/find/replace/g' file
The substitute flag /g (global replacement) specifies the sed command to replace all the occurrences of the string in the line.
So to remove ";" just find and replace it with nothing.
sed 's/;//g' file.csv

Resources