Delete unknown amount of regexps using sed - bash

I'm trying to get a bunch of regular expressions for a file (one per line) and then fit those regexps into something like this /$regexp/d . I'm trying it this way:
while read line;do sed "/$line/d" to_delete.file >> output;done < to_delete.txt
But it says me 'unknown command', even if I change the delimiter.
--- EDIT
The to_delete.txt file has slashes but i'm already scraping them and that's where i find the error.

To avoid problem with / in regex sed is allow to use another separator, so you can use e.g. sed "\|$line|d".
Secondary if you put script into double-quotes you shoud add space between address range and action e.g. "\|$line| d"
But I see a general mistake in the script. The loop will print into output all to_delete.file (exept 1 line with regexp) by each loop. I suppose it is not the thing what OP wants.
If you'd like to exclude content of to_delete.txt from to_delete.file it can be easy done by grep
grep -vFf "to_delete.txt" "to_delete.file" > output

Related

Append text to top of file using sed doesn't work for variable whose content has "/" [duplicate]

This question already has answers here:
Using different delimiters in sed commands and range addresses
(3 answers)
Closed 1 year ago.
I have a Visual Studio project, which is developed locally. Code files have to be deployed to a remote server. The only problem is the URLs they contain, which are hard-coded.
The project contains URLs such as ?page=one. For the link to be valid on the server, it must be /page/one .
I've decided to replace all URLs in my code files with sed before deployment, but I'm stuck on slashes.
I know this is not a pretty solution, but it's simple and would save me a lot of time. The total number of strings I have to replace is fewer than 10. A total number of files which have to be checked is ~30.
An example describing my situation is below:
The command I'm using:
sed -f replace.txt < a.txt > b.txt
replace.txt which contains all the strings:
s/?page=one&/pageone/g
s/?page=two&/pagetwo/g
s/?page=three&/pagethree/g
a.txt:
?page=one&
?page=two&
?page=three&
Content of b.txt after I run my sed command:
pageone
pagetwo
pagethree
What I want b.txt to contain:
/page/one
/page/two
/page/three
The easiest way would be to use a different delimiter in your search/replace lines, e.g.:
s:?page=one&:pageone:g
You can use any character as a delimiter that's not part of either string. Or, you could escape it with a backslash:
s/\//foo/
Which would replace / with foo. You'd want to use the escaped backslash in cases where you don't know what characters might occur in the replacement strings (if they are shell variables, for example).
The s command can use any character as a delimiter; whatever character comes after the s is used. I was brought up to use a #. Like so:
s#?page=one&#/page/one#g
A very useful but lesser-known fact about sed is that the familiar s/foo/bar/ command can use any punctuation, not only slashes. A common alternative is s#foo#bar#, from which it becomes obvious how to solve your problem.
add \ before special characters:
s/\?page=one&/page\/one\//g
etc.
In a system I am developing, the string to be replaced by sed is input text from a user which is stored in a variable and passed to sed.
As noted earlier on this post, if the string contained within the sed command block contains the actual delimiter used by sed - then sed terminates on syntax error. Consider the following example:
This works:
$ VALUE=12345
$ echo "MyVar=%DEF_VALUE%" | sed -e s/%DEF_VALUE%/${VALUE}/g
MyVar=12345
This breaks:
$ VALUE=12345/6
$ echo "MyVar=%DEF_VALUE%" | sed -e s/%DEF_VALUE%/${VALUE}/g
sed: -e expression #1, char 21: unknown option to `s'
Replacing the default delimiter is not a robust solution in my case as I did not want to limit the user from entering specific characters used by sed as the delimiter (e.g. "/").
However, escaping any occurrences of the delimiter in the input string would solve the problem.
Consider the below solution of systematically escaping the delimiter character in the input string before having it parsed by sed.
Such escaping can be implemented as a replacement using sed itself, this replacement is safe even if the input string contains the delimiter - this is since the input string is not part of the sed command block:
$ VALUE=$(echo ${VALUE} | sed -e "s#/#\\\/#g")
$ echo "MyVar=%DEF_VALUE%" | sed -e s/%DEF_VALUE%/${VALUE}/g
MyVar=12345/6
I have converted this to a function to be used by various scripts:
escapeForwardSlashes() {
# Validate parameters
if [ -z "$1" ]
then
echo -e "Error - no parameter specified!"
return 1
fi
# Perform replacement
echo ${1} | sed -e "s#/#\\\/#g"
return 0
}
this line should work for your 3 examples:
sed -r 's#\?(page)=([^&]*)&#/\1/\2#g' a.txt
I used -r to save some escaping .
the line should be generic for your one, two three case. you don't have to do the sub 3 times
test with your example (a.txt):
kent$ echo "?page=one&
?page=two&
?page=three&"|sed -r 's#\?(page)=([^&]*)&#/\1/\2#g'
/page/one
/page/two
/page/three
replace.txt should be
s/?page=/\/page\//g
s/&//g
please see this article
http://netjunky.net/sed-replace-path-with-slash-separators/
Just using | instead of /
Great answer from Anonymous. \ solved my problem when I tried to escape quotes in HTML strings.
So if you use sed to return some HTML templates (on a server), use double backslash instead of single:
var htmlTemplate = "<div style=\\"color:green;\\"></div>";
A simplier alternative is using AWK as on this answer:
awk '$0="prefix"$0' file > new_file
You may use an alternative regex delimiter as a search pattern by backs lashing it:
sed '\,{some_path},d'
For the s command:
sed 's,{some_path},{other_path},'

Use awk to extract value from a line

I have these two lines within a file:
<first-value system-property="unique.setting.limit">3</first-value>
<second-value-limit>50000</second-value-limit>
where I'd like to get the following as output using awk or sed:
3
50000
Using this sed command does not work as I had hoped, and I suspect this is due to the presence of the quotes and delimiters in my line entry.
sed -n '/WORD1/,/WORD2/p' /path/to/file
How can I extract the values I want from the file?
awk -F'[<>]' '{print $3}' input.txt
input.txt:
<first-value system-property="unique.setting.limit">3</first-value>
<second-value-limit>50000</second-value-limit>
Output:
3
50000
sed -e 's/[a-zA-Z.<\/>= \-]//g' file
Using sed:
sed -E 's/.*limit"*>([0-9]+)<.*/\1/' file
Explanation:
.* takes care of everything that comes before the string limit
limit"* takes care of both the lines, one with limit" and the other one with just limit
([0-9]+) takes care of matching numbers and only numbers as stated in your requirement.
\1 is actually a shortcut for capturing pattern. When a pattern groups all or part of its content into a pair of parentheses, it captures that content and stores it temporarily in memory. For more details, please refer https://www.inkling.com/read/introducing-regular-expressions-michael-fitzgerald-1st/chapter-4/capturing-groups-and
The script solution with parameter expansion:
#!/bin/bash
while read line || test -n "$line" ; do
value="${line%<*}"
printf "%s\n" "${value##*\>}"
done <"$1"
output:
$ ./ltags.sh dat/ltags.txt
3
50000
Looks like XML to me, so assuming it forms part of some valid XML, e.g.
<root>
<first-value system-property="unique.setting.limit">3</first-value>
<second-value-limit>50000</second-value-limit>
</root>
You can use Perl's XML::Simple and do something like this:
perl -MXML::Simple -E '$xml = XMLin("file"); say $xml->{"first-value"}->{"content"}; say $xml->{"second-value-limit"}'
Output:
3
50000
If the XML structure is more complicated, then you may have to drill down a bit deeper to get to the values you want. If that's the case, you should edit the question to show the bigger picture.
Ashkan's awk solution is straightforward, but let me suggest a sed solution that accepts non-integer numbers:
sed -n 's/[^>]*>\([.[:digit:]]*\)<.*/\1/p' input.txt
This extracts the number between the first > character of the line and the following <. In my RE this "number" can be the empty string, if you don't want to accept an empty string please add the -r option to sed and replace \([.[:digit:]]*\) by ([.[:digit:]]+).

How to parse a config file using sed

I've never used sed apart from the few hours trying to solve this. I have a config file with parameters like:
test.us.param=value
test.eu.param=value
prod.us.param=value
prod.eu.param=value
I need to parse these and output this if REGIONID is US:
test.param=value
prod.param=value
Any help on how to do this (with sed or otherwise) would be great.
This works for me:
sed -n 's/\.us\././p'
i.e. if the ".us." can be replaced by a dot, print the result.
If there are hundreds and hundreds of lines it might be more efficient to first search for lines containing .us. and then do the string replacement... AWK is another good choice or pipe grep into sed
cat INPUT_FILE | grep "\.us\." | sed 's/\.us\./\./g'
Of course if '.us.' can be in the value this isn't sufficient.
You could also do with with the address syntax (technically you can embed the second sed into the first statement as well just can't remember syntax)
sed -n '/\(prod\|test\).us.[^=]*=/p' FILE | sed 's/\.us\./\./g'
We should probably do something cleaner. If the format is always environment.region.param we could look at forcing this only to occur on the text PRIOR to the equal sign.
sed -n 's/^\([^,]*\)\.us\.\([^=]\)=/\1.\2=/g'
This will only work on lines starting with any number of chars followed by '.' then 'us', then '.' and then anynumber prior to '=' sign. This way we won't potentially modify '.us.' if found within a "value"

How to append to specific lines in a flat file using shell script

I have a flat file that contains something like this:
11|30646|654387|020751520
11|23861|876521|018277154
11|30645|765418|016658304
Using shell script, I would like to append a string to certain lines in this file, if those lines contain a specific string.
For example, in the above file, for lines containing 23861, I would like to append a string "Processed" at the end, so that the file becomes:
11|30646|654387|020751520
11|23861|876521|018277154|Processed
11|30645|765418|016658304
I could use sed to append the string to all lines in the file, but how do I do it for specific lines ?
I'd do it this way
sed '/\|23861\|/{s/$/|Something/;}' file
This is similar to Marcelo's answer but doesn't require extended expressions and is, I think, a little cleaner.
First, match lines having 23861 between pipes
/\|23861\|/
Then, on those lines, replace the end-of-line with the string |Something
{s/$/|Something/;}
If you want to do more than one of these you could simply list them
sed '/\|23861\|/{s/$/|Something/;};/\|30645\|/{s/$/|SomethingElse/;}' file
Use the following awk-script:
$ awk '/23861/ { $0=$0 "|Processed" } {print}' input
11|30646|654387|020751520
11|23861|876521|018277154|Processed
11|30645|765418|016658304
or, using sed:
$ sed 's/\(.*23861.*$\)/\1|Processed/' input
11|30646|654387|020751520
11|23861|876521|018277154|Processed
11|30645|765418|016658304
Use the substitution command:
sed -i~ -E 's/(\|23861\|.*)/\1|Processed/' flat.file
(Note: the -i~ performs the substitution in-place. Just leave it out if you don't want to modify the original file.)
You can use the shell
while read -r line
do
case "$line" in
*23681*) line="$line|Processed";;
esac
echo "$line"
done < file > tempo && mv tempo file
sed is just a stream version of ed, which has a similar command set but was designed to edit files in place (allegedly interactively, but you wouldn't want to use it that way unless all you had was one of these). Something like
field_2_value=23861
appended_text='|processed'
line_match_regex="^[^|]*|$field_2_value|"
ed "$file" <<EOF
g/$line_match_regex/s/$/$appended_text/
wq
EOF
should get you there.
Note that the $ in .../s/$/... is not expanded by the shell, as are $line_match_regex and $appended_text, because there's no such thing as $/ - instead it's passed through as-is to ed, which interprets it as text to substitute ($ being regex-speak for "end of line").
The syntax to do the same job in sed, should you ever want to do this to a stream rather than a file in place, is very similar except that you don't need the leading g before the regex address:
sed -e "/$line_match_regex/s/$/$appended_text/" "$input_file" >"$output_file"
You need to be sure that the values you put in field_2_value and appended_text never contain slashes, because ed's g and s commands use those for delimiters.
If they might do, and you're using bash or some other shell that allows ${name//search/replace} parameter expansion syntax, you could fix them up on the fly by substituting \/ for every / during expansion of those variables. Because bash also uses / as a substitution delimiter and also uses \ as a character escape, this ends up looking horrible:
appended_text='|n/a'
ed "$file" <<EOF
g/${line_match_regex//\//\\/}/s/$/${appended_text//\//\\/}/
wq
EOF
but it does work. Nnote that both ed and sed require a trailing / after the replacement text in s/search/replace/ while bash's ${name//search/replace} syntax doesn't.

How to use sed to test and then edit one line of input?

I want to test whether a phone number is valid, and then translate it to a different format using a script. This far I can test the number like this:
sed -n -e '/(0..)-...\s..../p' -e '/(0..)-...-..../p'
However, I don't just want to test the number and output it, I would like to remove the brackets, dashes and spaces and output that.
Is there any way to do that using sed? Or should I be using something else, like AWK?
I'm not sure why you're using a 0 in that position. You're saying "a zero followed by any two characters" in the area code position. Is that really what you mean?
Anyway, you want to use the sed substitution operator with the p command in conjunction with the -n switch. Here's one way to do it:
sed -n 's/(\([0-9][0-9][0-9]\))\s\?\([0-9][0-9][0-9]\)[- ]\([0-9][0-9][0-9][0-9]\)/\1\2\3/p'
You can also use something as simple as egrep to validate lines and tr to remove the characters you don't want to see:
egrep "\([0-9]+\)[0-9.-]+" <file> |tr -d '()\-'
Note that it will only work if you don't want to keep any of those characters.
This is a more succinct version of Jonathan Feinberg's answer. It uses extended regular expressions to avoid having to do all the escaping that the curly braces would require (in addition to moving the escaping of parentheses from the special ones to the literal ones).
sed -r 's/\(([[:digit:]]{3})\)\s?([[:digit:]]{3})[ -]([[:digit:]]{4})/\1\2\3/'
this suggestion depends on how your number format looks like , for example, i assume phone number like this
echo "(703) 234 5678" | awk '
{
for(i=1;i<=NF;i++){
gsub(/\(|\)/,"",$i) # remove ( and )
if ($i+0>=0 ){ # check if it more than 0 and a number
print $i
}
if (){
# some other checks
}
}
}
'
do it systematically, and you don't have to waste time crafting out complex regex

Resources