How to use sed command to change string in a file - windows

OS: Windows 10
Tool: git bash
I want to use sed command to change the version string in some files.
In git bash, I tried below command and it works.
$ sed -i 's/1.0.0.21/1.0.0.22/g' ../fossa/PluginManifest.xml
Then I put sed command in a script file, like below:
$ cat UpdateVersion.sh
echo $1
echo $2
sed -i 's/$1/$2/g' ../fossa/PluginManifest.xml
And then I execute below command:
$ source UpdateVersion.sh 1.0.0.21 1.0.0.22
1.0.0.21
1.0.0.22
When I check the file, I find the version string is not changed. Why?

In general, any and all input should validated, sanitized, and/or encoded as appropriate before use, especially for input being passed to a command/control interface, such as a shell-executed sed.
In your example, the following may be appropriate (with .s being escaped, as suggested by anubhava and mashuptwice, as unescaped .s instruct the regular expression engine to match any character):
1 if [[ !( $1 =~ ^[0-9.]+$ && $2 =~ ^[0-9.]+$ ) ]]; then
2 echo "Invalid version syntax" 1>&2
3 exit 1
4 fi
5
6 ver1=${1//./\\.}
7 ver2=${2//./\\.}
8
9 sed -i "s/\b$ver1\b/$ver2/" ../fossa/PluginManifest.xml
Note that, $ver1 is surrounded by \bs to ensure that sed matches on word boundaries (e.g., sed s/1/a/g would match on 1, (1), 12, 21, 121, and 212 (replacing all 1s with a), while sed s/\b1\b/a/g would only match on 1, (1), and other 1s with word boundaries on both sides.
Consider reviewing the manpages for bash and sed, as well as the regular expressions tutorial here: https://www.regular-expressions.info/.

Related

Bash script : in file replace characters with 'X" between two given strings using sed [duplicate]

This question already has answers here:
Substitute all characters between two strings by char 'X' using sed
(3 answers)
Closed 6 years ago.
file=mylog.log
search_str="&Name="
end_str="&"
sed -i -E ':a; s/('"$search_str"'X*)[^X'"$end_str"']/\1X/; ta' "$file"
Ex 1:
something&Name=JASON&else
to
something&Name=XXXXX&else
And actually, my current sed command works fine when instead of a '"$end_str"' if I use '&' character... Like this :
sed -i -E ':a; s/('"$search_str"'X*)[^X&]/\1X/; ta' "$file"
So, to summariz, it, after ^X if a single character comes than my given sed command works fine... But the same command does not work, if instead of character, i use a string...
For example, my sed command won't work in this case :
end_str="\%26"
sed -i -E ':a; s/('"$search_str"'X*)[^X'"$end_str"']/\1X/; ta' "$file"
Eg:
something&Name=JASON_MATTHEW_DONALD%26else
TO
something&Name=XXXXXXXXXXXXXXXXXXXX%26else
Eg:2
something&Name=JASON%26else
TO
something&Name=XXXXX%26else
Please let me know
Place your string variable outside the character class and capture it to check for further substitutions:
sed -i -E ':a; s/('"$search_str"'X*)[^X](.*'"$end_str"')/\1X\2/; ta' "$file"
If the number of X doesn't matter you can simplify it to:
search="Name"
sed "s/$search=[^&]*/$search=XXX/" input.file
This assumes that $search won't contain special characters which have a meaning in sed's regex syntax. If special characters can be a problem you need to prepare the $search variable, as explained here: Is it possible to escape regex metacharacters reliably with sed
I insist on the point that keeping the length of passwords in logs is a very bad idea (security wise).
Having said that:
First, a character list [...] is not the right tool to match an string. For that we need to use an alternating value (...|...).
end_str="(&|%26)"
But it is quite difficult to express a "not a string" in regex.
not_end_str="([^&]|[^%]|%[^2]|%2[^6])"
Using all that we may build a pure bash solution (maybe not fast, but works).
It prints to stdout to show how it works.
Redirect to a file to store the result.
file=mylog.log
search_str="&Name="
end_str="(&|%26)" # write end_str as an alternate value.
not_end_str="([^&]|[^%]|%[^2]|%2[^6])" # regex negate end_string.
# Build a regex that split each part of the text.
myreg="(.*${search_str}X*)(${not_end_str}*)([&%].*)"
while IFS=$'\n' read line; do
[[ $line =~ $myreg ]]
len=$((${#BASH_REMATCH[#]}-2)) # do not count [0] and last.
arr=("${BASH_REMATCH[#]:1}") # remove [0] and last.
arr[1]=${arr[1]//?/X} # Replace name with "X"'s.
arr[2]='' # Clear not_end_str match
printf '%s' "${arr[#]}"; echo # Print modified line.
done <"$file"
Further reading:
Regular expression to match line that doesn't contain a word?
Regular expression that doesn't contain certain string

Delete strings with non-Ukrainian characters bash

Using file structure
foo_11: "Марія"
foo_112: "Superman"
FOOTLONG: "Subway"
foo_13: "Юлія"
I want to remove all strings that don't have at least one character from Ukrainian alphabet.
Script:
for i in *.txt;
do
sed '/[^А-ЯЄЇІа-яєїі]+/d' $i >$i.out
mv $i.out $i
done
doesn't do anything. What is wrong?
Using mac bash.
Assuming that your character class defining Ukrainian letters is correct, the following should work:
sed '/[А-ЯЄЇІа-яєїі]/!d' file
[А-ЯЄЇІа-яєїі] matches a Ukrainian letter anywhere on the line.
Note that even the letters that look like ASCII letters A I a i are actually Ukrainian (Cyrillic) letters with Unicode codepoints U+410 U+406 U+430 U+456.
! negates the match, meaning that only lines not containing at least 1 Ukrainian letter match.
d deletes those lines.
To put it all together:
for f in *.txt; do
sed -i '' '/[А-ЯЄЇІа-яєїі]/!d' "$f" # -i '' is BSD Sed syntax; GNU sed takes just -i
done
As for what you've tried:
As #StefanHegny points out in a comment on the question, + isn't supported when sed is not run with -E in order to enable extended regular expressions; without -E, the cumbersome \{1,\} must be used. (\+ is only supported by GNU sed, not by the BSD version of sed that macOS comes with).
However, even the fixed version of your command, sed '/[^А-ЯЄЇІа-яєїі]\{1,\}/d', doesn't do what you want: it deletes all lines that contain at least one non-Ukrainian-letter character, which eliminates all of your input lines, given that they all have ASCII-based field names and contain :.
You should double-quote variable references such as $i to protect them from shell expansions: "$i"
BSD Sed does support in-place updating with -i, but - unlike GNU Sed - it requires that an empty option-argument (indicating that no backup of the input file should be made) be specified as a separate argument: -i ''.
Your write-to-a-temp-file-first-then-replace-the-original approach works too, but it's generally better to use the following idiom: sed ... file > file.tmp && mv file.tmp file. Separating the mv command with && ensures that the original file is only replaced if the sed command succeeded.
That said, that doesn't help with logic errors as in the case at hand: despite outputting nothing, sed reports success in this case.
This code would achieve what you want (if I understood your question correctly):
grep -i "Я\|Є\|Ї\|І" /folder/file >> /tmp/result
The result is stored on /tmp/result
Note: I don't know Ukranian, so I'm sure I did not included all Ukranian characters, please add/delete Ukranian characters you want to match to the construction above.
Note2: this code is case insensitive thanks to grep -i so you only need to add the character once (lowercase or capital).
To put it on your loop it could be:
for i in *.txt;
do
grep -i "Я\|Є\|Ї\|І" "$i" > "$i".out
mv "$i".out "$i"
done
Edit: I edited this answer to make it simpler, and to add a loop to it.

Delete lines where 3rd character equals a number

I have a consistent file with numbers like
0123456
0234566
.
.
.
etc
With bash tools, command line preferable, how can I remove each line if the third digit equals 2 .
eg, with cut -c3 I can get the correct digit but I cannot combine it effectively with sed or something similar. I am not looking for a pattern, only the 3rd digit.
(I have done it in a script in python but I was wondering how its done through a one-line bash command). Thank you!
EDIT: Additionally, if I want to delete the lines where the third digit NOT equals to 2 (opposite question)
You can just do this with sed
sed -i '/^..2/d' file
If you want to do the opposite you can do:
sed -i '/^..[^2]/d' file
since you are dealing with a specific character.
I would use awk:
$ awk -F "" '$3!=2' file
0234566
by setting the field separator to "" (empty, just valid on GNU-awk), every character is stored in a different field. Then, saying $3 != 2 checks if the 3rd character is not 2 and, if so, the line is printed.
Or with pure bash, using Using shell parameter expansion ${parameter:offset:length}:
while IFS= read -r line
do
[ "${line:2:1}" != "2" ] && echo "$line"
done < file

Using BASH, how to increment a number that uniquely only occurs once in most lines of an HTML file?

The target is always going to be between two characters, 'E' and '/' and there will never be but one occurrence of this combination, e.g. 'E01/' in most lines in the HTML file and will always be between '01' and '90'.
So, I need to programmatically read the file and replace each occurrence of 'Enn/' where 'nn' in 'Enn/' will be between '01' and '90' and must maintain the '0' for numbers '01' to '09' in 'Enn/' while incrementing the existing number by 1 throughout the HTML file.
Is this doable and if so how best to go about it?
Edit: Target lines will be in one or the other formats:
<DT>ProgramName
<DT>Program Name
You can use sed inside BASH as a fantastic one-liner, either:
sed -ri 's/(.*E)([0-9]{2})(\/.*)/printf "\1%02u\3" $((10#\2+(10#\2>=90?0:1)))/ge' FILENAME
or if you are guaranteed the number is lower than 100:
sed -ri 's/(.*E)([0-9]{2})(\/.*)/printf "\1%02u\3" $((10#\2+1)))/ge' FILENAME
Basically, you'll be doing inplace search and replace. The above will not add anything after 90 (since you didn't specify the exact nature of the overflow condition). So E89/ -> E90/, E90/ -> E90/, and if by chance you have E91/, it will remain E91/. Add this line inside a loop for multiple files
A small explanation of the above command:
-r states that you'll be using a regular expression
-i states to write back to the same file (be careful with overwriting!)
s/search/replace/ge this is the regex command you'll be using
s/ states you'll be using a string search
(.E) first grouping of all characters upto the first E (case sensitive)
([0-9]{2}) second grouping of numbers 0 through 9, repeated twice (fixed width)
(/.) third grouping getting the escaped trailing slash and everything after that
/ (slash separator) denotes end of search pattern and beginning of replacement pattern
printf "format" var this is the expression used for each replacement
\1 place first grouping found here
%02u the replace format for the var
\3 place third grouping found here
$((expression)) BASH arithmetic expression to use in printf format
10#\2 force second grouping as a base 10 number
+(10#\2>=90?0:1) add 0 or 1 to the second grouping based on if it is >= 90 (as used in first command)
+1 add 1 to the second grouping (see second command)
/ge flags for global replacement and the replace parameter will be an expression
GNU sed and awk are very powerful tools to do this sort of thing.
You can use the following perl one-liner to increment the numbers while maintaining the ones with leading 0s.
perl -pe 's/E\K([0-9]+)/sprintf "%02d", 1+$1/e' file
$ cat file
<DT>ProgramName
<DT>Program Name
<DT>Program Name
<DT>Program Name
$ perl -pe 's/E\K([0-9]+)/sprintf "%02d", 1+$1/e' file
<DT>ProgramName
<DT>Program Name
<DT>Program Name
<DT>Program Name
You can add the -i option to make changes in-place. I would recommend creating backup before doing so.
Not as elegant as one line sed!
Break the commands used into multiple commands and you can debug your bash or grep or sed.
# find the number
# use -o to grep to just return pattern
# use head -n1 for safety to just get 1 number
n=$(grep -o "E[0-9][0-9]\/" file.html |grep -o "[0-9][0-9]"|head -n1)
#octal 08 and 09 are problem so need to do this
n1=10#$n
echo Debug n1=$n1 n=$n
n2=n1
# bash arithmetic done inside (( ))
# as ever with bash bracketing whitespace is needed
(( n2++ ))
echo debug n2=$n2
# use sed with -i -e for inline edit to replace number
sed -ie "s/E$n\//E$(printf '%02d' $n2)\//" file.html
grep "E[0-9][0-9]" file.html
awk might be better. Maybe could do it in one awk command also.
The sed one-liner in other answer is awesome :-)
This works in bash or sh.
http://unixhelp.ed.ac.uk/CGI/man-cgi?grep

Grep characters before and after match?

Using this:
grep -A1 -B1 "test_pattern" file
will produce one line before and after the matched pattern in the file. Is there a way to display not lines but a specified number of characters?
The lines in my file are pretty big so I am not interested in printing the entire line but rather only observe the match in context. Any suggestions on how to do this?
3 characters before and 4 characters after
$> echo "some123_string_and_another" | grep -o -P '.{0,3}string.{0,4}'
23_string_and
grep -E -o ".{0,5}test_pattern.{0,5}" test.txt
This will match up to 5 characters before and after your pattern. The -o switch tells grep to only show the match and -E to use an extended regular expression. Make sure to put the quotes around your expression, else it might be interpreted by the shell.
You could use
awk '/test_pattern/ {
match($0, /test_pattern/); print substr($0, RSTART - 10, RLENGTH + 20);
}' file
You mean, like this:
grep -o '.\{0,20\}test_pattern.\{0,20\}' file
?
That will print up to twenty characters on either side of test_pattern. The \{0,20\} notation is like *, but specifies zero to twenty repetitions instead of zero or more.The -o says to show only the match itself, rather than the entire line.
I'll never easily remember these cryptic command modifiers so I took the top answer and turned it into a function in my ~/.bashrc file:
cgrep() {
# For files that are arrays 10's of thousands of characters print.
# Use cpgrep to print 30 characters before and after search pattern.
if [ $# -eq 2 ] ; then
# Format was 'cgrep "search string" /path/to/filename'
grep -o -P ".{0,30}$1.{0,30}" "$2"
else
# Format was 'cat /path/to/filename | cgrep "search string"
grep -o -P ".{0,30}$1.{0,30}"
fi
} # cgrep()
Here's what it looks like in action:
$ ll /tmp/rick/scp.Mf7UdS/Mf7UdS.Source
-rw-r--r-- 1 rick rick 25780 Jul 3 19:05 /tmp/rick/scp.Mf7UdS/Mf7UdS.Source
$ cat /tmp/rick/scp.Mf7UdS/Mf7UdS.Source | cgrep "Link to iconic"
1:43:30.3540244000 /mnt/e/bin/Link to iconic S -rwxrwxrwx 777 rick 1000 ri
$ cgrep "Link to iconic" /tmp/rick/scp.Mf7UdS/Mf7UdS.Source
1:43:30.3540244000 /mnt/e/bin/Link to iconic S -rwxrwxrwx 777 rick 1000 ri
The file in question is one continuous 25K line and it is hopeless to find what you are looking for using regular grep.
Notice the two different ways you can call cgrep that parallels grep method.
There is a "niftier" way of creating the function where "$2" is only passed when set which would save 4 lines of code. I don't have it handy though. Something like ${parm2} $parm2. If I find it I'll revise the function and this answer.
With gawk , you can use match function:
x="hey there how are you"
echo "$x" |awk --re-interval '{match($0,/(.{4})how(.{4})/,a);print a[1],a[2]}'
ere are
If you are ok with perl, more flexible solution : Following will print three characters before the pattern followed by actual pattern and then 5 character after the pattern.
echo hey there how are you |perl -lne 'print "$1$2$3" if /(.{3})(there)(.{5})/'
ey there how
This can also be applied to words instead of just characters.Following will print one word before the actual matching string.
echo hey there how are you |perl -lne 'print $1 if /(\w+) there/'
hey
Following will print one word after the pattern:
echo hey there how are you |perl -lne 'print $2 if /(\w+) there (\w+)/'
how
Following will print one word before the pattern , then the actual word and then one word after the pattern:
echo hey there how are you |perl -lne 'print "$1$2$3" if /(\w+)( there )(\w+)/'
hey there how
If using ripgreg this is how you would do it:
grep -E -o ".{0,5}test_pattern.{0,5}" test.txt
You can use regexp grep for finding + second grep for highlight
echo "some123_string_and_another" | grep -o -P '.{0,3}string.{0,4}' | grep string
23_string_and
With ugrep you can specify -ABC context with option -o (--only-matching) to show the match with extra characters of context before and/or after the match, fitting the match plus the context within the specified -ABC width. For example:
ugrep -o -C30 pattern testfile.txt
gives:
1: ... long line with an example pattern to match. The line could...
2: ...nother example line with a pattern.
The same on a terminal with color highlighting gives:
Multiple matches on a line are either shown with [+nnn more]:
or with option -k (--column-number) to show each individually with context and the column number:
The context width is the number of Unicode characters displayed (UTF-8/16/32), not just ASCII.
I personally do something similar to the posted answers.. but since the dot key, like any keyboard key, can be tapped or held down.. and I often don't need a lot of context(if I needed more I might do the lines like grep -C but often like you I don't want lines before and after), so I find it much quicker for entering the command, to just tap the dot key for how many dots / how many characters, if it's a few then tapping the key, or hold it down for more.
e.g. echo zzzabczzzz | grep -o '.abc..'
Will have the abc pattern with one dot before and two after. ( in regex language, Dot matches any character). Others used dot too but with curly braces to specify repetition.
If I wanted to be strict re between (0 or x) characters and exactly y characters, then i'd use the curlies.. and -P, as others have done.
There is a setting re whether dot matches new line but you can look into that if it's a concern/interest.

Resources