Find and divide all numbers in a file using sed - macos

I am trying to find all numbers in a json file and replace them with a half value of the original number using sed on mac. For example, here I search for 2010 and replace it with 1005:
file="data.json"
sed -i '' -E 's,([^0-9]|^)2010([^0-9]|$),\1 1005\2,g' "$file"
I would like to find all number instances, and replace them with half values of themselves. It would need to work on decimals, eg: 2009 would become 1004.5, 10.5 would become 5.25.
I'm aware this could take each individual number character, so perhaps it would need to find numbers with non-numerical characters either side of it.
edit: I would like it to be flexible and work on all forms of text files, not just JSON files. (.txt, .html, .rtf etc...)

You may use Perl with a regex with e modifier:
perl -pe 's{(?<!\d)(\d+(?:\.\d+)?)(?!\d)}{$1/2}ge' file
To modify the file inline, add -i option:
perl -i -pe 's{(?<!\d)(\d+(?:\.\d+)?)(?!\d)}{$1/2}ge' file
perl -pi.bak -e 's{(?<!\d)(\d+(?:\.\d+)?)(?!\d)}{$1/2}ge' file # To save a backup of the original file
See the online demo:
s="abc_2010_and+2009+or-10.5"
perl -pe 's{(?<!\d)(\d+(?:\.\d+)?)(?!\d)}{$1/2}ge' <<< "$s"
# => abc_1005_and+1004.5+or-5.25
The (?<!\d)(\d+(?:\.\d+)?)(?!\d) regex matches
(?<!\d) - no digit immediately to the left is allowed
(\d+(?:\.\d+)?) - Group 1 ($1): 1+ digits followed with an optional sequence of . and 1+ digits
(?!\d) - no digit immediately to the right is allowed.
The RHS - $1/2 - is an expression that divides the Group 1 value with 2. It is achieved through adding e modifier at the end of the regex.

With GNU awk for multi-char RS and RT it'd just be:
awk -v RS='[0-9]+([.][0-9]+)?' -v ORS= 'RT{$0=$0 RT/2} 1'
e.g borrowing #Wiktors example:
$ s="abc_2010_and+2009+or-10.5"
$ awk -v RS='[0-9]+([.][0-9]+)?' -v ORS= 'RT{$0=$0 RT/2} 1' <<< "$s"
abc_1005_and+1004.5+or-5.25
If you want to overwrite an input file then add -i inplace:
awk -i inplace -v RS...1' file

Related

How to extract only the English words and leaving the Devanagari words in bash script?

The text file is like this,
#एक
1के
अंकगणित8IU
अधोरेखाunderscore
$thatऔर
%redएकyellow
$चिह्न
अंडरस्कोर#_
The desired text file should be like,
#
1
8IU
underscore
$that
%redyellow
$
#_
This is what I have tried so far, using awk
awk -F"[अ-ह]*" '{print $1}' filename.txt
And the output that I am getting is,
#
1
$that
%red
$
and using this awk -F"[अ-ह]*" '{print $1,$2}' filename.txt and I am getting an output like this,
#
1 े
ं
ो
$that
%red yellow
$ ि
ं
Is there anyway to solve this in bash script?
Using perl:
$ perl -CSD -lpe 's/\p{Devanagari}+//g' input.txt
#
1
8IU
underscore
$that
%redyellow
$
#_
-CSD tells perl that standard streams and any opened files are encoded in UTF-8. -p loops over input files printing each line to standard output after executing the script given by -e. If you want to modify the file in place, add the -i option.
The regular expression matches any codepoints assigned to the Devanagari script in the Unicode standard and removes them. Use \P{Devanagari} to do the opposite and remove the non-Devanagari characters.
Using awk you can do:
awk '{sub(/[^\x00-\x7F]+/, "")} 1' file
#
1
8IU
underscore
$that
%redyellow
See documentation: https://www.gnu.org/software/gawk/manual/html_node/Bracket-Expressions.html
using [\x00-\x7F].
This matches all values numerically between zero and 127, which is the defined range of the ASCII character set. Use a complemented character list [^\x00-\x7F] to match any single-byte characters that are not in the ASCII range.
tr is a very good fit for this task:
LC_ALL=C tr -c -d '[:cntrl:][:graph:]' < input.txt
It sets the POSIX C locale environment so that only US English character set is valid.
Then instructs tr to -d delete -c complement [:cntrl:][:graph:], control and drawn characters classes (those not control or visible) characters. Since it is sets all the locale setting to C, all non-US-English characters are discarded.

bash script to convert values before an equal sign to lowercase?

I'm trying to achieve the following results:
I have a command that spits out some env vars to the terminal:
./script will output the following:
AWS_OKTA_PROFILE=xxx
AWS_ACCESS_KEY_ID=xxx
AWS_SECRET_ACCESS_KEY=xx
AWS_SECURITY_TOKEN=xx
AWS_SESSION_TOKEN=xx
I want to write this to an output file in another location to be almost the same except like this (but only before the equal sign)
[default]
aws_okta_profile=xxx
aws_access_key_id=xxx
aws_secret_access_key=xx
aws_security_token=xx
aws_session_token=xx
Notice, I'm also prepending [default] to the file.
Thanks!
In addition to sed, awk also provides a simple solution. You can use the '=' as the field-separator and simply convert the first field to lowercase with tolower() if the record contains an '=' sign. (or you can check NF>1 to check you have more than one field) The 1 at the end of the rule is simply short-hand for print. Putting it altogether, you can use
awk -F= -v OFS='=' '/=/{$1=tolower($1)}1' file
Example Use/Output
With your input in the file file, you would get:
$ awk -F= -v OFS='=' '/=/{$1=tolower($1)}1' file
[default]
aws_okta_profile=xxx
aws_access_key_id=xxx
aws_secret_access_key=xx
aws_security_token=xx
aws_session_token=xx
awk does not have an edit in-place mechanism (except by non-standard extension), so simply redirect the output to a new file, e.g.
$ awk -F= -v OFS='=' '/=/{$1=tolower($1)}1' file > newfile
The search and replace with lowercase can be done with sed like this:
#!/usr/bin/env -S sed -f
s/\([^=]\+\)=\([^=]\+\)/\L\1=\E\2/
s/\([^=]\+\)=\([^=]\+\)/: search regex pattern:
\([^=]\+\): capture group of 1 or more characters not an = sign,
=: followed by an = sign,
\([^=]\+\): followed by another captured group of 1 or more characters not an = sign.
/\L\1=\E\2/: Replace matches with this:
\L\1: Lowercase the captured group 1,
=: followed by an = sign,
\E\2: followed by captured group 2 with case unchanged.

Find two string in same line and then replace using sed

I am doing a find and replace using sed in a bash script. I want to search each file for words with files and no. If both the words are present in the same line then replace red with green else do nothing
sed -i -e '/files|no s/red/green' $file
But I am unable to do so. I am not receiving any error and the file doesn't get updated.
What am I doing wrong here or what is the correct way of achieving my result
/files|no/ means to match lines with either files or no, it doesn't require both words on the same line.
To match the words in either order, use /files.*no|no.*files/.
sed -i -r -e '/files.*no|no.*files/s/red/green/' "$file"
Notice that you need another / at the end of the pattern, before s, and the s operation requires / at the end of the replacement.
And you need the -r option to make sed use extended regexp; otherwise you have to use \| instead of just |.
This might work for you (GNU sed):
sed '/files/{/no/s/red/green/}' file
or:
sed '/files/!b;/no/s/red/green/' file
This method allows for easy extension e.g. foo, bar and baz:
sed '/foo/!b;/bar/!b;/baz/!b;s/red/green/' file
or fee, fie, foe and fix:
sed '/fee/!b;/fi/!b;/foe/!b;/fix/!b;s/bacon/cereal/' file
An awk verison
awk '/files/ && /no/ {sub(/red/,"green")} 1' file
/files/ && /no/ files and no have to be on the same line, in any order
sub(/red/,"green") replace red with green. Use gsub(/red/,"green") if there are multiple red
1 always true, do the default action, print the line.

Sed/Awk to delete second occurence of string - platform independent

I'm looking for a line in bash that would work on both linux as well as OS X to remove the second line containing the desired string:
Header
1
2
...
Header
10
11
...
Should become
Header
1
2
...
10
11
...
My first attempt was using the deletion option of sed:
sed -i '/^Header.*/d' file.txt
But well, that removes the first occurence as well.
How to delete the matching pattern from given occurrence suggests to use something like this:
sed -i '/^Header.*/{2,$d} file.txt
But on OS X that gives the error
sed: 1: "/^Header.*/{2,$d}": extra characters at the end of d command
Next, i tried substitution, where I know how to use 2,$, and subsequent empty line deletion:
sed -i '2,$s/^Header.*//' file.txt
sed -i '/^\s*$/d' file.txt
This works on Linux, but on OS X, as mentioned here sed command with -i option failing on Mac, but works on Linux , you'd have to use
sed -i '' '2,$s/^Header.*//' file.txt
sed -i '' '/^\s*$/d' file.txt
And this one in return doesn't work on Linux.
My question then, isn't there a simple way to make this work in any Bash? Doesn't have to be sed, but should be as shell independent as possible and i need to modify the file itself.
Since this is file-dependent and not line-dependent, awk can be a better tool.
Just keep a counter on how many times this happened:
awk -v patt="Header" '$0 == patt && ++f==2 {next} 1' file
This skips the line that matches exactly the given pattern and does it for the second time. On the rest of lines, it prints normally.
I would recommend using awk for this:
awk '!/^Header/ || !f++' file
This prints all lines that don't start with "Header". Short-circuit evaluation means that if the left hand side of the || is true, the right hand side isn't evaluated. If the line does start with Header, the second part !f++ is only true once.
$ cat file
baseball
Header and some other stuff
aardvark
Header for the second time and some other stuff
orange
$ awk '!/^Header/ || !f++' file
baseball
Header and some other stuff
aardvark
orange
This might work for you (GNU sed):
sed -i '1b;/^Header/d' file
Ignore the first line and then remove any occurrence of a line beginning with Header.
To remove subsequent occurrences of the first line regardless of the string, use:
sed -ri '1h;1b;G;/^(.*)\n\1$/!P;d' file

Using BASH, how to increment a number that uniquely only occurs once in most lines of an HTML file?

The target is always going to be between two characters, 'E' and '/' and there will never be but one occurrence of this combination, e.g. 'E01/' in most lines in the HTML file and will always be between '01' and '90'.
So, I need to programmatically read the file and replace each occurrence of 'Enn/' where 'nn' in 'Enn/' will be between '01' and '90' and must maintain the '0' for numbers '01' to '09' in 'Enn/' while incrementing the existing number by 1 throughout the HTML file.
Is this doable and if so how best to go about it?
Edit: Target lines will be in one or the other formats:
<DT>ProgramName
<DT>Program Name
You can use sed inside BASH as a fantastic one-liner, either:
sed -ri 's/(.*E)([0-9]{2})(\/.*)/printf "\1%02u\3" $((10#\2+(10#\2>=90?0:1)))/ge' FILENAME
or if you are guaranteed the number is lower than 100:
sed -ri 's/(.*E)([0-9]{2})(\/.*)/printf "\1%02u\3" $((10#\2+1)))/ge' FILENAME
Basically, you'll be doing inplace search and replace. The above will not add anything after 90 (since you didn't specify the exact nature of the overflow condition). So E89/ -> E90/, E90/ -> E90/, and if by chance you have E91/, it will remain E91/. Add this line inside a loop for multiple files
A small explanation of the above command:
-r states that you'll be using a regular expression
-i states to write back to the same file (be careful with overwriting!)
s/search/replace/ge this is the regex command you'll be using
s/ states you'll be using a string search
(.E) first grouping of all characters upto the first E (case sensitive)
([0-9]{2}) second grouping of numbers 0 through 9, repeated twice (fixed width)
(/.) third grouping getting the escaped trailing slash and everything after that
/ (slash separator) denotes end of search pattern and beginning of replacement pattern
printf "format" var this is the expression used for each replacement
\1 place first grouping found here
%02u the replace format for the var
\3 place third grouping found here
$((expression)) BASH arithmetic expression to use in printf format
10#\2 force second grouping as a base 10 number
+(10#\2>=90?0:1) add 0 or 1 to the second grouping based on if it is >= 90 (as used in first command)
+1 add 1 to the second grouping (see second command)
/ge flags for global replacement and the replace parameter will be an expression
GNU sed and awk are very powerful tools to do this sort of thing.
You can use the following perl one-liner to increment the numbers while maintaining the ones with leading 0s.
perl -pe 's/E\K([0-9]+)/sprintf "%02d", 1+$1/e' file
$ cat file
<DT>ProgramName
<DT>Program Name
<DT>Program Name
<DT>Program Name
$ perl -pe 's/E\K([0-9]+)/sprintf "%02d", 1+$1/e' file
<DT>ProgramName
<DT>Program Name
<DT>Program Name
<DT>Program Name
You can add the -i option to make changes in-place. I would recommend creating backup before doing so.
Not as elegant as one line sed!
Break the commands used into multiple commands and you can debug your bash or grep or sed.
# find the number
# use -o to grep to just return pattern
# use head -n1 for safety to just get 1 number
n=$(grep -o "E[0-9][0-9]\/" file.html |grep -o "[0-9][0-9]"|head -n1)
#octal 08 and 09 are problem so need to do this
n1=10#$n
echo Debug n1=$n1 n=$n
n2=n1
# bash arithmetic done inside (( ))
# as ever with bash bracketing whitespace is needed
(( n2++ ))
echo debug n2=$n2
# use sed with -i -e for inline edit to replace number
sed -ie "s/E$n\//E$(printf '%02d' $n2)\//" file.html
grep "E[0-9][0-9]" file.html
awk might be better. Maybe could do it in one awk command also.
The sed one-liner in other answer is awesome :-)
This works in bash or sh.
http://unixhelp.ed.ac.uk/CGI/man-cgi?grep

Resources