How to format a US currency string using python or sed - bash

I have numerous invoices that I sent to clients with this string at the bottom:
Total: 1,000.00
or whatever the amount. Some are 2 figures, some 5 figures + the decimal part.
The thing is that the number's format is inconsistant accross all invoices. Sometimes its 1.000,00 and it keeps on switching the dot and the coma.
so with grep, awk and sed, i am able to only get the amount part from all invoices, without the dollar sign in order to sum them up to a grand total. But the dot and coma switching confuses python, obviously.
So in python (could be in sed as well), i am looking to convert the third char from the right to a dot and then from there on, every fourth char it finds, convert it to a coma.
In other words, it has to be able to separate the digits in groups of 3 from the right, add a coma in between each of them except for the first group at the far right which would be 2 digits separated by a dot.
Hope that is clear enough...

Try this:
yourstring = yourstring[:(len(yourstring)-3)].replace(".",",") + "." + yourstring[-2:]
I tried this on python and I think that works.

sed 's/$/ /
:coma
s/\([0-9]\)[.]\([0-9]\{3\}\)/\1,\2/g;t coma
:dot
s/\([0-9]\),\([0-9][0-9][^0-9]\)/\1.\2/g;t dot
s/ $//
' YourFile
use general and recursive modification for all number on each line.
change every dot number into coma structure then change last coma to a dot
need a trick to change number at end of string (add a space at start, remove it at the end [this could be optimized with a previous test])
posix compliant

Well, the simplest way i've found to handle this is using a bit of sed, some bash and for the final print, printf, which allow us easy currency formatting with "%'.2f" (note the ' character, it is mandatory):
# Get rid of every character that is not a digit
totals=$( echo "$totals" | sed 's/[^0-9]*//g' )
# Sum up the amounts
sum=0
for n in $totals; do
sum=$(($sum+$n))
done
# Put back the comas at each thousand, the dot at decimals and the $ sign in
sansdec=(${#sum}-2)
sum="${sum:0:$sansdec}.${sum: -2}"
printf "%s" "\$"
printf "%'.2f\n" "$sum"

Related

Regular expression in bash to match multiple conditions

I would like to implement a regular expression in bash that allows me to verify a series of characteristics on a dataset.
A sample is attached below:
id, date of birth, grade, explusion, serious misdemeanor
123,2005-01-01,5.36,1,1
582,1999-05-12,8.51,0,1
9274,2001-25-12,9.65,0,0
21,2006-14-05,0.53,4,1
id is required to have only 3 digits, date of birth less than 2000, minimum grade point average is 5.60 with the second decimal place being other than 0, and at least one expulsion or serious misconduct.
The result of executing the regular expression should be:
582, 1999-05-12, 8.51, 0, 1
I have tried to implement the following regular expression and it does not give me any result.
grep -E "^\d{0,3},[0-2][0-9][0-9][0-9].*,[1-5].[0-5][1-9],[1-9],[1-9]$"
Any idea?
If it is mandatory to use grep, would you please try:
grep -E '^[0-9]{1,3},1[0-9]{3}(-[0-9]{2}){2},(5\.[6-9][1-9]|[6-9]\.[0-9][1-9]|[1-9][0-9]+\.[0-9][1-9]),([1-9][0-9]*,[0-9]+|[0-9]+,[1-9][0-9]*)[[:space:]]?$' input_file
Result:
582,1999-05-12,8.51,0,1
[0-9]{1,3} matches if id has 1-3 digits. (I have interpreted only 3 digits like that. If it means differently, tweak the regex accordingly.)
1[0-9]{3}(-[0-9]{2}){2} matches if the birth year is before 200 exclusive.
(5\.[6-9][1-9]|[6-9]\.[0-9][1-9]|[1-9][0-9]+\.[0-9][1-9]) matches if grade is greater than 5.60 with the second decimal place being other than 0.
([1-9][0-9]*,[0-9]+|[0-9]+,[1-9][0-9]*) matches if either or both of explusion and serious misdemeanor have non-zero value.
Regular expressions do not understand numeric values, and they certainly do not understand boolean logic. All it knows is text. You'll need to use an actual programming language like Awk or Perl to do this.
Here's an example:
$ perl -l -a -F, -E'say if length($F[0])>3 || $F[2] < 5.60' foo.txt
123,2005-01-01,5.36,1,1
9274,2001-25-12,9.65,0,0
21,2006-14-05,0.53,4,1
This call to perl splits apart the fields on commas, and then prints the line if the length of the first column is over 3, or the value of the third column is less than 5.60.
This is just a starting point, but this is the direction to go.

replace a pattern with n number of spaces

I am new to shell scripting, appreciate any help regarding below problem. I have tried to use sed and awk but unable to find a solution.
Problem: I have a fixed width file which has amount fields that need to be replaced with spaces/any special character like $ and the record length has to be maintained. The length of amount fields can vary.
For ex. if sample_file.txt has record length of 10 and there are two amount fields starting at 2 and 6 of length 3 and 5 in this file as below:
a234b67890
It has to be modified as:
a$$$b$$$$$
This is for unix server.
Edit:
Also the records can have numeric characters at other positions which shouldn't be updated. So considering the previous example, the updated input is:
a234b678901234567890
And new output should be:
a$$$b$$$$$1234567890
Try using
inp=a234b67890
echo $inp | sed 's/[0-9]/$/g'
# gives a$$$b$$$$$
The only requirement is that the input should always be of record_length as sed replaces the numbers with the special character.
Hope this helps.

How to separate variables without spaces

My question is similar to this one, but not really.
The issue is that I have variables in my script that will echo/printf control characters directly next to the previous. Unfortunately I have to put spaces between the variables or everything gets misinterpreted, but that's not going to work either, as I can't have spaces between them.
str="25 cents"
one=1
two=2
printf "\x3${one},${two}${str}\x30"\
(without spaces this string messes up)
printf "\x3${one},${two}%s\x30" "${str}" # outputs "5 cents"
So it ends up being either " 25 cents " (wrong), or "5 cents" (wrong x 2)... It should be:
25 cents
I've tried just about everything, escaping the variables, putting them in quotes and no luck. Evidently there's a correct way to handle this that I'm unaware of, so any help is great - thanks.
If what you are trying to do is insert mIRC colour codes into a string -- and you would have made it easier to be helped if you had said so -- then you need to be aware of two things:
The C-style hexadecimal escapes interpreted by Gnu printf have the format \x followed by two hexadecimal digits. (You can use just one digit, but only if the next character is not a hexadecimal digit. So it's better to think of it as always being two digits.) A control-C (character code 3) is written \x03. x30 through \x39 are the character codes for the digits 0 through 9. The translation of the escape code is done by printf, not by the shell, so parameter substitution happens first. So if the value of $one is one, printf "\x3${one}" will be expanded to printf "\x31" by the shell, and then printf will print the digit 1. I presume that is not what you want, since there are obviously much less round-about ways to insert the value of a variable, which don't limit the variable to be a single decimal digit.
Not all printf implementations handle hexadecimal escapes, and not all shells have a built-in printf. So while you can use \x03 with bash, you might find that it is not portable. All printf implementations should handle octal escapes, though, and 3 is still 3 in octal, but now you need three digits: \003.
The mIRC colour codes have the form control-C followed by up to two numbers separated by a comma. These numbers have a maximum of two digits, and if the next character after the colour code is a digit, you must use the two-digit form. (Coincidentally similar to the hex escape codes above, but it is truly just a coincidence.) So if you wanted the text 25 with foreground colour 3 and background colour 1, you would need to send ^C1,0225^C; if you sent ^C1,225^C, that would be interpreted as foreground colour 1 and background colour 25 (which is not a valid colour code), with the text being 5.
This is mentioned in the mIRC documentation linked above:
Note: if you want to color text that begins with numbers, this syntax requires that you specify the color value as two digits.
So a better printf invocation might be:
printf "\003%02d,%02d%s\003" "$one" "$two" "$str"
Note: It is, of course, possible that my guess about what string you are seeking to produce is completely wrong; it is just a guess based on an off-hand comment which was not deleted. If so, and if you are serious about getting your question answered, I strongly suggest you provide a clearer explanation of precisely what byte-string you are attempting to produce with your printf statement.

AWK - I need to write a one line shell command that will count all lines that

I need to write this solution as an AWK command. I am stuck on the last question:
Write a one line shell command that will count all lines in a file called "file.txt" that begin with a decimal number in parenthesis, containing a mix of both upper and lower case letters, and end with a period.
Example(s):
This is the format of lines we want to print. Lines that do not match this format should be skipped:
(10) This is a sample line from file.txt that your script should
count.
(117) And this is another line your script should count.
Lines like this, as well as other non-matching lines, should be skipped:
15 this line should not be printed
and this line should not be printed
Thanks in advance, I'm not really sure how to tackle this in one line.
This is not a homework solution service. But I think I can give a few pointers.
One idea would be to create a counter, and then print the result at the end:
awk '<COND> {c++} END {print c}'
I'm getting a bit confused by the terminology. First you claim that the lines should be counted, but in the examples, it says that those lines should be printed.
Now of course you could do something like this:
awk '<COND>' file.txt | wc -l
The first part will print out all lines that follow the condition, but the output will be parsed to wc -l which is a separate program that counts the number of lines.
Now as to what the condition <COND> should be, I leave to you. I strongly suggest that you google regular expressions and awk, it shouldn't be too hard.
I think the requirement is very clear
Write a one line shell command that will count all lines in a file called "file.txt" that begin with a decimal number in parenthesis, containing a mix of both upper and lower case letters, and end with a period.
1. begin with a decimal number in parenthesis
2. containing a mix of both upper and lower case letters
3. end with a period
check all three conditions. Note that in 2. it doesn't say "only" so you can have extra class of characters but it should have at least one uppercase and one lowercase character.
The example mixes concepts printing and counting, if part of the exercise it's very poorly worded or perhaps assumes that the counting will be done by wc by a piped output of a filtering script; regardless more attention should have been paid, especially for a student exercise.
Please comment if anything not clear and I'll add more details...

Take token from this bash string/array...not sure which it is

Hi I am writing a bash script and I have a string
foo=1.0.3
What I want to do is examine the '3'. The first thing I did was get rid of the periods by doing this. bar=echo $foo|tr '.' ' ' with backticks around echo until the last single quote (not sure how to accomplish writing that.
When I do an echo $bar it prints 1 0 3. Now how do I create a variable that holds only the 3? thank you very much
As you are no doubt learning about bash, there are many many ways to achieve your goals. I think #Mat's answer using bar=${foo##*.} is the best so far, although he doesn't explain how or why it works. I strongly recommend you check out the bash tutorial on tldp, it is my goto source when I have questions like this. For string manipulation, there is a section there that discusses many of the different ways to go about this sort of thing.
For example, if you know that foo is always going to be 5 characters long, you can simply take the fifth character from it:
bar=${foo:4}
That is, make bar the fifth position of foo (remember, we start counting from zero, not from one).
If you know it is always going to be the last position of foo, then you can just count backwards:
bar=${foo: -1}
Notice there is a space between the -1 and the colon, you need that (or parenthesis) to escape the negative sign.
To explain #Mat's answer, I had to look at the link I provided above. Apparently the double pound signs (hash mark, octothorpe, whatever you want to call them) in the expression:
${string##substring}
Mean to delete longest match of $substring from front of $string. So you are looking for the longest match of *. which equates to everything before a dot. Pretty cool, huh?
This should work:
bar=$(echo $foo|cut -d. -f3)
If you know you only want the part after the last dot (not the third item in a .-separated list) you can also do this:
bar=${foo##*.}
Advantage: no extra process or subshell started.
One way: Build an array and take position 2:
array=(`echo $foo | tr . ' '`)
echo ${array[2]}
This should also work too:
echo $foo | awk -F. '{print $3}'

Resources