Unexpected arithmetic result with zero padded numbers - shell

I have a problem in my script wherein I'm reading a file and each line has data which is a representation of an amount. The said field always has a length of 12 and it's always a whole number. So let's say I have an amount of 25,000, the data will look like this 000000025000.
Apparently, I have to get the total amount of these lines but the zero prefixes are disrupting the computation. If I add the above mentioned number to a zero value like this:
echo $(( 0 + 000000025000 ))
Instead of getting 25000, I get 10752 instead. I was thinking of looping through 000000025000 and when I finally get a non-zero value, I'm going to substring the number from that index onwards. However, I'm hoping that there must be a more elegant solution for this.

The number 000000025000 is an octal number as it starts with 0.
If you use bash as your shell, you can use the prefix 10# to force the base number to decimal:
echo $(( 10#000000025000 ))
From the bash man pages:
Constants with a leading 0 are interpreted as octal numbers. A leading 0x or 0X denotes hexadecimal. Otherwise, numbers take the form [base#]n, where the optional base is a decimal number between 2 and 64 representing the arithmetic base, and n is a number in that base.

Using Perl
$ echo "000000025000" | perl -ne ' { printf("%d\n",scalar($_)) } '
25000

Related

bash script problem with understanding how shuf works

I have the following problem understanding this line of code
for NUMBER in $(shuf -i1-$MAX_NUMBER)
Do I understand correctly that I take subsequent numbers up to "$MAX_NUMBER" or the function "shuf -i1-" make any changes?
shuf -i1-$MAX_NUMBER prints a random permutation of the numbers in the range of 1 to $MAX_NUMBER (i.e, not subsequent).
This means that in each iteration of the loop, the value of $NUMBER will be a random value between 1 and $MAX_NUMBER, until all numbers have been used.

Solved: Grep and Dynamically Truncate at Same Time

Given the following:
for(condition which changes $z)
aptitude show $z | grep -E 'Uncompressed Size: |x' | sed s/Uncompressed Size: //";
done
That means 3 items are outputting to screen ($Z, Uncompressed Size, x).
I want all of that to fit on one line, and a line I deem is = 100 characters.
So, ($Z, Uncompressed Size, x) must fit on one line. But X is very long and will have to be truncated. So there is a requirement to add "used" characters by $z and Uncompressed Size, so that x can be truncated dynamically. I love scripting and being able to do this I deem an absolute must. Needless to say all 3 items being output to screen change hence the characters of the first two outputs must be calculated to subtract from the allowed characters for x, and sum of all characters between all 3 items cannot exceed 100 characters.
sed 's/.//5g'
Lmao, sometimes I wish I thought in simpler terms; complicated description + simple solution = simple problem over complicated by interpreter.
Thank you, Barmar
That only leaves sed (100 - amount of characters used by $z which is this function: ${#z}

How many numbers can we store with 1 bit?

I want to know how many characters or numbers can I store in 1 bit only. It will be more helpful if you tell it in octal, hexadecimal.
I want to know how many characters or numbers can I store in 1 bit only.
It is not practical to use a single bit to store numbers or characters. However, you could say:
One integer provided that the integer is in the range 0 to 1.
One ASCII character provided that the character is either NUL (0x00) or SOH (0x01).
The bottom line is that a single bit has two states: 0 and 1. Any value domain with more that two values in the domain cannot be represented using a single bit.
It will be more helpful if you tell it in octal, hexadecimal.
That is not relevant to the problem. Octal and hexadecimal are different textual representations for numeric data. They make no difference to the meaning of the numbers, or (in most cases1) the way that you represent the numbers in a computer.
1 - The exception is when you are representing numbers as text; e.g. when you represent the number 42 in a text document as the character '4' followed by the character '2'.
A bit is a "binary digit", or a value from a set of size two. If you have one or more bits, you raise 2 to the power of the number of bits. So, 2ยน gives 2. The field in Mathematics is called combinatorics.

How to determine the number of grouped numbers in a string in bash

I have a string in bash
string="123 abc 456"
Where numbers that are grouped together are considered 1 number.
"123" and "456" would be considered numbers in this case.
How can i determine the number of grouped together numbers?
so
"123"
is determined to be a string with just one number, and
"123 abc 456"
is determined to be a string with 2 numbers.
egrep -o '[0-9]+' <<<"$string" | wc -l
Explanation
egrep: This performs an extended regular expression match on the lines of a given file (or, in this case, a herestring). It usually returns lines of text within the string that contain at least one chunk of text that matches the supplied pattern. However, the -o flag tells it to return only those matching chunks, one per line of output.
'[0-9]+': This is the regular expression that the string is compared against. Here, we are telling it to match successive runs of 1 or more digits, and no other character.
<<< The herestring operator allows us to pass a string into a command as if were the contents of a file.
| This pipes the output of the previous command (egrep) to become the input for the next command (wc).
wc: This performs a word count, normally returning the number of words in a given argument. However, the -l tells it to do a line count instead.
UPDATE: 2018-08-23
Is there any way to adapt your solution to work with floats?
The regular expression that matches both integer numbers and floating point decimal numbers would be something like this: '[0-9]*\.?[0-9]+'. Inserting this into the command above in place of its predecessor, forms this command chain:
egrep -o '[0-9]*\.?[0-9]+' <<<"$string" | wc -l
Focussing now only on the regular expression, here's how it works:
[0-9]: This matches any single digit from 0 to 9.
*: This is an operator that applies to the expression that comes directly before it, i.e. the [0-9] character class. It tells the search engine to match any number of occurrences of the digits 0 to 9 instead of just one, but no other character. Therefore, it will match "2", "26", "4839583", ... but it will not match "9.99" as a singular entity (but will, of course, match the "9" and the "99" that feature within it). As the * operator matches any number of successive digits, this can include zero occurrences (this will become relevant later).
\.: This matches a singular occurrence of a period (or decimal point), ".". The backslash is a special character that tells the search engine to interpret the period as a literal period, because this character itself has special function in regular expression strings, acting as a wildcard to match any character except a line-break. Without the backslash, that's what it would do, which would potentially match "28s" if it came across it, where the "s" was caught by the wildcard period. However, the backslash removes the wildcard functionality, so it will now only match with an actual period.
?: Another operator, like the *, except this one tells the search engine to match the previous expression either zero or one times, but no more. In other words, it makes the decimal point optional.
[0-9]+: As before, this will match digits 0 to 9, the number of which here is determined by the + operator, which standards for at least one, i.e. one or more digits.
Applying this to the following string:
"The value of pi is approximately 3.14159. The value of e is about 2.71828. The Golden Ratio is approximately 1.61803, which can be expressed as (โˆš5 + 1)/2."
yields the following matches (one per line):
3.14159
2.71828
1.61803
5
1
2
And when this is piped through the wc -l command, returns a count of the lines, which is 6, i.e. the supplied string contains 6 occurrences of number strings, which includes integers and floating point decimals.
If you wanted only the floating point decimals, and to exclude the integers, the regular expression is this:
'[0-9]*\.[0-9]+'
If you look carefully, it's identical to the previous regular expression, except for the missing ? operator. If you recall, the ? made the decimal point an optional feature to match; removing this operator now means the decimal point must be present. Likewise, the + operator is matching at least one instance of a digit following the decimal point. However, the * operator before it matches any number of digits, including zero digits. Therefore, "0.61803" would be a valid match (if it were present in the string, which it isn't), and ".33333" would also be a valid match, since the digits before the decimal point needn't be there thanks to the * operator. However, whilst "1.1111" could be a valid match, "1111." would not be, because + operator dictates that there must be at least one digit following the decimal point.
Putting it into the command chain:
egrep -o '[0-9]*\.[0-9]+' <<<"$string" | wc -l
returns a value of 3, for the three floating point decimals occurring in the string, which, if you remove the | wc -l portion of the command, you will see in the terminal output as:
3.14159
2.71828
1.61803
For reasons I won't go into, matching integers exclusively and excluding floating point decimals is harder to accomplish with Perl-flavoured regular expression matching (which egrep is not). However, since you're really only interested in the number of these occurrences, rather than the matches themselves, we can create a regular expression that doesn't need to worry about accurate matching of integers, as long as it produces the same number of matched items. This expression:
'[^.0-9][0-9]+(\.([^0-9]|$)|[^.])'
seems to be good enough for counting the integers in the string, which includes the 5, 1 and 2 (ignoring, of course, the โˆš symbol), returning these approximately matches substrings:
โˆš5
1)
/2.
I haven't tested it that thoroughly, however, and only formulated it tonight when I read your comment. But, hopefully, you are beginning to get a rough sense of what's going on.
In case you need to know the number of grouped digits in string then following may help you.
string="123 abc 456"
echo "$string" | awk '{print gsub(/[0-9]+/,"")}'
Explanation: Adding explanation too here, following is only for explanation purposes.
string="123 abc 456" ##Creating string named string with value of 123 abc 456.
echo "$string" ##Printing value of string here with echo.
| ##Putting its output as input to awk command.
awk '{ ##Initializing awk command here.
print gsub(/[0-9]+/,"") ##printing value of gsub here(where gsub is for substituting the values of all digits in group with ""(NULL)).
it will globally substitute the digits and give its count(how many substitutions happens will be equal to group of digits present).
}' ##Closing awk command here.

Bash - stripping of the last digits of a number - which one is better in terms of semantic?

Consider a this string containing an integer
nanoseconds=$(date +%s%N)
when I want to strip off the last six characters, what would be semantically better?
Stripping just the characters off the string
nanoseconds=$(date +%s%N)
milliseconds=${nanoseconds%??????}
or dividing the value by 1000000
milliseconds=$((nanoseconds / 1000000))
EDIT
Sry for not being clear. It's basically for doing a conversion from nanoseconds to milliseconds. I think I answered my own question...
Both are equivalent, but in general I would consider the former method to be safer. The first method is explicit and does precisely what you want to do: to remove a substring from the back of the string.
The other one is a mathematical operation that relies on correct rounding. Although I cannot imagine where it would fail, I would prefer the first method.
Unless, of course, what you really want is not stripping the last three characters but dividing by 1000 :-)
Post scriptum: hah, of course I know where it would fail. Let value="123". ${value%???} strips the last three digits, as intended, leaving an empty string. $(( value / 1000 )) results in value equal to "0" (a string of length of 1).
EDIT: since we know now that it is not about stripping characters, but rounding, clearly dividing by 1000 is the correct way of approaching the problem :-)
The clearest method when strings are involved is probably string subscription in shells that support it.
s=$(LC_TIME=C date +%s.%N) s=${s::-3}
Fortunately it appears GNU date at least defaults to zero-padding for %N, so division should be reliable. (note that both of these methods are truncation, not rounding).
(( s=(10#$(LC_TIME=C date +%s%N))/1000 ))
If you want to round, you can do a bit better than these using printf
printf -v milliseconds %.6f "$(LC_TIME=C date +%s.%N)"
ksh93's printf supports %N so there's no need for date. The conversion can be automatic. If you have (a modern) ksh available you should definitely use it.
typeset -T MsTime=(
typeset -lF6 .=0
function get {
((.sh.value=$(LC_TIME=C printf '%(%s.%N)T')))
}
)
MsTime milliseconds
print -r "$milliseconds"

Resources