decimal numbers conversion - bash

In bash I am trying to code a conditional with numbers that are decimals (with fractions). Then I found out that I cannot do decimals in bash.
The script that I have is as follows:
a=$(awk '/average TM cross section = / {CCS=$6}; END {printf "%15.4E \n",CCS}' ${names}_$i.out)
a=$(printf '%.2f\n' $a)
echo $a
In the *.out file the numbers are in scientific-notation. At the end the echo $a results me in a number 245.35 (or other numbers in my files). So, I was wondering how to change the out put number 245.35 in to 24535 so I can do a conditional in bash.
I tried to multiply and that obviously did not work. Can anyone help with this conversion?

You might do best to use something other than bash for your arithmetic -- call out to something with a bit more power. You might find the following links either inspiring or horrifying: http://blog.plover.com/prog/bash-expr.html ("Arithmetic expressions in shell scripts") and http://blog.plover.com/prog/spark.html ("Insane calculations in bash"); I'm afraid this is the sort of thing you're liable to end up with if you seriously try to do bash-based arithmetic. In particular, the to_rational function in the second of those articles includes some code for splitting up decimals using regular expressions, though he's doing something more complicated with them than it sounds like you do.

Per our extended conversation
a=$(awk '/average TM cross section = / {CCS=$6}; END {printf "%15d\n",CCS * 100}' ${names}_$i.out)
Now your output will be an integer.
Note that awk is well designed for processing large files and testing logic. It is likely that your all/most of your processing could be done in one awk process. If you're processing large amounts of data, the time savings can be significant.
I hope this helps.

as per the info provided by you , this is not related to any arithmetic operation.
treat it as string . find decimal point and remove it . that's what i understand
http://www.cyberciti.biz/faq/unix-linux-replace-string-words-in-many-files/
http://www.thegeekstuff.com/2010/07/bash-string-manipulation/

Related

nested associative arrays in bash [duplicate]

This question already has answers here:
Multidimensional associative arrays in Bash
(2 answers)
Closed 2 years ago.
Can one construct an associative array whose elements contain arrays in bash? For instance, suppose one has the following arrays:
a=(a aa)
b=(b bb bbb)
c=(c cc ccc cccc)
Can one create an associate array to access these variables? For instance,
declare -A letters
letters[a]=$a
letters[b]=$b
letters[c]=$c
and then access individual elements by a command such as
letter=${letters[a]}
echo ${letter[1]}
This mock syntax for creating and accessing elements of the associate array does not work. Do valid expressions accomplishing the same goals exist?
This is the best non-hacky way to do it but you're only limited to accessing single elements. Using indirect variable expansion references is another but you'd still have to store every element set on an array. If you want to have some form of like anonymous arrays, you'd need to have a random parameter name generator. If you don't use a random name for an array, then there's no sense referencing it on associative array. And of course I wouldn't like using external tools to generate random anonymous variable names. It would be funny whoever does it.
#!/bin/bash
a=(a aa)
b=(b bb bbb)
c=(c cc ccc cccc)
declare -A letters
function store_array {
local var=$1 base_key=$2 values=("${#:3}")
for i in "${!values[#]}"; do
eval "$1[\$base_key|$i]=\${values[i]}"
done
}
store_array letters a "${a[#]}"
store_array letters b "${b[#]}"
store_array letters c "${c[#]}"
echo "${letters[a|1]}"
I think the more straightforward answer is "No, bash arrays cannot be nested."
Anything that simulates nested arrays is actually just creating fancy mapping functions for the keyspace of the (single layered) arrays.
Not that that's bad: it may be exactly what you want, but especially when you don't control the keys into your array, doing it properly becomes harder.
Although I like the solution given by #konsolebox of using a delimiter, it ultimately falls over if your keyspace includes keys like "p|q".
It does have a nice benefit in that you can operate transparently on your keys, as in array[abc|def] to look up the key def in array[abc], which is very clear and readable.
Because it relies on the delimiter not appearing in the keys, this is only a good approach when you know what the keyspace looks like now and in all future uses of the code. This is only a safe assumption when you have strict control over the data.
If you need any kind of robustness, I would recommend concatenating hashes of your array keys. This is a simple technique that is extremely likely to eliminate conflicts, although they are possible if you are operating on extremely carefully crafted data.
To borrow a bit from how Git handles hashes, let's take the first 8 characters of the sha512sums of keys as our hashed keys.
If you feel nervous about this, you can always use the whole sha512sum, since there are no known collisions for sha512.
Using the whole checksum makes sure that you are safe, but it is a little bit more burdensome.
So, if I want the semantics of storing an element in array[abc][def] what I should do is store the value in array["$(keyhash "abc")$(keyhash "def")"] where keyhash looks like this:
function keyhash () {
echo "$1" | sha512sum | cut -c-8
}
You can then pull out the elements of the associative array using the same keyhash function.
Funnily, there's a memoized version of keyhash you can write which uses an array to store the hashes, preventing extra calls to sha512sum, but it gets expensive in terms of memory if the script takes many keys:
declare -A keyhash_array
function keyhash () {
if [ "${keyhash_array["$1"]}" == "" ];
then
keyhash_array["$1"]="$(echo "$1" | sha512sum | cut -c-8)"
fi
echo "${keyhash_array["$1"]}"
}
A length inspection on a given key tells me how many layers deep it looks into the array, since that's just len/8, and I can see the subkeys for a "nested array" by listing keys and trimming those that have the correct prefix.
So if I want all of the keys in array[abc], what I should really do is this:
for key in "${!array[#]}"
do
if [[ "$key" == "$(keyhash "abc")"* ]];
then
# do stuff with "$key" since it's a key directly into the array
:
fi
done
Interestingly, this also means that first level keys are valid and can contain values. So, array["$(keyhash "abc")"] is completely valid, which means this "nested array" construction can have some interesting semantics.
In one form or another, any solution for nested arrays in Bash is pulling this exact same trick: produce a (hopefully injective) mapping function f(key,subkey) which produces strings that they can be used as array keys.
This can always be applied further as f(f(key,subkey),subsubkey) or, in the case of the keyhash function above, I prefer to define f(key) and apply to subkeys as concat(f(key),f(subkey)) and concat(f(key),f(subkey),f(subsubkey)).
In combination with memoization for f, this is a lot more efficient.
In the case of the delimiter solution, nested applications of f are necessary, of course.
With that known, the best solution that I know of is to take a short hash of the key and subkey values.
I recognize that there's a general dislike for answers of the type "You're doing it wrong, use this other tool!" but associative arrays in bash are messy on numerous levels, and run you into trouble when you try to port code to a platform that (for some silly reason or another) doesn't have bash on it, or has an ancient (pre-4.x) version.
If you are willing to look into another language for your scripting needs, I'd recommend picking up some awk.
It provides the simplicity of shell scripting with the flexibility that comes with more feature rich languages.
There are a few reasons I think this is a good idea:
GNU awk (the most prevalent variant) has fully fledged associative arrays which can nest properly, with the intuitive syntax of array[key][subkey]
You can embed awk in shell scripts, so you still get the tools of the shell when you really need them
awk is stupidly simple at times, which puts it in stark contrast with other shell replacement languages like Perl and Python
That's not to say that awk is without its failings. It can be hard to understand when you're first learning it because it's heavily oriented towards stream processing (a lot like sed), but it's a great tool for a lot of tasks that are just barely outside of the scope of the shell.
Note that above I said that "GNU awk" (gawk) has multidimensional arrays. Other awks actually do the trick of separating keys with a well-defined separator, SUBSEP. You can do this yourself, as with the array[a|b] solution in bash, but nawk has this feature builtin if you do array[key,subkey]. It's still a bit more fluid and clear than bash's array syntax.
For those stumbling on this question when looking for ways to pass command line arguments within a command line argument, an encoding such as JSON could turn useful, as long as the consumer agrees to use the encoding.
# Usage: $0 --toolargs '["arg 1", "arg 2"]' --otheropt
toolargs="$2"
v=()
while read -r line; do v+=("${line}"); done < <(jq -r '.[]' <<< "${toolargs}")
sometool "${v[#]}"
nestenc='{"a": ["a", "aa "],
"b": ["b", "bb", "b bb"],
"c d": ["c", "cc ", " ccc", "cc cc"]
}'
index="c d"
letter=()
while read -r line; do
letter+=("${line}")
done < <(jq -r ".\"${index}\"[]" <<< "${nestenc}")
for c in "${letter[#]}" ; do echo "<<${c}>>" ; done
The output follows.
<<c>>
<<cc>>
<<ccc>>
<<cc cc>>

global optin available to set all numbers in script as decimal?

I just discovered the problem doing arithmetic using vars with leading 0's. I found the solution for setting individual vars to decimal using:
N=016
N=$((10#$N)) # force decimal (base 10)
echo $((N + 2))
# result is 18, ok
But I have multiple vars in my script that may or may not take a leading zero when run. I wonder if there is a global option that can be set to specify that all numbers in the script are to be interpreted as decimal? Or would there be a potential problem with doing so that I perhaps did not take into account?
I thought the set command might have such an option but after referring to the man page I did not read anything that looked like it would do the job.
As far as I can tell, this is an (unfortunate) convention established by the B language than a leading 0 introduces an octal number.
By looking at the bash sources, it seems that this convention is hard-coded in several places (lib/sh/strtol.c, builtins/common.c and concerning that specific case in expr.c, function strlong). So to answer to your question, no there isn't a global option to set all numbers as decimal.
If you have number in base 10 potentially prefixed by 0 you want perform calculation on, you might use the ${N#0} notation to refer to them.
sh$ N=010
sh$ echo $((${N#0}+0))
10
I don't know if this is more readable, or even less error prone_ than the solution you proposed in your question, though.

printf column alignment issue

Can someone help me understand printf's alignment function. I have tried reading several examples on Stack and general Google results and I'm still having trouble understanding its syntax. Here is essentially what I'm trying to achieve:
HOLDING 1.1.1.1 Hostname Potential outage!
SKIPPING 1:1:1:1:1:1:1:1 Hostname Existing outage!
I'm sorry, I know this is more of a handout than my usual questions. I really don't know how to start here. I have tried using echo -e "\t" in the past which works for horizontal placement, but not alignment. I have also incorporated a much more complex tcup solution using a for loop, but this will not work easily in this situation.
I just discovered printf's capability though and it seems like it will do what I need, but I dont understand the syntax. Maybe something like this?
A="HOLDING"
B="1.1.1.1"
C="Hostname"
D="Potential outage"
for (( j=1; j<=10; j++ )); do
printf "%-10s" $A $B $C $D
echo "\n"
done
Those variables would be fed in from a db though. I still dont really understand the printf syntax though? Please help
* ALSO *
Off topic question, what is your incentive for responding? I'm fairly new to stack exchange. Do some of you get anything out of it other than reputation. Careers 2.0? or something else? Some people have ridiculous stats on this site. Just curious what the drive is.
The string %-10s can be broken up into multiple parts:
% introduces a conversion specifier, i.e. how to format an argument
- specifies that the field should be left aligned.
10 specifies the field width
s specifies the data type, string.
Bash printf format strings mimic those of the C library function printf(3), and this part is described in man 3 printf.
Additionally, Bash printf, when given more arguments than conversion specifiers, will print the string multiple times for each argument, so that printf "%-10s" foo bar is equivalent to printf "%-10s" foo; printf "%-10s" bar. This is what lets you specify all the arguments on the same command, with the %-10s applying to each of them.
As for people's motivation, you could try the meta site, which is dedicated to questions about stackoverflow itself.

Good style for splitting lengthy expressions over lines

If the following is not the best style, what is for the equivalent expression?
if (some_really_long_expression__________ && \
some_other_really_long_expression)
The line continuation feels ugly. But I'm having a hard time finding a better alternative.
The parser doesn't need the backslashes in cases where the continuation is unambiguous. For example, using Ruby 2.0:
if true &&
true &&
true
puts true
end
#=> true
The following are some more-or-less random thoughts about the question of line length from someone who just plays with Ruby. Nor have I had any training as a software engineer, so consider yourself forewarned.
I find the problem of long lines is often more the number of characters than the number of operations. The former can be reduced by (drum-roll) shortening variable names and method names. The question, of course, is whether the application of a verbosity filter (aka babbling, prattling or jabbering filter) will make the code harder to comprehend. How often have you seen something fairly close to the following (without \)?
total_cuteness_rating = cats_dogs_and_pigs.map {|animal| \
cuteness_calculation(animal)}.reduce {|cuteness_accumulator, \
cuteness_per_animal| cuteness_accumulator + cuteness_per_animal}
Compare that with:
tot_cuteness = pets.map {|a| cuteness(a)}.reduce(&:+)
Firstly, I see no benefit of long names for local variables within a block (and rarely for local variables in a method). Here, isn't it perfectly obvious what a refers to in the calculation of tot_cuteness? How good a memory do you need to remember what a is when it is confined to a single line of code?
Secondly, whenever possible use the short form for enumerables followed by a block (e.g, reduce(&:+)). This allows us to comprehend what's going on in microseconds, here as soon as our eyes latch onto the +. Same, for .to_i, _s or _f. True, reduce {|tot, e| tot + e} isn't much longer, but we're forcing the reader's brain to decode two variables as well as the operator, when + is really all it needs.
Another way to shorten lines is to avoid long chains of operations. That comes at a cost, however. As far as I'm concerned, the longer the chain, the better. It reduces the need for temporary variables, reduces the number of lines of code and--possibly of greatest importance--allows us to read across a line, as most humans are accustomed, rather than down the page. The above line of code reads, "To calculate total cuteness, calculate each pet's cuteness rating, then sum those ratings". How could it be more clear?
When chains are particularly long, they can be written over multiple lines without using the line-continuaton character \:
array.each {|e| blah, blah, ..., blah
.map {|a| blah, blah, ..., blah
.reduce {|i| blah, blah, ..., blah }
}
}
That's no less clear than separate statements. I think this is frequently done in Rails.
What about the use of abbreviations? Which of the following names is most clear?
number_of_dogs
number_dogs
nbr_dogs
n_dogs
I would argue the first three are equally clear, and the last no less clear if the writer consistently prefixes variable names with n_ when that means "number of". Same for tot_, and so on. Enough.
One approach is to encapsulate those expressions inside meaningful methods. And you might be able to break it into multiple methods that you can later reuse.
Other then that is hard to suggest anything with the little information you gave. You might be able to get rid of the if statement using command objects or something like that but I can't tell if it makes sense on your code because you didn't show it.
Ismael answer works really well in Ruby (there may be other languages too) for 2 reasons:
Ruby has very low overhead to creating methods due to lack of type
definition
It allows you to decouple such logic for reuse or future adaptability and testing
Another option I'll toss out is create logic equations and store the result in a variable e.g.
# this are short logic equations testing x but you can apply same for longer expressions
number_gt_5 = x > 5
number_lt_20 = x < 20
number_eq_11 = x == 11
if (number_gt_5 && number_lt_20 && !number_eq_11)
# do some stuff
end

Sprintf equivalent in Mathematica?

I don't know why Wikipedia lists Mathematica as a programming language with printf. I just couldn't find the equivalent in Mathematica.
My specific task is to process a list of data files with padded numbers, which I used to do it in bash with
fn=$(printf "filename_%05d" $n)
The closest function I found in Mathematica is PaddedForm. And after some trial and error, I got it with
"filename_" <> PaddedForm[ Round##, 4, NumberPadding -> {"0", ""} ]&
It is very odd that I have to use the number 4 to get the result similar to what I get from "%05d". I don't understand this behavior at all. Can someone explain it to me?
And is it the best way to achieve what I used to in bash?
I wouldn't use PaddedForm for this. In fact, I'm not sure that PaddedForm is good for much of anything. Instead, I'd use good old ToString, Characters and PadLeft, like so:
toFixedWidth[n_Integer, width_Integer] :=
StringJoin[PadLeft[Characters[ToString[n]], width, "0"]]
Then you can use StringForm and ToString to make your file name:
toNumberedFileName[n_Integer] :=
ToString#StringForm["filename_``", toFixedWidth[n, 5]]
Mathematica is not well-suited to this kind of string munging.
EDIT to add: Mathematica proper doesn't have the required functionality, but the java.lang.String class has the static method format() which takes printf-style arguments. You can call out to it using Mathematica's JLink functionality pretty easily. The performance won't be very good, but for many use cases you just won't care that much:
Needs["JLink`"];
LoadJavaClass["java.lang.String"];
LoadJavaClass["java.util.Locale"];
sprintf[fmt_, args___] :=
String`format[Locale`ENGLISH,fmt,
MakeJavaObject /#
Replace[{args},
{x_?NumericQ :> N#x,
x : (_Real | _Integer | True |
False | _String | _?JavaObjectQ) :> x,
x_ :> MakeJavaExpr[x]},
{1}]]
You need to do a little more work, because JLink is a bit dumb about Java functions with a variable number of arguments. The format() method takes a format string and an array of Java Objects, and Mathematica won't do the conversion automatically, which is what the MakeJavaObject is there for.
I've run into the same problem quite a bit, and decided to code my own function. I didn't do it in Java but instead just used string operations in Mathematica. It turned out quite lengthy, since I actually also needed %f functionality, but it works, and now I have it as a package that I can use at any time. Here's a link to the GitHub project:
https://github.com/vlsd/MathPrintF
It comes with installation instructions (really just copying the directory somewhere in the $Path).
Hope this will be helpful to at least some.
You could also define a function which passes all arguments to StringForm[] and use IntegerString or the padding functions as previously mentioned:
Sprintf[args__] := StringForm[args__] // ToString;
file = Sprintf["filename_``", IntegerString[n, 10, 5]];
IntegerString does exactly what you need. In this case it would be
IntegerString[x,10,5]
I agree with Pillsy.
Here's how I would do it.
Note the handy cat function, which I think of as kind of like sprintf (minus the placeholders like StringForm provides) in that it works like Print (you can print any concatenation of expressions without converting to String) but generates a string instead of sending to stdout.
cat = StringJoin##(ToString/#{##})&;
pad[x_, n_] := If[StringLength#cat[x]>=n, cat[x],
cat##PadLeft[Characters#cat[x],n,"0"]]
cat["filename_", pad[#, 5]]&
This is very much like Pillsy's answer but I think cat makes it a little cleaner.
Also, I think it's safer to have that conditional in the pad function -- better to have the padding wrong than the number wrong.

Resources