why are function calls in Perl loops so slow? - performance

I was writing a file parser in Perl, so had to loop through file. File consists of fixed length records and I wanted to make a separate function that parses given record and call that function in a loop. However, final result turned to be slow with big files and my guess was that I shouldn't use external function. So I made some dummy tests with and without function call in a loop:
[A]
foreach (1 .. 10000000) {
$a = &get_string();
}
sub get_string {
return sprintf("%s\n", 'abc');
}
[B]
foreach (1 .. 10000000) {
$a = sprintf "%s\n", 'abc';
}
Measuring showed that A code runs about 3-4 times slower than code B. I knew beforehand that code A was supposed to run slower but still I was surprised that difference is that big. Also tried to run similar tests with Python and Java. In Python code A equivalent was about 20% slower than B and Java code was runing more or less at the same speed (as expected). Changing function from sprintf to something else didn't show any significant difference.
Is there any way to help Perl run such loops faster? Am I doing something totaly wrong here or is it Perl's feature that function calls are such overhead?

Perl function calls are slow. It sucks because the very thing you want to be doing, decomposing your code into maintainable functions, is the very thing that will slow your program down. Why are they slow? Perl does a lot of things when it enters a subroutine, a result of it being extremely dynamic (ie. you can mess with a lot of things at run time). It has to get the code reference for that name, check that it is a code ref, set up a new lexical scratchpad (to store my variables), a new dynamic scope (to store local variables), set up #_ to name a few, check what context it was called in and pass along the return value. Attempts have been made to optimize this process, but they haven't paid out. See pp_entersub in pp_hot.c for the gory details.
Also there was a bug in 5.10.0 slowing down functions. If you're using 5.10.0, upgrade.
As a result, avoid calling functions over and over again in a long loop. Especially if its nested. Can you cache the results, perhaps using Memoize? Does the work have to be done inside the loop? Does it have to be done inside the inner-most loop? For example:
for my $thing (#things) {
for my $person (#persons) {
print header($thing);
print message_for($person);
}
}
The call to header could be moved out of the #persons loop reducing the number of calls from #things * #persons to just #things.
for my $thing (#things) {
my $header = header($thing);
for my $person (#persons) {
print $header;
print message_for($person);
}
}

If your sub has no arguments and is a constant as in your example, you can get a major speed-up by using an empty prototype "()" in the sub declaration:
sub get_string() {
return sprintf(“%s\n”, ‘abc’);
}
However this is probably a special case for your example that do not match your real case. This is just to show you the dangers of benchmarks.
You'll learn this tip and many others by reading perlsub.
Here is a benchmark:
use strict;
use warnings;
use Benchmark qw(cmpthese);
sub just_return { return }
sub get_string { sprintf "%s\n", 'abc' }
sub get_string_with_proto() { sprintf "%s\n", 'abc' }
my %methods = (
direct => sub { my $s = sprintf "%s\n", 'abc' },
function => sub { my $s = get_string() },
just_return => sub { my $s = just_return() },
function_with_proto => sub { my $s = get_string_with_proto() },
);
cmpthese(-2, \%methods);
and its result:
Rate function just_return direct function_with_proto
function 1488987/s -- -65% -90% -90%
just_return 4285454/s 188% -- -70% -71%
direct 14210565/s 854% 232% -- -5%
function_with_proto 15018312/s 909% 250% 6% --

The issue you are raising does not have anything to do with loops. Both your A and B examples are the same in that regard. Rather, the issue is the difference between direct, in-line coding vs. calling the same code via a function.
Function calls do involve an unavoidable overhead. I can't speak to the issue of whether and why this overhead is costlier in Perl relative to other languages, but I can provide an illustration of a better way to measure this sort of thing:
use strict;
use warnings;
use Benchmark qw(cmpthese);
sub just_return { return }
sub get_string { my $s = sprintf "%s\n", 'abc' }
my %methods = (
direct => sub { my $s = sprintf "%s\n", 'abc' },
function => sub { my $s = get_string() },
just_return => sub { my $s = just_return() },
);
cmpthese(-2, \%methods);
Here's what I get on Perl v5.10.0 (MSWin32-x86-multi-thread). Very roughly, simply calling a function that does nothing is about as costly as directly running our sprintf code.
Rate function just_return direct
function 1062833/s -- -70% -71%
just_return 3566639/s 236% -- -2%
direct 3629492/s 241% 2% --
In general, if you need to optimize some Perl code for speed and you're trying to squeeze out every last drop of efficiency, direct coding is the way to go -- but that often comes with a price of less maintainability and readability. Before you get into the business of such micro-optimizing, however, you want to make sure that your underlying algorithm is solid and that you have a firm grasp on where the slow parts of your code actually reside. It's easy to waste a lot of effort working on the wrong thing.

The perl optimizer is constant-folding the sprintf calls in your sample code.
You can deparse it to see it happening:
$ perl -MO=Deparse sample.pl
foreach $_ (1 .. 10000000) {
$a = &get_string();
}
sub get_string {
return "abc\n";
}
foreach $_ (1 .. 10000000) {
$a = "abc\n";
}
- syntax OK

Related

How can I use "map" in "perl" to return a hash reference whose key is looked up from an array reference whose value is looked up from another array?

I've searched other many Stack questions on map however this requirement is particular and well try as I might I cannot quite get the solution I am looking for, or I think that does exist.
This question is simply about performance.
As limited, background, this code segment used in decoding incoming tokens so it's used on every web request and therefore the performance is critical and I know "map" can be used so want to use it.
Here is a trimmed down but nevertheless fully working code segment which I am currently using and works perfectly well:
use strict;
use Data::Dumper qw (Dumper);
my $api_token = { array => [ 'user_id', 'session_id', 'expiry' ], max => 3, name => 'session' };
my $token_got = [ 9923232345812112323, 1111323232000000465, 1002323001752323232 ];
my $rt;
for (my $i=0; $i<scalar #{$api_token->{array}}; $i++) {
$rt->{$api_token->{array}->[$i]} = $token_got->[$i];
}
$rt->{type} = $api_token->{name};
print Dumper ($rt) . "\n";
The question is:
What is the absolute BEST POSSIBLE PERL CODE to replicate the foreach statement above in terms of performance?
Looks like you only need a hash slice
my %rt;
#rt{ #{ $api_token->{array} } } = #$token_got;
Or, if the hash reference is needed
my $rt;
#{ $rt } { #{ $api_token->{array} } } = #$token_got;
or with the newer postfix dereferencing, on both array and hash slices, perhaps a bit nicer
my $rt;
$rt->#{ $api_token->{array}->#* } = #$token_got;
One can also do it using
List::MoreUtils::mesh, and in one statement
my $rt = { mesh #{ $api_token->{array} }, #$token_got };
or with pairwise from the same library
my $rt = { pairwise { $a, $b } #{ $api_token->{array} }, #$token_got };
These go via C code if the library gets installed with List::MoreUtils::XS.
Benchmarked all above, with the tiny datasets from the question (realistic though?), and whatever implementation mesh/pairwise have they are multiple times as slow as the others.
On an old laptop with v5.26
Rate use_pair use_mesh use_href use_post use_hash
use_pair 373639/s -- -36% -67% -67% -68%
use_mesh 580214/s 55% -- -49% -49% -51%
use_href 1129422/s 202% 95% -- -1% -5%
use_post 1140634/s 205% 97% 1% -- -4%
use_hash 1184835/s 217% 104% 5% 4% --
On a server with v5.36 the numbers are around 160%--170% against pairwise (with mesh being a bit faster than it, similarly to above)
Of the others, on the laptop the hash-based one is always a few percent quicker, while on a server with v5.36 they are all very close. Easy to call it a tie.
The following is edit by OP, who timed a 61% speedup (see comments)
CHANGED CODE:
#rt{ #{ $api_token->{array} } } = #$token_got; ### much faster onliner replaced the loop. #zdim credit

How to efficiently store and restore part of an object (a hash) in Perl

I am extending Chess::Play, a little chess framework. I want to implement a method that undoes a move. In do_move() I save relevant state information for using it later to restore the state:
sub do_move {
my ($self, $move) = #_;
# Do some things and call the super method ...
# Return everything needed for restoring the current state.
return (
BOARD => [#{$self->{BOARD}],
CASTLE_OK => {%{$self->{CASTLE_OK}},
COLOR_TO_MOVE => $self->{COLOR_TO_MOVE},
# ... Other properties omitted for brevity.
);
}
It is, of course, crucial to make deep copies of those properties that are not scalars. The original author is doing more or less the same internally in other places.
In undo_move() I restore the state. A shallow copy is enough because the state information is no longer needed.
sub undo_move {
my ($self, $state) = #_;
$self->{BOARD} = $state->{BOARD};
$self->{CASTLE_OK} = $state->{CASTLE_OK};
$self->{COLOR_TO_MOVE} = $state->{COLOR_TO_MOVE};
}
Alternatively, I could assign to a hash slice:
my #keys = keys %$state;
# Using just %$state on the right-hand side is a bug because
# it returns the values in arbitrary order. Therefore a hash
# slice must be used although the entire state has to be copied
# back.
#{$self}{#keys} = #{$state}{#keys};
But is that really more efficient? Or will Perl internally make a shallow copy of $self for that, copying also the hash slots that I don't want/need to touch?
If using the hash slice is more efficient, would it then be better save the state in do_move() with something like this:
return Storable::dclone(#{$self}{qw(BOARD CASTLE_OK COLOR_TO_MOVE)});
It is key for this application that the code is efficient, not elegant or idiomatic, because it will be executed millions of times.
In reality my state information has 8 properties, one array of 144 integers, one array of 16 integers, one hash with 4 keys, one with two keys, and four scalars.
Yes, I should try out both versions and compare the performance. I will do that. But I am interested in what Perl is doing internally here and how to solve such problems in general.
Edit: Normally, chess engines do not save state but rather just undo the modifications to the board. But that is more complicated than it looks at first glance because of castling and en passant captures. My assumption is at the moment that the shorter the code, the faster it will be, so that most of it runs inside Perl's C code and is not interpreted as Perl bytecode.
Update:   Added benchmarks. In short: the fastest way to copy (this) data is by copying items one by one. It is clearly faster than using a slice, and for this data much faster than using Storable.
I see two problems with copying data by hand, piece by piece
return (
BOARD => [#{$self->{BOARD}],
CASTLE_OK => {%{$self->{CASTLE_OK}},
COLOR_TO_MOVE => $self->{COLOR_TO_MOVE},
# ... Other properties omitted for brevity.
);
As for efficiency, this code manually dereferences and then constructs back arrays and hashes; no work is left undone, and all data gets copied. I don't see how it could be more efficient than the same job done by the fine tuned C code in Storable, regarded to be fast.†
More importantly, the code copies only "one-level-deep," so to say -- if the arrayref $self->{BOARD} has any references for values then there is a problem, and worse yet it'll be a quiet problem. I assume that that is not the case here but it still leaves me itchy allowing for a potential bug, should that change (the proverbial 6 months later, naturally).
So in principle I'd readily go with Storable.†
However, there's a special case, as clarified in comments: there is absolutely no reason to worry about the depth of the copy as there can only be plain arrays and hashes; and, only a subset of the data structure is needed.
First, I don't see why a hash slice would be (measurably) faster than copying item by item; each key still has to be dereferenced, and its value copied over. I don't know the implementation but I'd expect a "slice" to be a syntax feature, for which the work is done the way we do it manually.
While there may be some optimization with a slice, given that all used keys are known in that statement, when you go key by key the slicing is done already since keys are given one by one.
On a million repetitions? Measure. My gut feeling is that only copying by hand could be faster, but again by a faint measure. Or try
$self->{$_} = $state->{$_} for keys %$state;
to avoid constructing an array (#keys). The postfix-for loop (as a statement modifier) has no scope built so that's another teeny-tiny benefit.
What leaves the question of how to return from do_move, and I'd still go for Storable. Even as there is some extra work of generating a slice the
whole copying would have to be faster in that old C code. — nah, see benchmark -- One can't extract a part of a hashref but it has to be rebuilt and that voids possible advantage of the library, for this simple data at least. (Worse, it turns out that manual copy is faster than Storable even for copying the whole hashref, by a factor of 2. The data here is so simple that "by hand" method has no work to do.)
I haven't measured any of this, and that will of course answer the question. — see benchmarks
Conclusion
The data here is so simple that a manual copy is much faster than using Storable, so do as shown in the question. (And since one has to rebuild only a part of the hashref using Storable is even less effective.) But measure on real data.
Then, rebuild data to return (in undo_move) "by hand," see first benchmark below
Edit   This discussion really needs measurements. Here are basic benchmarks
Change parts of a hashref, using "slice" vs writing each key-value "by hand"
use warnings;
use strict;
use feature 'say';
use Storable qw(dclone);
use Benchmark qw(cmpthese);
my $runfor = shift // 3; # second to run-for, see cmpthese(...)
my $data = {
one => [10..12], two => { a => 1, b => 2 },
more => 'some',
};
my $slice = { # use this to change $data
one => [2..6], two => { A => 10, B => 20 },
};
sub by_slice {
my ($data, $slice) = #_;
my #keys = keys %$slice;
#{$data}{#keys} = #{$slice}{#keys};
return $data;
}
sub by_hand {
my ($data, $slice) = #_;
for (qw(one two)) {
$data->{$_} = $slice->{$_}
}
# This can only be faster, but barely
#$data->{$_} = $slice->{$_} for qw(one two);
return $data;
}
my $d1 = dclone $data;
my $d2 = dclone $data;
cmpthese(-$runfor, {
by_slice => sub { my $res_data = by_slice($data, $slice) },
by_hand => sub { my $res_data = by_hand ($data, $slice) },
});
Running this for 10 seconds (progname 10), on a very old laptop with perl 5.16 (CentOS 7)
Rate by_slice by_hand
by_slice 709738/s -- -25%
by_hand 940751/s 33% --
So doing it "by hand" is indeed a bit faster, as guessed but some more. At least in this simple test-case, but I'd expect that no less (and perhaps more?) with more complex data. A part of this is due to constructing that extra array in the 'slice' case.
Copying a part of a complex data structure, by hand vs using Storable
Well, this is a no-contest since there is no efficient way to extract a "part" of a hashref, but one has to rebuild it by pulling out wanted key => value pairs and constructing a hashref with them. Then, while using Storable we have to do all that and call the function to copy data. Given that by-hand copy in this case is very simple it beats the library use hands down (by a factor of 3)
Comparing
sub by_lib {
my ($data) = #_;
return dclone( { one => $data->{one}, two => $data->{two} } );
#return dclone( { map { $_ => $data->{$_} } qw(one two) } ); # little slower
}
sub by_hand {
my ($data) = #_;
return {
one => [ #{ $data->{one} } ],
two => { %{ $data->{two} } },
}
}
by using them in the benchmark program above yields
Rate by_lib by_hand
by_lib 157838/s -- -75%
by_hand 627359/s 297% --
Note added— Just so, I also compared the by-hand-copy with Storable for copying the whole data structure -- and the lib is slower by a factor of 2.
Two morals come out of that, for me: with simple data the overheads of a general purpose library (which has to do a lot of extra work for its generality) are beat by the manual copy; and, needing the slice indeed hurts the library use further, going from x2 to x3.
Speaking of libraries and speed, also see footnote†.
† There is also JSON::XS, which serializes data and comes out about twice as fast as Storable in my test. Thus still half the speed of the manual copying, for simple data.
With it you'd need to JSON-encode data for return-ing from a sub, and to decode on receiving the return
use JSON::XS qw(encode_json decode_json);
sub do_move {
...
return encode_json ...;
}
and
my $ret = decode_json do_move(...);

Performance of local variable vs. array access

I was doing some benchmarking of Perl performance, and ran into a case that I thought was somewhat odd. Suppose you have a function which uses a value from an array multiple times. In this case, you often see some code as:
sub foo {
my $value = $array[17];
do_something_with($value);
do_something_else_with($value);
}
The alternative is not to create a local variable at all:
sub foo {
do_something_with($array[17]);
do_something_else_with($array[17]);
}
For readability, the first is clearer. I assumed that performance would be at least equal (or better) for the first case too - array lookup requires a multiply-and-add, after all.
Imagine my surprise when this test program showed the opposite. On my machine, re-doing the array lookup is actually faster than storing the result, until I increase ITERATIONS to 7; in other words, for me, creating a local variable is only worthwhile if it's used at least 7 times!
use Benchmark qw(:all);
use constant { ITERATIONS => 4, TIME => -5 };
# sample array
my #array = (1 .. 100);
cmpthese(TIME, {
# local variable version
'local_variable' => sub {
my $index = int(rand(scalar #array));
my $val = $array[$index];
my $ret = '';
for (my $i = 0; $i < ITERATIONS; $i ++) {
$ret .= $val;
}
return $ret;
},
# multiple array access version
'multi_access' => sub {
my $index = int(rand(scalar #array));
my $ret = '';
for (my $i = 0; $i < ITERATIONS; $i ++) {
$ret .= $array[$index];
}
return $ret;
}
});
Result:
Rate local_variable multi_access
local_variable 245647/s -- -5%
multi_access 257907/s 5% --
It's not a HUGE difference, but it brings up my question: why is it slower to create a local variable and cache the array lookup, than to do the lookup again? Reading other S.O. posts, I've seen that other languages / compilers do have the expected outcome, and sometimes even transform these into the same code. What is Perl doing?
I've done more poking around at this today, and what I've determined is that scalar assignment of any sort is an expensive operation, relative to the overhead of one-deep array lookup.
This seems like it's just restating the initial question, but I feel I have found more clarity. If, for example, I modify my local_variable subroutine to do another assignment like so:
my $index = int(rand(scalar #array));
my $val = 0; # <- this is new
$val = $array[$index];
my $ret = '';
...the code suffers an additional 5% speed penalty beyond the single-assignment version - even though it does nothing but a dummy assignment to the variable.
I also tested to see if scope caused setup/teardown of $var to impede performance, by switching it to global instead of local scoped one. The difference is negligible (see comments to #zdim above), pointing away from construct/destruct as the performance bottleneck.
In the end, my confusion was based on faulty assumptions that scalar assignment should be fast. I am used to working in C, where copying a value to a local variable is an extremely quick operation (1-2 asm instructions).
As it turns out, this is not the case in Perl (though I don't know exactly why, it's ok). Scalar assignment is a relatively "slow" operation... Whatever Perl internals are doing to get at the nth element of an Array object is actually quite fast by comparison. The "multiply and add" I mentioned in the initial post is still far less work than the code for scalar assignment.
That is why it takes so many lookups to match the performance of caching the result: simply assigning to the "cache" variable is ~7 times slower (for my setup).
Let's first turn the statement: Caching the lookup is expected to be faster as it avoids the repeated lookups, even as it does cost some, and it starts being faster once more than 7 lookups are done. Now that's not so shocking, I think.
As to why it's slower for fewer than seven iterations ... I'll guess that the cost of the scalar creation is still greater than those few lookups. It is surely greater than one lookup, yes? How about two, then? I'd say that "a few" may well be a good measure.

Does awk support dynamic user-defined variables?

awk supports this:
awk '{print $(NF-1);}'
but not for user-defined variables:
awk '{a=123; b="a"; print $($b);}'
by the way, shell supports this:
a=123;
b="a";
eval echo \${$b};
How can I achieve my purpose in awk?
OK, since some of us like to eat spaghetti through their nose, here is some actual code that I wrote in the past :-)
First of all, getting a self modifying code in a language that does not support it will be extremely non-trivial.
The idea to allow dynamic variables, function names, in a language that does not support one is very simple. At some state in the program, you want a dynamic anything to self modify your code, and resume execution
from where you left off. a eval(), that is.
This is all very trivial, if the language supports eval() and such equlavant. However, awk does not have such function. Therefore, you, the programmer has to provide a interface to such thing.
To allow all this to happen, you have three main problems
How to get our self so we can modify it
How to load the modified code, and resume from where we left off
Finding a way for the interpreter to accept our modified code
How to get our self so we can modify it
Here is a example code, suitable for direct execution.
This one is the infastrucure that I inject for enviroments running gawk, as it requires PROCINFO
echo ""| awk '
function push(d){stack[stack[0]+=1]=d;}
function pop(){if(stack[0])return stack[stack[0]--];return "";}
function dbg_printarray(ary , x , s,e, this , i ){
x=(x=="")?"A":x;for(i=((s)?s:1);i<=((e)?e:ary[0]);i++){print x"["i"]=["ary[i]"]"}}
function dbg_argv(A ,this,p){
A[0]=0;p="/proc/"PROCINFO["pid"]"/cmdline";push(RS);RS=sprintf("%c",0);
while((getline v <p)>0)A[A[0]+=1]=v;RS=pop();close(p);}
{
print "foo";
dbg_argv(A);
dbg_printarray(A);
print "bar";
}'
Result:
foo
A[1]=[awk]
A[2]=[
function push(d){stack[stack[0]+=1]=d;}
function pop(){if(stack[0])return stack[stack[0]--];return "";}
function dbg_printarray(ary , x , s,e, this , i ){
x=(x=="")?"A":x;for(i=((s)?s:1);i<=((e)?e:ary[0]);i++){print x"["i"]=["ary[i]"]"}}
function dbg_argv(A ,this,p){
A[0]=0;p="/proc/"PROCINFO["pid"]"/cmdline";push(RS);RS=sprintf("%c",0);
while((getline v <p)>0)A[A[0]+=1]=v;RS=pop();close(p);}
{
print "foo";
dbg_argv(A);
dbg_printarray(A);
print "bar";
}]
bar
As you can see, as long as the OS does not play with our args, and /proc/ is available, it is possible
to read our self. This may appear useless at first, but we need it for push/pop of our stack,
so that our execution state can be enbedded within the code, so we can save/resume and survive OS shutdown/reboots
I have left out the OS detection function and the bootloader (written in awk), because, if I publish that,
kids can build platform independent polynormal code, and it is easy to cause havoc with it.
how to load the modified code, and resume from where we left off
Now, normaly you have push() and pop() for registers, so you can save your state and play with
your self, and resume from where you left off. a Call and reading your stack is a typical way to get the
memory address.
Unfortunetly, in awk, under normal situations we can not use pointers (with out a lot of dirty work),
or registers (unless you can inject other stuff along the way).
However you need a way to suspend and resume from your code.
The idea is simple. Instead of letting awk in control of your loops and while, if else conditions,
recrusion depth, and functions you are in, the code should.
Keep a stack, list of variable names, list of function names, and manage it your self.
Just make sure that your code always calls self_modify( bool ) constantly, so that even upon sudden failure,
As soon as the script is re-run, we can enter self_modify( bool ) and resume our state.
When you want to self modify your code, you must provide a custom made
write_stack() and read_stack() code, that writes out the state of stack as string, and reads string from
the values out from the code embedded string itself, and resume the execution state.
Here is a small piece of code that demonstrates the whole flow
echo ""| awk '
function push(d){stack[stack[0]+=1]=d;}
function pop(){if(stack[0])return stack[stack[0]--];return "";}
function dbg_printarray(ary , x , s,e, this , i ){
x=(x=="")?"A":x;for(i=((s)?s:1);i<=((e)?e:ary[0]);i++){print x"["i"]=["ary[i]"]"}}
function _(s){return s}
function dbg_argv(A ,this,p){
A[0]=0;p="/proc/"PROCINFO["pid"]"/cmdline";push(RS);RS=sprintf("%c",0);
while((getline v <p)>0)A[A[0]+=1]=v;RS=pop();close(p);}
{
_(BEGIN_MODIFY"|");print "#foo";_("|"END_MODIFY)
dbg_argv(A);
sub( \
"BEGIN_MODIFY\x22\x5c\x7c[^\x5c\x7c]*\x5c\x7c\x22""END_MODIFY", \
"BEGIN_MODIFY\x22\x7c\x22);print \"#"PROCINFO["pid"]"\";_(\x22\x7c\x22""END_MODIFY" \
,A[2])
print "echo \x22\x22\x7c awk \x27"A[2]"";
print "function bar_"PROCINFO["pid"]"_(s){print \x22""doe\x22}";
print "\x27"
}'
Result:
Exactly same as our original code, except
_(BEGIN_MODIFY"|");print "65964";_("|"ND_MODIFY)
and
function bar_56228_(s){print "doe"}
at the end of code
Now, this may seem useless, as we are only replaceing code print "foo"; with our pid.
But it becomes usefull, when there are multiple _() with separate MAGIC strings to identify BLOCKS,
and a custome made multi line string replacement routine instead of sub()
You msut provide BLOCKS for stack, function list, execution point, as a bare minimum.
And notice that the last line contains bar
This it self is just a sting, but when this code repeatedly gets executed, notice that
function bar_56228_(s){print "doe"}
function bar_88128_(s){print "doe"}
...
and it keeps growing. While the example is intentionally made so that it does nothing useful,
if we provide a routine to call bar_pid_(s) instead of that print "foo" code,
Sudenly it means we have eval() on our hands :-)
Now, isn't eval() usefull :-)
Don't forget to provide a custome made remove_block() function so that the code maintains
a reasonable size, instead of growing every time you execute.
Finding a way for the interpreter to accept our modified code
Normally calling a binary is trivial. However, when doing so from with in awk, it becomes difficult.
You may say system() is the way.
There are two problems to that.
system() may not work on some envoroments
it blocks while you are executing code, trus you can not perform recrusive calls and keep the user happy at the same time.
If you must use system(), ensure that it does not block.
A normal call to system("sleep 20 && echo from-sh & ") will not work.
The solution is simple,
echo ""|awk '{print "foo";E="echo ep ; sleep 20 && echo foo & disown ; "; E | getline v;close(E);print "bar";}'
Now you have a async system() call that does not block :-)
Not at the moment. However, if you provide a wrapper, it is (somewhat hacky and dirty) possible.
The idea is to use # operator, introduced in the recent versions of gawk.
This # operator is normally used to call a function by name.
So if you had
function foo(s){print "Called foo "s}
function bar(s){print "Called bar "s}
{
var = "";
if(today_i_feel_like_calling_foo){
var = "foo";
}else{
var = "bar";
}
#var( "arg" ); # This calls function foo(), or function bar() with "arg"
}
Now, this is usefull on it's own.
Assuming we know var names beforehand, we can write a wrapper to indirectly modify and obtain vars
function get(varname, this, call){call="get_"varname;return #call();}
function set(varname, arg, this, call){call="set_"varname; #call(arg);}
So now, for each var name you want to prrvide access by name, you declare these two functions
function get_my_var(){return my_var;}
function set_my_var(arg){my_var = arg;}
And prahaps, somewhere in your BEGIN{} block,
BEGIN{ my_var = ""; }
To declare it for global access.
Then you can use
get("my_var");
set("my_var", "whatever");
This may appear useless at first, however there are perfectly good use cases, such as
keeping a linked list of vars, by holding the var's name in another var's array, and such.
It works for arrays too, and to be honest, I use this for nesting and linking Arrays within
Arrays, so I can walk through multiple Arrays like using pointers.
You can also write configure scripts that refer to var names inside awk this way,
in effect having a interpreter-inside-a-interpreter type of things, too...
Not the best way to do things, however, it gets the job done, and I do not have to worry about
null pointer exceptions, or GC and such :-)
The $ notation is not a mark for variables, as in shell, PHP, Perl etc. It is rather an operator, which receives an integer value n and returns the n-th column from the input. So, what you did in the first example is not the setting/getting of a variable dynamically but rather a call to an operator/function.
As stated by commenters, you can archive the behavior you are looking for with arrays:
awk '{a=123; b="a"; v[b] = a; print v[b];}'
I had a similar problem to solve, to load the settings from a '.ini' file and I've used arrays to set the variables dynamically.
It works with Awk or Gawk, Linux or Windows (GnuWin32)
gawk -v Settings_File="my_settings_file.ini" -f awk_script.awk <processing_file>
[my_settings_file.ini]
#comment
first_var=foo
second_var=bar
[awk_script.awk]
BEGIN{
FS="=";
while((getline < Settings_File)>0) {
if($0 !~ /^[#;]|^(\s*)$/) {
var_array[$1] = $2;
}
}
print var_array["first_var"];
print var_array["second_var"];
if (var_array["second_var"] == "bar") {
print "works!";
}
}
{
#more processing
}
END {
#finish processing
}

Check Registry Version using Perl

I need to go to the registry and check a programs installed version. I am using perl to a whole lot of others things but the registry checking part isn't working. The program version has to be 9.7 and up so it could be 9.8 or 9.7.5.
When I install the program it shows 9.7.4 but I just need the 9.7 to be checked.
Bellow is me going to DisplayVersion which is a REG_SZ which shows 9.7.4.
OR
I could use VersionMajor and VersionMinor together which is a REG_DWORD. Which for Major is 9 and Minor is 7.
$ProgVersion= `$rootpath\\system32\\reg\.exe query \\\\$ASSET\\HKLM\\SOFTWARE\\Wow6432Node\\Microsoft\\Windows\\CurrentVersion\\Uninstall\\{9ACB414D-9347-40B6-A453-5EFB2DB59DFA} \/v DisplayVersion`;
if ($ProgVersion == /9.7/)
This doesn't work I could make it 9.200 and it still works. I tried to use this instead and it still wouldn't work. This next part is assuming that a new client needs to be install if it goes from 9.7. I was trying to use Great than or equal to, but it didn't work.
$ProgVersionMajor= `$rootpath\\system32\\reg\.exe query \\\\$ASSET\\HKLM\\SOFTWARE\\Wow6432Node\\Microsoft\\Windows\\CurrentVersion\\Uninstall\\{9ACB414D-9347-40B6-A453-5EFB2DB59DFA} \/v VersionMajor`;
$ProgVersionMinor= `$rootpath\\system32\\reg\.exe query \\\\$ASSET\\HKLM\\SOFTWARE\\Wow6432Node\\Microsoft\\Windows\\CurrentVersion\\Uninstall\\{9ACB414D-9347-40B6-A453-5EFB2DB59DFA} \/v VersionMinor`;
if (($ProgVersionMajor=~ /9/) && ($ProgVersionMinor=~ /7/))
Any help on doing this correctly or fixing what I am doing.
Several things:
You don't mention it, but are you using the Perl module Win32::TieRegistry? If not, you should. It'll make handling the Windows registry much easier.
In the Perl documentation, you can look at Version String under Scalar Value Constructors. This will make manipulating version strings much, much easier. Version strings have either more than one decimal place in them, or start with the letter v. I always prefix them with v to make it obvious what it is.
Here's a sample program below showing you how they can be used in comparisons:
#! /usr/bin/env perl
#
use strict;
use warnings;
my $version = v4.10.3;
for my $testVersion (v3.5.2, v4.4.1, v5.0.1) {
if ($version gt $testVersion) {
printf qq(Version %vd is greater than test %vd\n), $version, $testVersion;
}
else {
printf qq(Version %vd is less than test %vd\n), $version, $testVersion;
}
}
Note that I can't just print version strings. I have to use printf and sprintf and use the %vd vector decimal format to print them out. Printing version strings via a regular print statement can cause all sorts of havoc since they're really unicode representations. You put them in a print statement and you don't know what you're getting.
Also notice that you do not put quotes around them! Otherwise, you'll just make them regular strings.
NEW ANSWER
I was trying to find a way to convert a string into a v-string without downloading an optional package like Perl::Version or (Version), and I suddenly read that v-strings are deprecated, and I don't want to use a deprecated feature.
So, let's try something else...
We could simply divide up version numbers into their constituent components as arrays:
v1.2.3 => $version[0] = 1, $version[1] = 2, $version[2] = 3
By using the following bit of code:
my #version = split /\./, "9.7.5";
my #minVersion = split /\./, "9.7"
Now, we can each part of the version string against the other. In the above example, I compare the 9 of #version with the 9 of #version, etc. If #version was 9.6 I would have compared the 6 in #version against the 7 in #minVersion and quickly discovered that #minVersion is a higher version number. However, in both the second parts are 7. I then look at the third section. Whoops! #minVersion consists of only two sections. Thus, #version is bigger.
Here's a subroutine that does the comparison. Note that I also verify that each section is an integer via the /^\d+$/ regular expression. My subroutine can return four values:
0: Both are the same size
1: First Number is bigger
2: Second Number is bigger
undef: There is something wrong.
Here's the program:
my $minVersion = "10.3.1.3";
my $userVersion = "10.3.2";
# Create the version arrays
my $result = compare($minVersion, $userVersion);
if (not defined $results) {
print "Non-version string detected!\n";
}
elsif ($result == 0) {
print "$minVersion and $userVersion are the same\n";
}
elsif ($result == 1) {
print "$minVersion is bigger than $userVersion\n";
}
elsif ($result == 2) {
print "$userVersion is bigger than $minVersion\n";
}
else {
print "Something is wrong\n";
}
sub compare {
my $version1 = shift;
my $version2 = shift;
my #versionList1 = split /\./, $version1;
my #versionList2 = split /\./, $version2;
my $result;
while (1) {
# Shift off the first value for comparison
# Returns undef if there are no more values to parse
my $versionCompare1 = shift #versionList1;
my $versionCompare2 = shift #versionList2;
# If both are empty, Versions Matched
if (not defined $versionCompare1 and not defined $versionCompare2) {
return 0;
}
# If $versionCompare1 is empty $version2 is bigger
if (not defined $versionCompare1) {
return 2;
}
# If $versionCompare2 is empty $version1 is bigger
if (not defined $versionCompare2) {
return 1;
}
# Make sure both are numeric or else there's an error
if ($versionCompare1 !~ /\^d+$/ or $versionCompare2 !~ /\^\d+$/) {
return;
}
if ($versionCompare1 > $versionCompare2) {
return 1;
}
if ($versionCompare2 > $versionCompare1) {
return 2;
}
}
}
Using Win32::TieRegistry
You said in your answer you didn't use Win32::TieRegistry. I just want to show you what it can do for the readability of your program:
Your Way
$ProgVersion= `$rootpath\\system32\\reg\.exe query \\\\$ASSET\\HKLM\\SOFTWARE\\Wow6432Node\\Microsoft\\Windows\\CurrentVersion\\Uninstall\\{9ACB414D-9347-40B6-A453-5EFB2DB59DFA} \/v DisplayVersion`;
With Win32::TieRegistry
use Win32::TieRegistry ( TiedHash => '%RegHash', DWordsToHex => 0 );
my $key = $TiedHash->{LMachine}->{Software}->{Wow6432Node}->{Microsoft}->{Windows}->{CurrentVersion}->{Uninstall}->{9ACB414D-9347-40B6-A453-5EFB2DB59DFA}->{Version};
my $programValue = $key->GetValue;
my $stringValue = unpack("L", $programValue);
Or, you can split it up:
my $MSSoftware = $TiedHash->{LMachine}->{Software}->{Wow6432Node}->{Microsoft};
my $uninstall = $MSSoftware->{Windows}->{CurrentVersion}->{Uninstall};
my $programVersion = $uninstall->{9ACB414D-9347-40B6-A453-5EFB2DB59DFA}->{Version};
See how much easier that's to read. You can also use this to test keys too.
(Word 'o Warning: I don't have a Windows machine in front of me, so I didn't exactly check the validity of the code. Try playing around with it and see what you get.)

Resources