Additional hash lookup using 'exists'? - performance

I sometimes access a hash like this:
if(exists $ids{$name}){
$id = $ids{$name};
}
Is that good practice? I'm a bit concerned that it contains two lookups where really one should be done. Is there a better way to check the existence and assign the value?

By checking with exists, you prevent autovivification. See Autovivification : What is it and why do I care?.
UPDATE: As trendels points out below, autovivification does not come into play in the example you posted. I am assuming that the actual code involves multi-level hashes.
Here is an illustration:
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
my (%hash, $x);
if ( exists $hash{test}->{vivify} ) {
$x = $hash{test}->{vivify}->{now};
}
print Dumper \%hash;
$x = $hash{test}->{vivify}->{now};
print Dumper \%hash;
__END__
C:\Temp> t
$VAR1 = {
'test' => {}
};
$VAR1 = {
'test' => {
'vivify' => {}
}
};

You could use apply Hash::Util's lock_keys to the hash. Then perform your assignments within an eval.
#!/usr/bin/perl
use Hash::Util qw/lock_keys/;
my %a = (
1 => 'one',
2 => 'two'
);
lock_keys(%a);
eval {$val = $a{2}}; # this assignment completes
eval {$val = $a{3}}; # this assignment aborts
print "val=$val\n"; # has value 'two'

You can do it with one lookup like this:
$tmp = $ids{$name};
$id = $tmp if (defined $tmp);
However, I wouldn't bother unless I saw that that was a bottleneck

performance is not important in this case see "Devel::NYTProf".
But to answer your question:
if the value in the hash does not exists, "exists" is very fast
if(exists $ids{$name}){
$id = $ids{$name};
}
but if it does exists a second lookup is done.
if the value is likely to exists than making only one look up will be faster
$id = $ids{$name};
if($id){
#....
}
see this littel benchmark from a perl mailing list.
#!/usr/bin/perl -w
use strict;
use Benchmark qw( timethese );
use vars qw( %hash );
#hash{ 'A' .. 'Z', 'a' .. 'z' } = (1) x 52;
my $key = 'xx';
timethese 10000000, {
'defined' => sub {
if (defined $hash{$key}) { my $x = $hash{$key}; return $x; };
return 0;
},
'defined_smart' => sub {
my $x = $hash{$key};
if (defined $x) {
return $x;
};
return 0;
},
'exists' => sub {
if (exists $hash{$key}) { my $x = $hash{$key}; return $x; };
return 0;
},
'as is' => sub {
if ($hash{$key}) { my $x = $hash{$key}; return $x; };
return 0;
},
'as is_smart' => sub {
my $x = $hash{$key};
if ($x) { return $x; };
return 0;
},
};
using a key('xx') that does not exists shows that 'exists' is the winner.
Benchmark: timing 10000000 iterations of as is, as is_smart, defined, defined_smart, exists...
as is: 1 wallclock secs ( 1.52 usr + 0.00 sys = 1.52 CPU) # 6578947.37/s (n=10000000)
as is_smart: 3 wallclock secs ( 2.67 usr + 0.00 sys = 2.67 CPU) # 3745318.35/s (n=10000000)
defined: 3 wallclock secs ( 1.53 usr + 0.00 sys = 1.53 CPU) # 6535947.71/s (n=10000000)
defined_smart: 3 wallclock secs ( 2.17 usr + 0.00 sys = 2.17 CPU) # 4608294.93/s (n=10000000)
exists: 1 wallclock secs ( 1.33 usr + 0.00 sys = 1.33 CPU) # 7518796.99/s (n=10000000)
using a key('x') that does exists shows that 'as is_smart' is the winner.
Benchmark: timing 10000000 iterations of as is, as is_smart, defined, defined_smart, exists...
as is: 3 wallclock secs ( 2.76 usr + 0.00 sys = 2.76 CPU) # 3623188.41/s (n=10000000)
as is_smart: 3 wallclock secs ( 1.81 usr + 0.00 sys = 1.81 CPU) # 5524861.88/s (n=10000000)
defined: 3 wallclock secs ( 3.42 usr + 0.00 sys = 3.42 CPU) # 2923976.61/s (n=10000000)
defined_smart: 2 wallclock secs ( 2.32 usr + 0.00 sys = 2.32 CPU) # 4310344.83/s (n=10000000)
exists: 3 wallclock secs ( 2.83 usr + 0.00 sys = 2.83 CPU) # 3533568.90/s (n=10000000)

if it is not a multi-level hash you can do this:
$id = $ids{$name} || 'foo';
or if $id already has a value:
$id ||= $ids{$name};
where 'foo' is a default or fall-through value. If it is a multi-level hash you would use 'exists' to avoid the autovivification discussed earlier in the thread or not use it if autovivification is not going to be a problem.

If I want high performance I'm used to write this idiom when want create hash as set:
my %h;
for my $key (#some_vals) {
...
$h{$key} = undef unless exists $h{$key};
...
}
return keys %h;
This code is little bit faster than commonly used $h{$key}++. exists avoids useless assignment and undef avoids allocation for value. Best answer for you is: Benchmark it! I guess that exists $ids{$name} is little bit faster than $id=$ids{$name} and if you have big miss ratio your version with exists can be faster than assignment and test after.
For example if I want fast sets intersection I would wrote something like this.
sub intersect {
my $h;
#$h{#{shift()}} = ();
my $i;
for (#_) {
return unless %$h;
$i = {};
#$i{grep exists $h->{$_}, #$_} = ();
$h = $i;
}
return keys %$h;
}

Related

Perl6 : What is the best way for dealing with very big files?

Last week I decided to give a try to Perl6 and started to reimplement one of my program.
I have to say, Perl6 is so the easy for object programming, an aspect very painfull to me in Perl5.
My program have to read and store big files, such as whole genomes (up to 3 Gb and more, See example 1 below) or tabulate data.
The first version of the code was made in the Perl5 way by iterating line by line ("genome.fa".IO.lines). It was very slow and unsable for a correct execution time.
my class fasta {
has Str $.file is required;
has %!seq;
submethod TWEAK() {
my $id;
my $s;
for $!file.IO.lines -> $line {
if $line ~~ /^\>/ {
say $id;
if $id.defined {
%!seq{$id} = sequence.new(id => $id, seq => $s);
}
my $l = $line;
$l ~~ s:g/^\>//;
$id = $l;
$s = "";
}
else {
$s ~= $line;
}
}
%!seq{$id} = sequence.new(id => $id, seq => $s);
}
}
sub MAIN()
{
my $f = fasta.new(file => "genome.fa");
}
So after a little bit of RTFM, I changed for a slurp on the file, a split on the \n which I parsed with a for loop. This way I managed to load the data in 2 min. Much better but not enough. By cheating, I mean by removing a maximum of \n (Example 2), I decreased the execution time to 30 seconds. Quite good, but not totaly satisfied, by this fasta format is not the most used.
my class fasta {
has Str $.file is required;
has %!seq;
submethod TWEAK() {
my $id;
my $s;
say "Slurping ...";
my $f = $!file.IO.slurp;
say "Spliting file ...";
my #lines = $f.split(/\n/);
say "Parsing lines ...";
for #lines -> $line {
if $line !~~ /^\>/ {
$s ~= $line;
}
else {
say $id;
if $id.defined {
%!seq{$id} = seq.new(id => $id, seq => $s);
}
$id = $line;
$id ~~ s:g/^\>//;
$s = "";
}
}
%!seq{$id} = seq.new(id => $id, seq => $s);
}
}
sub MAIN()
{
my $f = fasta.new(file => "genome.fa");
}
So RTFM again and I discovered the magic of Grammar. So new version and an execution time of 45 seconds whatever the fasta format used. Not the fastest way but more elegant and stable.
my grammar fastaGrammar {
token TOP { <fasta>+ }
token fasta {<.ws><header><seq> }
token header { <sup><id>\n }
token sup { '>' }
token id { <[\d\w]>+ }
token seq { [<[ACGTNacgtn]>+\n]+ }
}
my class fastaActions {
method TOP ($/){
my #seqArray;
for $<fasta> -> $f {
#seqArray.push: seq.new(id => $f.<header><id>.made, seq => $f<seq>.made);
}
make #seqArray;
}
method fasta ($/) { make ~$/; }
method id ($/) { make ~$/; }
method seq ($/) { make $/.subst("\n", "", :g); }
}
my class fasta {
has Str $.file is required;
has %seq;
submethod TWEAK() {
say "=> Slurping ...";
my $f = $!file.IO.slurp;
say "=> Grammaring ...";
my #seqArray = fastaGrammar.parse($f, actions => fastaActions).made;
say "=> Storing data ...";
for #seqArray -> $s {
%!seq{$s.id} = $s;
}
}
}
sub MAIN()
{
my $f = fasta.new(file => "genome.fa");
}
I think that I found good solution to handle these kind of big files, but performances are still under those of Perl5.
As a newbie in Perl6, I would be interested to know if there is better ways to deal with big data or if there is some limitation due to the Perl6 implementation ?
As a newbie in Perl6, I would ask two questions :
Is there other Perl6 mechanisms that I'm not aware yet, or not yet
documented, for storing huge data from a file (like my genomes) ?
Did I reach the maximum performances for the current version of
Perl6 ?
Thanks for reading !
Fasta Example 1 :
>2L
CGACAATGCACGACAGAGGAAGCAGAACAGATATTTAGATTGCCTCTCATTTTCTCTCCCATATTATAGGGAGAAATATG
ATCGCGTATGCGAGAGTAGTGCCAACATATTGTGCTCTTTGATTTTTTGGCAACCCAAAATGGTGGCGGATGAACGAGAT
...
>3R
CGACAATGCACGACAGAGGAAGCAGAACAGATATTTAGATTGCCTCTCATTTTCTCTCCCATATTATAGGGAGAAATATG
ATCGCGTATGCGAGAGTAGTGCCAACATATTGTGCTCTTTGATTTTTTGGCAACCCAAAATGGTGGCGGATGAACGAGAT
...
Fasta example 2 :
>2L
GACAATGCACGACAGAGGAAGCAGAACAGATATTTAGATTGCCTCTCAT...
>3R
TAGGGAGAAATATGATCGCGTATGCGAGAGTAGTGCCAACATATTGTGCT...
EDIT
I applied advises of #Christoph and #timotimo and test with code:
my class fasta {
has Str $.file is required;
has %!seq;
submethod TWEAK() {
say "=> Slurping / Parsing / Storing ...";
%!seq = slurp($!file, :enc<latin1>).split('>').skip(1).map: {
.head => seq.new(id => .head, seq => .skip(1).join) given .split("\n").cache;
}
}
}
sub MAIN()
{
my $f = fasta.new(file => "genome.fa");
}
The program finished in 2.7s, which is so great !
I also tried this code on the wheat genome (10 Gb). It finished in 35.2s.
Perl6 is not so slow finally !
Big Thank for the help !
One simple improvement is to use a fixed-width encoding such as latin1 to speed up character decoding, though I'm not sure how much this will help.
As far as Rakudo's regex/grammar engine is concerned, I've found it to be pretty slow, so it might indeed be necessary to take a more low-level approach.
I did not do any benchmarking, but what I'd try first is something like this:
my %seqs = slurp('genome.fa', :enc<latin1>).split('>')[1..*].map: {
.[0] => .[1..*].join given .split("\n");
}
As the Perl6 standard library is implemented in Perl6 itself, it is sometimes possible to improve performance by just avoiding it, writing code in an imperative style such as this:
my %seqs;
my $data = slurp('genome.fa', :enc<latin1>);
my $pos = 0;
loop {
$pos = $data.index('>', $pos) // last;
my $ks = $pos + 1;
my $ke = $data.index("\n", $ks);
my $ss = $ke + 1;
my $se = $data.index('>', $ss) // $data.chars;
my #lines;
$pos = $ss;
while $pos < $se {
my $end = $data.index("\n", $pos);
#lines.push($data.substr($pos..^$end));
$pos = $end + 1
}
%seqs{$data.substr($ks..^$ke)} = #lines.join;
}
However, if the parts of the standard library used has seen some performance work, this might actually make things worse. In that case, the next step to take would be adding low-level type annotations such as str and int and replacing calls to routines such as .index with NQP builtins such as nqp::index.
If that's still too slow, you're out of luck and will need to switch languages, eg calling into Perl5 by using Inline::Perl5 or C using NativeCall.
Note that #timotimo has done some performance measurements and wrote an article about it.
If my short version is the baseline, the imperative version improves performance by 2.4x.
He actually managed to squeeze a 3x improvement out of the short version by rewriting it to
my %seqs = slurp('genome.fa', :enc<latin-1>).split('>').skip(1).map: {
.head => .skip(1).join given .split("\n").cache;
}
Finally, rewriting the imperative version using NQP builtins sped things up by a factor of 17x, but given potential portability issues, writing such code is generally discouraged, but may be necessary for now if you really need that level of performance:
use nqp;
my Mu $seqs := nqp::hash();
my str $data = slurp('genome.fa', :enc<latin1>);
my int $pos = 0;
my str #lines;
loop {
$pos = nqp::index($data, '>', $pos);
last if $pos < 0;
my int $ks = $pos + 1;
my int $ke = nqp::index($data, "\n", $ks);
my int $ss = $ke + 1;
my int $se = nqp::index($data ,'>', $ss);
if $se < 0 {
$se = nqp::chars($data);
}
$pos = $ss;
my int $end;
while $pos < $se {
$end = nqp::index($data, "\n", $pos);
nqp::push_s(#lines, nqp::substr($data, $pos, $end - $pos));
$pos = $end + 1
}
nqp::bindkey($seqs, nqp::substr($data, $ks, $ke - $ks), nqp::join("", #lines));
nqp::setelems(#lines, 0);
}

Fastest way to populate a hash in perl

I am trying filling a hash in perl from a file of around 564k lines, and the code is taking like 1.6~2.1 seconds to execute, while the equivalent in C# takes around 0.8 seconds to finish. Is there any better way to do it in Perl ?
I have tried so far :
# 1 - this version take ~ +1.6 seconds to fill the hash from file with ~ 564000
my %voc;
open(F,"<$voc_file");
while(defined(my $line=<F>)) {
chomp($line);
$voc{$line} = 1;
}
close(F);
and this
# 2 - this version take ~ +2.1 seconds to fill the hash from file with ~ 564000
my %voc;
my #voc_keys;
my #array_of_ones;
open(F,"<$voc_file");
my $voc_keys_index = 0;
while(defined(my $line=<F>)) {
chomp($line);
$voc_keys[$voc_keys_index] = $line;
$array_of_ones[$voc_keys_index] = 1;
$voc_keys_index ++;
}
#voc{#voc_keys} = #array_of_ones;
close(F);
In c#, I am using :
var voc = new Dictionary<String, int>();
foreach (string line in File.ReadLines(pathToVoc_file))
{
var trimmedline = line.TrimEnd(new char[] { '\n' });
voc[trimmedline] = 1;
}
And it takes only 700~800 ms
Definitely avoiding storing 1's as the data and using exists can save time and memory. You can eke out even more by removing the block from the loop:
my %voc;
open(F,"<$voc_file");
chomp, undef $voc{$_} while <F>;
close(F);
Benchmark results (using 20 character lines):
Benchmark: running ikegami, original, statementmodifier, statementmodifier_undef for at least 10 CPU seconds...
ikegami: 10 wallclock secs ( 9.54 usr + 0.46 sys = 10.00 CPU) # 2.10/s (n=21)
original: 10 wallclock secs ( 9.62 usr + 0.45 sys = 10.07 CPU) # 2.09/s (n=21)
statementmodifier: 10 wallclock secs ( 9.61 usr + 0.48 sys = 10.09 CPU) # 2.18/s (n=22)
statementmodifier_undef: 11 wallclock secs ( 9.85 usr + 0.48 sys = 10.33 CPU) # 2.23/s (n=23)
Benchmark:
use strict;
use warnings;
use Benchmark 'timethese';
my $voc_file = 'rand.txt';
sub check {
my ($voc) = #_;
unless (keys %$voc == 564000) {
warn "bad number of keys ", scalar keys %$voc;
}
chomp(my $expected_line = `head -1 $voc_file`);
unless (exists $voc->{$expected_line}) {
warn "bad data";
}
return;
}
timethese(-10, {
'statementmodifier' => sub {
my %voc;
open(F,"<$voc_file");
chomp, $voc{$_} = 1 while <F>;
close(F);
#check(\%voc);
return;
},
'statementmodifier_undef' => sub {
my %voc;
open(F,"<$voc_file");
chomp, undef $voc{$_} while <F>;
close(F);
#check(\%voc);
return;
},
'original' => sub {
my %voc;
open(F,"<$voc_file");
while(defined(my $line=<F>)) {
chomp($line);
$voc{$line} = 1;
}
close(F);
#check(\%voc);
return;
},
'ikegami' => sub {
my %voc;
open(F,"<$voc_file");
while(defined(my $line=<F>)) {
chomp($line);
undef $voc{$line};
}
close(F);
#check(\%voc);
return;
},
});
(Original incorrect answer replaced with this.)
Of course C# is going to be faster.
You could save a little time and some memory by replacing
$voc{$line} = 1; ... if ($voc{$key}) { ... } ...
with
undef $voc{$line}; ... if (exists($voc{$key})) { ... } ...

Why does replacing Perl's s/// with a dummy function using Inline::C cause a significant slow down?

I have an array of strings of about 100,000 elements. I need to iterate through each element and replace some words with other words. This takes a few seconds in pure perl. I need to speed this up as much as I can. I'm testing using the following snippet:
use strict;
my $string = "This is some string. Its only purpose is for testing.";
for( my $i = 1; $i < 100000; $i++ ) {
$string =~ s/old1/new1/ig;
$string =~ s/old2/new2/ig;
$string =~ s/old3/new3/ig;
$string =~ s/old4/new4/ig;
$string =~ s/old5/new5/ig;
}
I know this doesn't actually replace anything in the test string, but it's for speed testing only.
I had my hopes set on Inline::C. I've never worked with Inline::C before but after reading up on it a bit, I thought it was fairly simple to implement. But apparently, even calling a stub function that does nothing is a lot slower. Here's the snippet I tested with:
use strict;
use Benchmark qw ( timethese );
use Inline 'C';
timethese(
5,
{
"Pure Perl" => \&pure_perl,
"Inline C" => \&inline_c
}
);
sub pure_perl {
my $string = "This is some string. Its only purpose is for testing.";
for( my $i = 1; $i < 1000000; $i++ ) {
$string =~ s/old1/new1/ig;
$string =~ s/old2/new2/ig;
$string =~ s/old3/new3/ig;
$string =~ s/old4/new4/ig;
$string =~ s/old5/new5/ig;
}
}
sub inline_c {
my $string = "This is some string. Its only purpose is for testing.";
for( my $i = 1; $i < 1000000; $i++ ) {
$string = findreplace( $string, "old1", "new1" );
$string = findreplace( $string, "old2", "new2" );
$string = findreplace( $string, "old3", "new3" );
$string = findreplace( $string, "old4", "new4" );
$string = findreplace( $string, "old5", "new5" );
}
}
__DATA__
__C__
char *
findreplace( char *text, char *what, char *with ) {
return text;
}
on my Linux box, the result is:
Benchmark: timing 5 iterations of Inline C, Pure Perl...
Inline C: 6 wallclock secs ( 5.51 usr + 0.02 sys = 5.53 CPU) # 0.90/s (n=5)
Pure Perl: 2 wallclock secs ( 2.51 usr + 0.00 sys = 2.51 CPU) # 1.99/s (n=5)
Pure Perl is twice as fast as calling an empty C function. Not at all what I expected! Again, I've never worked with Inline::C before so maybe I am missing something here?
In the version using Inline::C, you kept everything that was in the original pure Perl script, and changed just one thing: Additionally, you've replaced Perl's highly optimized s/// with a worse implementation. Invoking your dummy function actually involves work whereas none of the s/// invocations do much in this case. It is a priori impossible for the Inline::C version to run faster.
On the C side, the function
char *
findreplace( char *text, char *what, char *with ) {
return text;
}
is not a "do nothing" function. Calling it involves unpacking arguments. The string pointed to by text has to be copied to the return value. There is some overhead which you are paying for each invocation.
Given that s/// does no replacements, there is no copying involved in that. In addition, Perl's s/// is highly optimized. Are you sure you can write a better find & replace that is faster to make up for the overhead of calling an external function?
If you use the following implementation, you should get comparable speeds:
sub inline_c {
my $string = "This is some string. It's only purpose is for testing.";
for( my $i = 1; $i < 1000000; $i++ ) {
findreplace( $string );
findreplace( $string );
findreplace( $string );
findreplace( $string );
findreplace( $string );
}
}
__END__
__C__
void findreplace( char *text ) {
return;
}
Benchmark: timing 5 iterations of Inline C, Pure Perl...
Inline C: 6 wallclock secs ( 5.69 usr + 0.00 sys = 5.69 CPU) # 0.88/s (n=5)
Pure Perl: 6 wallclock secs ( 5.70 usr + 0.00 sys = 5.70 CPU) # 0.88/s (n=5)
The one possibility of gaining speed is to exploit any special structure involved in the search pattern and replacements and write something to implement that.
On the Perl side, you should at least pre-compile the patterns.
Also, since your problem is embarrassingly parallel, you are better off looking into chopping up the work into as many chunks as you have cores to work with.
For example, take a look at the Perl entries in the regex-redux task in the Benchmarks Game:
Perl #4 (fork only): 14.13 seconds
and
Perl #3 (fork & threads): 14.47 seconds
versus
Perl #1: 34.01 seconds
That is, some primitive exploitation of parallelization possibilities results in a 60% speedup. That problem is not exactly comparable because the substitutions must be done sequentially, but still gives you an idea.
If you have eight cores, dole out the work to eight cores.
Also, consider the following script:
#!/usr/bin/env perl
use warnings;
use strict;
use Data::Fake::Text;
use List::Util qw( sum );
use Time::HiRes qw( time );
use constant INPUT_SIZE => $ARGV[0] // 1_000_000;
run();
sub run {
my #substitutions = (
sub { s/dolor/new1/ig },
sub { s/fuga/new2/ig },
sub { s/facilis/new3/ig },
sub { s/tempo/new4/ig },
sub { s/magni/new5/ig },
);
my #times;
for (1 .. 5) {
my $data = read_input();
my $t0 = time;
find_and_replace($data, \#substitutions);
push #times, time - $t0;
}
printf "%.4f\n", sum(#times)/#times;
return;
}
sub find_and_replace {
my $data = shift;
my $substitutions = shift;
for ( #$data ) {
for my $s ( #$substitutions ) {
$s->();
}
}
return;
}
{
my #input;
sub read_input {
#input
or #input = map fake_sentences(1)->(), 1 .. INPUT_SIZE;
return [ #input ];
}
}
In this case, each invocation of find_and_replace takes about 2.3 seconds my laptop. The five replications run in about 30 seconds. The overhead is the combined cost of generating the 1,000,000 sentence data set and copying it four times.

Why iterating with $_ takes longer

I was wondering based on many books on Internet, if $_ is really faster way of iterating through array (no instantiating of new variable), but somehow I always get different results. Here's the performance code test:
#!/usr/bin/perl
use Time::HiRes qw(time);
use strict;
use warnings;
# $_ is a default argument for many operators, and also for some control structures.
my $test_array = [1..1000000];
my $number_of_tests = 100;
my $dollar_wins = 0;
my $dollar_wins_sum = 0;
for (my $i = 1; $i <= $number_of_tests; $i++) {
my $odd_void_array = [];
my $start_time_1 = time();
foreach my $item (#{$test_array}) {
if ($item % 2 == 1) {
push (#{$odd_void_array}, $item);
}
}
foreach my $item_odd (#{$odd_void_array}) {
}
my $end_time_1 = time();
$odd_void_array = [];
my $start_time_2 = time();
foreach (#{$test_array}) {
if ($_ % 2 == 1) {
push (#{$odd_void_array}, $_);
}
}
foreach (#{$odd_void_array}) {
}
my $end_time_2 = time();
my $diff = ($end_time_1-$start_time_1) - ($end_time_2-$start_time_2);
if ($diff > 0) {
$dollar_wins ++;
$dollar_wins_sum += $diff;
print "Dollar won ($dollar_wins out of $i) with diff $diff \n";
}
}
print "=================================\n";
print "When using dollar underscore, execution was faster in $dollar_wins cases (".(($dollar_wins/$number_of_tests)*100)."%), with average difference of ".($dollar_wins_sum/$dollar_wins)."\n";
So, I have twice iterating (once with assigning to my $item, other without). I get mostly that iterating with $_ was faster in about 20-30% cases.
Shouldn't be iterating without new variable be faster?
You aren't really benchmarking iteration with different variables.
Your timings includes array creation and other calculations.
You only tell which is faster, not by how much.
You have too few iterations to tell anything reliable.
Let's take this better test that actually benchmarks what you are claiming to benchmark:
use strict;
use warnings;
use Benchmark ':hireswallclock', 'cmpthese';
my #numbers = 1..100_000;
cmpthese -3, {
'$_' => sub {
for (#numbers) {
1;
}
},
'my $x' => sub {
for my $x (#numbers) {
1;
}
},
'$x' => sub {
my $x;
for $x (#numbers) {
1;
}
},
}
Result:
Rate $_ my $x $x
$_ 107/s -- -0% -0%
my $x 107/s 0% -- -0%
$x 108/s 0% 0% --
So they are equally fast on my test system (perl 5.18.2 built for i686-linux-thread-multi-64int).
My suspicion is that using $_ is slightly slower than a lexical, as it's a global variable. However, the speed of iteration is equivalent. Indeed, modifying the benchmark…
use strict;
use warnings;
use Benchmark ':hireswallclock', 'cmpthese';
my #numbers = 1..100_000;
cmpthese -3, {
'$_' => sub {
for (#numbers) {
$_ % 2 == 0;
}
},
'my $x' => sub {
for my $x (#numbers) {
$x % 2 == 0;
}
},
'$x' => sub {
my $x;
for $x (#numbers) {
$x % 2 == 0;
}
},
}
… gives
Rate $_ $x my $x
$_ 40.3/s -- -1% -6%
$x 40.6/s 1% -- -5%
my $x 42.9/s 7% 6% --
but the effects are still too small to draw any solid conclusion.

DBD::Oracle Column returns number in unusual format

I am querying an Oracle database using DBD::Oracle, however, oddly enough the decimals are return as follows:
0.1 will be returned as .1
0.97 will be returned as .97
Any ideas why this is happening.
I am using the latest version of DBI and DBD::Oracle.
That would be the default number format in your session. DBD::Oracle retrieves columns as strings so there will be an implicit TO_CHAR done by Oracle. If you want a different format do a TO_CHAR(column_name, 'FORMAT'). e.g.,
use strict;
use warnings;
use DBI;
use Devel::Peek;
my $h = DBI->connect('dbi:Oracle:host=xxx;sid=xxx','xxx','xxx',
{RaiseError => 1});
eval {
$h->do(q/drop table mje/);
};
$h->do(q/create table mje (a number)/);
my #n = (0.1, 0.97);
my $s = $h->prepare(q/insert into mje values(?)/);
foreach (#n) {
$s->execute($_);
}
$s = $h->prepare(q/select * from mje/);
$s->execute;
while (my #row = $s->fetchrow) {
Dump($row[0]);
}
$s = $h->prepare(q/select TO_CHAR(a, '09D99') from mje/);
$s->execute;
while (my #row = $s->fetchrow) {
Dump($row[0]);
}
outputs
SV = PV(0x9b72810) at 0x9c57ed8
REFCNT = 1
FLAGS = (POK,pPOK)
PV = 0x9d11ec0 ".1"\0
CUR = 2
LEN = 12
SV = PV(0x9b72810) at 0x9c57ed8
REFCNT = 1
FLAGS = (POK,pPOK)
PV = 0x9d18230 ".97"\0
CUR = 3
LEN = 12
SV = PV(0x9b72840) at 0x9c57ff8
REFCNT = 1
FLAGS = (POK,pPOK)
PV = 0x9d0d600 " 00.10"\0
CUR = 6
LEN = 12
SV = PV(0x9b72840) at 0x9c57ff8
REFCNT = 1
FLAGS = (POK,pPOK)
PV = 0x9d0d600 " 00.97"\0
CUR = 6
LEN = 12
I added the Devel::Peek just to show you they are strings (see the PV) by default so adding the TO_CHAR makes no difference.

Resources