Why iterating with $_ takes longer - performance

I was wondering based on many books on Internet, if $_ is really faster way of iterating through array (no instantiating of new variable), but somehow I always get different results. Here's the performance code test:
#!/usr/bin/perl
use Time::HiRes qw(time);
use strict;
use warnings;
# $_ is a default argument for many operators, and also for some control structures.
my $test_array = [1..1000000];
my $number_of_tests = 100;
my $dollar_wins = 0;
my $dollar_wins_sum = 0;
for (my $i = 1; $i <= $number_of_tests; $i++) {
my $odd_void_array = [];
my $start_time_1 = time();
foreach my $item (#{$test_array}) {
if ($item % 2 == 1) {
push (#{$odd_void_array}, $item);
}
}
foreach my $item_odd (#{$odd_void_array}) {
}
my $end_time_1 = time();
$odd_void_array = [];
my $start_time_2 = time();
foreach (#{$test_array}) {
if ($_ % 2 == 1) {
push (#{$odd_void_array}, $_);
}
}
foreach (#{$odd_void_array}) {
}
my $end_time_2 = time();
my $diff = ($end_time_1-$start_time_1) - ($end_time_2-$start_time_2);
if ($diff > 0) {
$dollar_wins ++;
$dollar_wins_sum += $diff;
print "Dollar won ($dollar_wins out of $i) with diff $diff \n";
}
}
print "=================================\n";
print "When using dollar underscore, execution was faster in $dollar_wins cases (".(($dollar_wins/$number_of_tests)*100)."%), with average difference of ".($dollar_wins_sum/$dollar_wins)."\n";
So, I have twice iterating (once with assigning to my $item, other without). I get mostly that iterating with $_ was faster in about 20-30% cases.
Shouldn't be iterating without new variable be faster?

You aren't really benchmarking iteration with different variables.
Your timings includes array creation and other calculations.
You only tell which is faster, not by how much.
You have too few iterations to tell anything reliable.
Let's take this better test that actually benchmarks what you are claiming to benchmark:
use strict;
use warnings;
use Benchmark ':hireswallclock', 'cmpthese';
my #numbers = 1..100_000;
cmpthese -3, {
'$_' => sub {
for (#numbers) {
1;
}
},
'my $x' => sub {
for my $x (#numbers) {
1;
}
},
'$x' => sub {
my $x;
for $x (#numbers) {
1;
}
},
}
Result:
Rate $_ my $x $x
$_ 107/s -- -0% -0%
my $x 107/s 0% -- -0%
$x 108/s 0% 0% --
So they are equally fast on my test system (perl 5.18.2 built for i686-linux-thread-multi-64int).
My suspicion is that using $_ is slightly slower than a lexical, as it's a global variable. However, the speed of iteration is equivalent. Indeed, modifying the benchmark…
use strict;
use warnings;
use Benchmark ':hireswallclock', 'cmpthese';
my #numbers = 1..100_000;
cmpthese -3, {
'$_' => sub {
for (#numbers) {
$_ % 2 == 0;
}
},
'my $x' => sub {
for my $x (#numbers) {
$x % 2 == 0;
}
},
'$x' => sub {
my $x;
for $x (#numbers) {
$x % 2 == 0;
}
},
}
… gives
Rate $_ $x my $x
$_ 40.3/s -- -1% -6%
$x 40.6/s 1% -- -5%
my $x 42.9/s 7% 6% --
but the effects are still too small to draw any solid conclusion.

Related

Create a sorted list from n sorted sublists (efficiently)

I was playing around with parallel sorting tonight.
creating sort file
naive-sort ...
1000000
23.61265496
partial-hyper-sort ...
4
7.4924575
simple-hyper-sort ...
1000000
141.7945921
naive-hyper-sort ...
1000000
23.5756172
Two things stand out.
a) naive-hyper-sort is just as fast as ordinary sort
b) The sorting in partial-hyper-sort is 66% faster than ordinary sort.
My problem: partial-hyper-sort is exactly that: "partial". It returns (on my system) 4 sublists, but you want of course one. My attempt to merge them into one (simple-hyper-sort) is an order of magnitude slower than the whole sorting!
So how do I get this faster? And if someone can explain why naive-hyper-sort is not faster than naive-sort, bonus points and a cookie (seriously, a literal cookie).
create-sortfile
unless "tosort.txt".IO.e;
my $start = DateTime.now;
say "naive-sort ...";
say naive-sort.elems;
say DateTime.now - $start;
$start = DateTime.now;
say "partial-hyper-sort ...";
say partial-hyper-sort.elems;
say DateTime.now - $start;
$start = DateTime.now;
say "simple-hyper-sort ...";
say simple-hyper-sort.elems;
say DateTime.now - $start;
$start = DateTime.now;
say "naive-hyper-sort ...";
say naive-hyper-sort.elems;
say DateTime.now - $start;
sub create-sortfile
{
say "creating sort file";
my $to-sort = "tosort.txt".IO.open(:w);
$to-sort.say( ( 10_000 .. 99_999 ).pick )
for ( 1 .. 1_000_000 );
$to-sort.close;
}
sub simple-hyper-sort
{
my $to-sort = "tosort.txt".IO.open( :r );
my $lines = $to-sort.lines;
my $degrees = $*KERNEL.cpu-cores;
my $batch = $lines.elems div $degrees;
my #parts = $lines.batch( $batch ).hyper( :batch(1) ).map({ .sort });
my #index = 0 xx $degrees;
return gather loop
{
my $smallest = Inf;
my $smallest-index = -1;
my $smallest-degree = -1;
for ^$degrees -> $degree
{
my $index = #index[$degree];
if ( $index < $batch )
{
my $value = #parts[$degree;$index];
if $value < $smallest
{
$smallest = $value;
$smallest-index = $index;
$smallest-degree = $degree;
}
}
}
last if $smallest-index < 0;
#index[$smallest-degree]++;
take $smallest;
}
}
sub partial-hyper-sort
{
my $to-sort = "tosort.txt".IO.open( :r );
my $lines = $to-sort.lines;
my $degrees = $*KERNEL.cpu-cores;
my $batch = $lines.elems div $degrees;
my #parts = $lines.batch( $batch ).hyper( :batch(1) ).map({ .sort });
}
multi sub naive-hyper-sort
{
my $to-sort = "tosort.txt".IO.open( :r );
my $lines = $to-sort.lines;
my $degrees = $*KERNEL.cpu-cores;
my $batch = $lines.elems div $degrees;
$lines.hyper( :$batch, :$degrees ).sort;
}
sub naive-sort {
my $to-sort = "tosort.txt".IO.open( :r );
$to-sort.lines.sort;
}
Using .hyper and .race only results in a speedup if there is a parallel implementation of the operation that follows. At the time of writing, there is not a parallel sort implementation in Rakudo, which means that it will fall back to using the regular sort implementation. So, this answers why native-hyper-sort doesn't come out faster right now (however it almost certainly will in the future).
The idea in simple-hyper-sort is along the right lines: break the data up into sublists, sort the sublists, and then merge them. We can therefore parallelize the sorting of the sublists. As you've observed, this achieving a win is dependent on the merge operation itself being fast enough, and so we'd need to carefully optimize that.
It's much easier to write a tight (not to mention correct!) merge operation if it only needs to merge two sublists. Thus, we need to structure the problem in a way that gives us that. This points to a different approach:
Break the list in half
start a task to sort each half
await the two tasks
Merge the results of the two tasks
Note that step 2 involves recursion. We stop recursing when the size of a partition is too small, and use the built-in sort on such partitions. (We can choose to define "too small" by dividing the input list size by the number of CPU cores, along the lines of your example.)
Thus we get a solution like this:
sub parallel-merge-sort {
my $to-sort = "tosort.txt".IO.open( :r );
my $lines = $to-sort.lines;
return do-sort $lines, ceiling($lines.elems / $*KERNEL.cpu-cores);
sub do-sort(#in, $limit) {
if #in.elems < $limit {
#in.sort
}
else {
my $pivot = #in.elems div 2;
merge |await
(start do-sort #in[0..$pivot], $limit),
(start do-sort #in[$pivot^..#in.end], $limit)
}
}
sub merge(#a, #b) {
my #result;
my int $a-idx = 0;
my int $a-elems = +#a;
my int $b-idx = 0;
my int $b-elems = +#b;
my int $r-idx = 0;
while $a-idx < $a-elems && $b-idx < $b-elems {
my $a := #a[$a-idx];
my $b := #b[$b-idx];
if $a before $b {
$a-idx++;
#result[$r-idx++] := $a;
}
else {
$b-idx++;
#result[$r-idx++] := $b;
}
}
if $a-idx < $a-elems {
#result[$r-idx++] := $_ for #a[$a-idx..*];
}
elsif $b-idx < $b-elems {
#result[$r-idx++] := $_ for #b[$b-idx..*];
}
return #result;
}
}
I didn't spend terribly long optimizing this (haven't profiled, etc.), but did take care to use natives and binding in order to reduce allocations. On My Machine, this does give a speedup over the serial sorting, however.
One other easy speedup we can get on this - at the cost of a tad more complexity in the code - comes from realizing that we don't need to slice the input in do-sort until the point that we actually need to send it to the built-in sort:
sub do-sort(#in, $limit, $from = 0, $to = #in.end) {
my $elems = $to - $from;
if $elems < $limit {
#in[$from..$to].sort
}
else {
my $pivot = $from + $elems div 2;
merge |await
(start do-sort #in, $limit, $from, $pivot),
(start do-sort #in, $limit, $pivot + 1, $to)
}
}
Which saves some work; by this point, I measure a factor of two speedup on the machine I'm testing it on, which isn't amazing, but given we've an enforced serial O(n) step, and a bunch more parallelized O(n) steps, over the serial sort algorithm, it's perhaps not so disappointing after all.

Perl6 : What is the best way for dealing with very big files?

Last week I decided to give a try to Perl6 and started to reimplement one of my program.
I have to say, Perl6 is so the easy for object programming, an aspect very painfull to me in Perl5.
My program have to read and store big files, such as whole genomes (up to 3 Gb and more, See example 1 below) or tabulate data.
The first version of the code was made in the Perl5 way by iterating line by line ("genome.fa".IO.lines). It was very slow and unsable for a correct execution time.
my class fasta {
has Str $.file is required;
has %!seq;
submethod TWEAK() {
my $id;
my $s;
for $!file.IO.lines -> $line {
if $line ~~ /^\>/ {
say $id;
if $id.defined {
%!seq{$id} = sequence.new(id => $id, seq => $s);
}
my $l = $line;
$l ~~ s:g/^\>//;
$id = $l;
$s = "";
}
else {
$s ~= $line;
}
}
%!seq{$id} = sequence.new(id => $id, seq => $s);
}
}
sub MAIN()
{
my $f = fasta.new(file => "genome.fa");
}
So after a little bit of RTFM, I changed for a slurp on the file, a split on the \n which I parsed with a for loop. This way I managed to load the data in 2 min. Much better but not enough. By cheating, I mean by removing a maximum of \n (Example 2), I decreased the execution time to 30 seconds. Quite good, but not totaly satisfied, by this fasta format is not the most used.
my class fasta {
has Str $.file is required;
has %!seq;
submethod TWEAK() {
my $id;
my $s;
say "Slurping ...";
my $f = $!file.IO.slurp;
say "Spliting file ...";
my #lines = $f.split(/\n/);
say "Parsing lines ...";
for #lines -> $line {
if $line !~~ /^\>/ {
$s ~= $line;
}
else {
say $id;
if $id.defined {
%!seq{$id} = seq.new(id => $id, seq => $s);
}
$id = $line;
$id ~~ s:g/^\>//;
$s = "";
}
}
%!seq{$id} = seq.new(id => $id, seq => $s);
}
}
sub MAIN()
{
my $f = fasta.new(file => "genome.fa");
}
So RTFM again and I discovered the magic of Grammar. So new version and an execution time of 45 seconds whatever the fasta format used. Not the fastest way but more elegant and stable.
my grammar fastaGrammar {
token TOP { <fasta>+ }
token fasta {<.ws><header><seq> }
token header { <sup><id>\n }
token sup { '>' }
token id { <[\d\w]>+ }
token seq { [<[ACGTNacgtn]>+\n]+ }
}
my class fastaActions {
method TOP ($/){
my #seqArray;
for $<fasta> -> $f {
#seqArray.push: seq.new(id => $f.<header><id>.made, seq => $f<seq>.made);
}
make #seqArray;
}
method fasta ($/) { make ~$/; }
method id ($/) { make ~$/; }
method seq ($/) { make $/.subst("\n", "", :g); }
}
my class fasta {
has Str $.file is required;
has %seq;
submethod TWEAK() {
say "=> Slurping ...";
my $f = $!file.IO.slurp;
say "=> Grammaring ...";
my #seqArray = fastaGrammar.parse($f, actions => fastaActions).made;
say "=> Storing data ...";
for #seqArray -> $s {
%!seq{$s.id} = $s;
}
}
}
sub MAIN()
{
my $f = fasta.new(file => "genome.fa");
}
I think that I found good solution to handle these kind of big files, but performances are still under those of Perl5.
As a newbie in Perl6, I would be interested to know if there is better ways to deal with big data or if there is some limitation due to the Perl6 implementation ?
As a newbie in Perl6, I would ask two questions :
Is there other Perl6 mechanisms that I'm not aware yet, or not yet
documented, for storing huge data from a file (like my genomes) ?
Did I reach the maximum performances for the current version of
Perl6 ?
Thanks for reading !
Fasta Example 1 :
>2L
CGACAATGCACGACAGAGGAAGCAGAACAGATATTTAGATTGCCTCTCATTTTCTCTCCCATATTATAGGGAGAAATATG
ATCGCGTATGCGAGAGTAGTGCCAACATATTGTGCTCTTTGATTTTTTGGCAACCCAAAATGGTGGCGGATGAACGAGAT
...
>3R
CGACAATGCACGACAGAGGAAGCAGAACAGATATTTAGATTGCCTCTCATTTTCTCTCCCATATTATAGGGAGAAATATG
ATCGCGTATGCGAGAGTAGTGCCAACATATTGTGCTCTTTGATTTTTTGGCAACCCAAAATGGTGGCGGATGAACGAGAT
...
Fasta example 2 :
>2L
GACAATGCACGACAGAGGAAGCAGAACAGATATTTAGATTGCCTCTCAT...
>3R
TAGGGAGAAATATGATCGCGTATGCGAGAGTAGTGCCAACATATTGTGCT...
EDIT
I applied advises of #Christoph and #timotimo and test with code:
my class fasta {
has Str $.file is required;
has %!seq;
submethod TWEAK() {
say "=> Slurping / Parsing / Storing ...";
%!seq = slurp($!file, :enc<latin1>).split('>').skip(1).map: {
.head => seq.new(id => .head, seq => .skip(1).join) given .split("\n").cache;
}
}
}
sub MAIN()
{
my $f = fasta.new(file => "genome.fa");
}
The program finished in 2.7s, which is so great !
I also tried this code on the wheat genome (10 Gb). It finished in 35.2s.
Perl6 is not so slow finally !
Big Thank for the help !
One simple improvement is to use a fixed-width encoding such as latin1 to speed up character decoding, though I'm not sure how much this will help.
As far as Rakudo's regex/grammar engine is concerned, I've found it to be pretty slow, so it might indeed be necessary to take a more low-level approach.
I did not do any benchmarking, but what I'd try first is something like this:
my %seqs = slurp('genome.fa', :enc<latin1>).split('>')[1..*].map: {
.[0] => .[1..*].join given .split("\n");
}
As the Perl6 standard library is implemented in Perl6 itself, it is sometimes possible to improve performance by just avoiding it, writing code in an imperative style such as this:
my %seqs;
my $data = slurp('genome.fa', :enc<latin1>);
my $pos = 0;
loop {
$pos = $data.index('>', $pos) // last;
my $ks = $pos + 1;
my $ke = $data.index("\n", $ks);
my $ss = $ke + 1;
my $se = $data.index('>', $ss) // $data.chars;
my #lines;
$pos = $ss;
while $pos < $se {
my $end = $data.index("\n", $pos);
#lines.push($data.substr($pos..^$end));
$pos = $end + 1
}
%seqs{$data.substr($ks..^$ke)} = #lines.join;
}
However, if the parts of the standard library used has seen some performance work, this might actually make things worse. In that case, the next step to take would be adding low-level type annotations such as str and int and replacing calls to routines such as .index with NQP builtins such as nqp::index.
If that's still too slow, you're out of luck and will need to switch languages, eg calling into Perl5 by using Inline::Perl5 or C using NativeCall.
Note that #timotimo has done some performance measurements and wrote an article about it.
If my short version is the baseline, the imperative version improves performance by 2.4x.
He actually managed to squeeze a 3x improvement out of the short version by rewriting it to
my %seqs = slurp('genome.fa', :enc<latin-1>).split('>').skip(1).map: {
.head => .skip(1).join given .split("\n").cache;
}
Finally, rewriting the imperative version using NQP builtins sped things up by a factor of 17x, but given potential portability issues, writing such code is generally discouraged, but may be necessary for now if you really need that level of performance:
use nqp;
my Mu $seqs := nqp::hash();
my str $data = slurp('genome.fa', :enc<latin1>);
my int $pos = 0;
my str #lines;
loop {
$pos = nqp::index($data, '>', $pos);
last if $pos < 0;
my int $ks = $pos + 1;
my int $ke = nqp::index($data, "\n", $ks);
my int $ss = $ke + 1;
my int $se = nqp::index($data ,'>', $ss);
if $se < 0 {
$se = nqp::chars($data);
}
$pos = $ss;
my int $end;
while $pos < $se {
$end = nqp::index($data, "\n", $pos);
nqp::push_s(#lines, nqp::substr($data, $pos, $end - $pos));
$pos = $end + 1
}
nqp::bindkey($seqs, nqp::substr($data, $ks, $ke - $ks), nqp::join("", #lines));
nqp::setelems(#lines, 0);
}

Why does replacing Perl's s/// with a dummy function using Inline::C cause a significant slow down?

I have an array of strings of about 100,000 elements. I need to iterate through each element and replace some words with other words. This takes a few seconds in pure perl. I need to speed this up as much as I can. I'm testing using the following snippet:
use strict;
my $string = "This is some string. Its only purpose is for testing.";
for( my $i = 1; $i < 100000; $i++ ) {
$string =~ s/old1/new1/ig;
$string =~ s/old2/new2/ig;
$string =~ s/old3/new3/ig;
$string =~ s/old4/new4/ig;
$string =~ s/old5/new5/ig;
}
I know this doesn't actually replace anything in the test string, but it's for speed testing only.
I had my hopes set on Inline::C. I've never worked with Inline::C before but after reading up on it a bit, I thought it was fairly simple to implement. But apparently, even calling a stub function that does nothing is a lot slower. Here's the snippet I tested with:
use strict;
use Benchmark qw ( timethese );
use Inline 'C';
timethese(
5,
{
"Pure Perl" => \&pure_perl,
"Inline C" => \&inline_c
}
);
sub pure_perl {
my $string = "This is some string. Its only purpose is for testing.";
for( my $i = 1; $i < 1000000; $i++ ) {
$string =~ s/old1/new1/ig;
$string =~ s/old2/new2/ig;
$string =~ s/old3/new3/ig;
$string =~ s/old4/new4/ig;
$string =~ s/old5/new5/ig;
}
}
sub inline_c {
my $string = "This is some string. Its only purpose is for testing.";
for( my $i = 1; $i < 1000000; $i++ ) {
$string = findreplace( $string, "old1", "new1" );
$string = findreplace( $string, "old2", "new2" );
$string = findreplace( $string, "old3", "new3" );
$string = findreplace( $string, "old4", "new4" );
$string = findreplace( $string, "old5", "new5" );
}
}
__DATA__
__C__
char *
findreplace( char *text, char *what, char *with ) {
return text;
}
on my Linux box, the result is:
Benchmark: timing 5 iterations of Inline C, Pure Perl...
Inline C: 6 wallclock secs ( 5.51 usr + 0.02 sys = 5.53 CPU) # 0.90/s (n=5)
Pure Perl: 2 wallclock secs ( 2.51 usr + 0.00 sys = 2.51 CPU) # 1.99/s (n=5)
Pure Perl is twice as fast as calling an empty C function. Not at all what I expected! Again, I've never worked with Inline::C before so maybe I am missing something here?
In the version using Inline::C, you kept everything that was in the original pure Perl script, and changed just one thing: Additionally, you've replaced Perl's highly optimized s/// with a worse implementation. Invoking your dummy function actually involves work whereas none of the s/// invocations do much in this case. It is a priori impossible for the Inline::C version to run faster.
On the C side, the function
char *
findreplace( char *text, char *what, char *with ) {
return text;
}
is not a "do nothing" function. Calling it involves unpacking arguments. The string pointed to by text has to be copied to the return value. There is some overhead which you are paying for each invocation.
Given that s/// does no replacements, there is no copying involved in that. In addition, Perl's s/// is highly optimized. Are you sure you can write a better find & replace that is faster to make up for the overhead of calling an external function?
If you use the following implementation, you should get comparable speeds:
sub inline_c {
my $string = "This is some string. It's only purpose is for testing.";
for( my $i = 1; $i < 1000000; $i++ ) {
findreplace( $string );
findreplace( $string );
findreplace( $string );
findreplace( $string );
findreplace( $string );
}
}
__END__
__C__
void findreplace( char *text ) {
return;
}
Benchmark: timing 5 iterations of Inline C, Pure Perl...
Inline C: 6 wallclock secs ( 5.69 usr + 0.00 sys = 5.69 CPU) # 0.88/s (n=5)
Pure Perl: 6 wallclock secs ( 5.70 usr + 0.00 sys = 5.70 CPU) # 0.88/s (n=5)
The one possibility of gaining speed is to exploit any special structure involved in the search pattern and replacements and write something to implement that.
On the Perl side, you should at least pre-compile the patterns.
Also, since your problem is embarrassingly parallel, you are better off looking into chopping up the work into as many chunks as you have cores to work with.
For example, take a look at the Perl entries in the regex-redux task in the Benchmarks Game:
Perl #4 (fork only): 14.13 seconds
and
Perl #3 (fork & threads): 14.47 seconds
versus
Perl #1: 34.01 seconds
That is, some primitive exploitation of parallelization possibilities results in a 60% speedup. That problem is not exactly comparable because the substitutions must be done sequentially, but still gives you an idea.
If you have eight cores, dole out the work to eight cores.
Also, consider the following script:
#!/usr/bin/env perl
use warnings;
use strict;
use Data::Fake::Text;
use List::Util qw( sum );
use Time::HiRes qw( time );
use constant INPUT_SIZE => $ARGV[0] // 1_000_000;
run();
sub run {
my #substitutions = (
sub { s/dolor/new1/ig },
sub { s/fuga/new2/ig },
sub { s/facilis/new3/ig },
sub { s/tempo/new4/ig },
sub { s/magni/new5/ig },
);
my #times;
for (1 .. 5) {
my $data = read_input();
my $t0 = time;
find_and_replace($data, \#substitutions);
push #times, time - $t0;
}
printf "%.4f\n", sum(#times)/#times;
return;
}
sub find_and_replace {
my $data = shift;
my $substitutions = shift;
for ( #$data ) {
for my $s ( #$substitutions ) {
$s->();
}
}
return;
}
{
my #input;
sub read_input {
#input
or #input = map fake_sentences(1)->(), 1 .. INPUT_SIZE;
return [ #input ];
}
}
In this case, each invocation of find_and_replace takes about 2.3 seconds my laptop. The five replications run in about 30 seconds. The overhead is the combined cost of generating the 1,000,000 sentence data set and copying it four times.

Alogrithm in using perl to find the value in array - Absolutely Interview Questions

I am asked to do the perl program to find a value(from user input) in array. If matched "its ok". If not matched, then check within the value in the index[0] to index[1] ... index[n]. So then if the value matched to the between two elements then report which is near to these elements might be index[0] or index[1].
Let you explain.
Given array : 10 15 20 25 30;
Get the value from user : 14 (eg.)
Hence 14 matched with in the two elements that is 10(array[0]) - 15(array[1])
Ultimately the check point is do not use more than one for loop and never use the while loop. You need to check one for loop and many of if conditions.
I got the output by which I did here is:
use strict;
use warnings;
my #arr1 = qw(10 15 20 25 30);
my $in = <STDIN>;
chomp($in);
if(grep /$in/, #arr1)
{ } #print "S: $in\n"; }
else
{
for(my $i=0; $i<scalar(#arr1); $i++)
{
my $j = $i + 1;
if($in > $arr1[$i] && $in < $arr1[$j])
{
#print "SN: $arr1[$i]\t$arr1[$j]\n";
my ($inc, $dec) = "0";
my $chk1 = $arr1[$i] + 1;
AGAIN1:
if($in == $chk1)
{ }
else
{ $chk1++; $inc++; goto AGAIN1; }
my $chk2 = $arr1[$j] - 1;
AGAIN2:
if($in == $chk2){ }
else
{ $chk2--; $dec++; goto AGAIN2; }
if($inc > $dec)
{ print "Matched value nearest to $arr1[$j]\n"; }
elsif($inc < $dec)
{ print "Matched value nearest to $arr1[$i]\n"; }
}
}
}
However my question is there a way in algorithm?. Hence if someone can help on this one and it would be appreciated.
Thanks in advance.
You seem determined to make this as complicated as possible :-)
Your specification isn't completely clear, but I think this does what you want:
#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
my #array = qw[10 15 20 25 30];
chomp(my $in = <STDIN>);
if ($in < $array[0]) {
say "$in is less than first element in the array";
exit;
}
if ($in > $array[-1]) {
say "$in is greater than last element in the array";
exit;
}
for (0 .. $#array) {
if ($in == $array[$_]) {
say "$in is in the array";
exit;
}
if ($in < $array[$_]) {
if ($in - $array[$_ - 1] < $array[$_] - $in) {
say "$in is closest to $array[$_ - 1]";
} else {
say "$in is closest to $array[$_]";
}
exit;
}
}
say "Shouldn't get here!";
Using the helper functions any and reduce from the core module List::Util and the built in abs.
#!/usr/bin/perl
use strict;
use warnings;
use List::Util qw/reduce any/;
my #arr1 = qw(10 15 20 25 30);
chomp(my $in = <STDIN>);
if (any {$in == $_} #arr1) {
print "$in is in the array\n";
}
else {
my $i = reduce { abs($in - $arr1[$a]) > abs($in - $arr1[$b]) ? $b : $a} 0 .. $#arr1;
print "$in is closest to $arr1[$i]\n";
}

How can this perl sub be optimised for speed?

The following perl sub is used to store arrays of hashes.
Each hash to be stored is first checked for uniqueness using a given key, if a hash exists on the array with the same key value then it's not stored.
How can this perl sub be optimised for speed?
Example use:
my #members;
...
$member= {};
$hash->{'name'}='James';
hpush('name', \#members,$member);
The sub:
sub hpush {
# push a set of key value pairs onto an array as a hash, if the key doesn't already exist
if (#_ != 3) {
print STDERR "hpush requires 3 args, ".#_." given\n";
return;
}
my $uniq = shift;
my $rarray = shift;
my $rhash = shift;
my $hash = ();
#print "\nHash:\n";
for my $key ( keys %{$rhash} ) {
my $valuea = $rhash->{$key};
#print "key: $key\n";
#print "key=>value: $key => $valuea\n";
$hash->{ $key} = $valuea;
}
#print "\nCurrent Array:\n";
for my $node (#{$rarray}) {
#print "node: $node \n";
for my $key ( keys %{$node} ) {
my $valueb = $node->{$key};
#print "key=>value: $key => $valueb\n";
if ($key eq $uniq) {
#print "key=>value: $key => $valueb\n";
if (($valueb =~ m/^[0-9]+$/) && ($hash->{$key} == $valueb)) {
#print "Not pushing i $key -> $valueb\n";
return;
} elsif ($hash->{$key} eq $valueb) {
#print "Not pushing s $key -> $valueb\n";
return;
}
}
}
}
push #{$rarray}, $hash;
#print "Pushed\n";
}
Note that the perl isn't mine and I'm a perl beginner
This code is rather... not very efficient. First, it copies $rhash to $hash, with a for loop... for some reason. Then it loops through the hash keys, instead of simply using the hash key that it's looking for. Then it does two equivalent checks, apparently some attempt to distinguish numbers from non-numbers and selecting the appropriate check (== or eq). This is all unnecessary.
This code below should be roughly equivalent. I've trimmed it down hard. This should be as fast as it is possible to get it.
use strict;
use warnings;
hpush('name', \#members,$member);
sub hpush {
my ($uniq, $rarray, $rhash) = #_;
for my $node (#{$rarray}) {
if (exists $node->{$uniq}) {
return if ($node->{$uniq} eq $rhash->{$uniq});
}
}
push #{$rarray}, $rhash;
}

Resources