I have about 30 text files with the structure
wordleft1|wordright1
wordleft2|wordright2
wordleft3|wordright3
...
The total size of the files is about 1 GB with about 32 million lines of word combinations.
I tried a few approaches to load them as fast as possible and store the combinations within a hash
$hash{$wordleft} = $wordright
Opening file by file and reading line by line takes about 42 seconds. I then store the hash with the Storable module
store \%hash, $filename
Loading the data again
$hashref = retrieve $filename
reduces the time to about 28 seconds. I use a fast SSD drive and a fast CPU and have enough RAM to hold all the data (it takes about 7 GB).
I'm searching for a faster way to load this data into the RAM (I can't keep it there for a few reasons).
You could try using Dan Bernstein's CDB file format using a tied hash, which will require minimal code change. You may need to install CDB_File. On my laptop, the cdb file is opened very quickly and I can do about 200-250k lookups per second. Here is an example script to create/use/benchmark a cdb:
test_cdb.pl
#!/usr/bin/env perl
use warnings;
use strict;
use Benchmark qw(:all) ;
use CDB_File 'create';
use Time::HiRes qw( gettimeofday tv_interval );
scalar #ARGV or die "usage: $0 number_of_keys seconds_to_benchmark\n";
my ($size) = $ARGV[0] || 1000;
my ($seconds) = $ARGV[1] || 10;
my $t0;
tic();
# Create CDB
my ($file, %data);
%data = map { $_ => 'something' } (1..$size);
print "Created $size element hash in memory\n";
toc();
$file = 'data.cdb';
create %data, $file, "$file.$$";
my $bytes = -s $file;
print "Created data.cdb [ $size keys and values, $bytes bytes]\n";
toc();
# Read from CDB
my $c = tie my %h, 'CDB_File', 'data.cdb' or die "tie failed: $!\n";
print "Opened data.cdb as a tied hash.\n";
toc();
timethese( -1 * $seconds, {
'Pick Random Key' => sub { int rand $size },
'Fetch Random Value' => sub { $h{ int rand $size }; },
});
tic();
print "Fetching Every Value\n";
for (0..$size) {
no warnings; # Useless use of hash element
$h{ $_ };
}
toc();
sub tic {
$t0 = [gettimeofday];
}
sub toc {
my $t1 = [gettimeofday];
my $elapsed = tv_interval ( $t0, $t1);
$t0 = $t1;
print "==> took $elapsed seconds\n";
}
Output ( 1 million keys, tested over 10 seconds )
./test_cdb.pl 1000000 10
Created 1000000 element hash in memory
==> took 2.882813 seconds
Created data.cdb [ 1000000 keys and values, 38890944 bytes]
==> took 2.333624 seconds
Opened data.cdb as a tied hash.
==> took 0.00015 seconds
Benchmark: running Fetch Random Value, Pick Random Key for at least 10 CPU seconds...
Fetch Random Value: 10 wallclock secs (10.46 usr + 0.01 sys = 10.47 CPU) # 236984.72/s (n=2481230)
Pick Random Key: 9 wallclock secs (10.11 usr + 0.02 sys = 10.13 CPU) # 3117208.98/s (n=31577327)
Fetching Every Value
==> took 3.514183 seconds
Output ( 10 million keys, tested over 10 seconds )
./test_cdb.pl 10000000 10
Created 10000000 element hash in memory
==> took 44.72331 seconds
Created data.cdb [ 10000000 keys and values, 398890945 bytes]
==> took 25.729652 seconds
Opened data.cdb as a tied hash.
==> took 0.000222 seconds
Benchmark: running Fetch Random Value, Pick Random Key for at least 10 CPU seconds...
Fetch Random Value: 14 wallclock secs ( 9.65 usr + 0.35 sys = 10.00 CPU) # 209811.20/s (n=2098112)
Pick Random Key: 12 wallclock secs (10.40 usr + 0.02 sys = 10.42 CPU) # 2865335.22/s (n=29856793)
Fetching Every Value
==> took 38.274356 seconds
It sounds like you do have a good use case for wanting an in-memory perl hash.
For faster storing/retrieving, I would recommend Sereal (Sereal::Encoder/Sereal::Decoder). If your disk storage is slow, you may even want to enable Snappy compression.
Related
I'm trying to understand this difference of 5 seconds between 1000 and 10000 loops extracted from gawk article. Would appreciate your help.
#!/usr/bin/gawk -f
# how long does it take to do a few loops?
BEGIN {
LOOPS=100;
# do the test twice
start=systime();
for (i=0;i<LOOPS;i++) {
}
end = systime();
# calculate how long it takes to do a dummy test
do_nothing = end-start;
# now do the test again with the *IMPORTANT* code inside
start=systime();
for (i=0;i<LOOPS;i++) {
# How long does this take?
while ("date" | getline) {
date = $0;
}
close("date");
}
end = systime();
newtime = (end - start) - do_nothing;
if (newtime <= 0) {
printf("%d loops were not enough to test, increase it\n",
LOOPS);
exit;
} else {
printf("%d loops took %6.4f seconds to execute\n",
LOOPS, newtime);
printf("That's %10.8f seconds per loop\n",
(newtime)/LOOPS);
# since the clock has an accuracy of +/- one second, what is the error
printf("accuracy of this measurement = %6.2f%%\n",
(1/(newtime))*100);
}
exit;
}
[root#krislasnetwork ~]# ./testSpeed.sh
1000 loops took 2.0000 seconds to execute
That's 0.00200000 seconds per loop
accuracy of this measurement = 50.00%
[root#krislasnetwork ~]# vi testSpeed.sh # changing loops from 1000 to 10000
[root#krislasnetwork ~]# ./testSpeed.sh
10000 loops took 15.0000 seconds to execute
That's 0.00150000 seconds per loop
accuracy of this measurement = 6.67%
It appears that each loop during the 10,000 loops is executed faster than each loop during the 1,000. Can you help me understand this behavior?
How do I prevent the GC from provoking copy-on-write, when I fork my process ? I have recently been analyzing the garbage collector's behavior in Ruby, due to some memory issues that I encountered in my program (I run out of memory on my 60core 0.5Tb machine even for fairly small tasks). For me this really limits the usefulness of ruby for running programs on multicore servers. I would like to present my experiments and results here.
The issue arises when the garbage collector runs during forking. I have investigated three cases that illustrate the issue.
Case 1: We allocate a lot of objects (strings no longer than 20 bytes) in the memory using an array. The strings are created using a random number and string formatting. When the process forks and we force the GC to run in the child, all the shared memory goes private, causing a duplication of the initial memory.
Case 2: We allocate a lot of objects (strings) in the memory using an array, but the string is created using the rand.to_s function, hence we remove the formatting of the data compared to the previous case. We end up with a smaller amount of memory being used, presumably due to less garbage. When the process forks and we force the GC to run in the child, only part of the memory goes private. We have a duplication of the initial memory, but to a smaller extent.
Case 3: We allocate fewer objects compared to before, but the objects are bigger, such that the amount of memory allocated stays the same as in the previous cases. When the process forks and we force the GC to run in the child all the memory stays shared, i.e. no memory duplication.
Here I paste the Ruby code that has been used for these experiments. To switch between cases you only need to change the “option” value in the memory_object function. The code was tested using Ruby 2.2.2, 2.2.1, 2.1.3, 2.1.5 and 1.9.3 on an Ubuntu 14.04 machine.
Sample output for case 1:
ruby version 2.2.2
proces pid log priv_dirty shared_dirty
Parent 3897 post alloc 38 0
Parent 3897 4 fork 0 37
Child 3937 4 initial 0 37
Child 3937 8 empty GC 35 5
The exact same code has been written in Python and in all cases the CoW works perfectly fine.
Sample output for case 1:
python version 2.7.6 (default, Mar 22 2014, 22:59:56)
[GCC 4.8.2]
proces pid log priv_dirty shared_dirty
Parent 4308 post alloc 35 0
Parent 4308 4 fork 0 35
Child 4309 4 initial 0 35
Child 4309 10 empty GC 1 34
Ruby code
$start_time=Time.new
# Monitor use of Resident and Virtual memory.
class Memory
shared_dirty = '.+?Shared_Dirty:\s+(\d+)'
priv_dirty = '.+?Private_Dirty:\s+(\d+)'
MEM_REGEXP = /#{shared_dirty}#{priv_dirty}/m
# get memory usage
def self.get_memory_map( pids)
memory_map = {}
memory_map[ :pids_found] = {}
memory_map[ :shared_dirty] = 0
memory_map[ :priv_dirty] = 0
pids.each do |pid|
begin
lines = nil
lines = File.read( "/proc/#{pid}/smaps")
rescue
lines = nil
end
if lines
lines.scan(MEM_REGEXP) do |shared_dirty, priv_dirty|
memory_map[ :pids_found][pid] = true
memory_map[ :shared_dirty] += shared_dirty.to_i
memory_map[ :priv_dirty] += priv_dirty.to_i
end
end
end
memory_map[ :pids_found] = memory_map[ :pids_found].keys
return memory_map
end
# get the processes and get the value of the memory usage
def self.memory_usage( )
pids = [ $$]
result = self.get_memory_map( pids)
result[ :pids] = pids
return result
end
# print the values of the private and shared memories
def self.log( process_name='', log_tag="")
if process_name == "header"
puts " %-6s %5s %-12s %10s %10s\n" % ["proces", "pid", "log", "priv_dirty", "shared_dirty"]
else
time = Time.new - $start_time
mem = Memory.memory_usage( )
puts " %-6s %5d %-12s %10d %10d\n" % [process_name, $$, log_tag, mem[:priv_dirty]/1000, mem[:shared_dirty]/1000]
end
end
end
# function to delay the processes a bit
def time_step( n)
while Time.new - $start_time < n
sleep( 0.01)
end
end
# create an object of specified size. The option argument can be changed from 0 to 2 to visualize the behavior of the GC in various cases
#
# case 0 (default) : we make a huge array of small objects by formatting a string
# case 1 : we make a huge array of small objects without formatting a string (we use the to_s function)
# case 2 : we make a smaller array of big objects
def memory_object( size, option=1)
result = []
count = size/20
if option > 3 or option < 1
count.times do
result << "%20.18f" % rand
end
elsif option == 1
count.times do
result << rand.to_s
end
elsif option == 2
count = count/10
count.times do
result << ("%20.18f" % rand)*30
end
end
return result
end
##### main #####
puts "ruby version #{RUBY_VERSION}"
GC.disable
# print the column headers and first line
Memory.log( "header")
# Allocation of memory
big_memory = memory_object( 1000 * 1000 * 10)
Memory.log( "Parent", "post alloc")
lab_time = Time.new - $start_time
if lab_time < 3.9
lab_time = 0
end
# start the forking
pid = fork do
time = 4
time_step( time + lab_time)
Memory.log( "Child", "#{time} initial")
# force GC when nothing happened
GC.enable; GC.start; GC.disable
time = 8
time_step( time + lab_time)
Memory.log( "Child", "#{time} empty GC")
sleep( 1)
STDOUT.flush
exit!
end
time = 4
time_step( time + lab_time)
Memory.log( "Parent", "#{time} fork")
# wait for the child to finish
Process.wait( pid)
Python code
import re
import time
import os
import random
import sys
import gc
start_time=time.time()
# Monitor use of Resident and Virtual memory.
class Memory:
def __init__(self):
self.shared_dirty = '.+?Shared_Dirty:\s+(\d+)'
self.priv_dirty = '.+?Private_Dirty:\s+(\d+)'
self.MEM_REGEXP = re.compile("{shared_dirty}{priv_dirty}".format(shared_dirty=self.shared_dirty, priv_dirty=self.priv_dirty), re.DOTALL)
# get memory usage
def get_memory_map(self, pids):
memory_map = {}
memory_map[ "pids_found" ] = {}
memory_map[ "shared_dirty" ] = 0
memory_map[ "priv_dirty" ] = 0
for pid in pids:
try:
lines = None
with open( "/proc/{pid}/smaps".format(pid=pid), "r" ) as infile:
lines = infile.read()
except:
lines = None
if lines:
for shared_dirty, priv_dirty in re.findall( self.MEM_REGEXP, lines ):
memory_map[ "pids_found" ][pid] = True
memory_map[ "shared_dirty" ] += int( shared_dirty )
memory_map[ "priv_dirty" ] += int( priv_dirty )
memory_map[ "pids_found" ] = memory_map[ "pids_found" ].keys()
return memory_map
# get the processes and get the value of the memory usage
def memory_usage( self):
pids = [ os.getpid() ]
result = self.get_memory_map( pids)
result[ "pids" ] = pids
return result
# print the values of the private and shared memories
def log( self, process_name='', log_tag=""):
if process_name == "header":
print " %-6s %5s %-12s %10s %10s" % ("proces", "pid", "log", "priv_dirty", "shared_dirty")
else:
global start_time
Time = time.time() - start_time
mem = self.memory_usage( )
print " %-6s %5d %-12s %10d %10d" % (process_name, os.getpid(), log_tag, mem["priv_dirty"]/1000, mem["shared_dirty"]/1000)
# function to delay the processes a bit
def time_step( n):
global start_time
while (time.time() - start_time) < n:
time.sleep( 0.01)
# create an object of specified size. The option argument can be changed from 0 to 2 to visualize the behavior of the GC in various cases
#
# case 0 (default) : we make a huge array of small objects by formatting a string
# case 1 : we make a huge array of small objects without formatting a string (we use the to_s function)
# case 2 : we make a smaller array of big objects
def memory_object( size, option=2):
count = size/20
if option > 3 or option < 1:
result = [ "%20.18f"% random.random() for i in xrange(count) ]
elif option == 1:
result = [ str( random.random() ) for i in xrange(count) ]
elif option == 2:
count = count/10
result = [ ("%20.18f"% random.random())*30 for i in xrange(count) ]
return result
##### main #####
print "python version {version}".format(version=sys.version)
memory = Memory()
gc.disable()
# print the column headers and first line
memory.log( "header") # Print the headers of the columns
# Allocation of memory
big_memory = memory_object( 1000 * 1000 * 10) # Allocate memory
memory.log( "Parent", "post alloc")
lab_time = time.time() - start_time
if lab_time < 3.9:
lab_time = 0
# start the forking
pid = os.fork() # fork the process
if pid == 0:
Time = 4
time_step( Time + lab_time)
memory.log( "Child", "{time} initial".format(time=Time))
# force GC when nothing happened
gc.enable(); gc.collect(); gc.disable();
Time = 10
time_step( Time + lab_time)
memory.log( "Child", "{time} empty GC".format(time=Time))
time.sleep( 1)
sys.exit(0)
Time = 4
time_step( Time + lab_time)
memory.log( "Parent", "{time} fork".format(time=Time))
# Wait for child process to finish
os.waitpid( pid, 0)
EDIT
Indeed, calling the GC several times before forking the process solves the issue and I am quite surprised. I have also run the code using Ruby 2.0.0 and the issue doesn't even appear, so it must be related to this generational GC just like you mentioned.
However, if I call the memory_object function without assigning the output to any variables (I am only creating garbage), then the memory is duplicated. The amount of memory that is copied depends on the amount of garbage that I create - the more garbage, the more memory becomes private.
Any ideas how I can prevent this ?
Here are some results
Running the GC in 2.0.0
ruby version 2.0.0
proces pid log priv_dirty shared_dirty
Parent 3664 post alloc 67 0
Parent 3664 4 fork 1 69
Child 3700 4 initial 1 69
Child 3700 8 empty GC 6 65
Calling memory_object( 1000*1000) in the child
ruby version 2.0.0
proces pid log priv_dirty shared_dirty
Parent 3703 post alloc 67 0
Parent 3703 4 fork 1 70
Child 3739 4 initial 1 70
Child 3739 8 empty GC 15 56
Calling memory_object( 1000*1000*10)
ruby version 2.0.0
proces pid log priv_dirty shared_dirty
Parent 3743 post alloc 67 0
Parent 3743 4 fork 1 69
Child 3779 4 initial 1 69
Child 3779 8 empty GC 89 5
UPD2
Suddenly figured out why all the memory is going private if you format the string -- you generate garbage during formatting, having GC disabled, then enable GC, and you've got holes of released objects in your generated data. Then you fork, and new garbage starts to occupy these holes, the more garbage - more private pages.
So i added a cleanup function to run GC each 2000 cycles (just enabling lazy GC didn't help):
count.times do |i|
cleanup(i)
result << "%20.18f" % rand
end
#......snip........#
def cleanup(i)
if ((i%2000).zero?)
GC.enable; GC.start; GC.disable
end
end
##### main #####
Which resulted in(with generating memory_object( 1000 * 1000 * 10) after fork):
RUBY_GC_HEAP_INIT_SLOTS=600000 ruby gc-test.rb 0
ruby version 2.2.0
proces pid log priv_dirty shared_dirty
Parent 2501 post alloc 35 0
Parent 2501 4 fork 0 35
Child 2503 4 initial 0 35
Child 2503 8 empty GC 28 22
Yes, it affects performance, but only before forking, i.e. increase load time in your case.
UPD1
Just found criteria by which ruby 2.2 sets old object bits, it's 3 GC's, so if you add following before forking:
GC.enable; 3.times {GC.start}; GC.disable
# start the forking
you will get(the option is 1 in command line):
$ RUBY_GC_HEAP_INIT_SLOTS=600000 ruby gc-test.rb 1
ruby version 2.2.0
proces pid log priv_dirty shared_dirty
Parent 2368 post alloc 31 0
Parent 2368 4 fork 1 34
Child 2370 4 initial 1 34
Child 2370 8 empty GC 2 32
But this needs to be further tested concerning the behavior of such objects on future GC's, at least after 100 GC's :old_objects remains constant, so i suppose it should be OK
Log with GC.stat is here
By the way there's also option RGENGC_OLD_NEWOBJ_CHECK to create old objects from the beginning, but i doubt it's a good idea, but may be useful for a particular case.
First answer
My proposition in the comment above was wrong, actually bitmap tables are the savior.
(option = 1)
ruby version 2.0.0
proces pid log priv_dirty shared_dirty
Parent 14807 post alloc 27 0
Parent 14807 4 fork 0 27
Child 14809 4 initial 0 27
Child 14809 8 empty GC 6 25 # << almost everything stays shared <<
Also had by hand and tested Ruby Enterprise Edition it's only half better than worst cases.
ruby version 1.8.7
proces pid log priv_dirty shared_dirty
Parent 15064 post alloc 86 0
Parent 15064 4 fork 2 84
Child 15065 4 initial 2 84
Child 15065 8 empty GC 40 46
(I made the script run strictly 1 GC, by increasing RUBY_GC_HEAP_INIT_SLOTS to 600k)
I have a big file "values.txt" in which there are a lot of values. These values can be found in the file valeur.txt as follow :
File: "values.txt"
************************************************** *****************************************
2
14
18
20
23
.
.
.
24
******************************************************************************************
I want to write a script that calculate the average of values read from the file values.txt for every 16 seconds
i e : After each 16 seconds I will calculate the average of values that I have read in the file values.txt
for exmpl :
To read the file "value.txt", we will take (16 * 4 = 64) seconds.
So, when the exuction of script is finished, it must show 4 values. because every 16 seconds it will calculate an average.
I tried to do things but the concept of time (16 seconds) I still can't manage it
let "i=0"
let"cum=0"
while read var
do
if [timer] # how I could control the period of 16 seconde ?
then
let "i = i +1"
let "cum = cum + var"
else # 16 seconds have elapsed, so i calculate the averag
let "avrg_var= cum /i"
echo "$avrg_var">> new_file.txt
fi
done < values.txt
Thank you in advance for your answers and suggestions
Try this:
while :
do
let "i=0"
let "cum=0"
while read var
do
let "i = i +1"
let "cum = cum + var"
done < /tmp/values.txt
let "avrg_var= cum /i"
echo "$avrg_var">> new_file.txt
sleep 16
done
This doesn't work on a timer, just sleeps 16 seconds between runs.
Write a script that averages the values once, and then you can run your script with the watch command assuming your on a *nix based system. The default is 2 seconds but you can change it to what ever you want and it will repeatedly run in the terminal until canceled
I have stock data at the tick level and would like to create a rolling list of all ticks for the previous 10 seconds. The code below works, but takes a very long time for large amounts of data. I'd like to vectorize this process or otherwise make it faster, but I'm not coming up with anything. Any suggestions or nudges in the right direction would be appreciated.
library(quantmod)
set.seed(150)
# Create five minutes of xts example data at .1 second intervals
mins <- 5
ticks <- mins * 60 * 10 + 1
times <- xts(runif(seq_len(ticks),1,100), order.by=seq(as.POSIXct("1973-03-17 09:00:00"),
as.POSIXct("1973-03-17 09:05:00"), length = ticks))
# Randomly remove some ticks to create unequal intervals
times <- times[runif(seq_along(times))>.3]
# Number of seconds to look back
lookback <- 10
dist.list <- list(rep(NA, nrow(times)))
system.time(
for (i in 1:length(times)) {
dist.list[[i]] <- times[paste(strptime(index(times[i])-(lookback-1), format = "%Y-%m-%d %H:%M:%S"), "/",
strptime(index(times[i])-1, format = "%Y-%m-%d %H:%M:%S"), sep = "")]
}
)
> user system elapsed
6.12 0.00 5.85
You should check out the window function, it will make your subselection of dates a lot easier. The following code uses lapply to do the work of the for loop.
# Your code
system.time(
for (i in 1:length(times)) {
dist.list[[i]] <- times[paste(strptime(index(times[i])-(lookback-1), format = "%Y-%m-%d %H:%M:%S"), "/",
strptime(index(times[i])-1, format = "%Y-%m-%d %H:%M:%S"), sep = "")]
}
)
# user system elapsed
# 10.09 0.00 10.11
# My code
system.time(dist.list<-lapply(index(times),
function(x) window(times,start=x-lookback-1,end=x))
)
# user system elapsed
# 3.02 0.00 3.03
So, about a third faster.
But, if you really want to speed things up, and you are willing to forgo millisecond accuracy (which I think your original method implicitly does), you could just run the loop on unique date-hour-second combinations, because they will all return the same time window. This should speed things up roughly twenty or thirty times:
dat.time=unique(as.POSIXct(as.character(index(times)))) # Cheesy method to drop the ms.
system.time(dist.list.2<-lapply(dat.time,function(x) window(times,start=x-lookback-1,end=x)))
# user system elapsed
# 0.37 0.00 0.39
option A:
print $fh $hr->{'something'}, "|", $hr->{'somethingelse'}, "\n";
option B:
print $fh $hr->{'something'} . "|" . $hr->{'somethingelse'} . "\n";
The answer is simple, it doesn't matter. As many folks have pointed out, this is not going to be your program's bottleneck. Optimizing this to even happen instantly is unlikely to have any effect on your performance. You must profile first, otherwise you are just guessing and wasting your time.
If we are going to waste time on it, let's at least do it right. Below is the code to do a realistic benchmark. It actually does the print and sends the benchmarking information to STDERR. You run it as perl benchmark.plx > /dev/null to keep the output from flooding your screen.
Here's 5 million iterations writing to STDOUT. By using both timethese() and cmpthese() we get all the benchmarking data.
$ perl ~/tmp/bench.plx 5000000 > /dev/null
Benchmark: timing 5000000 iterations of concat, list...
concat: 3 wallclock secs ( 3.84 usr + 0.12 sys = 3.96 CPU) # 1262626.26/s (n=5000000)
list: 4 wallclock secs ( 3.57 usr + 0.12 sys = 3.69 CPU) # 1355013.55/s (n=5000000)
Rate concat list
concat 1262626/s -- -7%
list 1355014/s 7% --
And here's 5 million writing to a temp file
$ perl ~/tmp/bench.plx 5000000
Benchmark: timing 5000000 iterations of concat, list...
concat: 6 wallclock secs ( 3.94 usr + 1.05 sys = 4.99 CPU) # 1002004.01/s (n=5000000)
list: 7 wallclock secs ( 3.64 usr + 1.06 sys = 4.70 CPU) # 1063829.79/s (n=5000000)
Rate concat list
concat 1002004/s -- -6%
list 1063830/s 6% --
Note the extra wallclock and sys time underscoring how what you're printing to matters as much as what you're printing.
The list version is about 5% faster (note this is counter to Pavel's logic underlining the futility of trying to just think this stuff through). You said you're doing tens of thousands of these? Let's see... 100k takes 146ms of wallclock time on my laptop (which has crappy I/O) so the best you can do here is to shave off about 7ms. Congratulations. If you spent even a minute thinking about this it will take you 40k iterations of that code before you've made up that time. This is not to mention the opportunity cost, in that minute you could have been optimizing something far more important.
Now, somebody's going to say "now that we know which way is faster we should write it the fast way and save that time in every program we write making the whole exercise worthwhile!" No. It will still add up to an insignificant portion of your program's run time, far less than the 5% you get measuring a single statement. Second, logic like that causes you to prioritize micro-optimizations over maintainability.
Oh, and its different in 5.8.8 as in 5.10.0.
$ perl5.8.8 ~/tmp/bench.plx 5000000 > /dev/null
Benchmark: timing 5000000 iterations of concat, list...
concat: 3 wallclock secs ( 3.69 usr + 0.04 sys = 3.73 CPU) # 1340482.57/s (n=5000000)
list: 5 wallclock secs ( 3.97 usr + 0.06 sys = 4.03 CPU) # 1240694.79/s (n=5000000)
Rate list concat
list 1240695/s -- -7%
concat 1340483/s 8% --
It might even change depending on what Perl I/O layer you're using and operating system. So the whole exercise is futile.
Micro-optimization is a fool's game. Always profile first and look to optimizing your algorithm. Devel::NYTProf is an excellent profiler.
#!/usr/bin/perl -w
use strict;
use warnings;
use Benchmark qw(timethese cmpthese);
#open my $fh, ">", "/tmp/test.out" or die $!;
#open my $fh, ">", "/dev/null" or die $!;
my $fh = *STDOUT;
my $hash = {
foo => "something and stuff",
bar => "and some other stuff"
};
select *STDERR;
my $r = timethese(shift || -3, {
list => sub {
print $fh $hash->{foo}, "|", $hash->{bar};
},
concat => sub {
print $fh $hash->{foo}. "|". $hash->{bar};
},
});
cmpthese($r);
Unless you are executing millions of these statements, the performance difference will not matter. I really suggest concentrating on performance problems where they do exist - and the only way to find that out is to profile your application.
Premature optimization is something that Joel and Jeff had a podcast on, and whined about, for years. It's just a waste of time to try to optimize something until you KNOW that it's slow.
Perl is a high-level language, and as such the statements you see in the source code don't map directly to what the computer is actually going to do. You might find that a particular implementation of perl makes one thing faster than the other, but that's no guarantee that another implementation might take away the advantage (although they try not to make things slower).
If you're worried about I/O speed, there are a lot more interesting and useful things to tweak before you start worrying about commas and periods. See, for instance, the discussion under Perl write speed mystery.
UPDATE:
I just ran my own test.
1,000,000 iterations of each version took each < 1 second.
10mm iterations of each version took an average of 2.35 seconds for list version vs. 2.1 seconds for string concat version
Have you actually tried profiling this? Only takes a few seconds.
On my machine, it appears that B is faster. However, you should really have a look at Pareto Analysis. You've already wasted far, far more time thinking about this question then you'd ever save in any program run. For problems as trivial as this (character substitution!), you should wait to care until you actually have a problem.
Of the three options, I would probably choose string interpolation first and switch to commas for expressions that cannot be interpolated. This, humorously enough, means that my default choice is the slowest of the bunch, but given that they are all so close to each other in speed and that disk speed is probably going to be slower than anything else, I don't believe changing the method has any real performance benefits.
As others have said, write the code, then profile the code, then examine the algorithms and data structures you have chosen that are in the slow parts of the code, and, finally, look at the implementation of the algorithms and data structures. Anything else is foolish micro-optimizing that wastes more time than it saves.
You may also want to read perldoc perlperf
Rate string concat comma
string 803887/s -- -0% -7%
concat 803888/s 0% -- -7%
comma 865570/s 8% 8% --
#!/usr/bin/perl
use strict;
use warnings;
use Carp;
use List::Util qw/first/;
use Benchmark;
sub benchmark {
my $subs = shift;
my ($k, $sub) = each %$subs;
my $value = $sub->();
croak "bad" if first { $value ne $_->() and print "$value\n", $_->(), "\n" } values %$subs;
Benchmark::cmpthese -1, $subs;
}
sub fake_print {
#this is, plus writing output to the screen is what print does
no warnings;
my $output = join $,, #_;
return $output;
}
my ($x, $y) = ("a", "b");
benchmark {
comma => sub { return fake_print $x, "|", $y, "\n" },
concat => sub { return fake_print $x . "|" . $y . "\n" },
string => sub { return fake_print "$x|$y\n" },
};