Julia parallel computing in IPython Jupyter

Julia parallel computing in IPython Jupyter - parallel-processing

I'm preparing a small presentation in Ipython where I want to show how easy it is to do parallel operation in Julia.
It's basically a Monte Carlo Pi calculation described here
The problem is that I can't make it work in parallel inside an IPython (Jupyter) Notebook, it only uses one.
I started Julia as: julia -p 4
If I define the functions inside the REPL and run it there it works ok.
#everywhere function compute_pi(N::Int)
"""
Compute pi with a Monte Carlo simulation of N darts thrown in [-1,1]^2
Returns estimate of pi
"""
n_landed_in_circle = 0
for i = 1:N
x = rand() * 2 - 1 # uniformly distributed number on x-axis
y = rand() * 2 - 1 # uniformly distributed number on y-axis
r2 = x*x + y*y # radius squared, in radial coordinates
if r2 < 1.0
n_landed_in_circle += 1
end
end
return n_landed_in_circle / N * 4.0
end
 
function parallel_pi_computation(N::Int; ncores::Int=4)
"""
Compute pi in parallel, over ncores cores, with a Monte Carlo simulation throwing N total darts
"""
# compute sum of pi's estimated among all cores in parallel
sum_of_pis = #parallel (+) for i=1:ncores
compute_pi(int(N/ncores))
end
return sum_of_pis / ncores # average value
end
 
julia> #time parallel_pi_computation(int(1e9))
elapsed time: 2.702617652 seconds (93400 bytes allocated)
3.1416044160000003
But when I do:
using IJulia
notebook()
And try to do the same thing inside the Notebook it only uses 1 core:
In [5]: #time parallel_pi_computation(int(10e8))
elapsed time: 10.277870808 seconds (219188 bytes allocated)
Out[5]: 3.141679988
So, why isnt Jupyter using all the cores? What can I do to make it work?
Thanks.

Using addprocs(4) as the first command in your notebook should provide four workers for doing parallel operations from within your notebook.

One way to solve this is to create a kernel that always uses 4 cores. For that some manual work is required. I assume that you are on a unix machine.
In the folder ~/.ipython/kernels/julia-0.x, you will find following kernel.json file:
{
"display_name": "Julia 0.3.9",
"argv": [
"/usr/local/Cellar/julia/0.3.9_1/bin/julia",
"-i",
"-F",
"/Users/ch/.julia/v0.3/IJulia/src/kernel.jl",
"{connection_file}"
],
"language": "julia"
}
If you copy the whole folder cp -r julia-0.x julia-0.x-p4, and modify the newly copied kernel.json file:
{
"display_name": "Julia 0.3.9 p4",
"argv": [
"/usr/local/Cellar/julia/0.3.9_1/bin/julia",
"-p",
"4",
"-i",
"-F",
"/Users/ch/.julia/v0.3/IJulia/src/kernel.jl",
"{connection_file}"
],
"language": "julia"
}
The paths will probably be different for you. Note that I only gave the kernel a new name and added the command line argument `-p 4.
You should see a new kernel named Julia 0.3.9 p4 which should always use 4 cores.
Also note that this kernel file will not get updated when you update IJulia, so you have to update it manually whenever you update julia or IJulia.

You can add new kernels using this command:
using IJulia
#for 4 cores
installkernel("Julia_4_threads", env=Dict("JULIA_NUM_THREADS"=>"4"))
#or for 8 cores
installkernel("Julia_8_threads", env=Dict("JULIA_NUM_THREADS"=>"8"))
After restart your VSCode this options will apear you your select kernel option.

Related

Why is PyTorch inference faster when preloading all mini-batches to list?

While benchmarking different dataloaders I noticed some peculiar behavior with the PyTorch built-in dataloader. I am running the below code on a cpu-only machine with the MNIST dataset.
It seems that a simple forward pass in my model is much faster when mini-batches are preloaded to a list rather than fetched during iteration:
import torch, torchvision
import torch.nn as nn
import torchvision.transforms as T
from torch.profiler import profile, record_function, ProfilerActivity
mnist_dataset = torchvision.datasets.MNIST(root=".", train=True, transform=T.ToTensor(), download=True)
loader = torch.utils.data.DataLoader(dataset=mnist_dataset, batch_size=128,shuffle=False, pin_memory=False, num_workers=4)
model = nn.Sequential(nn.Flatten(), nn.Linear(28*28, 256), nn.BatchNorm1d(256), nn.ReLU(), nn.Linear(256, 10))
model.train()
with profile(activities=[ProfilerActivity.CPU], record_shapes=True) as prof:
with record_function("model_inference"):
for (images_iter, labels_iter) in loader:
outputs_iter = model(images_iter)
print(prof.key_averages().table(sort_by="cpu_time_total", row_limit=10))
with profile(activities=[ProfilerActivity.CPU], record_shapes=True) as prof:
with record_function("model_inference"):
train_list = [sample for sample in loader]
for (images_iter, labels_iter) in train_list:
outputs_iter = model(images_iter)
print(prof.key_averages().table(sort_by="cpu_time_total", row_limit=10))
The subset of most interesting output from the Torch profiler is:
Name Self CPU % Self CPU CPU total % CPU total CPU time avg # of Calls
aten::batch_norm 0.02% 644.000us 4.57% 134.217ms 286.177us 469
Self CPU time total: 2.937s
Name Self CPU % Self CPU CPU total % CPU total CPU time avg # of Calls
aten::batch_norm 70.48% 6.888s 70.62% 6.902s 14.717ms 469
Self CPU time total: 9.773s
Seems like aten::batch_norm (batch normalization) is taking significantly more time in the case where samples are not preloaded to a list, but I can't figure out why since it should be the same operation?
The above was tested on a 4-core cpu with python 3.8
If anything the version of pre-loading to a list should be slight slower overall due to the overhead of creating the list

with
torch=1.10.2+cu102
torchvision=0.11.3+cu102
had following results
Self CPU time total: 2.475s
Self CPU time total: 2.800s
Try to reproduce this code again using different lib versions

How to eliminate JIT overhead in a Julia executable (with MWE)

I'm using PackageCompiler hoping to create an executable that eliminates just-in-time compilation overhead.
The documentation explains that I must define a function julia_main to call my program's logic, and write a "snoop file", a script that calls functions I wish to precompile. My julia_main takes a single argument, the location of a file containing the input data to be analysed. So to keep things simple my snoop file simply makes one call to julia_main with a particular input file. So I'd hope to see the generated executable run nice and fast (no compilation overhead) when executed against that same input file.
But alas, that's not what I see. In a fresh Julia instance julia_main takes approx 74 seconds for the first execution and about 4.5 seconds for subsequent executions. The executable file takes approx 50 seconds each time it's called.
My use of the build_executable function looks like this:
julia> using PackageCompiler
julia> build_executable("d:/philip/source/script/julia/jsource/SCRiPTMain.jl",
"testexecutable",
builddir = "d:/temp/builddir4",
snoopfile = "d:/philip/source/script/julia/jsource/snoop.jl",
compile = "all",
verbose = true)
Questions:
Are the above arguments correct to achieve my aim of an executable with no JIT overhead?
Any other advice for me?
Here's what happens in response to that call to build_executable. The lines from Start of snoop file execution! to End of snoop file execution! are emitted by my code.
Julia program file:
"d:\philip\source\script\julia\jsource\SCRiPTMain.jl"
C program file:
"C:\Users\Philip\.julia\packages\PackageCompiler\CJQcs\examples\program.c"
Build directory:
"d:\temp\builddir4"
Executing snoopfile: "d:\philip\source\script\julia\jsource\snoop.jl"
Start of snoop file execution!
┌ Warning: The 'control file' contains the key 'InterpolateCovariance' with value 'true' but that is not supported. Pass a value of 'false' or omit the key altogether.
└ # ValidateInputs d:\Philip\Source\script\Julia\JSource\ValidateInputs.jl:685
Time to build model 20.058000087738037
Saving c:/temp/SCRiPT/SCRiPTModel.jls
Results written to c:/temp/SCRiPT/SCRiPTResultsJulia.json
Time to write file: 3620 milliseconds
Time in method runscript: 76899 milliseconds
End of snoop file execution!
[ Info: used 1313 out of 1320 precompile statements
Build static library "testexecutable.a":
atexit_hook_copy = copy(Base.atexit_hooks) # make backup
# clean state so that any package we use can carelessly call atexit
empty!(Base.atexit_hooks)
Base.__init__()
Sys.__init__() #fix https://github.com/JuliaLang/julia/issues/30479
using REPL
Base.REPL_MODULE_REF[] = REPL
Mod = #eval module $(gensym("anon_module")) end
# Include into anonymous module to not polute namespace
Mod.include("d:\\\\temp\\\\builddir4\\\\julia_main.jl")
Base._atexit() # run all exit hooks we registered during precompile
empty!(Base.atexit_hooks) # don't serialize the exit hooks we run + added
# atexit_hook_copy should be empty, but who knows what base will do in the future
append!(Base.atexit_hooks, atexit_hook_copy)
Build shared library "testexecutable.dll":
`'C:\Users\Philip\.julia\packages\WinRPM\Y9QdZ\deps\usr\x86_64-w64-mingw32\sys-root\mingw\bin\gcc.exe' --sysroot 'C:\Users\Philip\.julia\packages\WinRPM\Y9QdZ\deps\usr\x86_64-w64-mingw32\sys-root' -shared '-DJULIAC_PROGRAM_LIBNAME="testexecutable.dll"' -o testexecutable.dll -Wl,--whole-archive testexecutable.a -Wl,--no-whole-archive -std=gnu99 '-IC:\Users\philip\AppData\Local\Julia-1.2.0\include\julia' -DJULIA_ENABLE_THREADING=1 '-LC:\Users\philip\AppData\Local\Julia-1.2.0\bin' -Wl,--stack,8388608 -ljulia -lopenlibm -m64 -Wl,--export-all-symbols`
Build executable "testexecutable.exe":
`'C:\Users\Philip\.julia\packages\WinRPM\Y9QdZ\deps\usr\x86_64-w64-mingw32\sys-root\mingw\bin\gcc.exe' --sysroot 'C:\Users\Philip\.julia\packages\WinRPM\Y9QdZ\deps\usr\x86_64-w64-mingw32\sys-root' '-DJULIAC_PROGRAM_LIBNAME="testexecutable.dll"' -o testexecutable.exe 'C:\Users\Philip\.julia\packages\PackageCompiler\CJQcs\examples\program.c' testexecutable.dll -std=gnu99 '-IC:\Users\philip\AppData\Local\Julia-1.2.0\include\julia' -DJULIA_ENABLE_THREADING=1 '-LC:\Users\philip\AppData\Local\Julia-1.2.0\bin' -Wl,--stack,8388608 -ljulia -lopenlibm -m64`
Copy Julia libraries to build directory:
7z.dll
BugpointPasses.dll
libamd.2.4.6.dll
libamd.2.dll
libamd.dll
libatomic-1.dll
libbtf.1.2.6.dll
libbtf.1.dll
libbtf.dll
libcamd.2.4.6.dll
libcamd.2.dll
libcamd.dll
libccalltest.dll
libccolamd.2.9.6.dll
libccolamd.2.dll
libccolamd.dll
libcholmod.3.0.13.dll
libcholmod.3.dll
libcholmod.dll
libclang.dll
libcolamd.2.9.6.dll
libcolamd.2.dll
libcolamd.dll
libdSFMT.dll
libexpat-1.dll
libgcc_s_seh-1.dll
libgfortran-4.dll
libgit2.dll
libgmp.dll
libjulia.dll
libklu.1.3.8.dll
libklu.1.dll
libklu.dll
libldl.2.2.6.dll
libldl.2.dll
libldl.dll
libllvmcalltest.dll
libmbedcrypto.dll
libmbedtls.dll
libmbedx509.dll
libmpfr.dll
libopenblas64_.dll
libopenlibm.dll
libpcre2-8-0.dll
libpcre2-8.dll
libpcre2-posix-2.dll
libquadmath-0.dll
librbio.2.2.6.dll
librbio.2.dll
librbio.dll
libspqr.2.0.9.dll
libspqr.2.dll
libspqr.dll
libssh2.dll
libssp-0.dll
libstdc++-6.dll
libsuitesparseconfig.5.4.0.dll
libsuitesparseconfig.5.dll
libsuitesparseconfig.dll
libsuitesparse_wrapper.dll
libumfpack.5.7.8.dll
libumfpack.5.dll
libumfpack.dll
libuv-2.dll
libwinpthread-1.dll
LLVM.dll
LLVMHello.dll
zlib1.dll
All done
julia>
EDIT
I was afraid that creating a minimal working example would be hard, but it was straightforward:
TestBuildExecutable.jl contains:
module TestBuildExecutable
Base.#ccallable function julia_main(ARGS::Vector{String}=[""])::Cint
#show sum(myarray())
return 0
end
#Function which takes approx 8 seconds to compile. Returns a 500 x 20 array of 1s
function myarray()
[1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1;
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1;
# PLEASE EDIT TO INSERT THE MISSING 496 LINES, EACH IDENTICAL TO THE LINE ABOVE!
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1;
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]
end
end #module
SnoopFile.jl contains:
module SnoopFile
currentpath = dirname(#__FILE__)
push!(LOAD_PATH, currentpath)
unique!(LOAD_PATH)
using TestBuildExecutable
println("Start of snoop file execution!")
TestBuildExecutable.julia_main()
println("End of snoop file execution!")
end # module
In a fresh Julia instance, julia_main takes 8.3 seconds for the first execution and half a millisecond for the second execution:
julia> #time TestBuildExecutable.julia_main()
sum(myarray()) = 10000
8.355108 seconds (425.36 k allocations: 25.831 MiB, 0.06% gc time)
0
julia> #time TestBuildExecutable.julia_main()
sum(myarray()) = 10000
0.000537 seconds (25 allocations: 82.906 KiB)
0
So next I call build_executable:
julia> using PackageCompiler
julia> build_executable("d:/philip/source/script/julia/jsource/TestBuildExecutable.jl",
"testexecutable",
builddir = "d:/temp/builddir15",
snoopfile = "d:/philip/source/script/julia/jsource/SnoopFile.jl",
verbose = false)
Julia program file:
"d:\philip\source\script\julia\jsource\TestBuildExecutable.jl"
C program file:
"C:\Users\Philip\.julia\packages\PackageCompiler\CJQcs\examples\program.c"
Build directory:
"d:\temp\builddir15"
Start of snoop file execution!
sum(myarray()) = 10000
End of snoop file execution!
[ Info: used 79 out of 79 precompile statements
All done
Finally, in a Windows Command Prompt:
D:\temp\builddir15>testexecutable
sum(myarray()) = 1000
D:\temp\builddir15>
which took (by my stopwatch) 8 seconds to run, and it takes 8 seconds to run every time it's executed, not just the first time. This is consistent with the executable doing a JIT compile every time it's run, but the snoop file is designed to avoid that!
Version information:
julia> versioninfo()
Julia Version 1.2.0
Commit c6da87ff4b (2019-08-20 00:03 UTC)
Platform Info:
OS: Windows (x86_64-w64-mingw32)
CPU: Intel(R) Core(TM) i7-6700 CPU # 3.40GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-6.0.1 (ORCJIT, skylake)
Environment:
JULIA_NUM_THREADS = 8
JULIA_EDITOR = "C:\Users\Philip\AppData\Local\Programs\Microsoft VS Code\Code.exe"

Looks like you are using Windows.
At some point PackageCompiler.jl will be mature for Windows at which you can try it.

The solution was indeed to wait for progress on PackageCompilerX, as suggested by #xiaodai.
On 10 Feb 2020 what was formerly PackageCompilerX became a new (version 1.0 of) PackageCompiler, with a significantly changed API, and more thorough documentation.
In particular, the MWE above (mutated for the new API to PackageCompiler) now works correctly without any JIT overhead.

xarray too slow for performance critical code

I planned to use xarray extensively in some numerically intensive scientific code that I am writing. So far, it makes the code very elegant, but I think I will have to abandon it as the performance cost is far too high.
Here is an example, which creates two arrays and multiplies parts of them together using xarray (with several indexing schemes), and numpy. I used num_comp=2 and num_x=10000:
Line # Hits Time Per Hit % Time Line Contents
4 #profile
5 def xr_timing(num_comp, num_x):
6 1 4112 4112.0 10.1 da1 = xr.DataArray(np.random.random([num_comp, num_x]).astype(np.float32), dims=['component', 'x'], coords={'component': ['a', 'b'], 'x': np.linspace(0, 1, num_x)})
7 1 438 438.0 1.1 da2 = da1.copy()
8 1 1398 1398.0 3.4 da2[:] = np.random.random([num_comp, num_x]).astype(np.float32)
9 1 7148 7148.0 17.6 da3 = da1.isel(component=0).drop('component') * da2.isel(component=0).drop('component')
10 1 6298 6298.0 15.5 da4 = da1[dict(component=0)].drop('component') * da2[dict(component=0)].drop('component')
11 1 7541 7541.0 18.6 da5 = da1.sel(component='a').drop('component') * da2.sel(component='a').drop('component')
12 1 7184 7184.0 17.7 da6 = da1.loc[dict(component='a')].drop('component') * da2.loc[dict(component='a')].drop('component')
13 1 6479 6479.0 16.0 da7 = da1[0, :].drop('component') * da2[0, :].drop('component')
15 #profile
16 def np_timing(num_comp, num_x):
17 1 1027 1027.0 50.2 da1 = np.random.random([num_comp, num_x]).astype(np.float32)
18 1 977 977.0 47.8 da2 = np.random.random([num_comp, num_x]).astype(np.float32)
19 1 41 41.0 2.0 da3 = da1[0, :] * da2[0, :]
The fastest xarray multiplication takes about 150X the time of the numpy version. This is just one of the operations in my code, but I find most of them are many times slower than the numpy equivalent, which is unfortunate as xarray makes the code so much clearer. Am I doing something wrong?
Update: Even da1[0, :].values * da2[0, :].values (which loses many of the benefits of using xarray) takes 2464 time units.
I am using xarray 0.9.6, pandas 0.21.0, numpy 1.13.3, and Python 3.5.2.
Update 2:
As requested by #Maximilian, here is a re-run with num_x=1000000:
Line # Hits Time Per Hit % Time Line Contents
# xarray
9 5 408596 81719.2 11.3 da3 = da1.isel(component=0).drop('component') * da2.isel(component=0).drop('component')
10 5 407003 81400.6 11.3 da4 = da1[dict(component=0)].drop('component') * da2[dict(component=0)].drop('component')
11 5 411248 82249.6 11.4 da5 = da1.sel(component='a').drop('component') * da2.sel(component='a').drop('component')
12 5 411730 82346.0 11.4 da6 = da1.loc[dict(component='a')].drop('component') * da2.loc[dict(component='a')].drop('component')
13 5 406757 81351.4 11.3 da7 = da1[0, :].drop('component') * da2[0, :].drop('component')
14 5 48800 9760.0 1.4 da8 = da1[0, :].values * da2[0, :].values
# numpy
20 5 37476 7495.2 2.9 da3 = da1[0, :] * da2[0, :]
The performance difference has decreased substantially, as expected (only about 10X slower now), but I am still glad that the issue will be mentioned in the next release of the documentation as even this amount of difference may surprise some people.

Yes, this is a known limitation for xarray. Performance sensitive code that uses small arrays is much slower for xarray than NumPy. I wrote a new section about this in our docs for the next version:
http://xarray.pydata.org/en/stable/computation.html#wrapping-custom-computation
You basically have two options:
Write your performance sensitive code on unwrapped arrays, and then wrap them back in xarray data structures. Xarray v0.10 has a new helper function (apply_ufunc) that makes this a little easier. See the link above if you are interested in this.
Use something other than xarray/Python to do your computation. This could also make sense because Python itself adds significant overhead. Julia's AxisArrays.jl looks like interesting, though I haven't tried it myself.
I suppose option 3 would be to rewrite xarray itself in C++ (e.g., on top of xtensor), but that would be much more involved!

Is it possible to vectorize annotation for matplotlib?

As a part of a large QC benchmark I am creating a large number (approx 100K) of scatter plots in a single PDF using PdfPages backend. (See further down for the code)
The issue I am having is that the plotting takes too much time, see output from a custom profiling/debugging effort:
Checkpoint1: Predictions done in 1.110076904296875 millis
Checkpoint2: df created and correlations calculated in 3.108978271484375 millis
Checkpoint3: plotting and accumulating done in 231.31990432739258 millis
Cycle completed in 0.23553895950317383 secs
----------------------
Checkpoint1: Predictions done in 3.718852996826172 millis
Checkpoint2: df created and correlations calculated in 2.353191375732422 millis
Checkpoint3: plotting and accumulating done in 155.93385696411133 millis
Cycle completed in 0.16200590133666992 secs
----------------------
Checkpoint1: Predictions done in 2.920866012573242 millis
Checkpoint2: df created and correlations calculated in 1.995086669921875 millis
Checkpoint3: plotting and accumulating done in 161.8819236755371 millis
Cycle completed in 0.16679787635803223 secs
The figure for plotting gets an 2-3x increase if I annotate the points, which is necessary for the use case. As you can see below I have tried both itertuples() and apply(), switching to apply did not give a significant change in the times as far as I can see.
def annotate(row, ax):
ax.annotate(row.name, (row.exp, row.model),
xytext=(10, 20), textcoords='offset points',
arrowprops=dict(arrowstyle="-", connectionstyle="arc,angleA=180,armA=10"),
family='sans-serif', fontsize=8, color='darkslategrey')
def plot2File(df, file, seq, z, p, s):
""" Plot predictions vs experimental """
plttitle = f"Correlations for {seq}+{z} \n pearson={p} \n spearman={s}"
ax = df.plot(x='exp', y='model', kind='scatter', title=plttitle, s=40)
df.apply(annotate, ax=ax, axis=1)
# for row in df.itertuples():
# ax.annotate(row.Index, (row.exp, row.model),
# xytext=(10, 20), textcoords='offset points',
# arrowprops=dict(arrowstyle="-", connectionstyle="arc,angleA=180,armA=10"),
# family='sans-serif', fontsize=8, color='darkslategrey')
plt.savefig(file, bbox_inches='tight', format='pdf')
plt.close()
Given the nice explanation by Jeff on a question regarding iterrows() I was wondering if it would be possible to vectorize the annotation process? Or should I ditch using a data frame altogether?

R csv.bz2 Shell Windows counting number of lines

I'm having problems for counting the number of lines in a messy csv.bz2 file.
Since this is a huge file I want to be able to preallocate a data frame before reading the bzip2 file with the read.csv() function.
As you can see in the following tests, my results are widely variable, and none of the correspond with the number of actual rows in the csv.bz2 file.
> system.time(nrec1 <- as.numeric(shell('type "MyFile.csv" | find /c ","', intern=T)))
user system elapsed
0.02 0.00 53.50
> nrec1
[1] 1060906
> system.time(nrec2 <- as.numeric(shell('type "MyFile.csv.bz2" | find /c ","', intern=T)))
user system elapsed
0.00 0.02 10.15
> nrec2
[1] 126715
> system.time(nrec3 <- as.numeric(shell('type "MyFile.csv" | find /v /c ""', intern=T)))
user system elapsed
0.00 0.02 53.10
> nrec3
[1] 1232705
> system.time(nrec4 <- as.numeric(shell('type "MyFile.csv.bz2" | find /v /c ""', intern=T)))
user system elapsed
0.00 0.01 4.96
> nrec4
[1] 533062
The most interesting result is the one I called nrec4 since it takes no time, and it returns roughly half the number of rows of nrec1, but I'm totally unsure if the naive multiplication by 2 will be ok.
I have tried several other methods including fread() and hsTableReader() but the former crashes and the later is so slow that I won't even consider it further.
My questions are:
Which reliable method can I use for counting the number of rows in a csv.bz2 file?
It's ok to use a formula for calculating the number of rows directly in a csv.bz2 file without decompressing it?
Thanks in advance,
Diego

Roland was right from the beginning.
When using the garbage collector, the illusion of improved performance still remained.
I had to close and re-start R for doing an accurate test.
Yes, the process is still a little bit faster by few seconds (red line), and the increase in RAM consumption is more uniform when using nrows.
But at least in this case is not worthy the effort of trying to find an optimization for the read.csv() function.
It is slow but it is what it is.
If someone know about a faster approach I'm interested.
fread() crashes just in case.
Thanks.
Without nrows (Blue Line)
Sys.time()
system.time(storm.data <- read.csv(fileZip,
header = TRUE,
stringsAsFactors = F,
comment.char = "",
colClasses = "character"))
Sys.time()
rm(storm.data)
gc()
With nrows (Red Line)
Sys.time()
system.time(nrec12 <- as.numeric(
shell('type "MyFile.csv.bz2" | find /v /c ""',
intern=T)))
nrec12 <- nrec12 * 2
system.time(storm.data <- read.csv(fileZip,
stringsAsFactors = F,
comment.char = "",
colClasses = "character",
nrows = nrec12))
Sys.time()
rm(storm.data)
gc()

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Julia parallel computing in IPython Jupyter - parallel-processing

Using addprocs(4) as the first command in your notebook should provide four workers for doing parallel operations from within your notebook.

Related

Why is PyTorch inference faster when preloading all mini-batches to list?

How to eliminate JIT overhead in a Julia executable (with MWE)

xarray too slow for performance critical code

Is it possible to vectorize annotation for matplotlib?

R csv.bz2 Shell Windows counting number of lines

Categories

Resources