Gradle task execution time: compileKotlin is way slower than compileJava? - performance

We have a multi-module Gradle project, and some of our modules have Kotlin and Java code.
The compileKotlin tasks take way longer than the `compile java tasks even though most of our codebase is still Java code - is there a known performance loss to mixing Java and Kotlin source in one module?
module
compile task
LOC
execution time
teamecho-global
kotlin
155
10s
java
1.103
2s
teamecho-base
kotlin
3899
16s
java
41.414
9s
teamecho-satisfaction
kotlin
3.600
10s
java
6.183
3s
teamecho-dynamic-frontend
kotlin
337
3s
Gradle Build Scan:
https://scans.gradle.com/s/o4avvpmrrg3oi/timeline
What surprises me the most is the 10 seconds spend on compiling the 155 LOC in the global module.
In comparison to that the dynamic-frontend module only takes about 3 seconds for 330 LOC (which is still a lot!?) but is still way faster than the 155 lines from the global module. Could this be because the dynamic-frontend module has no java code?
We would like to implement more and more of our code base in Kotlin but if the Gradle build time gets so much worse, I am not sure if we should continue with it. Any tips or pin points to what we can do to decrease the time the compileKotlin task takes? We must be doing something wrong or is this performance expected?

Related

The bytecode size of LendingPool.sol is over 24k

I used the "npm run compile" command to compile the protocol-v2 in the aave. I found that bytecode size of LendingPool.sol is 43,892 bytes. It exceeds the 24k of the contract's max limit of evm. But the protocol-v2 can deploy this contract to ethereum by using hardhat-deploy. I want to know the reason.
The Aave LendingPool.sol was compiled with the optimizer configured for 200 runs, see the Settings JSON under the link.
Solidity optimizer removes unused bytecode, optimizes paths, replaces multiple chunks of the same bytecode with links to just one copy of it, ... and one of its effects is reducing the bytecode size.

Julia package load extremely slow in first run

I'm using Julia 1.5.2 under Linux 5.4.0 and waited around 15 minutes for Pkg.add("DifferentialEquations"). Then I started the Kernel in Jupyter Notebook and ran the following code. It took terribly 1 minute to execute (the actual first time that I did this it took 225s).
t = time()
using Printf
using BenchmarkTools
using OrdinaryDiffEq
using Plots
tt = time() - t
#sprintf("It took %f seconds to import Printf, BenchmarkTools, OrdinaryDiffEq and Plots.", tt)
# It took 58.545894 seconds to import Printf, BenchmarkTools, OrdinaryDiffEq and Plots.
Finally, I done the same as above, but for each package. This is the summary:
Printf: 0.004755973815917969
BenchmarkTools: 0.06729602813720703
Plots: 19.99405598640442
OrdinaryDiffEq: 19.001102209091187
I know from here that Pkg was slow in the past, but I think that 15 minutes isn't a normal installing time at all. However, this is not my big problem.
I know that Julia needs to compile everything everytime the Kernel is started or some package is loaded. But it obviously is not a compilation time, it's a compilation eternity.
Can anyone figure out why this is so terribly slow? And, if it's normal, wouldn't it be better to provide precompiled packages to Pkg such as numpy and friends are in Python? Or at least compile forever in the first using?
Thank you!
My complete Platform Info:
Julia Version 1.5.2
Commit 539f3ce943 (2020-09-23 23:17 UTC)
Platform Info:
OS: Linux (x86_64-pc-linux-gnu)
CPU: Intel(R) Core(TM) i3-6100U CPU # 2.30GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-9.0.1 (ORCJIT, skylake)
This problem is generally called latency or time-to-first-plot (TTFP) when referring to julia-lang. There are some discussions you can find when using these keywords.
A nice recent analysis of this problem is assessed in the article "Analyzing sources of compiler latency in Julia: method invalidations"
At the time of writing (end 2020, stable release v1.5.3), no general solution is available but strategies of massive precompilation of packages instead of JIT is discussed, with marginal success.

What's the most effective function in julia to do a task inside a loop?

Good day everyone,
basically I have this:
#sync #parallel for i in 1:100;
V[i] = i
end
And I really have no problems at all. I'm working with julia 0.6.4 and I've noticed that the function #parallel doesn't exist in the new releases. So my question is: there's a more efficient way to run in parallel that simple task? maybe in other versions of Julia. The function #distributed does the same?
Thanks.
If you're updating code from 0.6 to 1.0, step through 0.7 first. It's literally exactly the same as 1.0, but with friendly warnings that tell you how to update your code! And yes, in this case, it'll tell you to use #distributed instead of #parallel and to load the Distributed standard library.
julia> #sync #parallel for i in 1:100;
V[i] = i
end
WARNING: Base.#parallel is deprecated: it has been moved to the standard library package `Distributed`.
Add `using Distributed` to your imports.
in module Main
┌ Warning: `#parallel` is deprecated, use `#distributed` instead.
│ caller = eval(::Module, ::Any) at boot.jl:319
└ # Core ./boot.jl:319
This was simply a rename, and a rename for a good reason: there are many forms of parallelism and the "most effective" form of parallelism for you will depend upon what task you're doing (in particular its runtime and IO) and the hardware you have available.

When will compile_to_c with vector types be supported?

When will compile_to_c with vector types be supported?
I've added calling Pipeline::compile_to_c() at conv_layer.cpp:93 line for getting C code generated by halide.
std::vector<Argument> empty_arg;
// p is defined like "Pipeline p(f_ReLU);"
p.compile_to_c("conv_layer.out.cpp", empty_arg, "f_ReLU");
After building conv_layer.cpp and then running it causes assertion error at CodeGen_C.cpp#212:
Can't use vector types when compiling to C (yet)
It's very low priority - it would take a lot of work to make it portable, and for not much payoff. Code generated by the C backend is slower to compile and slower to run than code generated via the LLVM backends, so it's not suitable for actually getting high-performance code. I'm not entirely sure, but I believe it's slower because we can't easily express all the aliasing and alignment info in the emitted C code that we can in LLVM bitcode.

golang tool pprof not working properly - same broken output regardless of profiling target

I've previously used the pprof tool without issue and it worked great - now I see output like the following no matter what I profile:
the application being profiled in this example probably makes 40+ function calls and even more complex apps are producing similar callgraphs for both cpu and memprofiling.
The apps Im trying to profile are all web applications, I am profiling them for one minute at a time and using wrk to generate 200,000,000+ requests = all returning data and a 2xx response
pprof suddenly stopped working a few days ago running osx yosemite - in an attempt to resolve the issue I recently upgraded to el capitan but result is the same.
Note: this is not just call graphs - calling list or top command produce similarly barren results but the apps themselves work fine:
(pprof) top
269.97kB of 269.97kB total ( 100%)
flat flat% sum% cum cum%
269.97kB 100% 100% 269.97kB 100%
(pprof)
I am using the following package: "github.com/davecheney/profile" with go v1.5.1
For clarity, here's what I'm doing to generate the profiles::
I import the above package into main.go and place the following at the top of my main func:
defer profile.Start(profile.MemProfile).Stop()
I then build the binary and run it:
go build -o orig /Users/danielwall/www/netlistener/application/adrequest.go /Users/danielwall/www/netlistener/application/cookie.go /Users/danielwall/www/netlistener/application/header.go /Users/danielwall/www/netlistener/application/lex.go /Users/danielwall/www/netlistener/application/main.go /Users/danielwall/www/netlistener/application/publisher_ids.go /Users/danielwall/www/netlistener/application/request.go /Users/danielwall/www/netlistener/application/response.go /Users/danielwall/www/netlistener/application/server.go /Users/danielwall/www/netlistener/application/sniff.go /Users/danielwall/www/netlistener/application/status.go /Users/danielwall/www/netlistener/application/transfer.go
./orig
I then see output like this:
2015/11/16 11:39:49 profile: memory profiling enabled, /var/folders/26/2sj70_sn72l_93j7tf6r07gr0000gn/T/profile614358295/mem.pprof
Now I work the app from another terminal :
wrk -d60 -c10 -H "X-Device: desktop" -H "X-Country-Code: GB" "http://localhost:8189/app?id=111&schema=xml2&ad_type=auto&url=http://test.com/&category=bob"
Running 1m test # http://localhost:8189/app?id=111&schema=xml2&ad_type=auto&url=http://test.com/&category=bob
2 threads and 10 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 414.09us 0.92ms 55.36ms 95.66%
Req/Sec 17.57k 3.19k 22.59k 76.00%
2097764 requests in 1.00m, 684.20MB read
Requests/sec: 34958.03
Transfer/sec: 11.40MB
After 60 seconds, I go back to check my profile:
^C2015/11/16 12:05:20 profile: caught interrupt, stopping profiles
go tool pprof /var/folders/26/2sj70_sn72l_93j7tf6r07gr0000gn/T/profile614358295/mem.pprof
Any ideas what might be happening here or where I could start with trouble shooting/solving this?
Any help suggestions welcome.
Your go tool pprof call is missing the binary itself. Call it as
go tool pprof ./orig /path/to/profile.pprof

Resources