In shell, how to extract the digit among a string - shell

In fact, I'd like to use regex to extract the seconds of Total Execution time: in shell. And someone could help me out of this?
Here is the target string:
Val Loss: 20.032490309035197
Val Accuracy: 0.13
SystemML Statistics:
Total elapsed time: 80.698 sec.
Total compilation time: 1.325 sec.
Total execution time: 79.373 sec.
Number of compiled MR Jobs: 0.
Number of executed MR Jobs: 0.
Cache hits (Mem, WB, FS, HDFS): 141449/0/0/2.
Cache writes (WB, FS, HDFS): 22097/0/0.
Cache times (ACQr/m, RLS, EXP): 0.151/0.024/0.285/0.000 sec.
HOP DAGs recompiled (PRED, SB): 0/1802.
HOP DAGs recompile time: 1.649 sec.
Functions recompiled: 1.
Functions recompile time: 0.006 sec.
Paramserv func number of workers: 1.
Paramserv func total gradients compute time: 38.000 secs.
Paramserv func total aggregation time: 29.604 secs.
Paramserv func model broadcasting time: 0.008 secs.
Paramserv func total batch slicing time: 0.000 secs.
Total JIT compile time: 20.714 sec.
Total JVM GC count: 228.
Total JVM GC time: 3.195 sec.

You may use this awk command:
awk '/Total execution time:/{print $(NF-1)}' file
79.373

Related

Calculate CPU usage from process.cpu.time

https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/receiver/hostmetricsreceiver/internal/scraper/processscraper/documentation.md
I have been using this library which gives me 3 values for a single process :
user time, system time & wait time
One example value is : 0.05, 0.01, 0.00
How can I calculate CPU percent of the particular process ?
To calculate the total CPU load/utilization percent of the system, we need to calculate "total system cpu time (during the period)" + "total user cpu time (during the period)" / "period"
In your case, suppose you take sample every 2 seconds, then for every sample, you need to calculate:
= ( (process.cpu.time.sys - previous_process.cpu.time.sys) + (process.cpu.time.user - previous_process.cpu.time.user) ) / 2

get_surface().blit() vs var_screen.blit()

I am wondering - is it more performant to save your pygame "window" surface as a variable, and call blit on the variable for every image or call get_surface().blit(...) every time?
Especially when it comes to games, there are lots and lots of pngs/sprites/something to be displayed. I was wondering whether anyone has experience with the performance of calling the function over saving your "screen" in a variable?
Example one with variable:
screen = pygame.display.get_surface()
while True:
screen.blit(my_image.png)
Example two:
while True:
pygame.display.get_surface().blit(my_image.png)
Best regards,
Cribber
I have taken the advice and run a performance test myself and performance wise it really doesn't matter apparently.
So the deciding factor being readability, I will take the variable option with screen.blit().
start_time = time.time()
while i <= 1000:
for event in pygame.event.get():
if event.type == pygame.QUIT:
pygame.quit()
sys.exit()
#pygame.display.get_surface().blit(png, (x, y))
screen.blit(png, (x, y))
pygame.display.flip()
i += 1
elapsed_time = time.time() - start_time
1) display.get_surface().blit()
Sum = 14.827981948852539
x100 - Elapsed time: 0.31400012969970703
x1000 - Elapsed time: 2.9339892864227295
x1000 - Elapsed time: 2.897007465362549
x1000 - Elapsed time: 2.9139883518218994
x1000 - Elapsed time: 2.834001064300537
x1000 - Elapsed time: 2.934995651245117
2) screen.blit()
Sum = 14.843550443649292
x100 - Elapsed time: 0.2919886112213135
x1000 - Elapsed time: 2.8539986610412598
x1000 - Elapsed time: 2.914994239807129
x1000 - Elapsed time: 2.926569938659668
x1000 - Elapsed time: 2.9420039653778076
x1000 - Elapsed time: 2.9139950275421143

Inv() versus '\' on JULIA

So I am working on coding a program that requires me to assign and invert a large matrix M of size 100 x 100 each time, in a loop that repeats roughly 1000 times.
I was originally using the inv() function but since it is taking a lot of time, I want to optimize my program to make it run faster. Hence I wrote some dummy code as a test for what could be slowing things down:
function test1()
for i in (1:100)
𝐌=Diagonal(rand(100))
inverse_𝐌=inv(B)
end
end
using BenchmarkTools
#benchmark test1()
---------------------------------
BenchmarkTools.Trial:
memory estimate: 178.13 KiB
allocs estimate: 400
--------------
minimum time: 67.991 μs (0.00% GC)
median time: 71.032 μs (0.00% GC)
mean time: 89.125 μs (19.43% GC)
maximum time: 2.490 ms (96.64% GC)
--------------
samples: 10000
evals/sample: 1
When I use the '\' operator to evaluate the inverse:
function test2()
for i in (1:100)
𝐌=Diagonal(rand(100))
inverse_𝐌=𝐌\Diagonal(ones(100))
end
end
using BenchmarkTools
#benchmark test2()
-----------------------------------
BenchmarkTools.Trial:
memory estimate: 267.19 KiB
allocs estimate: 600
--------------
minimum time: 53.728 μs (0.00% GC)
median time: 56.955 μs (0.00% GC)
mean time: 84.430 μs (30.96% GC)
maximum time: 2.474 ms (96.95% GC)
--------------
samples: 10000
evals/sample: 1
I can see that inv() is taking less memory than the '\' operator, although the '\' operator is faster in the end.
Is this because I am using an extra Identity matrix--> Diagonal(ones(100)) in test2() ? Does it mean that everytime I run my loop, a new portion of memory is being assigned to store the Identity matrix?
My original matrix 𝐌 is a tridigonal matrix. Does inverting a matrix with such a large number of zeros cost more memory allocation? For such a matrix,what is better to use: the inv() or the '\' function or is there some other better strategy?
P.S: How does inverting matrices in julia compare to other languages like C and Python? When I ran the same algorithm in my older program written in C, it took a considerably less amount of time...so I was wondering if the inv() function was the culprit here.
EDIT:
So as pointed out,I had made a typo while typing in the test1() function. It's actually
function test1()
for i in (1:100)
𝐌=Diagonal(rand(100))
inverse_𝐌=inv(𝐌)
end
end
However my problem stayed the same, test1() function allocates less memory but takes more time:
using BenchmarkTools
#benchmark test1()
>BenchmarkTools.Trial:
memory estimate: 178.13 KiB
allocs estimate: 400
--------------
minimum time: 68.640 μs (0.00% GC)
median time: 71.240 μs (0.00% GC)
mean time: 90.468 μs (20.23% GC)
maximum time: 3.455 ms (97.41% GC)
samples: 10000
evals/sample: 1
using BenchmarkTools
#benchmark test2()
BenchmarkTools.Trial:
memory estimate: 267.19 KiB
allocs estimate: 600
--------------
minimum time: 54.368 μs (0.00% GC)
median time: 57.162 μs (0.00% GC)
mean time: 86.380 μs (31.68% GC)
maximum time: 3.021 ms (97.52% GC)
--------------
samples: 10000
evals/sample: 1
I also tested some other variants of the test2() function:
function test3()
for i in (1:100)
𝐌=Diagonal(rand(100))
𝐈=Diagonal(ones(100))
inverse_𝐌=𝐌\𝐈
end
end
function test4(𝐈)
for i in (1:100)
𝐌=Diagonal(rand(100))
inverse_𝐌=𝐌\𝐈
end
end
using BenchmarkTools
#benchmark test3()
>BenchmarkTools.Trial:
memory estimate: 267.19 KiB
allocs estimate: 600
--------------
minimum time: 54.248 μs (0.00% GC)
median time: 57.120 μs (0.00% GC)
mean time: 86.628 μs (32.01% GC)
maximum time: 3.151 ms (97.23% GC)
--------------
samples: 10000
evals/sample: 1
using BenchmarkTools
#benchmark test4(Diagonal(ones(100)))
>BenchmarkTools.Trial:
memory estimate: 179.02 KiB
allocs estimate: 402
--------------
minimum time: 48.556 μs (0.00% GC)
median time: 52.731 μs (0.00% GC)
mean time: 72.193 μs (25.48% GC)
maximum time: 3.015 ms (97.32% GC)
--------------
samples: 10000
evals/sample: 1
test2() and test3() are equivalent. I realised I could go about the extra memory allocation in test2() by passing the Identity matrix as a variable instead, as done in test4(). It also fastens up the the function.
The question you ask is tricky and depends on the context. I can answer you within your context, but if you post your real problem the answer might change.
So for your question, the codes are not equivalent, because in the first you use some matrix B in inv(B), which is undefined (and probably is a global, type unstable, variable), If you change B to 𝐌 actually the first code is a bit faster:
julia> function test1()
for i in (1:100)
𝐌=Diagonal(rand(100))
inverse_𝐌=inv(𝐌)
end
end
test1 (generic function with 1 method)
julia> function test2()
for i in (1:100)
𝐌=Diagonal(rand(100))
inverse_𝐌=𝐌\Diagonal(ones(100))
end
end
test2 (generic function with 1 method)
julia> using BenchmarkTools
julia> #benchmark test1()
BenchmarkTools.Trial:
memory estimate: 178.13 KiB
allocs estimate: 400
--------------
minimum time: 28.273 μs (0.00% GC)
median time: 32.900 μs (0.00% GC)
mean time: 43.447 μs (14.28% GC)
maximum time: 34.779 ms (99.70% GC)
--------------
samples: 10000
evals/sample: 1
julia> #benchmark test2()
BenchmarkTools.Trial:
memory estimate: 267.19 KiB
allocs estimate: 600
--------------
minimum time: 28.273 μs (0.00% GC)
median time: 33.928 μs (0.00% GC)
mean time: 45.907 μs (15.25% GC)
maximum time: 34.718 ms (99.74% GC)
--------------
samples: 10000
evals/sample: 1
Now, the second thing is that your code uses diagonal matrices and Julia is smart enough to have specialized methods for inv and \ for this kind of matrices. Their definitions are as follows:
(\)(Da::Diagonal, Db::Diagonal) = Diagonal(Da.diag .\ Db.diag)
function inv(D::Diagonal{T}) where T
Di = similar(D.diag, typeof(inv(zero(T))))
for i = 1:length(D.diag)
if D.diag[i] == zero(T)
throw(SingularException(i))
end
Di[i] = inv(D.diag[i])
end
Diagonal(Di)
end
And you can see that such an example is not fully representative for the general case (if the matrices were not diagonal other methods would be used). You can check which methods are used like this:
julia> #which �\Diagonal(ones(100))
\(Da::Diagonal, Db::Diagonal) in LinearAlgebra at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.1\LinearAlgebra\src\diagonal.jl:493
julia> #which inv(�)
inv(D::Diagonal{T,V} where V<:AbstractArray{T,1}) where T in LinearAlgebra at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.1\LinearAlgebra\src\diagonal.jl:496
and look up the code yourself.
I assume that in your real exercise you do not have diagonal matrices. In particular if you have block matrices you might have a look at https://github.com/JuliaArrays/BlockArrays.jl package, as it might have optimized methods for your use case. You might also have a look at https://github.com/JuliaMatrices/BandedMatrices.jl.
In summary - you can expect Julia to try to highly optimize the code for the specific use case, so in order to get a definitive answer for your use case the detailed specification of your problem will matter. If you would share it a more specific answer can be given.

Why is iterating through flattened iterator slow?

Scala 2.11.8
I'm measuring iteration through flattened and non-flattened iterator. I wrote the following benchmark:
#State(Scope.Benchmark)
class SerializeBenchmark
var list = List(
List("test", 12, 34, 56),
List("test-test-test", 123, 444, 0),
List("test-test-test-tes", 145, 443, 4333),
List("testdsfg-test-test-tes", 3145, 435, 333),
List("test-tessdfgsdt-tessdfgt-tes", 1455, 43, 333),
List("tesewrt-test-tessdgdsft-tes", 13345, 4533, 3222333),
List("ewrtes6yhgfrtyt-test-test-tes", 122245, 433444, 322233),
List("tserfest-test-testtryfgd-tes", 143345, 43, 3122233),
List("test-reteytest-test-tes", 1121145, 4343, 3331212),
List("test-test-ertyeu6test-tes", 14115, 4343, 33433),
List("test-lknlkkn;lkntest-ertyeu6test-tes", 98141115, 4343, 33433),
List("tkknknest-test-ertyeu6test-tes", 914111215, 488343, 33433),
List("test-test-ertyeu6test-tes", 1411125, 437743, 93433),
List("test-test-ertyeu6testo;kn;lkn;lk-tes", 14111215, 5409343, 39823),
List("telnlkkn;lnih98st-test-ertyeu6test-tes", 1557215, 498343, 3377433)
)
#Benchmark
#OutputTimeUnit(TimeUnit.NANOSECONDS)
#BenchmarkMode(Array(Mode.AverageTime))
def flattenerd(bh: Blackhole): Any = {
list.iterator.flatten.foreach(bh.consume)
}
#Benchmark
#OutputTimeUnit(TimeUnit.NANOSECONDS)
#BenchmarkMode(Array(Mode.AverageTime))
def raw(bh: Blackhole): Any = {
list.iterator.foreach(_.foreach(bh.consume))
}
}
After running these benchmarks several times I got the following results:
Benchmark Mode Cnt Score Error Units
SerializeBenchmark.flattenerd avgt 5 10311,373 ± 1189,448 ns/op
SerializeBenchmark.raw avgt 5 3463,902 ± 141,145 ns/op
Almost 3 times difference in performance. And the larger I make the source list the bigger performance difference. Why?
I expected some performance difference but not 3 times.
I re-ran your test with a bit more iterations running under the hs_gc profile.
These are the results:
[info] Benchmark Mode Cnt Score Error Units
[info] IteratorFlatten.flattenerd avgt 50 0.708 â–’ 0.120 us/op
[info] IteratorFlatten.flattenerd:â•–sun.gc.collector.0.invocations avgt 50 8.840 â–’ 2.259 ?
[info] IteratorFlatten.raw avgt 50 0.367 â–’ 0.014 us/op
[info] IteratorFlatten.raw:â•–sun.gc.collector.0.invocations avgt 50 0 ?
IteratorFlatten.flattenerd had an average of 8 GC cycles during the test runs, where raw had 0. This means that because of the noise generated by the allocation by FlattenOps (the wrapper class and it's method, particularly hasNext which allocates an iterator per list), which is what is needed in order to provide the flatten method on Iterator, we suffer in running time.
If I re-run the test and give it a minimum heap size of 2G, the results get closer:
[info] Benchmark Mode Cnt Score Error Units
[info] IteratorFlatten.flattenerd avgt 50 0.615 â–’ 0.041 us/op
[info] IteratorFlatten.raw avgt 50 0.434 â–’ 0.064 us/op
The gist of it is, the more you allocate, the more work the GC has to do, more pauses, slower execution.
Note that these kind of micro benchmarks are very fragile and may yield different results. Make sure you measure enough allocations for the stats to become significant.

Why is indexing a large matrix 170x slower slower in Julia 0.5.0 than 0.4.7?

Indexing large matrixes seems to be taking FAR longer in 0.5 and 0.6 than 0.4.7.
For instance:
x = rand(10,10,100,4,4,1000) #Dummy array
tic()
r = squeeze(mean(x[:,:,1:80,:,:,56:800],(1,2,3,4,5)),(1,2,3,4,5))
toc()
Julia 0.5.0 -> elapsed time: 176.357068283 seconds
Julia 0.4.7 -> elapsed time: 1.19991952 seconds
Edit: as per requested, I've updated the benchmark to use BenchmarkTools.jl and wrap the code in a function:
using BenchmarkTools
function testf(x)
r = squeeze(mean(x[:,:,1:80,:,:,56:800],(1,2,3,4,5)),(1,2,3,4,5));
end
x = rand(10,10,100,4,4,1000) #Dummy array
#benchmark testf(x)
In 0.5.0 I get the following (with huge memory usage):
BenchmarkTools.Trial:
samples: 1
evals/sample: 1
time tolerance: 5.00%
memory tolerance: 1.00%
memory estimate: 23.36 gb
allocs estimate: 1043200022
minimum time: 177.94 s (1.34% GC)
median time: 177.94 s (1.34% GC)
mean time: 177.94 s (1.34% GC)
maximum time: 177.94 s (1.34% GC)
In 0.4.7 I get:
BenchmarkTools.Trial:
samples: 11
evals/sample: 1
time tolerance: 5.00%
memory tolerance: 1.00%
memory estimate: 727.55 mb
allocs estimate: 79
minimum time: 425.82 ms (0.06% GC)
median time: 485.95 ms (11.31% GC)
mean time: 482.67 ms (10.37% GC)
maximum time: 503.27 ms (11.22% GC)
Edit: Updated to use sub in 0.4.7 and view in 0.5.0
using BenchmarkTools
function testf(x)
r = mean(sub(x, :, :, 1:80, :, :, 56:800));
end
x = rand(10,10,100,4,4,1000) #Dummy array
#benchmark testf(x)
In 0.5.0 it ran for >20 mins and gave:
BenchmarkTools.Trial:
samples: 1
evals/sample: 1
time tolerance: 5.00%
memory tolerance: 1.00%
memory estimate: 53.75 gb
allocs estimate: 2271872022
minimum time: 407.64 s (1.32% GC)
median time: 407.64 s (1.32% GC)
mean time: 407.64 s (1.32% GC)
maximum time: 407.64 s (1.32% GC)
In 0.4.7 I get:
BenchmarkTools.Trial:
samples: 5
evals/sample: 1
time tolerance: 5.00%
memory tolerance: 1.00%
memory estimate: 1.28 kb
allocs estimate: 34
minimum time: 1.15 s (0.00% GC)
median time: 1.16 s (0.00% GC)
mean time: 1.16 s (0.00% GC)
maximum time: 1.18 s (0.00% GC)
This seems repeatable on other machines, so an issue has been opened: https://github.com/JuliaLang/julia/issues/19174
EDIT 17 March 2017 This regression is fixed in Julia v0.6.0. The discussion still applies if using older versions of Julia.
Try running this crude script in both Julia v0.4.7 and v0.5.0 (change sub to view):
using BenchmarkTools
function testf()
# set seed
srand(2016)
# test array
x = rand(10,10,100,4,4,1000)
# extract array view
y = sub(x, :, :, 1:80, :, :, 56:800) # julia v0.4
#y = view(x, :, :, 1:80, :, :, 56:800) # julia v0.5
# wrap mean(y) into a function
z() = mean(y)
# benchmark array mean
#time z()
#time z()
end
testf()
My machine:
julia> versioninfo()
Julia Version 0.4.7
Commit ae26b25 (2016-09-18 16:17 UTC)
Platform Info:
System: Darwin (x86_64-apple-darwin13.4.0)
CPU: Intel(R) Core(TM) i7-4870HQ CPU # 2.50GHz
WORD_SIZE: 64
BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
LAPACK: libopenblas64_
LIBM: libopenlibm
LLVM: libLLVM-3.3
My output, Julia v0.4.7:
1.314966 seconds (246.43 k allocations: 11.589 MB)
1.017073 seconds (1 allocation: 16 bytes)
My output, Julia v0.5.0:
417.608056 seconds (2.27 G allocations: 53.749 GB, 0.75% gc time)
410.918933 seconds (2.27 G allocations: 53.747 GB, 0.72% gc time)
It would seem that you may have discovered a performance regression. Consider filing an issue.

Resources