How to find memory and runtime used by a NuSMV model - performance

Given a NuSMV model, how to find its runtime and how much memory it consumed?
So the runtime can be found using this command at system prompt: /usr/bin/time -f "time %e s" NuSMV filename.smv
The above gives the wall-clock time. Is there a better way to obtain runtime statistics from within NuSMV itself?
Also how to find out how much RAM memory the program used during its processing of the file?

One possibility is to use the usage command, which displays both the amount of RAM currently being used, as well as the User and the System time used by the tool since when it was started (thus, usage should be called both before and after each operation which you want to profile).
An example execution:
NuSMV > usage
Runtime Statistics
------------------
Machine name: *****
User time 0.005 seconds
System time 0.005 seconds
Average resident text size = 0K
Average resident data+stack size = 0K
Maximum resident size = 6932K
Virtual text size = 8139K
Virtual data size = 34089K
data size initialized = 3424K
data size uninitialized = 178K
data size sbrk = 30487K
Virtual memory limit = -2147483648K (-2147483648K)
Major page faults = 0
Minor page faults = 2607
Swaps = 0
Input blocks = 0
Output blocks = 0
Context switch (voluntary) = 9
Context switch (involuntary) = 0
NuSMV > reset; read_model -i nusmvLab.2018.06.07.smv ; go ; check_property ; usage
-- specification (L6 != pc U cc = len) IN mm is true
-- specification F (min = 2 & max = 9) IN mm is true
-- specification G !((((max > arr[0] & max > arr[1]) & max > arr[2]) & max > arr[3]) & max > arr[4]) IN mm is true
-- invariant max >= min IN mm is true
Runtime Statistics
------------------
Machine name: *****
User time 47.214 seconds
System time 0.284 seconds
Average resident text size = 0K
Average resident data+stack size = 0K
Maximum resident size = 270714K
Virtual text size = 8139K
Virtual data size = 435321K
data size initialized = 3424K
data size uninitialized = 178K
data size sbrk = 431719K
Virtual memory limit = -2147483648K (-2147483648K)
Major page faults = 1
Minor page faults = 189666
Swaps = 0
Input blocks = 48
Output blocks = 0
Context switch (voluntary) = 12
Context switch (involuntary) = 145

Related

High CPU load on SYN flood

When being under SYN flood attack, my CPU reach to 100% in no time by the kernel proccess named ksoftirqd,
I tried so many mitigations but none solve the problem.
This is my sysctl configurations returned by the sysctl -p:
net.ipv4.tcp_syncookies = 1
net.ipv4.ip_forward = 0
net.ipv4.conf.all.accept_redirects = 0
net.ipv4.conf.default.accept_redirects = 0
net.ipv6.conf.all.accept_redirects = 0
net.ipv4.conf.all.send_redirects = 0
net.ipv4.conf.default.send_redirects = 0
fs.file-max = 10000000
fs.nr_open = 10000000
net.core.somaxconn = 128
net.core.netdev_max_backlog = 2500
net.ipv4.ip_local_port_range = 1024 65535
net.ipv4.ip_nonlocal_bind = 1
net.ipv4.tcp_fin_timeout = 10
net.ipv4.tcp_keepalive_time = 300
net.ipv4.tcp_max_orphans = 262144
net.ipv4.tcp_max_syn_backlog = 2048
net.ipv4.tcp_max_tw_buckets = 262144
net.ipv4.tcp_reordering = 3
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 16384 16777216
net.ipv4.tcp_syn_retries = 3
net.ipv4.tcp_tw_reuse = 1
net.netfilter.nf_conntrack_max = 10485760
net.netfilter.nf_conntrack_tcp_timeout_fin_wait = 30
net.netfilter.nf_conntrack_tcp_timeout_time_wait = 15
vm.swappiness = 10
net.ipv4.icmp_echo_ignore_all = 1
net.ipv4.icmp_ignore_bogus_error_responses = 1
net.ipv4.tcp_synack_retries = 1
Even after activating the Syn cookies, the CPU stays the same,
The Listen queue of port 443 (the port under attack) is showing 512 SYN_RECV, which is the default backlog limit set by the NGINX.
Which is also wired because the SOMAXCONN is set to a much lower value than 512 (128), so how does it exceed that limit?
SOMAXCONN needs to be the upper boundary for every socket listen and its not..
I read so much and I'm confused,
As far as I understood the SOMAXCONN is the backlog size for both LISTEN and ACCECPT queues,
so what exactly is the tcp_max_syn_backlog?
And how do I calculate each queue size?
I also read that SYN cookies does not activate immediately, but only after reaching the tcp_max_syn_backlog size, is that true?
And if so, it means its value needs to be lower than the SOMAXCONN..
I tried even activating tcp_abort_on_overflow when being under attack but nothing changed,
if its true that the SYN coockies is activate on overflow, applying them togerther result what?
I have 3 gigs of ram that is using only 700MB, my only problem is the CPU load

JCS 1.3 - pre-load cache from disk

I am using Indexed disk cache and JCS 1.3. When I restart, the JCS cache does not seem to pre-load data, instead it does lazy initialization of the cache.
On startup, the stats are as below:
Region Name = triplet_set_1
HitCountRam = 0
HitCountAux = 0
---------------------------LRU Memory Cache
List Size = 0
Map Size = 0
Put Count = 0
Hit Count = 0
Miss Count = 0
---------------------------Indexed Disk Cache
Is Alive = true
Key Map Size = 138832
Data File Length = 72470304
Hit Count = 0
Bytes Free = 0
Optimize Operation Count = 1
Times Optimized = 0
Recycle Count = 0
Recycle Bin Size = 0
Startup Size = 138832
Purgatory Hits = 0
Purgatory Size = 0
Working = true
Alive = false
Empty = true
Size = 0
Region Name = triplet_set_1
HitCountRam = 200
HitCountAux = 100
I was hoping to see a high map size given the fact that data file length is significant.
Thanks a lot

golang slice allocation performance

I stumbled upon an interesting thing while checking performance of memory allocation in GO.
package main
import (
"fmt"
"time"
)
func main(){
const alloc int = 65536
now := time.Now()
loop := 50000
for i := 0; i<loop;i++{
sl := make([]byte, alloc)
i += len(sl) * 0
}
elpased := time.Since(now)
fmt.Printf("took %s to allocate %d bytes %d times", elpased, alloc, loop)
}
I am running this on a Core-i7 2600 with go version 1.6 64bit (also same results on 32bit) and 16GB of RAM (on WINDOWS 10)
so when alloc is 65536 (exactly 64K) it runs for 30 seconds (!!!!).
When alloc is 65535 it takes ~200ms.
Can someone explain this to me please?
I tried the same code at home with my core i7-920 # 3.8GHZ but it didn't show same results (both took around 200ms). Anyone has an idea what's going on?
Setting GOGC=off improved performance (down to less than 100ms). Why?
becaue of escape analysis. When you build with go build -gcflags -m the compiler prints whatever allocations escapes to heap. It really depends on your machine and GO compiler version but when the compiler decides that the allocation should move to heap it means 2 things:
1. the allocation will take longer (since "allocating" on the stack is just 1 cpu instruction)
2. the GC will have to clean up that memory later - costing more CPU time
for my machine, the allocation of 65536 bytes escapes to heap and 65535 doesn't.
that's why 1 bytes changed the whole proccess from 200ms to 30s. Amazing..
Note/Update 2021: as Tapir Liui notes in Go101 with this tweet:
As of Go 1.17, Go runtime will allocate the elements of slice x on stack if the compiler proves they are only used in the current goroutine and N <= 64KB:
var x = make([]byte, N)
And Go runtime will allocate the array y on stack if the compiler proves it is only used in the current goroutine and N <= 10MB:
var y [N]byte
Then how to allocated (the elements of) a slice which size is larger than 64KB but not larger than 10MB on stack (and the slice is only used in one goroutine)?
Just use the following way:
var y [N]byte
var x = y[:]
Considering stack allocation is faster than heap allocation, that would have a direct effect on your test, for alloc equals to 65536 and more.
Tapir adds:
In fact, we could allocate slices with arbitrary sum element sizes on stack.
const N = 500 * 1024 * 1024 // 500M
var v byte = 123
func createSlice() byte {
var s = []byte{N: 0}
for i := range s { s[i] = v }
return s[v]
}
Changing 500 to 512 make program crash.
the reason is very simple.
const alloc int = 65535
0x0000 00000 (example.go:8) TEXT "".main(SB), ABIInternal, $65784-0
const alloc int = 65536
0x0000 00000 (example.go:8) TEXT "".main(SB), ABIInternal, $248-0
the difference is where the slice are created.

Maximum value of PCR

What is the maximum value of Program Clock Reference(PCR) in MPEG?
I understand that it is derived from a 27MHz clock, periodically loaded into a 42bit register.
PCR(i)=PCR_Base(i) * 300 + PCR_Ext(i)
where PCR_Base is loaded into a 33 bits register
PCR_Ext is loaded into a 9-bit register.
So, the maximum value of PCR w.r.t 27MHz clock is:
PCR = (2^33 - 1)*300 + (2^9 - 1) = 2,576,980,374,811.
=> (2,576,980,374,811/27,000,000) = 95443.7s = 1590.7 min = 26.5 hours
The register overflow happens after 26.5 hours of continuous streaming. Is this understanding correct?
PCR_ext(i) value should be 0 .. 299.
So the maximum PCR = (2^33-1)*300+299 = 2,576,980,377,599

Program execution taking almost same usertime on CPU as well as GPU?

The program for finding prime numbers using OpenCL 1.1 gave the following benchmarks :
Device : CPU
Realtime : approx. 3 sec
Usertime : approx. 32 sec
Device : GPU
Realtime - approx. 37 sec
Usertime - approx. 32 sec
Why is the usertime of execution by GPU not less than that of CPU? Is data/task parallelization not occuring?
System specifications :64-bit CentOS 5.3 system with two ATI Radeon 5970 graphics card + Intel Core i7 processor(12 cores)
Your kernel is rather inefficient, I have an adjusted one below for you to consider. As to why it runs better on a cpu device...
Using your algorithm, the work items take varying amounts of time to execute. They will take longer as the numbers tested grow larger. A work group on a gpu will not finish until all of its items are finished some of the hardware will be left idle until the last item is done. On a cpu, it behaves more like a loop iterating over the kernel items, so the difference in cycles needed to compute each item won't drastically affect the performance.
'A' is not used by the kernel. It should not be copied unless it is used. It looks like you wanted to test the A[i] rather then 'i' itself though.
I think the gpu would be much better at FFT-based prime calculations, or even a sieve algorithm.
{
int t;
int i = get_global_id(0);
int end = sqrt(i);
if(i%2){
B[i] = 0;
}else{
B[i] = 1; //assuming only that it should be non-zero
}
for ( t = 3; (t<=end)&&(B[i] > 0) ; t+=2 ) {
if ( i % t == 0 ) {
B[ i ] = 0;
}
}
}

Resources