.gvs (GuideView openmp statistics) file format - performance

Is there a format of *.gvs files, used by GuideView OpenMP performance analyser?
The "guide.gvs" is generated, f.e. by intel's OpenMP'ed programmes with
$ export LD_PRELOAD=<path_to_icc_or_redist>/lib/libiompprof5.so
$ ./openmp_parallelized_prog
$ ls -l guide.gvs

It s a plain text.
Here is an example of such from very short omp programme:
$ cat guide.gvs
*** KAI statistics library k3301
*** Begin Task 0
Environment variables:
OMP_NUM_THREADS : 2
OMP_SCHEDULE : static
OMP_DYNAMIC : FALSE
OMP_NESTED : FALSE
KMP_STATSFILE : guide.gvs
KMP_STATSCOLS : 80
KMP_INTERVAL : 0
KMP_BLOCKTIME : 200
KMP_PARALLEL : 2
KMP_STACKSIZE : 2097152
KMP_STACKOFFSET : 0
KMP_SCHEDULING : <unknown>
KMP_CHUNK : <unknown>
KMP_LIBRARY : throughput
end
System parameters:
start : Wed Nov 1 12:26:52 2010
stop : Wed Nov 1 12:26:52 2010
host : localhost
ncpu : 2
end
Unix process parameters:
maxrss : 0
minflt : 440
majflt : 2
nswap : 0
inblock : 208
oublock : 0
nvcsw : 6
nivcsw : 7
end
Region counts:
serial regions : 2
barrier regions : 0
parallel regions : 1
end
Program execution time (in seconds):
cpu : 0.00 sec
elapsed : 0.04 sec
serial : 0.00 sec
parallel : 0.04 sec
cpu percent : 0.01 %
end
Summary over all regions (has 2 threads):
# Thread #0 #1
Sum Parallel : 0.036 0.027
Sum Imbalance : 0.035 0.026
Min Parallel : 0.036 0.027
Min Imbalance : 0.035 0.026
Max Parallel : 0.036 0.027
Max Imbalance : 0.035 0.026
end
Region #1 (has 2 threads) at main/9 in "/home/user/icc/omp.c":
# Thread #0 #1
Sum Parallel : 0.036 0.027
Sum Imbalance : 0.035 0.026
Min Parallel : 0.036 0.027
Min Imbalance : 0.035 0.026
Max Parallel : 0.036 0.027
Max Imbalance : 0.035 0.026
end
Region #1 (has 2 threads) profile:
# Thread Incl Excl Routine
0,0 : 0.000 0.000 main/9 "/home/user/icc/omp.c"
1,0 : 0.000 0.000 main/9 "/home/user/icc/omp.c"
end
Serial program regions:
Serial region #1 executes for 0.00 seconds
begins at START OF PROGRAM
ends before region #1 (using 2 threads) at main/9 in "/home/user/icc/omp.c"
Serial region #2 executes for 0.00 seconds
begins after region #1 (using 2 threads) at main/9 in "/home/user/icc/omp.c"
ends at END OF PROGRAM
end
Serial region #1 profile:
# Thread Incl Excl Routine
end
Serial region #2 profile:
# Thread Incl Excl Routine
end
Program events (total):
# Thread #0 #1
mppbeg : 1 0
mppend : 1 0
serial : 2 0
mppfkd : 1 0
mppfrk : 1 0
mppjoi : 1 0
mppadj : 1 0
mpptid : 51 50
end
Region #1 (has 2 threads) events:
# Thread #0 #1
mppfrk : 1 0
mppjoi : 1 0
mpptid : 50 50
end
Serial section events:
# Serial #1 #2
mppbeg : 1 0
mppend : 0 1
serial : 1 1
mppfkd : 1 0
mppadj : 1 0
mpptid : 1 0
end
*** end

Related

Why are there extra equations when using extra neurons in GEKKO?

I'm a bit puzzled on why there are extra equations introduced in the optimization problem (using GEKKO) when increasing the amount of neurons in an ANN that e.g. is used within the objective function or in the constraints. I was hoping the find the answer in this paper, but I can't seem to pinpoint the reason.
This is the log of a baseline example I made, using 2 Gekko_NN_SKlearn functions.
----------------------------------------------------------------
APMonitor, Version 1.0.0
APMonitor Optimization Suite
----------------------------------------------------------------
Called files( 55 )
files: overrides.dbs does not exist
Run id : 2022y11m03d15h47m50.642s
COMMAND LINE ARGUMENTS
coldstart: 0
imode : 3
dbs_read : T
dbs_write: T
specs : T
rto selected
Called files( 35 )
READ info FILE FOR VARIABLE DEFINITION: gk_model0.info
SS MODEL INIT 0
Parsing model file gk_model0.apm
Read model file (sec): 1.1181
Initialize constants (sec): 0.
Determine model size (sec): 0.7627999999999999
Allocate memory (sec): 0.0049000000000001265
Parse and store model (sec): 0.7630000000000001
--------- APM Model Size ------------
Each time step contains
Objects : 247
Constants : 0
Variables : 752
Intermediates: 249
Connections : 741
Equations : 745
Residuals : 496
Error checking (sec): 0.2740999999999998
Compile equations (sec): 2.8513
Check for uninitialized intermediates (sec): 0.
------------------------------------------------------
Total Parse Time (sec): 5.7744
SS MODEL INIT 1
SS MODEL INIT 2
SS MODEL INIT 3
SS MODEL INIT 4
Called files( 31 )
READ info FILE FOR PROBLEM DEFINITION: gk_model0.info
Called files( 6 )
Files(6): File Read rto.t0 F
files: rto.t0 does not exist
Called files( 51 )
Read DBS File defaults.dbs
files: defaults.dbs does not exist
Called files( 51 )
Read DBS File gk_model0.dbs
files: gk_model0.dbs does not exist
Called files( 51 )
Read DBS File measurements.dbs
Called files( 51 )
Read DBS File overrides.dbs
files: overrides.dbs does not exist
Number of state variables: 1240
Number of total equations: - 989
Number of slack variables: - 0
---------------------------------------
Degrees of freedom : 251
----------------------------------------------
Steady State Optimization with APOPT Solver
----------------------------------------------
Iter: 1 I: 0 Tm: 2.23 NLPi: 67 Dpth: 0 Lvs: 3 Obj: 6.96E-02 Gap: NaN
--Integer Solution: 1.78E-01 Lowest Leaf: 6.96E-02 Gap: 1.08E-01
Iter: 2 I: 0 Tm: 0.13 NLPi: 6 Dpth: 1 Lvs: 2 Obj: 1.78E-01 Gap: 1.08E-01
Iter: 3 I: 0 Tm: 0.33 NLPi: 5 Dpth: 1 Lvs: 2 Obj: 1.74E-01 Gap: 1.08E-01
Iter: 4 I: 0 Tm: 0.52 NLPi: 11 Dpth: 1 Lvs: 3 Obj: 9.49E-02 Gap: 1.08E-01
--Integer Solution: 1.78E-01 Lowest Leaf: 9.49E-02 Gap: 8.27E-02
Iter: 5 I: 0 Tm: 0.28 NLPi: 5 Dpth: 2 Lvs: 2 Obj: 1.04E+00 Gap: 8.27E-02
--Integer Solution: 1.23E-01 Lowest Leaf: 1.23E-01 Gap: 0.00E+00
Iter: 6 I: 0 Tm: 0.30 NLPi: 5 Dpth: 2 Lvs: 2 Obj: 1.23E-01 Gap: 0.00E+00
Successful solution
---------------------------------------------------
Solver : APOPT (v1.0)
Solution time : 3.842599999999999 sec
Objective : 0.12267384658102941
Successful solution
---------------------------------------------------
Called files( 2 )
Called files( 52 )
WRITE dbs FILE
Called files( 56 )
WRITE json FILE
Timer # 1 10.41/ 1 = 10.41 Total system time
Timer # 2 3.84/ 1 = 3.84 Total solve time
Timer # 3 0.02/ 191 = 0.00 Objective Calc: apm_p
Timer # 4 0.01/ 99 = 0.00 Objective Grad: apm_g
Timer # 5 0.03/ 191 = 0.00 Constraint Calc: apm_c
Timer # 6 0.00/ 0 = 0.00 Sparsity: apm_s
Timer # 7 0.00/ 0 = 0.00 1st Deriv #1: apm_a1
Timer # 8 0.00/ 99 = 0.00 1st Deriv #2: apm_a2
Timer # 9 0.67/ 1 = 0.67 Custom Init: apm_custom_init
Timer # 10 0.01/ 1 = 0.01 Mode: apm_node_res::case 0
Timer # 11 0.00/ 1 = 0.00 Mode: apm_node_res::case 1
Timer # 12 0.02/ 1 = 0.02 Mode: apm_node_res::case 2
Timer # 13 0.00/ 1 = 0.00 Mode: apm_node_res::case 3
Timer # 14 0.35/ 387 = 0.00 Mode: apm_node_res::case 4
Timer # 15 1.51/ 198 = 0.01 Mode: apm_node_res::case 5
Timer # 16 0.00/ 0 = 0.00 Mode: apm_node_res::case 6
Timer # 17 0.01/ 99 = 0.00 Base 1st Deriv: apm_jacobian
Timer # 18 0.00/ 99 = 0.00 Base 1st Deriv: apm_condensed_jacobian
Timer # 19 0.00/ 1 = 0.00 Non-zeros: apm_nnz
Timer # 20 0.00/ 0 = 0.00 Count: Division by zero
Timer # 21 0.00/ 0 = 0.00 Count: Argument of LOG10 negative
Timer # 22 0.00/ 0 = 0.00 Count: Argument of LOG negative
Timer # 23 0.00/ 0 = 0.00 Count: Argument of SQRT negative
Timer # 24 0.00/ 0 = 0.00 Count: Argument of ASIN illegal
Timer # 25 0.00/ 0 = 0.00 Count: Argument of ACOS illegal
Timer # 26 0.00/ 1 = 0.00 Extract sparsity: apm_sparsity
Timer # 27 0.00/ 13 = 0.00 Variable ordering: apm_var_order
Timer # 28 0.00/ 1 = 0.00 Condensed sparsity
Timer # 29 0.00/ 0 = 0.00 Hessian Non-zeros
Timer # 30 0.00/ 1 = 0.00 Differentials
Timer # 31 0.00/ 0 = 0.00 Hessian Calculation
Timer # 32 0.00/ 0 = 0.00 Extract Hessian
Timer # 33 0.00/ 1 = 0.00 Base 1st Deriv: apm_jac_order
Timer # 34 0.02/ 1 = 0.02 Solver Setup
Timer # 35 1.87/ 1 = 1.87 Solver Solution
Timer # 36 0.00/ 202 = 0.00 Number of Variables
Timer # 37 0.00/ 105 = 0.00 Number of Equations
Timer # 38 0.02/ 14 = 0.00 File Read/Write
Timer # 39 0.00/ 0 = 0.00 Dynamic Init A
Timer # 40 0.00/ 0 = 0.00 Dynamic Init B
Timer # 41 0.00/ 0 = 0.00 Dynamic Init C
Timer # 42 1.12/ 1 = 1.12 Init: Read APM File
Timer # 43 0.00/ 1 = 0.00 Init: Parse Constants
Timer # 44 0.76/ 1 = 0.76 Init: Model Sizing
Timer # 45 0.00/ 1 = 0.00 Init: Allocate Memory
Timer # 46 0.76/ 1 = 0.76 Init: Parse Model
Timer # 47 0.27/ 1 = 0.27 Init: Check for Duplicates
Timer # 48 2.85/ 1 = 2.85 Init: Compile Equations
Timer # 49 0.00/ 1 = 0.00 Init: Check Uninitialized
Timer # 50 0.00/ 1257 = 0.00 Evaluate Expression Once
Timer # 51 0.00/ 0 = 0.00 Sensitivity Analysis: LU Factorization
Timer # 52 0.00/ 0 = 0.00 Sensitivity Analysis: Gauss Elimination
Timer # 53 0.00/ 0 = 0.00 Sensitivity Analysis: Total Time
When I change the amount of neurons of 1 function from [25,20,20,10] to [50,40,40,40] I get the following log:
----------------------------------------------------------------
APMonitor, Version 1.0.0
APMonitor Optimization Suite
----------------------------------------------------------------
Called files( 55 )
files: overrides.dbs does not exist
Run id : 2023y02m04d11h08m15.999s
COMMAND LINE ARGUMENTS
coldstart: 0
imode : 3
dbs_read : T
dbs_write: T
specs : T
rto selected
Called files( 35 )
READ info FILE FOR VARIABLE DEFINITION: gk_model0.info
SS MODEL INIT 0
Parsing model file gk_model0.apm
Read model file (sec): 1.4879
Initialize constants (sec): 0.
Determine model size (sec): 0.9460000000000002
Allocate memory (sec): 0.01529999999999987
Parse and store model (sec): 0.7256999999999998
--------- APM Model Size ------------
Each time step contains
Objects : 342
Constants : 0
Variables : 1037
Intermediates: 344
Connections : 1026
Equations : 1030
Residuals : 686
Error checking (sec): 0.2522000000000002
Compile equations (sec): 2.8817999999999997
Check for uninitialized intermediates (sec): 0.
------------------------------------------------------
Total Parse Time (sec): 6.3089
SS MODEL INIT 1
SS MODEL INIT 2
SS MODEL INIT 3
SS MODEL INIT 4
Called files( 31 )
READ info FILE FOR PROBLEM DEFINITION: gk_model0.info
Called files( 6 )
Files(6): File Read rto.t0 F
files: rto.t0 does not exist
Called files( 51 )
Read DBS File defaults.dbs
files: defaults.dbs does not exist
Called files( 51 )
Read DBS File gk_model0.dbs
files: gk_model0.dbs does not exist
Called files( 51 )
Read DBS File measurements.dbs
Called files( 51 )
Read DBS File overrides.dbs
files: overrides.dbs does not exist
Number of state variables: 1715
Number of total equations: - 1369
Number of slack variables: - 0
---------------------------------------
Degrees of freedom : 346
----------------------------------------------
Steady State Optimization with APOPT Solver
----------------------------------------------
Iter: 1 I: 0 Tm: 2.84 NLPi: 53 Dpth: 0 Lvs: 3 Obj: 2.69E-01 Gap: NaN
--Integer Solution: 4.43E-01 Lowest Leaf: 2.69E-01 Gap: 1.74E-01
Iter: 2 I: 0 Tm: 0.14 NLPi: 5 Dpth: 1 Lvs: 2 Obj: 4.43E-01 Gap: 1.74E-01
Iter: 3 I: 0 Tm: 0.55 NLPi: 6 Dpth: 1 Lvs: 2 Obj: 3.79E-01 Gap: 1.74E-01
Iter: 4 I: 0 Tm: 0.96 NLPi: 12 Dpth: 1 Lvs: 3 Obj: 4.17E-01 Gap: 1.74E-01
--Integer Solution: 4.43E-01 Lowest Leaf: 4.17E-01 Gap: 2.62E-02
Iter: 5 I: 0 Tm: 0.56 NLPi: 7 Dpth: 2 Lvs: 2 Obj: 1.18E+00 Gap: 2.62E-02
--Integer Solution: 4.43E-01 Lowest Leaf: 4.17E-01 Gap: 2.62E-02
Iter: 6 I: 0 Tm: 0.36 NLPi: 4 Dpth: 2 Lvs: 1 Obj: 1.04E+00 Gap: 2.62E-02
--Integer Solution: 4.43E-01 Lowest Leaf: 5.39E-01 Gap: -9.53E-02
Iter: 7 I: 0 Tm: 0.36 NLPi: 4 Dpth: 2 Lvs: 1 Obj: 5.39E-01 Gap: -9.53E-02
Successful solution
---------------------------------------------------
Solver : APOPT (v1.0)
Solution time : 5.816599999999999 sec
Objective : 0.4433267264972657
Successful solution
---------------------------------------------------
Called files( 2 )
Called files( 52 )
WRITE dbs FILE
Called files( 56 )
WRITE json FILE
Timer # 1 13.21/ 1 = 13.21 Total system time
Timer # 2 5.82/ 1 = 5.82 Total solve time
Timer # 3 0.02/ 189 = 0.00 Objective Calc: apm_p
Timer # 4 0.03/ 91 = 0.00 Objective Grad: apm_g
Timer # 5 0.03/ 189 = 0.00 Constraint Calc: apm_c
Timer # 6 0.00/ 0 = 0.00 Sparsity: apm_s
Timer # 7 0.00/ 0 = 0.00 1st Deriv #1: apm_a1
Timer # 8 0.00/ 91 = 0.00 1st Deriv #2: apm_a2
Timer # 9 0.90/ 1 = 0.90 Custom Init: apm_custom_init
Timer # 10 0.01/ 1 = 0.01 Mode: apm_node_res::case 0
Timer # 11 0.01/ 1 = 0.01 Mode: apm_node_res::case 1
Timer # 12 0.03/ 1 = 0.03 Mode: apm_node_res::case 2
Timer # 13 0.00/ 1 = 0.00 Mode: apm_node_res::case 3
Timer # 14 0.33/ 383 = 0.00 Mode: apm_node_res::case 4
Timer # 15 2.01/ 182 = 0.01 Mode: apm_node_res::case 5
Timer # 16 0.00/ 0 = 0.00 Mode: apm_node_res::case 6
Timer # 17 0.06/ 91 = 0.00 Base 1st Deriv: apm_jacobian
Timer # 18 0.02/ 91 = 0.00 Base 1st Deriv: apm_condensed_jacobian
Timer # 19 0.00/ 1 = 0.00 Non-zeros: apm_nnz
Timer # 20 0.00/ 0 = 0.00 Count: Division by zero
Timer # 21 0.00/ 0 = 0.00 Count: Argument of LOG10 negative
Timer # 22 0.00/ 0 = 0.00 Count: Argument of LOG negative
Timer # 23 0.00/ 0 = 0.00 Count: Argument of SQRT negative
Timer # 24 0.00/ 0 = 0.00 Count: Argument of ASIN illegal
Timer # 25 0.00/ 0 = 0.00 Count: Argument of ACOS illegal
Timer # 26 0.00/ 1 = 0.00 Extract sparsity: apm_sparsity
Timer # 27 0.00/ 13 = 0.00 Variable ordering: apm_var_order
Timer # 28 0.00/ 1 = 0.00 Condensed sparsity
Timer # 29 0.00/ 0 = 0.00 Hessian Non-zeros
Timer # 30 0.00/ 1 = 0.00 Differentials
Timer # 31 0.00/ 0 = 0.00 Hessian Calculation
Timer # 32 0.00/ 0 = 0.00 Extract Hessian
Timer # 33 0.00/ 1 = 0.00 Base 1st Deriv: apm_jac_order
Timer # 34 0.02/ 1 = 0.02 Solver Setup
Timer # 35 3.26/ 1 = 3.26 Solver Solution
Timer # 36 0.00/ 200 = 0.00 Number of Variables
Timer # 37 0.02/ 97 = 0.00 Number of Equations
Timer # 38 0.03/ 14 = 0.00 File Read/Write
Timer # 39 0.00/ 0 = 0.00 Dynamic Init A
Timer # 40 0.00/ 0 = 0.00 Dynamic Init B
Timer # 41 0.00/ 0 = 0.00 Dynamic Init C
Timer # 42 1.49/ 1 = 1.49 Init: Read APM File
Timer # 43 0.00/ 1 = 0.00 Init: Parse Constants
Timer # 44 0.95/ 1 = 0.95 Init: Model Sizing
Timer # 45 0.02/ 1 = 0.02 Init: Allocate Memory
Timer # 46 0.73/ 1 = 0.73 Init: Parse Model
Timer # 47 0.25/ 1 = 0.25 Init: Check for Duplicates
Timer # 48 2.88/ 1 = 2.88 Init: Compile Equations
Timer # 49 0.00/ 1 = 0.00 Init: Check Uninitialized
Timer # 50 -0.00/ 1732 = -0.00 Evaluate Expression Once
Timer # 51 0.00/ 0 = 0.00 Sensitivity Analysis: LU Factorization
Timer # 52 0.00/ 0 = 0.00 Sensitivity Analysis: Gauss Elimination
Timer # 53 0.00/ 0 = 0.00 Sensitivity Analysis: Total Time
Hence, a significant amount of extra objects, variables, intermediates, connections, equations and residuals are introduced.
Many thank in advance for your replies!
the paper talks about this a little in section 3.1.3.
The prediction functions used in both TensorFlow and Scikit-learn neural networks use linear algebra to relate the layers and neurons of the neural network to one another. Each neuron in an input layer has a specific weight that corresponds to an output neuron in the following layer. Each neuron in an output layer also has a corresponding bias value as shown in Figure 1.
Figure 1
For each neuron connection, there is a bias, weight, and an additional activation function. An additional activation equation, such as the linear rectifier or hyperbolic tangent functions, is generally used to normalize the activation between 0 and 1.
As you increase the number of neurons from [25,20,20,10] to [50,40,40,40], a new weight, bias, and activation function is being introduced for each new neuron connection. These objects are represented by an increase in the number of equations, variables, etc... in the optimization problem.
Hopefully this helps!

Reduce total parse (total system) time in GEKKO

I have an RTO problem that I want to solve for multiple simulated timesteps with some time-depended parameters. However, I'm struggling with the run-time and noticed that the total system time is relatively large compared to the actual solve time. I was therefore trying to reduce the total parse time, as all the equations remain the same - yet "only" the values of some parameters change with time. A simple example below:
#parameters from simulation
demand = 100
#do RTO
from gekko import GEKKO
# first, create the model
m = GEKKO(remote=False)
# declare additional decision variables
m.u = m.Var(lb=5, ub=25)
m.v = m.Var(lb=0, ub=100)
m.w = m.Var(lb=0, ub=50)
m.b = m.Var(lb=0, ub=1, integer=True)
m.demand = m.Param(demand)
# now add the objective and the constraints
m.Minimize((1-0.8)*m.u*m.b+(1-0.9)*m.v+(1-0.7)*m.w)
m.Equation(m.u*m.b >= 10)
m.Equation(m.u*m.b + m.v + m.w == m.demand)
m.options.SOLVER=1
m.options.DIAGLEVEL = 1
m.solve()
then I capture the results, execute them in the simulation and move on to the next timestep. Now I could just, execute all code above again - with updated parameters (let's say the demand is now 110). But this results in the before mentioned long run-time (the RTO problem needs to be build from scratch every time, while only some parameters change). So I thought the following could work:
m.demand.VALUE = 110
m.solve()
While this does work. It doesn't seem to improve the run-time (total parse time is still relatively long). Below are the display outputs of the actual problem.
First time solving the RTO problem.
----------------------------------------------------------------
APMonitor, Version 1.0.0
APMonitor Optimization Suite
----------------------------------------------------------------
Called files( 55 )
files: overrides.dbs does not exist
Run id : 2022y11m03d13h18m21.919s
COMMAND LINE ARGUMENTS
coldstart: 0
imode : 3
dbs_read : T
dbs_write: T
specs : T
rto selected
Called files( 35 )
READ info FILE FOR VARIABLE DEFINITION: gk_model6.info
SS MODEL INIT 0
Parsing model file gk_model6.apm
Read model file (sec): 0.6602
Initialize constants (sec): 0.
Determine model size (sec): 0.4170999999999999
Allocate memory (sec): 0.
Parse and store model (sec): 0.45140000000000025
--------- APM Model Size ------------
Each time step contains
Objects : 247
Constants : 0
Variables : 752
Intermediates: 249
Connections : 741
Equations : 745
Residuals : 496
Error checking (sec): 0.17809999999999993
Compile equations (sec): 1.9933000000000003
Check for uninitialized intermediates (sec): 0.
------------------------------------------------------
Total Parse Time (sec): 3.7062
SS MODEL INIT 1
SS MODEL INIT 2
SS MODEL INIT 3
SS MODEL INIT 4
Called files( 31 )
READ info FILE FOR PROBLEM DEFINITION: gk_model6.info
Called files( 6 )
Files(6): File Read rto.t0 F
files: rto.t0 does not exist
Called files( 51 )
Read DBS File defaults.dbs
files: defaults.dbs does not exist
Called files( 51 )
Read DBS File gk_model6.dbs
files: gk_model6.dbs does not exist
Called files( 51 )
Read DBS File measurements.dbs
Called files( 51 )
Read DBS File overrides.dbs
files: overrides.dbs does not exist
Number of state variables: 1240
Number of total equations: - 989
Number of slack variables: - 0
---------------------------------------
Degrees of freedom : 251
----------------------------------------------
Steady State Optimization with APOPT Solver
----------------------------------------------
Iter: 1 I: 0 Tm: 1.20 NLPi: 45 Dpth: 0 Lvs: 3 Obj: 9.86E-02 Gap: NaN
--Integer Solution: 2.32E-01 Lowest Leaf: 9.86E-02 Gap: 1.34E-01
Iter: 2 I: 0 Tm: 0.06 NLPi: 4 Dpth: 1 Lvs: 2 Obj: 2.32E-01 Gap: 1.34E-01
Iter: 3 I: 0 Tm: 0.23 NLPi: 6 Dpth: 1 Lvs: 2 Obj: 2.16E-01 Gap: 1.34E-01
Iter: 4 I: 0 Tm: 0.44 NLPi: 12 Dpth: 1 Lvs: 3 Obj: 1.60E-01 Gap: 1.34E-01
--Integer Solution: 2.32E-01 Lowest Leaf: 1.60E-01 Gap: 7.21E-02
Iter: 5 I: 0 Tm: 0.20 NLPi: 6 Dpth: 2 Lvs: 2 Obj: 1.01E+00 Gap: 7.21E-02
--Integer Solution: 2.06E-01 Lowest Leaf: 2.06E-01 Gap: 0.00E+00
Iter: 6 I: 0 Tm: 0.20 NLPi: 5 Dpth: 2 Lvs: 2 Obj: 2.06E-01 Gap: 0.00E+00
Successful solution
---------------------------------------------------
Solver : APOPT (v1.0)
Solution time : 2.3522999999999996 sec
Objective : 0.20599966381706797
Successful solution
---------------------------------------------------
Called files( 2 )
Called files( 52 )
WRITE dbs FILE
Called files( 56 )
WRITE json FILE
Timer # 1 6.57/ 1 = 6.57 Total system time
Timer # 2 2.35/ 1 = 2.35 Total solve time
Timer # 3 0.01/ 156 = 0.00 Objective Calc: apm_p
Timer # 4 0.01/ 78 = 0.00 Objective Grad: apm_g
Timer # 5 0.01/ 156 = 0.00 Constraint Calc: apm_c
Timer # 6 0.00/ 0 = 0.00 Sparsity: apm_s
Timer # 7 0.00/ 0 = 0.00 1st Deriv #1: apm_a1
Timer # 8 0.01/ 78 = 0.00 1st Deriv #2: apm_a2
Timer # 9 0.42/ 1 = 0.42 Custom Init: apm_custom_init
Timer # 10 0.00/ 1 = 0.00 Mode: apm_node_res::case 0
Timer # 11 0.00/ 1 = 0.00 Mode: apm_node_res::case 1
Timer # 12 0.02/ 1 = 0.02 Mode: apm_node_res::case 2
Timer # 13 0.00/ 1 = 0.00 Mode: apm_node_res::case 3
Timer # 14 0.17/ 317 = 0.00 Mode: apm_node_res::case 4
Timer # 15 0.72/ 156 = 0.00 Mode: apm_node_res::case 5
Timer # 16 0.00/ 0 = 0.00 Mode: apm_node_res::case 6
Timer # 17 0.01/ 78 = 0.00 Base 1st Deriv: apm_jacobian
Timer # 18 0.00/ 78 = 0.00 Base 1st Deriv: apm_condensed_jacobian
Timer # 19 0.00/ 1 = 0.00 Non-zeros: apm_nnz
Timer # 20 0.00/ 0 = 0.00 Count: Division by zero
Timer # 21 0.00/ 0 = 0.00 Count: Argument of LOG10 negative
Timer # 22 0.00/ 0 = 0.00 Count: Argument of LOG negative
Timer # 23 0.00/ 0 = 0.00 Count: Argument of SQRT negative
Timer # 24 0.00/ 0 = 0.00 Count: Argument of ASIN illegal
Timer # 25 0.00/ 0 = 0.00 Count: Argument of ACOS illegal
Timer # 26 0.00/ 1 = 0.00 Extract sparsity: apm_sparsity
Timer # 27 0.00/ 13 = 0.00 Variable ordering: apm_var_order
Timer # 28 0.00/ 1 = 0.00 Condensed sparsity
Timer # 29 0.00/ 0 = 0.00 Hessian Non-zeros
Timer # 30 0.00/ 1 = 0.00 Differentials
Timer # 31 0.00/ 0 = 0.00 Hessian Calculation
Timer # 32 0.00/ 0 = 0.00 Extract Hessian
Timer # 33 0.00/ 1 = 0.00 Base 1st Deriv: apm_jac_order
Timer # 34 0.01/ 1 = 0.01 Solver Setup
Timer # 35 1.39/ 1 = 1.39 Solver Solution
Timer # 36 0.00/ 167 = 0.00 Number of Variables
Timer # 37 0.01/ 84 = 0.00 Number of Equations
Timer # 38 0.01/ 14 = 0.00 File Read/Write
Timer # 39 0.00/ 0 = 0.00 Dynamic Init A
Timer # 40 0.00/ 0 = 0.00 Dynamic Init B
Timer # 41 0.00/ 0 = 0.00 Dynamic Init C
Timer # 42 0.66/ 1 = 0.66 Init: Read APM File
Timer # 43 0.00/ 1 = 0.00 Init: Parse Constants
Timer # 44 0.42/ 1 = 0.42 Init: Model Sizing
Timer # 45 0.00/ 1 = 0.00 Init: Allocate Memory
Timer # 46 0.45/ 1 = 0.45 Init: Parse Model
Timer # 47 0.18/ 1 = 0.18 Init: Check for Duplicates
Timer # 48 1.99/ 1 = 1.99 Init: Compile Equations
Timer # 49 0.00/ 1 = 0.00 Init: Check Uninitialized
Timer # 50 0.01/ 1257 = 0.00 Evaluate Expression Once
Timer # 51 0.00/ 0 = 0.00 Sensitivity Analysis: LU Factorization
Timer # 52 0.00/ 0 = 0.00 Sensitivity Analysis: Gauss Elimination
Timer # 53 0.00/ 0 = 0.00 Sensitivity Analysis: Total Time
Updating one parameter and only calling m.solve() again - is shown in the simple problem above.
----------------------------------------------------------------
APMonitor, Version 1.0.0
APMonitor Optimization Suite
----------------------------------------------------------------
Called files( 55 )
Called files( 55 )
files: overrides.dbs does not exist
Run id : 2022y11m03d13h18m28.729s
COMMAND LINE ARGUMENTS
coldstart: 0
imode : 3
dbs_read : T
dbs_write: T
specs : T
rto selected
Called files( 35 )
READ info FILE FOR VARIABLE DEFINITION: gk_model6.info
SS MODEL INIT 0
Parsing model file gk_model6.apm
Read model file (sec): 0.6901
Initialize constants (sec): 0.
Determine model size (sec): 0.4546999999999999
Allocate memory (sec): 0.
Parse and store model (sec): 0.2824000000000002
--------- APM Model Size ------------
Each time step contains
Objects : 247
Constants : 0
Variables : 752
Intermediates: 249
Connections : 741
Equations : 745
Residuals : 496
Error checking (sec): 0.16720000000000002
Compile equations (sec): 2.0142999999999995
Check for uninitialized intermediates (sec): 0.
------------------------------------------------------
Total Parse Time (sec): 3.6097
SS MODEL INIT 1
SS MODEL INIT 2
SS MODEL INIT 3
SS MODEL INIT 4
Called files( 31 )
READ info FILE FOR PROBLEM DEFINITION: gk_model6.info
Called files( 6 )
Files(6): File Read rto.t0 T
Called files( 51 )
Read DBS File defaults.dbs
files: defaults.dbs does not exist
Called files( 51 )
Read DBS File gk_model6.dbs
Called files( 51 )
Read DBS File measurements.dbs
Called files( 51 )
Read DBS File overrides.dbs
files: overrides.dbs does not exist
Number of state variables: 1240
Number of total equations: - 989
Number of slack variables: - 0
---------------------------------------
Degrees of freedom : 251
----------------------------------------------
Steady State Optimization with APOPT Solver
----------------------------------------------
Iter: 1 I: 0 Tm: 0.26 NLPi: 7 Dpth: 0 Lvs: 3 Obj: 9.35E-02 Gap: NaN
--Integer Solution: 1.21E-01 Lowest Leaf: 9.35E-02 Gap: 2.71E-02
Iter: 2 I: 0 Tm: 0.22 NLPi: 5 Dpth: 1 Lvs: 2 Obj: 1.21E-01 Gap: 2.71E-02
--Integer Solution: 1.21E-01 Lowest Leaf: 9.35E-02 Gap: 2.71E-02
Iter: 3 I: 0 Tm: 0.32 NLPi: 10 Dpth: 1 Lvs: 1 Obj: 1.03E+00 Gap: 2.71E-02
Iter: 4 I: 0 Tm: 0.25 NLPi: 8 Dpth: 1 Lvs: 1 Obj: 1.20E-01 Gap: 2.71E-02
--Integer Solution: 1.21E-01 Lowest Leaf: 1.86E-01 Gap: -6.58E-02
Iter: 5 I: 0 Tm: 0.37 NLPi: 15 Dpth: 2 Lvs: 1 Obj: 1.86E-01 Gap: -6.58E-02
Successful solution
---------------------------------------------------
Solver : APOPT (v1.0)
Solution time : 1.4365000000000006 sec
Objective : 0.12065435497282542
Successful solution
---------------------------------------------------
Called files( 2 )
Called files( 52 )
WRITE dbs FILE
Called files( 56 )
WRITE json FILE
Timer # 1 5.64/ 1 = 5.64 Total system time
Timer # 2 1.44/ 1 = 1.44 Total solve time
Timer # 3 0.00/ 91 = 0.00 Objective Calc: apm_p
Timer # 4 0.00/ 45 = 0.00 Objective Grad: apm_g
Timer # 5 0.01/ 91 = 0.00 Constraint Calc: apm_c
Timer # 6 0.00/ 0 = 0.00 Sparsity: apm_s
Timer # 7 0.00/ 0 = 0.00 1st Deriv #1: apm_a1
Timer # 8 0.00/ 45 = 0.00 1st Deriv #2: apm_a2
Timer # 9 0.44/ 1 = 0.44 Custom Init: apm_custom_init
Timer # 10 0.00/ 1 = 0.00 Mode: apm_node_res::case 0
Timer # 11 0.00/ 1 = 0.00 Mode: apm_node_res::case 1
Timer # 12 0.01/ 1 = 0.01 Mode: apm_node_res::case 2
Timer # 13 0.00/ 1 = 0.00 Mode: apm_node_res::case 3
Timer # 14 0.10/ 187 = 0.00 Mode: apm_node_res::case 4
Timer # 15 0.46/ 90 = 0.01 Mode: apm_node_res::case 5
Timer # 16 0.00/ 0 = 0.00 Mode: apm_node_res::case 6
Timer # 17 0.00/ 45 = 0.00 Base 1st Deriv: apm_jacobian
Timer # 18 0.00/ 45 = 0.00 Base 1st Deriv: apm_condensed_jacobian
Timer # 19 0.00/ 1 = 0.00 Non-zeros: apm_nnz
Timer # 20 0.00/ 0 = 0.00 Count: Division by zero
Timer # 21 0.00/ 0 = 0.00 Count: Argument of LOG10 negative
Timer # 22 0.00/ 0 = 0.00 Count: Argument of LOG negative
Timer # 23 0.00/ 0 = 0.00 Count: Argument of SQRT negative
Timer # 24 0.00/ 0 = 0.00 Count: Argument of ASIN illegal
Timer # 25 0.00/ 0 = 0.00 Count: Argument of ACOS illegal
Timer # 26 0.00/ 1 = 0.00 Extract sparsity: apm_sparsity
Timer # 27 0.00/ 13 = 0.00 Variable ordering: apm_var_order
Timer # 28 0.00/ 1 = 0.00 Condensed sparsity
Timer # 29 0.00/ 0 = 0.00 Hessian Non-zeros
Timer # 30 0.01/ 1 = 0.01 Differentials
Timer # 31 0.00/ 0 = 0.00 Hessian Calculation
Timer # 32 0.00/ 0 = 0.00 Extract Hessian
Timer # 33 0.00/ 1 = 0.00 Base 1st Deriv: apm_jac_order
Timer # 34 0.01/ 1 = 0.01 Solver Setup
Timer # 35 0.84/ 1 = 0.84 Solver Solution
Timer # 36 0.00/ 102 = 0.00 Number of Variables
Timer # 37 0.00/ 51 = 0.00 Number of Equations
Timer # 38 0.12/ 14 = 0.01 File Read/Write
Timer # 39 0.00/ 0 = 0.00 Dynamic Init A
Timer # 40 0.00/ 0 = 0.00 Dynamic Init B
Timer # 41 0.00/ 0 = 0.00 Dynamic Init C
Timer # 42 0.69/ 1 = 0.69 Init: Read APM File
Timer # 43 0.00/ 1 = 0.00 Init: Parse Constants
Timer # 44 0.45/ 1 = 0.45 Init: Model Sizing
Timer # 45 0.00/ 1 = 0.00 Init: Allocate Memory
Timer # 46 0.28/ 1 = 0.28 Init: Parse Model
Timer # 47 0.17/ 1 = 0.17 Init: Check for Duplicates
Timer # 48 2.01/ 1 = 2.01 Init: Compile Equations
Timer # 49 0.00/ 1 = 0.00 Init: Check Uninitialized
Timer # 50 0.00/ 505 = 0.00 Evaluate Expression Once
Timer # 51 0.00/ 0 = 0.00 Sensitivity Analysis: LU Factorization
Timer # 52 0.00/ 0 = 0.00 Sensitivity Analysis: Gauss Elimination
Timer # 53 0.00/ 0 = 0.00 Sensitivity Analysis: Total Time
Many thanks in advance for your ideas.
Here are a few ideas to improve the compile time speed:
If the problem is a simulation, it can be run with IMODE=7 to compile the model once and run through all the timesteps sequentially. This is often the fastest option for simulation.
If the problem has degrees of freedom (find optimal parameter to minimize objective) then it is also possible to set up the problem as IMODE=6 to compile the model once and it solves all time steps simultaneously.
The model re-compiles every time m.solve() is called. Keeping a compiled version of the model is built into the REPLAY parameter, but it has little documentation and is built for exactly reenacting a sequence from historical data. If you'd like the option to keep a compiled version of the model for the next run, please add it as a feature request in GitHub.
Using the simple model shows that the model re-compile time doesn't change significantly between runs with different values of demand.
from gekko import GEKKO
from numpy import random
import numpy as np
import matplotlib.pyplot as plt
import time
# first, create the model
m = GEKKO(remote=False)
# declare additional decision variables
m.u = m.Var(lb=5, ub=25)
m.v = m.Var(lb=0, ub=100)
m.w = m.Var(lb=0, ub=50)
m.b = m.Var(lb=0, ub=1, integer=True)
m.demand = m.Param()
# now add the objective and the constraints
m.Minimize((1-0.8)*m.u*m.b+(1-0.9)*m.v+(1-0.7)*m.w)
m.Equation(m.u*m.b >= 10)
m.Equation(m.u*m.b + m.v + m.w == m.demand)
m.options.SOLVER=1
m.options.DIAGLEVEL = 0
tt = [] # total time
ts = [] # solve time
for i in range(101):
start = time.time()
m.demand.value = 50+random.rand()*50
m.solve(disp=False)
tt.append(time.time()-start)
ts.append(m.options.SOLVETIME)
plt.figure(figsize=(8,4))
plt.plot(tt,label='Total Time')
plt.plot(ts,label='Solve Time')
plt.plot(np.array(tt)-np.array(ts),label='Compile Time')
plt.ylim([0,0.03]); plt.grid(); plt.ylabel('Time (sec)')
plt.savefig('timing.png',dpi=300)
plt.legend()
plt.show()
The compile and solve times are very fast for this simple problem. If you can post the full RTO problem, we can give more specific suggestions to improve solve time.

Rocksdb concurrent Put performance

I would like to show some experimental results about Rocksdb Put performance. The fact that single-threaded put throughput is slower than two-threaded put throughput. It is wired because it uses the default skiplist as memtable, and this data structure supports concurrent writes.
Here is my testing code.
uint64_t nthread = 2;
uint64_t nkeys = 16000000;
std::thread threads[nthread];
std::atomic<uint64_t> idx(1000000);
for (int t = 0; t < nthread; t++) {
threads[t] = std::thread([db, &idx, nthread, nkeys, &write_option_disable] {
WriteBatch batch;
for (int i = 0; i < nkeys / nthread; i++) {
std::string key = "WVERIFY" + std::to_string(idx.fetch_add(1));
std::string value = "MOCK";
auto ikey = rocksdb::Slice(key);
auto ivalue = rocksdb::Slice(value);
db->Put(write_option_disable, ikey, ivalue);
}
return 0;
});
}
for (auto& t : threads) {
t.join();
}
Besides, here are the results I got.
// Single thread
Uptime(secs): 8.4 total, 8.3 interval
Flush(GB): cumulative 1.170, interval 1.170
AddFile(GB): cumulative 0.000, interval 0.000
AddFile(Total Files): cumulative 0, interval 0
AddFile(L0 Files): cumulative 0, interval 0
AddFile(Keys): cumulative 0, interval 0
Cumulative compaction: 1.17 GB write, 143.35 MB/s write, 0.00 GB read, 0.00 MB/s read, 8.1 seconds
Interval compaction: 1.17 GB write, 144.11 MB/s write, 0.00 GB read, 0.00 MB/s read, 8.1 seconds
Stalls(count): 0 level0_slowdown, 0 level0_slowdown_with_compaction, 0 level0_numfiles, 0 level0_numfiles_with_compaction, 0 stop for pending_compaction_bytes, 0 slowdown for pending_compaction_bytes, 0 memtable_compaction, 0 memtable_slowdown, interval 0 total count
Block cache LRUCache#0x564742515ea0#7011 capacity: 8.00 MB collections: 1 last_copies: 0 last_secs: 2e-05 secs_since: 8
Block cache entry stats(count,size,portion): Misc(1,0.00 KB,0%)
** File Read Latency Histogram By Level [default] **
** DB Stats **
Uptime(secs): 8.4 total, 8.3 interval
Cumulative writes: 16M writes, 16M keys, 16M commit groups, 1.0 writes per commit group, ingest: 1.63 GB, 199.80 MB/s
Cumulative WAL: 0 writes, 0 syncs, 0.00 writes per sync, written: 0.00 GB, 0.00 MB/s
Cumulative stall: 00:00:0.000 H:M:S, 0.0 percent
Interval writes: 16M writes, 16M keys, 16M commit groups, 1.0 writes per commit group, ingest: 1669.88 MB, 200.85 MB/s
Interval WAL: 0 writes, 0 syncs, 0.00 writes per sync, written: 0.00 GB, 0.00 MB/s
Interval stall: 00:00:0.000 H:M:S, 0.0 percent
// 2 threads
Uptime(secs): 31.4 total, 31.4 interval
Flush(GB): cumulative 0.183, interval 0.183
AddFile(GB): cumulative 0.000, interval 0.000
AddFile(Total Files): cumulative 0, interval 0
AddFile(L0 Files): cumulative 0, interval 0
AddFile(Keys): cumulative 0, interval 0
Cumulative compaction: 0.67 GB write, 21.84 MB/s write, 0.97 GB read, 31.68 MB/s read, 10.2 seconds
Interval compaction: 0.67 GB write, 21.87 MB/s write, 0.97 GB read, 31.72 MB/s read, 10.2 seconds
Stalls(count): 0 level0_slowdown, 0 level0_slowdown_with_compaction, 0 level0_numfiles, 0 level0_numfiles_with_compaction, 0 stop for pending_compaction_bytes, 0 slowdown for pending_compaction_bytes, 0 memtable_compaction, 0 memtable_slowdown, interval 0 total count
Block cache LRUCache#0x5619fb7bbea0#6183 capacity: 8.00 MB collections: 1 last_copies: 0 last_secs: 1.9e-05 secs_since: 31
Block cache entry stats(count,size,portion): Misc(1,0.00 KB,0%)
** File Read Latency Histogram By Level [default] **
** DB Stats **
Uptime(secs): 31.4 total, 31.4 interval
Cumulative writes: 16M writes, 16M keys, 11M commit groups, 1.4 writes per commit group, ingest: 0.45 GB, 14.67 MB/s
Cumulative WAL: 0 writes, 0 syncs, 0.00 writes per sync, written: 0.00 GB, 0.00 MB/s
Cumulative stall: 00:00:0.000 H:M:S, 0.0 percent
Interval writes: 16M writes, 16M keys, 11M commit groups, 1.4 writes per commit group, ingest: 460.94 MB, 14.69 MB/s
Interval WAL: 0 writes, 0 syncs, 0.00 writes per sync, written: 0.00 GB, 0.00 MB/s
Interval stall: 00:00:0.000 H:M:S, 0.0 percent
===========================update==========================
This is my Rocksdb's setting.
DB* db;
Options options;
BlockBasedTableOptions table_options;
rocksdb::WriteOptions write_option_disable;
write_option_disable.disableWAL = true;
// Optimize RocksDB. This is the easiest way to get RocksDB to perform well
options.IncreaseParallelism();
options.OptimizeLevelStyleCompaction();
// create the DB if it's not already present
options.create_if_missing = true;
The atomic idx shared between two threads can introduced non-trivial overhead. Try inserting random values from each thread, and maybe increase the number of threads.

native memory leak - how to find callstack of allocation source

Based on following output of !address -summary command, I think I have got a native memory leak. In order to deterine the callstack on where these allocations are happening, I am following article at http://www.codeproject.com/KB/cpp/MemoryLeak.aspx
0:000> !address -summary
TEB 7efdd000 in range 7efdb000 7efde000
TEB 7efda000 in range 7efd8000 7efdb000
TEB 7efd7000 in range 7efd5000 7efd8000
TEB 7efaf000 in range 7efad000 7efb0000
TEB 7efac000 in range 7efaa000 7efad000
ProcessParametrs 00441b78 in range 00440000 00540000
Environment 004407f0 in range 00440000 00540000
-------------------- Usage SUMMARY --------------------------
TotSize ( KB) Pct(Tots) Pct(Busy) Usage
551a000 ( 87144) : 04.16% 14.59% : RegionUsageIsVAD
5b8d3000 ( 1499980) : 71.53% 00.00% : RegionUsageFree
2cc3000 ( 45836) : 02.19% 07.68% : RegionUsageImage
4ff000 ( 5116) : 00.24% 00.86% : RegionUsageStack
0 ( 0) : 00.00% 00.00% : RegionUsageTeb
1c040000 ( 459008) : 21.89% 76.87% : RegionUsageHeap
0 ( 0) : 00.00% 00.00% : RegionUsagePageHeap
1000 ( 4) : 00.00% 00.00% : RegionUsagePeb
0 ( 0) : 00.00% 00.00% : RegionUsageProcessParametrs
0 ( 0) : 00.00% 00.00% : RegionUsageEnvironmentBlock
Tot: 7fff0000 (2097088 KB) Busy: 2471d000 (597108 KB)
0:000> !heap -s
LFH Key : 0x7fdcf95f
Termination on corruption : DISABLED
Heap Flags Reserv Commit Virt Free List UCR Virt Lock Fast
(k) (k) (k) (k) length blocks cont. heap
-----------------------------------------------------------------------------
00440000 00000002 453568 436656 453568 62 54 32 0 0 LFH
006b0000 00001002 64 16 64 4 2 1 0 0
002b0000 00041002 256 4 256 2 1 1 0 0
00620000 00001002 64 16 64 5 2 1 0 0
00250000 00001002 64 16 64 4 2 1 0 0
007d0000 00041002 256 4 256 0 1 1 0 0
005c0000 00001002 1088 388 1088 7 17 2 0 0 LFH
02070000 00041002 256 4 256 1 1 1 0 0
02270000 00041002 256 144 256 0 1 1 0 0 LFH
04e10000 00001002 3136 1764 3136 384 36 3 0 0 LFH
External fragmentation 21 % (36 free blocks)
-----------------------------------------------------------------------------
But when I run !heap -p –a command, I don’t get any callstack, just the following. Any ideas how to get callstack of allocations source?
0:000> !heap -p -a 0218e008
address 0218e008 found in
_HEAP # 4e10000
HEAP_ENTRY Size Prev Flags UserPtr UserSize - state
0218e000 001c 0000 [00] 0218e008 000d4 - (busy)
You should use deleaker. It's powerful tool for debuging.
use valgrind for linux and deleaker for windows.
If you don't get a call stack from !heap -p -a
The reason can be that you have not used gflags correctly
Remeber to use correct name including .exe
Try to start it inteactivly and go to the image tab, might be easier
Try with page heap, that also gives call stack
I know nothing about Windows, but at least on Unix systems a debugger (like gdb on Linux) is useful to understand callstacks.
And you could also circumvent some of your issues by using e.g. Boehm's conservative garbage collector. On many systems you can also hunt memory leaks with the help of valgrind

How can I measure the performance and TCP RTT of my server code?

I created a basic TCP server that reads incoming binary data in protocol buffer format, and writes a binary msg as response. I would like to benchmark the the roundtrip time.
I tried iperf, but could not make it send the same input file multiple times. Is there another benchmark tool than can send a binary input file repeatedly?
If you have access to a linux or unix machine1, you should use tcptrace. All you need to do is loop through your binary traffic test while capturing with wireshark or tcpdump file.
After you have that .pcap file2, analyze with tcptrace -xtraffic <pcap_filename>3. This will generate two text files, and the average RTT stats for all connections in that pcap are shown at the bottom of the one called traffic_stats.dat.
[mpenning#Bucksnort tcpperf]$ tcptrace -xtraffic willers.pcap
mod_traffic: characterizing traffic
1 arg remaining, starting with 'willers.pcap'
Ostermann's tcptrace -- version 6.6.1 -- Wed Nov 19, 2003
16522 packets seen, 16522 TCP packets traced
elapsed wallclock time: 0:00:00.200709, 82318 pkts/sec analyzed
trace file elapsed time: 0:03:21.754962
Dumping port statistics into file traffic_byport.dat
Dumping overall statistics into file traffic_stats.dat
Plotting performed at 15.000 second intervals
[mpenning#Bucksnort tcpperf]$
[mpenning#Bucksnort tcpperf]$ cat traffic_stats.dat
Overall Statistics over 201 seconds (0:03:21.754962):
4135308 ttl bytes sent, 20573.672 bytes/second
4135308 ttl non-rexmit bytes sent, 20573.672 bytes/second
0 ttl rexmit bytes sent, 0.000 bytes/second
16522 packets sent, 82.199 packets/second
200 connections opened, 0.995 conns/second
11 dupacks sent, 0.055 dupacks/second
0 rexmits sent, 0.000 rexmits/second
average RTT: 67.511 msecs <------------------
[mpenning#Bucksnort tcpperf]$
The .pcap file used in this example was a capture I generated when I looped through an expect script that pulled data from one of my servers. This was how I generated the loop...
#!/usr/bin/python
from subprocess import Popen, PIPE
import time
for ii in xrange(0,200):
# willers.exp is an expect script
Popen(['./willers.exp'], stdin=PIPE, stdout=PIPE, stderr=PIPE)
time.sleep(1)
You can adjust the sleep time between loops based on your server's accept() performance and the duration of your tests.
END NOTES:
A Knoppix Live-CD will do
Filtered to only capture test traffic
tcptrace is capable of very detailed per-socket stats if you use other options...
================================
[mpenning#Bucksnort tcpperf]$ tcptrace -lr willers.pcap
1 arg remaining, starting with 'willers.pcap'
Ostermann's tcptrace -- version 6.6.1 -- Wed Nov 19, 2003
16522 packets seen, 16522 TCP packets traced
elapsed wallclock time: 0:00:00.080496, 205252 pkts/sec analyzed
trace file elapsed time: 0:03:21.754962
TCP connection info:
200 TCP connections traced:
TCP connection 1:
host c: myhost.local:44781
host d: willers.local:22
complete conn: RESET (SYNs: 2) (FINs: 1)
first packet: Tue May 31 22:52:24.154801 2011
last packet: Tue May 31 22:52:25.668430 2011
elapsed time: 0:00:01.513628
total packets: 73
filename: willers.pcap
c->d: d->c:
total packets: 34 total packets: 39
resets sent: 4 resets sent: 0
ack pkts sent: 29 ack pkts sent: 39
pure acks sent: 11 pure acks sent: 2
sack pkts sent: 0 sack pkts sent: 0
dsack pkts sent: 0 dsack pkts sent: 0
max sack blks/ack: 0 max sack blks/ack: 0
unique bytes sent: 2512 unique bytes sent: 14336
actual data pkts: 17 actual data pkts: 36
actual data bytes: 2512 actual data bytes: 14336
rexmt data pkts: 0 rexmt data pkts: 0
rexmt data bytes: 0 rexmt data bytes: 0
zwnd probe pkts: 0 zwnd probe pkts: 0
zwnd probe bytes: 0 zwnd probe bytes: 0
outoforder pkts: 0 outoforder pkts: 0
pushed data pkts: 17 pushed data pkts: 33
SYN/FIN pkts sent: 1/1 SYN/FIN pkts sent: 1/0
req 1323 ws/ts: Y/Y req 1323 ws/ts: Y/Y
adv wind scale: 6 adv wind scale: 1
req sack: Y req sack: Y
sacks sent: 0 sacks sent: 0
urgent data pkts: 0 pkts urgent data pkts: 0 pkts
urgent data bytes: 0 bytes urgent data bytes: 0 bytes
mss requested: 1460 bytes mss requested: 1460 bytes
max segm size: 792 bytes max segm size: 1448 bytes
min segm size: 16 bytes min segm size: 32 bytes
avg segm size: 147 bytes avg segm size: 398 bytes
max win adv: 40832 bytes max win adv: 66608 bytes
min win adv: 5888 bytes min win adv: 66608 bytes
zero win adv: 0 times zero win adv: 0 times
avg win adv: 14035 bytes avg win adv: 66608 bytes
initial window: 32 bytes initial window: 40 bytes
initial window: 1 pkts initial window: 1 pkts
ttl stream length: 2512 bytes ttl stream length: NA
missed data: 0 bytes missed data: NA
truncated data: 0 bytes truncated data: 0 bytes
truncated packets: 0 pkts truncated packets: 0 pkts
data xmit time: 1.181 secs data xmit time: 1.236 secs
idletime max: 196.9 ms idletime max: 196.9 ms
throughput: 1660 Bps throughput: 9471 Bps
RTT samples: 18 RTT samples: 24
RTT min: 43.8 ms RTT min: 0.0 ms
RTT max: 142.5 ms RTT max: 7.2 ms
RTT avg: 68.5 ms RTT avg: 0.7 ms
RTT stdev: 35.8 ms RTT stdev: 1.6 ms
RTT from 3WHS: 80.8 ms RTT from 3WHS: 0.0 ms
RTT full_sz smpls: 1 RTT full_sz smpls: 3
RTT full_sz min: 142.5 ms RTT full_sz min: 0.0 ms
RTT full_sz max: 142.5 ms RTT full_sz max: 0.0 ms
RTT full_sz avg: 142.5 ms RTT full_sz avg: 0.0 ms
RTT full_sz stdev: 0.0 ms RTT full_sz stdev: 0.0 ms
post-loss acks: 0 post-loss acks: 0
segs cum acked: 0 segs cum acked: 9
duplicate acks: 0 duplicate acks: 1
triple dupacks: 0 triple dupacks: 0
max # retrans: 0 max # retrans: 0
min retr time: 0.0 ms min retr time: 0.0 ms
max retr time: 0.0 ms max retr time: 0.0 ms
avg retr time: 0.0 ms avg retr time: 0.0 ms
sdv retr time: 0.0 ms sdv retr time: 0.0 ms
================================
You can always stick a shell loop around a program like iperf. Also, assuming iperf can read from a file (thus stdin) or programs like ttcp, could allow a shell loop catting a file N times into iperf/ttcp.
If you want a program which sends a file, waits for your binary response, and then sends another copy of the file, you probably are going to need to code that yourself.
You will need to measure the time in the client application for a roundtrip time, or monitor the network traffic going from, and coming to, the client to get the complete time interval. Measuring the time at the server will exclude any kernel level delays in the server and all the network transmission times.
Note that TCP performance will go down as the load goes up. If you're going to test under heavy load, you need professional tools that can scale to thousands (or even millions in some cases) of new connection/second or concurrent established TCP connections.
I wrote an article about this on my blog (feel free to remove if this is considered advertisement, but I think it's relevant to this thread): http://synsynack.wordpress.com/2012/04/09/realistic-latency-measurement-in-the-application-layers
As a very simple highlevel tool netcat comes to mind ... so something like time (nc hostname 1234 < input.binary | head -c 100) assuming the response is 100 bytes long.

Resources