Measure performance of a VPS intended for hosting a website - performance

I just bought for a month an extremely cheap VPS with 16 GB RAM and 6 cores(from Contabo)
Now my question is, how can I get some benchmark results in order to compare it with other VPSes like Hostinger provides?
I did a Geekbench benchmark on it and the results can be seen here: https://browser.geekbench.com/v4/cpu/15852309
The problem with Geekbench is that I feel it's not really web oriented as the scores are influenced by the GPU as well.
What should I use in order to compare the VPSes between them?
Would the plan be enough to host a Magento 2 website / possibly more?

For webserver performance the Network, Disk (random read) and CPU performance are the most important factors.
I like to benchmark and compare each one separately.
For Disk I/O performance, can use sysbench:
apt install sysbench
sysbench fileio --file-num=4 prepare
sysbench fileio --file-num=4 --file-test-mode=rndrw run
For CPU performance can use stress-ng:
apt install stress-ng
stress-ng -t 5 -c 2 --metrics-brief
-c 2 uses 2 logical processors. Adjust if necessary.
For network performance can use speedtest-cli:
apt install speedtest-cli
speedtest-cli
Example output:
# sysbench fileio --file-num=4 --file-test-mode=rndrw run
<skip>
Throughput:
read, MiB/s: 45.01
written, MiB/s: 30.00
# stress-ng -t 5 -c 2 --metrics-brief
stress-ng: info: [14993] dispatching hogs: 2 cpu
stress-ng: info: [14993] successful run completed in 5.00s
stress-ng: info: [14993] stressor bogo ops real time usr time sys time bogo ops/s bogo ops/s
stress-ng: info: [14993] (secs) (secs) (secs) (real time) (usr+sys time)
stress-ng: info: [14993] cpu 3957 5.00 9.99 0.00 790.92 396.10
# speedtest-cli
Retrieving speedtest.net configuration...
Testing from <skip> ...
Retrieving speedtest.net server list...
Selecting best server based on ping...
Hosted by Uganda Hosting Limited (Helsinki) [0.20 km]: 1.807 ms
Testing download speed................................................................................
Download: 575.68 Mbit/s
Testing upload speed................................................................................................
Upload: 499.89 Mbit/s

Related

Package loading time increases dramatically when changing Julia DEPOT_PATH

I'm new to Julia and, after a suggestion, I started using nightly build 1.6.0-DEV.1371 due to the high loading time of packages in Julia 1.5.2.
So I tried to change the default directory of DEPOT_PATH and copied all files from ~/.julia to /opt/julia (owned by my user). Now, if I start Julia using the default directory and run the code below, it takes about 4 seconds. However, when I start julia in the new directory (JULIA_DEPOT_PATH=/opt/julia julia), the same code takes incredible 83 seconds. The same happens with Julia 1.5.2 (17s in the default directory, 200s in the new directory).
This is the code I'm using to measure the time.
t = time(); using Plots; time() - t
Is there some explanation about this strange (and annoying) behavior?
My Platform Info:
Platform Info:
OS: Linux (x86_64-pc-linux-gnu)
CPU: Intel(R) Core(TM) i3-6100U CPU # 2.30GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-9.0.1 (ORCJIT, skylake)
And I'm using a SSD.
There is a good chance that now you are having files from your previous Julia installation with the new Julia's JULIA_DEPOT_PATH. The normal approach is to set JULIA_DEPOT_PATH to an empty folder and then install them there.
Regarding startup times there are the following things you can do:
Precompile package (this is also happening when you use a new version of the package for the first time)
using Pkg
pkg"precompile"
Once Plots is precompiled it still takes around 10s to load each time.
If the above time is unacceptable build-in your Plots into Julia system image.
using PackageCompiler
create_sysimage(:Plots, sysimage_path="sys_plots.so", precompile_execution_file="precompile_plots.jl")
For the precompile_plots.jl you could use commands that you use regularly or perhaps test sets from Plots.jl.
Once done you will be starting Julia with the following comand:
julia --sysimage sys_plots.so
Building the system image takes long, however once done, loading Plots.jl will be a matter of milliseconds.
More information can be found here:
package compiler tutorial https://julialang.github.io/PackageCompiler.jl/dev/examples/plots/
excellent tutorial video https://live.juliacon.org/talk/Z8TE39

"go test -cpuprofile" does not generate a full trace

Issue
I have a go package, with a test suite.
When I run the test suite for this package, the total runtime is ~ 7 seconds :
$ go test ./mydbpackage/ -count 1
ok mymodule/mydbpackage 7.253s
However, when I add a -cpuprofile=cpu.out option, the sampling does not cover the whole run :
$ go test ./mydbpackage/ -count 1 -cpuprofile=cpu.out
ok mymodule/mydbpackage 7.029s
$ go tool pprof -text -cum cpu.out
File: mydbpackage.test
Type: cpu
Time: Aug 6, 2020 at 9:42am (CEST)
Duration: 5.22s, Total samples = 780ms (14.95%) # <--- depending on the runs, I get 400ms to 1s
Showing nodes accounting for 780ms, 100% of 780ms total
flat flat% sum% cum cum%
0 0% 0% 440ms 56.41% testing.tRunner
10ms 1.28% 1.28% 220ms 28.21% database/sql.withLock
10ms 1.28% 2.56% 180ms 23.08% runtime.findrunnable
0 0% 2.56% 180ms 23.08% runtime.mcall
...
Looking at the collected samples :
# sample from another run :
$ go tool pprof -traces cpu.out | grep "ms " # get the first line of each sample
10ms runtime.nanotime
10ms fmt.(*readRune).ReadRune
30ms syscall.Syscall
10ms runtime.scanobject
10ms runtime.gentraceback
...
# 98 samples collected, for a total sum of 1.12s
The issue I see is : for some reason, the sampling profiler stops collecting samples, or is blocked/slowed down at some point.
Context
go version is 1.14.6, platform is linux/amd64
$ go version
go version go1.14.6 linux/amd64
This package contains code that interact with a database, and the tests are run against a live postgresql server.
One thing I tried : t.Skip() internally calls runtime.Goexit(), so I replaced calls to t.Skip and variants with a simple return ; but it didn't change the outcome.
Question
Why aren't more samples collected ? I there some known pattern that blocks/slows down the sampler, or terminates the sampler earlier than it should ?
#Volker guided me to the answer in his comments :
-cpuprofile creates a profile in which only goroutines actively using the CPU are sampled.
In my use case : my go code spends a lot of time waiting for the answers of the postgresql server.
Generating a trace using go test -trace=trace.out, and then extracting a network blocking profile using go tool trace -pprof=net trace.out > network.out yielded much more relevant information.
For reference, on top of opening the complete trace using go tool trace trace.out, here are the values you can pass to -pprof= :
from go tool trace docs :
net: network blocking profile
sync: synchronization blocking profile
syscall: syscall blocking profile
sched: scheduler latency profile

Intel MPI benchmark fails when # bytes > 128: IMB-EXT

I just installed Linux and Intel MPI to two machines:
(1) Quite old (~8 years old) SuperMicro server, which has 24 cores (Intel Xeon X7542 X 4). 32 GB memory.
OS: CentOS 7.5
(2) New HP ProLiant DL380 server, which has 32 cores (Intel Xeon Gold 6130 X 2). 64 GB memory.
OS: OpenSUSE Leap 15
After installing OS and Intel MPI, I compiled intel MPI benchmark and ran it:
$ mpirun -np 4 ./IMB-EXT
It is quite surprising that I find the same error when running IMB-EXT and IMB-RMA, though I have a different OS and everything (even GCC version used to compile Intel MPI benchmark is different -- in CentOS, I used GCC 6.5.0, and in OpenSUSE, I used GCC 7.3.1).
On the CentOS machine, I get:
#---------------------------------------------------
# Benchmarking Unidir_Put
# #processes = 2
# ( 2 additional processes waiting in MPI_Barrier)
#---------------------------------------------------
#
# MODE: AGGREGATE
#
#bytes #repetitions t[usec] Mbytes/sec
0 1000 0.05 0.00
4 1000 30.56 0.13
8 1000 31.53 0.25
16 1000 30.99 0.52
32 1000 30.93 1.03
64 1000 30.30 2.11
128 1000 30.31 4.22
and on the OpenSUSE machine, I get
#---------------------------------------------------
# Benchmarking Unidir_Put
# #processes = 2
# ( 2 additional processes waiting in MPI_Barrier)
#---------------------------------------------------
#
# MODE: AGGREGATE
#
#bytes #repetitions t[usec] Mbytes/sec
0 1000 0.04 0.00
4 1000 14.40 0.28
8 1000 14.04 0.57
16 1000 14.10 1.13
32 1000 13.96 2.29
64 1000 13.98 4.58
128 1000 14.08 9.09
When I don't use mpirun (which means there is only one process to run IMB-EXT), the benchmark runs through, but Unidir_Put needs >=2 processes, so doesn't help so much, and I also find that the functions with MPI_Put and MPI_Get is extremely slower than I expected (from my experience). Also, using MVAPICH on the OpenSUSE machine did not help. The output is:
#---------------------------------------------------
# Benchmarking Unidir_Put
# #processes = 2
# ( 6 additional processes waiting in MPI_Barrier)
#---------------------------------------------------
#
# MODE: AGGREGATE
#
#bytes #repetitions t[usec] Mbytes/sec
0 1000 0.03 0.00
4 1000 17.37 0.23
8 1000 17.08 0.47
16 1000 17.23 0.93
32 1000 17.56 1.82
64 1000 17.06 3.75
128 1000 17.20 7.44
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 49213 RUNNING AT iron-0-1
= EXIT CODE: 139
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11)
This typically refers to a problem with your application.
Please see the FAQ page for debugging suggestions
update: I tested OpenMPI, and it goes through smoothly (although my application does not recommend using openmpi, and I still don't understand why Intel MPI or MVAPICH doesn't work...)
#---------------------------------------------------
# Benchmarking Unidir_Put
# #processes = 2
# ( 2 additional processes waiting in MPI_Barrier)
#---------------------------------------------------
#
# MODE: AGGREGATE
#
#bytes #repetitions t[usec] Mbytes/sec
0 1000 0.06 0.00
4 1000 0.23 17.44
8 1000 0.22 35.82
16 1000 0.22 72.36
32 1000 0.22 144.98
64 1000 0.22 285.76
128 1000 0.30 430.29
256 1000 0.39 650.78
512 1000 0.51 1008.31
1024 1000 0.84 1214.42
2048 1000 1.86 1100.29
4096 1000 7.31 560.59
8192 1000 15.24 537.67
16384 1000 15.39 1064.82
32768 1000 15.70 2086.51
65536 640 12.31 5324.63
131072 320 10.24 12795.03
262144 160 12.49 20993.49
524288 80 30.21 17356.93
1048576 40 81.20 12913.67
2097152 20 199.20 10527.72
4194304 10 394.02 10644.77
Is there any chance that I am missing something in installing MPI, or installing OS in these servers? Actually, I assume that OS is the problem, but not sure where to start...
Thanks a lot in advance,
Jae
Although this question is well written, you were not explicit about
Intel MPI benchmark (please add header)
Intel MPI
Open MPI
MVAPICH
supported host network fabrics - for each MPI distribution
selected fabric while running MPI benchmark
Compilation settings
Debugging this kind of trouble with disparate host machines, multiple Linux distributions and compiler versions can be quite hard. Remote debugging on StackOverflow is even harder.
First of all ensure reproducibility. This seems to be the case. One of many debugging approaches, the one I would recommend, is to reduce complexity of the system as a whole, test smaller sub-systems and start shifting responsibility to third parties. You may replace self-compiled executables with software packages provided by distribution software/package repositories or third parties like Conda.
Intel recently started to provide its libraries through YUM/APT repos as well as for Conda and PyPI. I found that helps a lot with reproducible deployments of HPC clusters and even runtime/development environments. I recommend to use it for CentOS 7.5.
YUM/APT repository for Intel MKL, Intel IPP, Intel DAAL, and Intel® Distribution for Python* (for Linux*):
Installing Intel® Performance Libraries and Intel® Distribution for Python* Using YUM Repository
Installing Intel® Performance Libraries and Intel® Distribution for Python* Using APT Repository
Conda* package/ Anaconda Cloud* support (Intel MKL, Intel IPP, Intel DAAL, Intel Distribution for Python):
Installing Intel Distribution for Python and Intel Performance Libraries with Anaconda
Available Intel packages can be viewed here
Install from the Python Package Index (PyPI) using pip (Intel MKL, Intel IPP, Intel DAAL)
Installing the Intel® Distribution for Python* and Intel® Performance Libraries with pip and PyPI
I do not know much about OpenSUSE Leap 15.

Vultr virtual cpu vs DigitalOcean Intel(R) Xeon(R) CPU E5-2650 v4 # 2.20GHz

When I run a uname -ar command on Vultr command line I see the following:
Linux my.vultr.account.com 4.12.10-coreos #1 SMP Tue Sep 5 20:29:13
UTC 2017 x86_64 Virtual CPU a7769a6388d5 GenuineIntel GNU/Linux
On DigitalOcean I get:
Linux master 4.11.11-coreos #1 SMP Tue Jul 18 23:06:59 UTC 2017 x86_64
Intel(R) Xeon(R) CPU E5-2650 v4 # 2.20GHz GenuineIntel GNU/Linux
I don't know what the difference means? Is virtual cpu worse/same/better than what I see in DigitalOcean output of "Intel(R) Xeon(R)"?
The real Intel Xeon E5-2650 v4 is a CPU with 12 cores. Depending on your VPS configuration you get X amount of cores assigned from that CPU. Hence virtual CPUs.
Pertaining the specs of Vultr. The official response from Vultr support:
"We do not provide specific information on the CPUs we offer. They are all late-model Intel Xeon CPUs."
The a7769a6388d5 is a 2.4Ghz virtual CPU. According to:
wget freevps.us/downloads/bench.sh -O - -o /dev/null|bash
From there on it can be a diversity of 2.4GHz Intel E5 Xeon's from either V2, V3, V4 generation. You can get down to the bottom of it:
cat /proc/cpuinfo
Family 6 Model 61 Stepping 2 = Broadwell? etc.
Tip: CPU speed is not the best way to compare your VPS though. Focus more on I/O speed, datacenter location, uplink speed and ping times.

elasticsearch JDBC -RIVER java.lang.OutOfMemoryError: unable to create new native thread

I am using elasticsearch "1.4.2" with river plugin on an aws instance with 8GB ram.Everything was working fine for a week but after a week the river plugin[plugin=org.xbib.elasticsearch.plugin.jdbc.river.JDBCRiverPlugin
version=1.4.0.4] stopped working also I was not able to do a ssh login to the server.After server restart ssh login worked fine ,when I checked the logs of elastic search I could find this error.
[2015-01-29 09:00:59,001][WARN ][river.jdbc.SimpleRiverFlow] no river mouth
[2015-01-29 09:00:59,001][ERROR][river.jdbc.RiverThread ] java.lang.OutOfMemoryError: unable to create new native thread
java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: unable to create new native thread
After restarting the service everything works normal .But after certain interval the same thing happen.Can anyone tell what could be the reason and solution .If any other details are required please let me know.
When I checked the number of file descriptor using
sudo ls /proc/1503/fd/ | wc -l
I could see it is increasing after every time . It was 320 and it now reached 360 (keeps increasing) . and
sudo grep -E "^Max open files" /proc/1503/limits
this shows 65535
processor info
vendor_id : GenuineIntel
cpu family : 6
model : 62
model name : Intel(R) Xeon(R) CPU E5-2670 v2 # 2.50GHz
stepping : 4
microcode : 0x415
cpu MHz : 2500.096
cache size : 25600 KB
siblings : 8
cpu cores : 4
memory
MemTotal: 62916320 kB
MemFree: 57404812 kB
Buffers: 102952 kB
Cached: 3067564 kB
SwapCached: 0 kB
Active: 2472032 kB
Inactive: 2479576 kB
Active(anon): 1781216 kB
Inactive(anon): 528 kB
Active(file): 690816 kB
Inactive(file): 2479048 kB
Do the following
Run the following two commands as root:
ulimit -l unlimited
ulimit -n 64000
In /etc/elasticsearch/elasticsearch.yml make sure you uncomment or add a line that says:
bootstrap.mlockall: true
In /etc/default/elasticsearch uncomment the line (or add a line) that says MAX_LOCKED_MEMORY=unlimited and also set the ES_HEAP_SIZE line to a reasonable number. Make sure it's a high enough amount of memory that you don't starve elasticsearch, but it should not be higher than half the memory on your system generally and definitely not higher than ~30GB. I have it set to 8g on my data nodes.
In one way or another the process is obviously being starved of resources. Give your system plenty of memory and give elasticsearch a good part of that.
I think you need to analysis your server log. Maybe In: /var/log/message

Resources