Huge performance hit on a simple Go server with Docker

Huge performance hit on a simple Go server with Docker - performance

I've tried several things to get to the root of this, but I'm clueless.
Here's the Go program. It's just one file and has a /api/sign endpoint that accepts POST requests. These POST requests have three fields in the body, and they are logged in a sqlite3 database. Pretty basic stuff.
I wrote a simple Dockerfile to containerize it. Uses golang:1.7.4 to build the binary and copies it over to alpine:3.6 for the final image. Once again, nothing fancy.
I use wrk to benchmark performance. With 8 threads and 1k connections for 50 seconds (wrk -t8 -c1000 -d50s -s post.lua http://server.com/api/sign) and a lua script to create the post requests, I measured the number of requests per second between different situations. In all situations, I run wrk from my laptop and the server is in DigitalOcean VPS (2 vCPUs, 2 GB RAM, SSD, Debian 9.4) that's very close to me.
Directly running the binary produced 2979 requests/sec.
Docker (docker run -it -v $(pwd):/data -p 8080:8080 image) produced 179 requests/sec.
As you can see, the Docker version is over 16x slower than running the binary directly. Everything else is the same during both experiments.
I've tried the following things and there is practically no improvement in performance in the Docker version:
Tried using host networking instead of bridge. There was a slight increase to around 190 requests/sec, but it's still miserable.
Tried increasing the limit on the number of file descriptors in the container version with --ulimit nofile=262144:262144. No improvement.
Tried different go versions, nothing.
Tried debian:9.4 for the final image instead of alpine:3.7 in the hope that it's musl that's performing terribly. Nothing here either.
(Edit) Tried running the container without a mounted volume and there's still no performance improvement.
I'm out of ideas at this point. Any help would be much appreciated!

Using an in-memory sqlite3 database completely solved all performance issues!
db, err = sql.Open("sqlite3", "file=dco.sqlite3?mode=memory")
I knew there was a disk I/O penalty hit associated with Docker's abstractions (even on Linux; I've heard it's worse on macOS), but I didn't know it would be ~16x.
Edit: Using an in-memory database isn't really an option most of the time. So I found another sqlite-specific solution. Before all database operations, do this to switch sqlite to WAL mode instead of the default rollback journal:
PRAGMA journal_mode=WAL;
PRAGMA synchronous=NORMAL;
This dramatically improved the Docker version's performance to over 2.7k requests/sec!

Related

How to correctly dockerize and continuously integrate 20GB raw data?

I have an application that uses about 20GB of raw data. The raw data consists of binaries.
The files rarely - if ever - change. Changes only happen if there are errors within the files that need to be resolved.
The most simple way to handle this would be to put the files in its own git repository and create a base image based on that. Then build the application on top of the raw data image.
Having a 20GB base image for a CI pipeline is not something I have tried and does not seem to be the optimal way to handle this situation.
The main reason for my approach here ist to prevent extra deployment complexity.
Is there a best practice, "correct" or more sensible way to do this?

Huge mostly-static data blocks like this are probably the one big exception to me to the “Docker images should be self-contained” rule. I’d suggest keeping this data somewhere else, and download it separately from the core docker run workflow.
I have had trouble in the past with multi-gigabyte images. Operations like docker push and docker pull in particular are prone to hanging up on the second gigabyte of individual layers. If, as you say, this static content changes rarely, there’s also a question of where to put it in the linear sequence of layers. It’s tempting to write something like
FROM ubuntu:18.04
ADD really-big-content.tar.gz /data
...
But even the ubuntu:18.04 image changes regularly (it gets security updates fairly frequently; your CI pipeline should explicitly docker pull it) and when it does a new build will have to transfer this entire unchanged 20 GB block again.
Instead I would put them somewhere like an AWS S3 bucket or similar object storage. (This is a poor match for source control systems, which (a) want to keep old content forever and (b) tend to be optimized for text rather than binary files.). Then I’d have a script that runs on the host that downloads that content, and then mount the corresponding host directory into the containers that need it.
curl -LO http://downloads.example.com/really-big-content.tar.gz
tar xzf really-big-content.tar.gz
docker run -v $PWD/really-big-content:/data ...
(In Kubernetes or another distributed world, I’d probably need to write a dedicated Job to download the content into a Persistent Volume and run that as part of my cluster bring-up. You could do the same thing in plain Docker to download the content into a named volume.)

How to use distcc to preprocess and compile everything remotely only?

Background:
I have a 128-core server which I would like to use as a build server.
I have a bunch of client machines which work with a not-so-new and not-so-powerful PC. (Can't upgrade! Not in my control.)
What I did:
I followed the distcc documentation.
And installed a virtual machine on the server with exactly the same compiler version, the same distcc version -- basically the same distribution, as on the client-machines.
After configuring the clients and the servers, I can remotely build. I can verify this using the distccmon-text tool. I can see on the server, there are 8 threads started by the distcc daemon and that are awaiting for build-jobs to come. This was good as a first step. You can see the output below to be sure.
Second Step: Since the client machines are dual-core machines while the server offers 128-cores, and not all clients will be compiling at the same time, I wanted to offload as much of the build as possible to the build-server.
Problems:
First Problem: distcc, no matter how I try to configure it, always tries to distribute the build-jobs equally on the client and the server. Even though my configuration file looks as shown below:
# --- /etc/distcc/hosts -----------------------
# See the "Hosts Specification" section of
# "man distcc" for the format of this file.
#
# By default, just test that it works in loopback mode.
# 127.0.0.1
172.24.26.208/8,cpp,lzo
localhost/0
Which as per the distcc documentation should give higher priority to the build-server and lower-priority to the localhost since it comes later in the list. Also, it should give 8 jobs to the build server and 0 jobs to the localhost. But no, that doesn't happen. Upon typing make -j8 what it tries to do is start 4 threads on localhost and 4 on remote. Not good. This you can see from the image below.
Second Problem: What you would also notice is that the pre-processing is being done on the localmachine. For this there is a tool that comes with distcc. It is called the "distcc-pump" or the pump mode and can be used like this.
time pump make CC="distcc gcc" CXX="distcc g++" -j8
Unfortunately, pump mode or not, the pre-processing happens to be happening all on the localhost, as you can see from the above image. Sad.
Note: At no point does the distcc program, with the configurations I listed here, throw any errors nor warnings, neither on the server nor on the clients.
Versions:
gcc 4.4.5
distcc 3.2rc1.2
(Before someone suggests - "upgrade software!", newer versions are most likely not possible for me. Anyways, this version of distcc offers the features that I need. Also, I can upgrade the server virtual-machine but then there would be compiler version mistmatch between clients and the server. The clients I cannot upgrade.)
Any suggestions, feedback on how to improve this setup/(fix the problems) are most welcome.

EDIT : these solutions do not work, I let the answer to avoid someone else to propose again them
Try by
removing the line concerning the localhost in /etc/distcc/hosts c.f. https://superuser.com/questions/568133/force-most-compilation-to-a-remote-host-with-distcc
or may be specifying 127.0.0.1 rather than localhost in /etc/distcc/hosts c.f. an other problem solved with that substitution in https://distcc.github.io/faq.html

distcc actually differentiates between remote and local CPUs. But contrary to your interpretation, in the hosts file the IP address 127.0.0.1 is considered as a remote CPU and a distccd server is expected to be running there. Any number of jobs you define in the hosts file is interpreted only for these server nodes.
According to the man page, "localhost" is interpreted specially. This is what seems to not work for you. An alternative syntax is --localslots=<int>. Have you tested this?
Additionally, distcc runs jobs on the local host (the one where you start the driver program). First, all linking is done there. Second, when you specify a certain parallelism with make -jN, all jobs exceeding the available number of remote jobs are run on your local host, too - in addition to the workload distribution part of distcc. The option --localslots limits these. The man page does not mention localhost explicitly here. And then there are those jobs, which fail on the server and are repeated locally.
For the given 128-core server I would use the number of cores in the hosts file and start only that number of compile jobs:
$ cat ~/.distcc/hosts
172.24.26.208/128,cpp,lzo
$ make -j 128
...
Then I would expect to not see any compile jobs on the local machine.
The man page has some more words regarding recommended job numbers. Search for the section(s) starting with distcc spreads the jobs across both local and remote CPUs.

Postgres: After importing production database (with replication) to my local machine, I notice network packets being sent and received from macbook

I've been a MySQL guy, and now I'm working with Postgres so I am learning. Wondering if someone can tell me why my postgres process on my macbook is sending and receiving data over my network. I am just noticing this is happening for the first time - so maybe it's been going on before this and I just never noticed postgres does this.
What has me a bit nervous, is that I pulled down a production datadump from our server which is set up with replication and I imported it to my local postgres db. The settings in my postgresql.conf don't indicate replication is turned on. So it shouldn't be streaming out to anything, right?
If someone has some insight into what may be happening, or why postgres is sending/receiving packets, I'd love to hear the easy answer (and the complex one if there's more to what's happening).
This is a postgres install via Homebrew on MacOSX.
Thanks in advance!
Some final thoughts: It's entirely possible, I guess, that Mac's activity monitor also shows local 'network' traffic stats. Maybe this isn't going out to the internets.....

In short, I would not expect replication to be enabled for a DB that was dumped from a server that had it if the server to which it was restored had no replication configured at all.
More detail:
Normally, to get a local copy of a database in Postgres, one would do a pg_dump of the remote database (this could be done from your laptop, pointing at your server), followed by a createdb on your laptop to create the database stub and then a pg_restore pointed at the dump to populate its contents. [Edit: Re-reading your post, it seems like you may perhaps have done this, but meant that the dump you used had replication enabled.)]
That would be entirely local (assuming no connections into the DB from off-box), so long as you didn't explicitly setup any replication or anything else that would go off-box. Can you elaborate on what exactly you mean by importing with replication?
Also, if you're concerned about remote traffic coming from Postgres, try running this command a few times over the period of a minute or two (when you are seeing the traffic):
netstat | grep postgres
In general, replication in Postgres in configured at a server level, and has to do with things such as the master server shipping WAL files to the standby server (for streaming replication). You would have almost certainly have had to setup entries in postgresql.conf and pg_hba.conf to ensure that the standby server had access (such as a replication entry in the latter conf file). Assuming you didn't do steps such as this, I think it can pretty safely be concluded that there's no replication going on (especially in conjunction with double-checking via netstat).
You might also double-check the Postgres log to see if it's doing anything replication related. In a default install, that'd probably be in /var/log/postgresql (although I'm not 100% sure if Homebrew installs put it somewhere else).

If it's UDP traffic, to and from a high port, it's likely to be PostgreSQL's internal statistics collector.
These are pre-bound to prevent interference and should not be accessible outside of PostgreSQL.

Symfony2 Slow Initialization Time

I have Symfony2 running on an Ubuntu Server 12.04 (64-bit) VM (VirtualBox). The host is a MacBook pro. For some reason I am getting really long request times in development mode (app_dev.php). I know its slower in dev mode, but I'm talking 5-7 seconds per request (sometimes even slower). On my Mac I get request times of 200ms or so in development mode.
After looking at my timeline in the Symfony2 profiler, I noticed that ~95% of the request time is "initialization time". What is this? What are some reasons it could be so slow?
This issue only applies to Symfony2 in dev mode, not any other sites I'm running on the VM, and not even to Symfony2 in production mode.
I saw this (http://stackoverflow.com/questions/11162429/whats-included-in-the-initialization-time-in-the-symfony2-web-profiler), but it doesn't seem to answer my questions.

I had 5-30 sec responses from Symfony2 by default. Now it's ~500ms in dev environment.
Then I modified the following things in php.ini:
set realpath_cache_size = 4M (or more)
disabled XDebug completely (test with phpinfo)
realpath_cache_ttl=7200
enabled and set OPcache (or APC) correctly
restarted Apache in order to have php.ini reloaded
And voilá, responses went under 2 secs in dev mode!
Before: 6779 ms
After: 1587 ms
Symfony2 reads classes from thousands of files and that's a slow process. When using a small PHP realpath cache, file paths need to be resolved one by one every time a new request is made in the dev environment if they are not in PHP's realpath cache. The realpath cache is too small by default for Symfony2. In prod this is not a problem of course.
Cache metadata:
Caching the metadata (e.g. mappings) is also very important for further performance boost:
doctrine:
orm:
entity_managers:
default:
metadata_cache_driver: apc
query_cache_driver: apc
result_cache_driver: apc
You need to enable APCu for this. It's APC without bytecode cache, as OPCache already does opcode caching. OPCache is built in since PHP 5.5.
---- After: 467 ms ----
(in prod environment the same response is ~80 ms)
Please note, this is project uses 30+ bundles and has tens of thousands of lines of code, almost hundred own services, so 0.5s is quite good on a local Windows environment using just a few simple optimizations.

I figured out the cause of the problem (and its not Symfony2). For some reason on the ubuntu VM, the modification times on certain files are incorrect (ie in the future, etc). When symfony2 checks these times using filemtime() against its registry, it determines that the cache is not longer fresh and it rebuilds the whole thing. I haven't been able to figure out why it is doing that yet.

I also needed to disable xdebug (v2.2.21) to debug apache2 max timeout loading on my macbook. It was installed using macports:
sudo port install php54-xdebug.
With xdebug enabled, every page run out max loading time, with a fatal error exceeding max timeout message dispatched. When disabled, everything just loads fine in a reasonable expected time. I came to this using MAMP, no xdebug enabled by default, and apache2 just works fast as usual. I may change for another debugger, that's a pitty, because xdebug worked fine before.
Config:
MacOSX 10.6.8
macports 2.1.3
Apache 2.2.24
php 5.4

We have the same problem.
Here we have 10 second and more for every request.
I see if I remove following lines in bootstrap.php.cache all times return in normal state (298 ms).
foreach ($meta as $resource) {
if (!$resource->isFresh($time)) {
return false;
}
}
It's possible that we have wrong modifications times, but we don't know how to fix. Somebody know a solution?

As said at https://stackoverflow.com/a/12967229/6108843 the reason of such behavior might be Ubuntu VM settings. You should to sync date and time between host and guest OS as explained at https://superuser.com/questions/463106/virtualbox-how-to-sync-host-and-guest-time.
File modification date changes to host's value when you upload file to VM via FTP. So that's why filemtime() returns wrong value.

You can move APP/var/cache в /dev/shm/YourAppName/var/cache. But it's good to have built container in local files too for IDE autocomplete and code validation. In app/AppKernel.php:
public function getCacheDir()
{
return $this->getVarOrShmDir('cache/' . $this->getEnvironment());
}
public function getLogDir()
{
return $this->getVarOrShmDir('logs');
}
private function getVarOrShmDir($dir)
{
$result = dirname(__DIR__) . '/var/' . $dir;
if (
in_array($this->environment, ['dev', 'test'], true) &&
empty($_GET['warmup']) && // to force using real directory add ?warmup=1 to URL
is_dir($result) && // first time create real directory, later use shm
file_exists('/bin/mount') && shell_exec('mount | grep vboxsf') // only for VirtualBox
) {
$result = '/dev/shm/' . 'YourAppName' . '/' . $dir . '/' . $this->getEnvironment();
}
return $result;
}

I disabled xdebug and it resulted in a decrease loading time from 17s (yea..) to 0.5s.

I had problems as well with slow page loads in development, which can extremely frustrating when you're tweaking CSS or something similar.
After a bit of digging I found that for me the problem was caused by Assetic which was recompiling all assets on every page load:
http://symfony.com/doc/current/cookbook/assetic/asset_management.html#dumping-asset-files-in-the-dev-environment
By disabling the use of the Assetic controller I was able to drastically increase my page load. However, as the link above states, this comes at a cost of regenerating your assets whenever you make a change to them (or set a watch on them).

In app_dev, all the caches and auto loading is starting from scratch and what I found to be most slow in dev is the orm. I shy away from using orm and focus mainly on dbal because of it, although i probably shouldn't. Orm is used quite a bit in sf2. My guess is orm is what's slowing you down most in dev. Look at the difference between your dev config and prod config. However, some tweaks to your dev config can make development much snappier and enjoyable.. Just try and be aware of what your doing. for example, turning off the twig controller and then modifying a lot of templates will be kind of frustrating. Your need to keep clearing your cache. But like you mentioned, its dev only and when its time to go live, symfony will speed up for you.

MySQL database backup: performance issues

Folks,
I'm trying to set up a regular backup of a rather large production database (half a gig) that has both InnoDB and MyISAM tables. I've been using mysqldump so far, but I find that it's taking increasingly longer periods of time, and the server is completely unresponsive while mysqldump is running.
I wanted to ask for your advice: how do I either
Make mysqldump backup non-blocking - assign low priority to the process or something like that, OR
Find another backup mechanism that will be better/faster/non-blocking.
I know of the existence of MySQL Enterprise Backup product (http://www.mysql.com/products/enterprise/backup.html) - it's expensive and this is not an option for this project.
I've read about setting up a second server as a "replication slave", but that's not an option for me either (this requires hardware, which costs $$).
Thank you!
UPDATE: more info on my environment: Ubuntu, latest LAMPP, Amazon EC2.

If replication to a slave isn't an option, you could leverage the filesystem, depending on the OS you're using,
Consistent backup with Linux Logical Volume Manager (LVM) snapshots.
MySQL backups using ZFS snapshots.
The joys of backing up MySQL with ZFS...
I've used ZFS snapshots on a quite large MySQL database (30GB+) as a backup method and it completes very quickly (never more than a few minutes) and doesn't block. You can then mount the snapshot somewhere else and back it up to tape, etc.

Edit: (previous answer was suggestion a slave db to back up from, then I noticed Alex ruled that out in his question.)
There's no reason your replication slave can't run on the same hardware, assuming the hardware can keep up. Grab a source tarball, ./configure --prefix=/dbslave; make; make install; and you'll have a second mysql server living completely under /dbslave.
EDIT2: Replication has a bunch of other benefits, as well. For instance, with replication running, you'll may be able to recover the binlog and replay it on top your last backup to recover the extra data after certain kinds of catastrophes.
EDIT3: You mention you're running on EC2. Another, somewhat contrived idea to keep costs down is to try setting up another instance with an EBS volume. Then use the AWS api to spin this instance up long enough for it to catch up with writes from the binary log, dump/compress/send the snapshot, and then spin it down. Not free, and labor-intensive to set up, but considerably cheaper than running the instance 24x7.

Try mk-parallel-dump utility from maatkit (http://www.maatkit.org/)
regards,

Something you might consider is using binary logs here though a method called 'log shipping'. Just before every backup, issue out a command to flush the binary logs and then you can copy all except the current binary log out via your regular file system operations.
The advantage with this method is your not locking up the database at all, since when it opens up the next binary log in sequence, it releases all the file locks on the prior logs so processing shouldn't be affected then. Tar'em, zip'em in place, do as you please, then copy it out as one file to your backup system.
An another advantage with using binary logs is you can restore up to X point in time if the logs are available. I.e. You have last year's full backup, and every log from then to now. But you want to see what the database was on Jan 1st, 2011. You can issue a restore 'until 2011-01-01' and when it stops, your at Jan 1st, 2011 as far as the database is concerned.
I've had to use this once to reverse the damage a hacker caused.
It is definately worth checking out.
Please note... binary logs are USUALLY used for replication. Nothing says you HAVE to.

Adding to what Rich Adams and timdev have already suggested, write a cron job which gets triggered on low usage period to perform the slaving task as suggested to avoid high CPU utilization.
Check mysql-parallel-dump also.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio