openldap read/write performance - performance

I am using OpenLDAP slapd version 2.4.33 with the slapd configuration below. I am trying to run some tests with ldapadd and ldapsearch tools to measure its write and read performance but I am getting really poor performance results.
System: sun4v sparc SUNW,SPARC-Enterprise-T2000 with 64 GB of memory.
Tried to add 1500 entried and it took 1 minute 19 seconds.
time ldapadd -x -D "cn=Manager,dc=my-company,dc=com" -w mypassword -f /tmp/perf1500.ldif > /dev/null 2>&1
ldapadd -x -D "cn=Manager,dc=my-company,dc=com" -w mypassword -f /tmp/perf200.ldif > 1.27s user 0.14s system 1% cpu 1:19.95 total
There is a similar configuration test results are give on http://wiki.zimbra.com/wiki/OpenLDAP_MDB_vs_HDB_performance#MDB_configuration . But I am not even close to that.
MDB configuration:
database mdb
directory /home/myuser/var/openldap-data
suffix "dc=my-company,dc=com"
rootdn "cn=Manager,dc=my-company,dc=com"
rootpw mypassword
index objectClass eq
index myMSISDN eq
maxsize 10737418240
envflags writemap,nometasync
The problem is system is not even being busy but the openldap performance is very poor. How a configuration may provide a better performance? Is there something that is so obvious that I am missing?
Thanks

Related

Unable to download data using Aspera

I am trying to download data from the European Nucleotide Archive (ENA) using Aspera CLI however my downloads are getting stalled. I have downloaded several files earlier using the same tool but this is happening since last one month. I usually use the following command:
ascp -QT -P33001 -k 1 -i ~/.aspera/connect/etc/asperaweb_id_dsa.openssh era-fasp#fasp.sra.ebi.ac.uk:/vol1/fastq/ERR192/009/ERR1924229/ERR1924229.fastq.gz .
From a post on Beta Science, I learnt that this might be due to not limiting the download speed and hence tried usng the -l argument but was of no help.
ascp -QT -l 300m -P33001 -k 1 -i ~/.aspera/connect/etc/asperaweb_id_dsa.openssh era-fasp#fasp.sra.ebi.ac.uk:/vol1/fastq/ERR192/009/ERR1924229/ERR1924229.fastq.gz .
Your command works.
you might be overdriving your local network ?
how much bandwidth do you have ?
here "-l 300m" sets a target rate to 300 Mbps, if you have less than 30, this can cause such problems.
try to reduce the target rate to what you actually have.
(using wired ? Wifi ?)

How to make script in bash aware that a server is still busy installing/configuring and wait for reboot?

The issue / dilemma
I am currently busy creating a script to kickstart servers (with CentOS 6.x and CentOS 7.x) remotely. So far the script is working, but hangs on one minor thing. Well actually it does not hang, but it does not give detailed information about what is happening. In other words, I am not getting the correct information back in bash about the job being finished correctly.
I have tried various things, however it's hanging with the following message (which is being repeated endlessly):
servername is still installing and configuring packages...
PING 100.125.150.175 (100.125.150.175) 56(84) bytes of data.
64 bytes from 100.125.150.175: icmp_seq=1 ttl=63 time=0.152 ms
64 bytes from 100.125.150.175: icmp_seq=2 ttl=63 time=0.157 ms
64 bytes from 100.125.150.175: icmp_seq=3 ttl=63 time=0.157 ms
64 bytes from 100.125.150.175: icmp_seq=4 ttl=63 time=0.143 ms
64 bytes from 100.125.150.175: icmp_seq=5 ttl=63 time=0.182 ms
--- 100.125.150.175 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 120025ms
rtt min/avg/max/mdev = 0.143/0.158/0.182/0.015 ms
servername is still installing and configuring packages...
PING 100.125.150.175 (100.125.150.175) 56(84) bytes of data.
64 bytes from 100.125.150.175: icmp_seq=1 ttl=63 time=0.153 ms
64 bytes from 100.125.150.175: icmp_seq=2 ttl=63 time=0.132 ms
64 bytes from 100.125.150.175: icmp_seq=3 ttl=63 time=0.142 ms
etc....
So for some reason it does not contine to the next line of code or does the next action. Since it's only feedback to me (or another user), it's not a majorissue. But it would be nice to get this functional and providing (detailed) information back about the current progress or what the script/server is actually doing at the moment. This is not the case for the above (last) piece of code unfortunately.
This is the current code snippet I have (yes, it's a mess):
while true;
do
#ping -c3 -i3 $HWNODEIP > /dev/null
#ping -c5 -i30 $HWNODEIP > /dev/null
ping -c5 -i30 $HWNODEIP
if [ $? -eq 1 ] || [ $? -eq 2 ] || [ $? -eq 68 ]
then
echo -e " "
echo -e "Kickstart part II also done. $HOSTNAME will be rebooted one more time."
sleep 5
######return 0
echo -e " "
printf "%s" "Waiting for $HOSTNAME to come back online: "
while ! ping -c 1 -n -w 30 $HWNODEIP &> /dev/null
do
printf "%c" "."
#sleep 10
done
echo -e " "
echo -e "Reboot is done and $HOSTNAME is back online. Performing final check. Please wait..."
sleep 10
echo -e " "
sudo /usr/local/collectHWdata.pl $HWNODEIP
ssh root#$HWNODEIP "while ! test -e /root/kickstart-DONE; do sleep 3; done; echo KICKSTART IS DONE\!"
echo -e " "
exit
else
echo -e " "
echo -e "$HOSTNAME is still installing and configuring packages..."
fi
done
Sidenote: I removed > /dev/null #5 for debugging (not that it helped)
I am guessing I am using things incorrectly and I am by no means a experienced scripter; I can only do minor stuff, but ofcourse I am doing my best. I have been fooling around with this since last week and still no result on this part.
What am I trying to achieve?
The server is rebooted after the selected CentOS version, creating partitions and setting up the network. This all works. The above snippet is after that reboot. Now it will install packages I selected, configure various things (like Nagios) and install/compile certain PERL modules. And a few other minor things.
This is done correctly in the background. I wanted to make the script (the above piece of code) that the server is still busy with installing things and such. Since I lack the knowledge to do that, I decided for a different approach; check if the server is online (in other words that it's still installing). As long as the server is online, it's still installing/configuring things obviously. After that is done, the server will reboot once more to perform the final 2 commands (as seen in my snippet). However (here is the problem) it never does those commands, though the kickstart is completely done.
So I am guessing I am doing something wrong and even might messed up things (or got confused by doing so). Maybe someone has an idea, solution or a completely different approach to tackle and fix this problem (or at least I hope so).
Other things I have tried so far? Well I tried a various of ping commands and I also tried nc (netcat) but also without a good result. I every single time hit a brick wall with the last 2 commands and it keeps pinging instead of showing that the kickstart was done... I think I have spend several hours (since last week) on this already without getting anywhere.
So I am hoping someone can take a look at this and tell me what I am doing wrong and maybe there is a better approach (other than pinging a server) to see if it's still busy. Maybe a (remote) check on yum, perl or a service, so that the script knows it's still busy.
Sorry for the long post, but I know when I provide as much information as possible including code examples and results, this is more "appreciated". So I am hoping I provided adequate information. If not, let me know. I will try to add as much information as I can. As always I am always willing to learn or change my approach.
Thank you already for reading my post!
As noted in the comments under the question:
The server may already be rebooted by the time ping -c5 -i30 $HWNODEIP finishes. The command sends 5 packets (-c flag), waiting 30 seconds between each packet (-i interval flag). So thats's 5*30 = 150 seconds, which is a bit more than 2 minutes. A server could reboot just fine within 2 minutes, especially if there's SSD in use. So try lowering the total time it would take this command to complete.
[ $? -eq 68 ] is probably unnecessary. $HWNODEIP is just ip address, and exit code 68 is for domain name not being resolved, which doesn't apply to IP addresses.
The if statement could be simplified to
if ! ping -c5 -i30 "$HWNODEIP"
These are minor suggestions,probably not bulletproof. As confirmed by OP in the comments, lowering interval helps. There's other small improvements that could be done (like quoting variables), but that's outside the scope of the question, so I'll leave it for now.

Issue with HDFS command taking 100% cpu

I have a hdfs server where I am currently streaming.
I also hit this server with the following type command regularly to check for certain conditions: hdfs dfs -find /user/cdh/streameddata/ -name *_processed
however, I have started to see this command taking a massive portion of my CPU when monitoring in TOP:
cdh 16919 1 99 13:03 ? 00:43:45 /opt/jdk/bin/java -Xmx1000m -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/var/log/hadoop -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/opt/hadoop -Dhadoop.id.str=cdh -Dhadoop.root.logger=ERROR,DRFA -Djava.library.path=/opt/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Xmx512m -Dhadoop.security.logger=INFO,NullAppender org.apache.hadoop.fs.FsShell -find /user/cdh/streameddata/ -name *_processed
This is causing other applications to stall, and is having a massive impact on my application on the whole.
My server is a 48 core server, I did not expect this to be an issue.
Currently, I have not set any additional heap in hadoop, so it is using the 1000MB default.
If you think your heap probably is too small, you can run:
jstat -gcutil 16919 # process ID of the hdfs dfs find command
And look at the value under GCT (Garbage Collection Time) to see how much time you're spending in garbage collection relative to your total run time.
However, if directory /user/cdh/streameddata/ has hundreds of thousands of files or millions of files, you probably are legitimately crippling your system.

Nmap - RTTVAR has grown to over 2.3 seconds, decreasing to 2.0

I have a script that I'm using to build a config for icinga2. The network is large, multiple /13's large. When I run the script I keep getting the RTTVAR has grown to over 2.3 seconds, decreasing to 2.0 error. I've tried raising my gc_thresh and breaking up the subnets. I've dived through the little info from google and can't seem to find a fix. If anyone has any ideas, I'd really appreciate it. I'm on Ubuntu 16.04
My script:
# Find devices and create IP list
i=72
while [ $i -lt 255 ]
do
echo "$(date) - Scanning xx.$i.0.0/16" >> files/scan.log
nmap -sn --host-timeout 5 xx.$i.0.0/16 -oG - | awk '/Up$/{print $2}' >> files/ip-list
let i=i+1
done
My /etc/sysctl.conf
# Force gc to clean-up quickly
net.ipv4.neigh.default.gc_interval = 3600
# Set ARP cache entry timeout
net.ipv4.neigh.default.gc_stale_time = 3600
# Setup DNS threshold for arp
net.ipv4.neigh.default.gc_thresh3 = 8192
net.ipv4.neigh.default.gc_thresh2 = 4096
net.ipv4.neigh.default.gc_thresh1 = 2048
Edit: added host-timeout 5 removed -n
I can suggest you tu use ping scan. If you want an "overall sight" of your network you can use
nmap -sP -n
It decreases the time a little bit comparing to nmap -sn , you can check it with small examples.
As I said in a comment. Use --host-timeout and --max-retries and that will improve your performance.

PostgreSQL: improving pg_dump, pg_restore performance

When I began, I used pg_dump with the default plain format. I was unenlightened.
Research revealed to me time and file size improvements with pg_dump -Fc | gzip -9 -c > dumpfile.gz. I was enlightened.
When it came time to create the database anew,
# create tablespace dbname location '/SAN/dbname';
# create database dbname tablespace dbname;
# alter database dbname set temp_tablespaces = dbname;
% gunzip dumpfile.gz # to evaluate restore time without a piped uncompression
% pg_restore -d dbname dumpfile # into a new, empty database defined above
I felt unenlightened: the restore took 12 hours to create the database that's only a fraction of what it will become:
# select pg_size_pretty(pg_database_size('dbname'));
47 GB
Because there are predictions this database will be a few terabytes, I need to look at improving performance now.
Please, enlighten me.
First check that you are getting reasonable IO performance from your disk setup. Then check that you PostgreSQL installation is appropriately tuned. In particular shared_buffers should be set correctly, maintenance_work_mem should be increased during the restore, full_page_writes should be off during the restore, wal_buffers should be increased to 16MB during the restore, checkpoint_segments should be increased to something like 16 during the restore, you shouldn't have any unreasonable logging on (like logging every statement executed), auto_vacuum should be disabled during the restore.
If you are on 8.4 also experiment with parallel restore, the --jobs option for pg_restore.
Improve pg dump&restore
PG_DUMP | always use format-directory and -j options
time pg_dump -j 8 -Fd -f /tmp/newout.dir fsdcm_external
PG_RESTORE | always use tuning for postgres.conf and format-directory and -j options
work_mem = 32MB
shared_buffers = 4GB
maintenance_work_mem = 2GB
full_page_writes = off
autovacuum = off
wal_buffers = -1
time pg_restore -j 8 --format=d -C -d postgres /tmp/newout.dir/
Two issues/ideas:
By specifying -Fc, the pg_dump output is already compressed. The compression is not maximal, so you may find some space savings by using "gzip -9", but I would wager it's not enough to warrant the extra time (and I/O) used compressing and uncompressing the -Fc version of the backup.
If you are using PostgreSQL 8.4.x you can potentially speed up the restore from a -Fc backup with the new pg_restore command-line option "-j n" where n=number of parallel connections to use for the restore. This will allow pg_restore to load more than one table's data or generate more than one index at the same time.
I assume you need backup, not a major upgrade of database.
For backup of large databases you should setup continuous archiving instead of pg_dump.
Set up WAL archiving.
Make your base backups for example every day by using
psql template1 -c "select pg_start_backup('`date +%F-%T``')"
rsync -a --delete /var/lib/pgsql/data/ /var/backups/pgsql/base/
psql template1 -c "select pg_stop_backup()"`
A restore would be as simple as restoring database and WAL logs not older than pg_start_backup time from backup location and starting Postgres. And it will be much faster.
zcat dumpfile.gz | pg_restore -d db_name
Removes the full write of the uncompressed data to disk, which is currently your bottleneck.
As you may have guessed simply by the fact that compressing the backup results in faster performance, your backup is I/O bound. This should come as no surprise as backup is pretty much always going to be I/O bound. Compressing the data trades I/O load for CPU load, and since most CPUs are idle during monster data transfers, compression comes out as a net win.
So, to speed up backup/restore times, you need faster I/O. Beyond reorganizing the database to not be one huge single instance, that's pretty much all you can do.
If you're facing issues with the speed of pg_restore check whether you dumped your data using INSERT or COPY statement.
If you use INSERT (pg_dump is called with --column-inserts parameter) the restore of data would be significantly slower.
Using INSERT is good for making dumps that are loaded into non-Postgres databases. But if you do a restore to Postgres omit using --column-inserts parameter when using pg_dump.

Resources