I host an Openstack Newton in High Availability passive mode.
But I have an issue with the speed. In fact my VMs are very slow.
An "apt-get update" takes 5 minutes and I have a good network bandwith.
Someone could help me to identify the problem please ?
Thanks you :)
Related
I'm using AWS cloudwatch for scaling my application. I created launch configuration, autoscaling group, upscaling and downscaling alarm and policies. The problem is it is taking 5 mins to launch an instance from an AMI. Is there a way to reduce the start-up time from 5 to 2-3 mins?
No, it isn't possible to speed up the provisioning of a new EC2 instance by an AutoScaling scale up action.
I think it's important to appreciate all that EC2 is doing in those 5 minutes. It's building a new virtual machine, installing an image of an operating system on it, hooking it up to a network and bringing it into service. That's pretty impressive for 5 minutes work if you ask me.
If you're needing to scale up that quickly, then quite frankly you're doing it wrong. Even with autoscaling, you should always be a little over provisioned for your expected load. If you start getting close to that over scaled limit, then it's time to autoscale up. Don't provision exactly what you need, it won't work well.
The start up time depends on a few things:
The availability of resources for your instance type within the
availability zone.
The size of the AMI. In the case of a custom AMI image, it may need to be copied to the correct internal storage for the VM.
Initialization procedures. For windows, some images with user-data scripts can require a reboot to join the domain.
This is an old question, and as I have seen. Start times have improved for EC2 over the past years. Some providers like Google Cloud can provide servers in under a minute. So if your workload is that demanding, you may research the available providers and their operational differences.
My company is running into a network performance problem that seemingly has all of the "experts" we're working with (VMWare support, RHEL support, our managed services hosting provider) stumped.
The issue is that network latency between our VMs (even VMs residing on the same physical host) increases--up to 100x or more!--with network throughput. For example, without any network load, latency (measured by ping) might be ~0.1ms. Start transferring a couple 100MB files, and latency grows to 1ms. Initiate a bunch (~20 or so) concurrent data transfers between two VMs, and the latency between the VMs can increase to upwards of 10ms.
This is a huge problem for us because we have application server VMs hosting processes that might issue 1 million or so queries against a database server (different VM) per hour. Adding a millisecond or two to each query therefore increases our runtime substantially--sometimes doubling or tripling our expected durations.
We've got what I would think is a pretty standard environment:
ESXi 6.0u2
4 Dell M620 blades with 2x Xeon E5-2650v2 processors and 128GB RAM
SolidFire SAN
And our base VM configuration consists of:
RHEL7, minimal install
Multiple LUNs configured for mount points at /boot, /, /var/log, /var/log/audit, /home, /tmp and swap
All partitions except /boot encrypted with LUKS (over LVM)
Our database server VMs are running Postgres 9.4.
We've already tried the following:
Change the virtual NIC from VMNETx3 to e1000 and back
Adjust RHEL ethernet stack settings
Using ESXi's "low latency" option for the VMs
Upgrading our hosts and vCenter from ESX 5.5 to 6.0u2
Creating bare-bones VMs (setup as above with LUKS, etc., but without any of our production services on them) for testing
Moving the datastore from the SSD SolidFire SAN to local (on-blade) spinning storage
None of these improved network latency. The only test that showed expected (non-deteriorating) latency is when we set up a second pair of bare-bones VMs without LUKS encryption. Unfortunately, we need fully encrypted partitions (for which we manage the keys) because we are dealing with regulated, sensitive data.
I don't see how LUKS--in and of itself--can be to blame here. Rather, I suspect that LUKS running with some combination of ESX, our hosting hardware, and/or our VM hardware configuration is to blame.
I performed a test in a much wimpier environment (MacBook Pro, i5, 8GB RAM, VMWare Fusion 6.0, Centos7 VMs configured similarly with LUKS on LVM and the same testing scripts) and was unable to reproduce the latency issue. Regardless of how much network traffic I sent between the VMs, latency remained steady at about 0.4ms. And this was on a laptop with a ton of the things going on!
Any pointers/tips/solutions will be greatly appreciated!
After much scrutiny and comparing the non-performing VMs against the performant VMs, we identified the issue as a bad selection for the advanced "Latency Sensitivity" setting.
For our poorly performing VMs, this was set to "Low". After changing the setting to "Normal" and restarting the VMs, latency dropped by ~100x and throughput (which we hadn't originally noticed was also a problem) increased by ~250x!
I have a closed network with a few nodes that are mutually consistent in time. For this I use NTP with one node as the NTP server. One of the nodes is a dumb box over which I have little control. It runs an sntp client to synchronize time to the system NTP server. I now need the box to be set to a time that is offset from the system time by an amount that I control. I am trying to find out if this can be done using only the available sntp client on the box. I will now present you my approach and would love to hear from anyone who knows if this can be done.
As far as I found out a standard NTP server cannot be made to serve a time that is offset from the server's system time. I will therefore have to write my own implementation. The conceptually simplest NTP server must be a broadcast-only server. My thought is that I will be able to set the sntp box to listen to broadcast and then just send NTP broadcast packets set to my custom time.
Are there any NTP server implementations that allow me to do this out of the box?
Can anyone tell me how hard it is to write an sNTP broadcast server - or any other NTP server?
Does anyone know of any tutorials for how to write an NTP server?
Are there any show-stoppers to the scheme I am describing above?
To try to answer the questions that will inevitably come up:
Yes, I am also thinking about a new interface on the box to set the time to a value I specify. But that is not what I am asking about, and no, it will not be much simpler.
I have inverstigated if I could just use the time that the box needs as the system time. This is not an option. I will need two different times, one for the system and one for the box.
All insight will be appreciated! Even opinions like "it should be doable."
You could use Jans to serve a fake time. I have no experience with this product but I know of it from the ntp mailing list. It will allow you to server faketime but it does none of the clock discipline like the reference implementation.
More info: http://www.vanheusden.com/time/jans/
Jans on its own is not suitable to provide fake time with offset, but it can provide real time plus a lot of test functionality like time drift, so on.
I used Jans as the source of real time in conjunction with llibfaketime on linux CentOs 6 as fake NTP server with + or - offset.
Just wget jans-0.3.tgz and run "make" from here:
https://www.vanheusden.com/time/jans/
RPM of libfaketime for CentOs 6 is here:
http://rpm.pbone.net/info_idpl_54489387_distro_centos6_com_libfaketime-0.9.7-1.1.x86_64.rpm.html
or find it for your distro.
Stop real NTP server if its running on your linux:
service ntpd stop
Run fake NTP server (for examle 15 days in the past):
LD_PRELOAD=/usr/lib64/libfaketime.so.1 FAKETIME="-15d" ./jans -P 123 -t real
Keep in mind that NTP server can be running only on port 123, otherwise you should use iptables masquerading.
I am working on a clustered marklogic environment where we have 10 Nodes. All nodes are shared E&D Nodes.
Problem that we are facing:
When a page is written in marklogic its takes some time (upto 3 secs) for all the nodes in the cluster to get updated & its during this time if I then do a read operation to fetch the previously written page, its not found.
Has anyone experienced this latency issue? and looked at eliminating it then please let me know.
Thanks
It's normal for a new document to only appear after the database transaction commits. But it is not normal for a commit to take 3-sec.
Which version of MarkLogic Server?
Which OS and version?
Can you describe the hardware configuration?
How large are these documents? All other things equal, update time should be proportional to document size.
Can you reproduce this with a standalone host? That should eliminate cluster-related network latency from the transaction, which might tell you something. Possibly your cluster network has problems, or possibly one or more of the hosts has problems.
If you can reproduce the problem with a standalone host, use system monitoring to see what that host is doing at the time. On linux I favor something like iostat -Mxz 5 and top, but other tools can also help. The problem could be disk I/O - though it would have to be really slow to result in 3-sec commits. Or it might be that your servers are low on RAM, so they are paging during the commit phase.
If you can't reproduce it with a standalone host, then I think you'll have to run similar system monitoring on all the hosts in the cluster. That's harder, but for 10 hosts it is just barely manageable.
while using AWS ec2
when i try to launch windows server 2003 instance it give me massage that please wait for 15 min.Actually password generated after 25 min
Why this is happen ?
please help.
Regards,
Tushar
There's quite a lot going on behind the scenes when you launch an EC2 instance, and host load, network load and Windows itself can all increase the amount of time it takes between clicking 'Launch' and having the password generated and ready for extraction.
15 minutes is the 'usual' amount of time it can take, but I've had VMs take 20-30 minutes to go from cold to ready-to-use. It's the nature of the environment that your VMs run in - sometimes it just takes a bit longer.