We have a bunch of Dell servers being monitored (particulary the RAID) using SNMP. To monitor the RAID array we use this OIDs:
Virtual Disk State : 1.3.6.1.4.1.674.10893.1.20.140.1.1.4
Array Disk State : 1.3.6.1.4.1.674.10893.1.20.130.4.1.4
With those 2 OID we can get an alarm that tell us if the virtual disk is having issues or the physical disks.
This OIDs works for all Dell Servers.
Now we've got a Proliant DL380 G7 and we need to do the same monitoring for this device.
I've found online the MIBs that apparently works (https://h20566.www2.hpe.com/hpsc/swd/public/detail?swItemId=MTX_ad470f02a64f416eb686f2525e) but there are too many OIDs.
Do you guys know if the RAID array can be monitored using SNMP on this Proliant?.
Thanks so much.
Regards.
Francisco.
Related
We have three physical servers. Each server has 2 CPUs (32 cores), 96 TB HDD, and 768 GB RAM. We would like to use these servers in an Elasticsearch cluster.
Each server will be located in a different data center, connecting each server using a private connection.
How can be optimize our configuration for high performance? Also, how should we best run Elasticsearch on these machines. For example, should we use virtualization to create multiple nodes per machine, or not?
As you have huge RAM(768) available on each physical server and according to ES documentation on heap setting it shouldn't cross 32 GB, so you will have to use virtualization to create multiple nodes per physical server for better ultization of your infra.
Apart from these there are various cluster settings and node settings which you can optimize but as you have not provided them, its difficult to provide recommendation on them.
Another thing to note is that you have huge RAM and disk but CPU is not in proportion to it, so if you can increase them as well, it would be good.
I am interested in understanding the way in which the hardware resources (CPU, disk, network, etc.) of an AWS physical server is shared between different applications. Do people have experiences about inexplicable performance changes in services running on AWS that you have successfully attributed to another application sharing the physical resources? If so, how did you go about debugging this?
In particular, I am interested in more complicated interactions between the resources, such as CPU->Memory bandwidth. If you run 15 VMs on a single machine, you will surely have worse performance than if you ran 2 VMs.
Perhaps this is a more general question about Xen virtualization, but I don't know if there is some kind of AWS magic happening under the hood that I don't know about.
I am not sure if this is the right forum for this kind of question; if not, it would be helpful if you could point me towards a resource or another forum.
Amazon EC2 instances are not susceptible to "noisy neighbour" problems.
Based upon the Instance Type selected, the EC2 instance receives CPU, Memory and (for some instance types) locally attached disk storage. These resources are dedicated to the instance and will not be impacted by other users nor other virtual machines. (An exception to this is the t1 and t2 instance types.)
Specifically:
The instance is allocated a number of vCPUs. These are provided to the instance and no other instance can use these vCPUs (see note about t1 and t2 below). The EC2 Instance Type page defines a vCPU as:
Each vCPU is a hyperthread of an Intel Xeon core for M4, M3, C4, C3, R3, HS1, G2, I2, and D2.
The instance is allocated an amount of RAM. No other instance can use this RAM. There is no oversubscription of CPU nor RAM.
The instance might be allocated locally-attached disk storage, known as Instance Store or Ephemeral Storage. This disk storage does not persist when the instance is Stopped or Terminated, so only store temporary data or data that is replicated elsewhere.
The instance is allocated network bandwidth that is dedicated to that instance. No other instance can impact this network bandwidth. The network performance is based upon the selected instance type. Basically, larger instances receive more network performance.
None of the above factors are impacted by other instances (virtual machines) running on the same host.
t1 and t2 instance types
An exception to the above statement are:
t1.micro instances "provide a small amount of consistent CPU resources and allow you to increase CPU capacity in short bursts when additional cycles are available".
t2 instances provide burst capacity based upon a system of CPU Credits. CPU Credits are earned at a constant rate depending upon instance type, and these credits can be used to burst the CPU when necessary.
For both these instance types, I would assume that this burst capacity is shared between instances, so it is possible that CPU burst might be impacted by other instances also wishing to burst. The t2 instances, however, would make this 'fair' by only consuming CPU credits when the CPU did actually burst.
Dedicated Instances and Dedicated Hosts
Dedicated instances are "Amazon EC2 instances that run in a virtual private cloud (VPC) on hardware that's dedicated to a single customer." Basically, your AWS account will be the only account running instances on that host computer.
A Dedicated Host is a "physical server with EC2 instance capacity fully dedicated to your use. Dedicated Hosts allow you to use your existing per-socket, per-core, or per-VM software licenses, including Windows Server, Microsoft SQL Server, SUSE, Linux Enterprise Server, and so on." Basically, you pay for the entire host computer and then launch individually instances on the host (at no additional charge).
The use of a Dedicated Instance or a Dedicated Host has no impact on resources allocated to each instance. They would receive the same resources as when running as a normal Shared Instance.
My company is running into a network performance problem that seemingly has all of the "experts" we're working with (VMWare support, RHEL support, our managed services hosting provider) stumped.
The issue is that network latency between our VMs (even VMs residing on the same physical host) increases--up to 100x or more!--with network throughput. For example, without any network load, latency (measured by ping) might be ~0.1ms. Start transferring a couple 100MB files, and latency grows to 1ms. Initiate a bunch (~20 or so) concurrent data transfers between two VMs, and the latency between the VMs can increase to upwards of 10ms.
This is a huge problem for us because we have application server VMs hosting processes that might issue 1 million or so queries against a database server (different VM) per hour. Adding a millisecond or two to each query therefore increases our runtime substantially--sometimes doubling or tripling our expected durations.
We've got what I would think is a pretty standard environment:
ESXi 6.0u2
4 Dell M620 blades with 2x Xeon E5-2650v2 processors and 128GB RAM
SolidFire SAN
And our base VM configuration consists of:
RHEL7, minimal install
Multiple LUNs configured for mount points at /boot, /, /var/log, /var/log/audit, /home, /tmp and swap
All partitions except /boot encrypted with LUKS (over LVM)
Our database server VMs are running Postgres 9.4.
We've already tried the following:
Change the virtual NIC from VMNETx3 to e1000 and back
Adjust RHEL ethernet stack settings
Using ESXi's "low latency" option for the VMs
Upgrading our hosts and vCenter from ESX 5.5 to 6.0u2
Creating bare-bones VMs (setup as above with LUKS, etc., but without any of our production services on them) for testing
Moving the datastore from the SSD SolidFire SAN to local (on-blade) spinning storage
None of these improved network latency. The only test that showed expected (non-deteriorating) latency is when we set up a second pair of bare-bones VMs without LUKS encryption. Unfortunately, we need fully encrypted partitions (for which we manage the keys) because we are dealing with regulated, sensitive data.
I don't see how LUKS--in and of itself--can be to blame here. Rather, I suspect that LUKS running with some combination of ESX, our hosting hardware, and/or our VM hardware configuration is to blame.
I performed a test in a much wimpier environment (MacBook Pro, i5, 8GB RAM, VMWare Fusion 6.0, Centos7 VMs configured similarly with LUKS on LVM and the same testing scripts) and was unable to reproduce the latency issue. Regardless of how much network traffic I sent between the VMs, latency remained steady at about 0.4ms. And this was on a laptop with a ton of the things going on!
Any pointers/tips/solutions will be greatly appreciated!
After much scrutiny and comparing the non-performing VMs against the performant VMs, we identified the issue as a bad selection for the advanced "Latency Sensitivity" setting.
For our poorly performing VMs, this was set to "Low". After changing the setting to "Normal" and restarting the VMs, latency dropped by ~100x and throughput (which we hadn't originally noticed was also a problem) increased by ~250x!
To show the cross channel attack we need to share the cache among the virtual machines created on the same physical host.
Using openstack i am able to create VMs on the same physical host using KVM hypervisor but unable to share the cache.
If I use xen hypervisor instead of KVM, will it help?
How do I make sure cache is shared between the VMS?
Need some guidelines regarding the same.
It is ok if you use either KVM or xen depending on your system design and hardware required .
To add on Cache will be shared once you spawn the VMs. Please check your cpu specs for L1 ,L2 and L3 cache . Also check VM allocation as to how much RAM is it consuming while creating a VM instance !!!! The question you posted is very wide !!!
http://aws.amazon.com/ec2/#pricing
I can't understand this. What is an instance? ("On-Demand Instances let you pay for compute capacity by the hour with no long-term commitments.")
Does this mean that I can use whole as my VMware server:
(Extra Large Instance)
15 GB memory
8 EC2 Compute Units (4 virtual cores with 2 EC2 Compute Units each)
1,690 GB instance storage
64-bit platform
I/O Performance: High
API name: m1.xlarge
For $0.96 per hour?
Or does it mean only like one operation or something? What is that instance exactly?
An instance signifies an operating system instance (a virtual machine). By using virtualization, Amazon (and cloud providers in general) offer you a virtualized environment where OS instances are running. You have full control over that operating system inside that environment. Per hour means that you pay that much for using your OS instance resources for a single hour. I believe that page has almost all the details about pricing.
An instance is a virtual machine. For example you can start up an ubuntu instance and then you can SSH into it and do whatever you want.