How much traffic between Foglight and monitored SQL Server? - performance

I have been trying to find the traffic volume that would result from having a SQL Server monitored by Foglight. Foglight servers and my SQL Servers are in separate data centers and the Foglight team is concerned that the latency will be too much for their servers to handle. Latency within the data center is <1ms and is between 30 and 60ms with the other data center.
I would very much appreciate any information on the volume of traffic between the Foglight server and monitored SQL Servers. Also, if you have any information on the "cache buffers" that, according to the Foglight team, would overflow and crash the server if latency is too high.
Thank you!

Related

Whats the maximum throughput per instance can be achieved in IBM MQ Advanced for Developers?

I am currently using a IBM MQ Advanced for Developers server for testing our client and was able to achieve around 1000 messages per second using the sample consumer written in jms, which seems to be pretty slow. Is this a limit for dev server, and if yes that what throughput can be achieved using a licensed production IBM MQ server.
There is no artificial limit associated with IBM MQ Advanced for Developers. It is the same as the licensed production version of IBM MQ.
You don't say what type of machine you were using, what persistence your messages were, what size they were, or any other qualifying criteria.
You say client, but I don't know whether you mean "network attached application" or "driving application". Clearly if your program is running "client-attached" (MQ parlance for network attached), then the network performance will also come into this.
On my Windows laptop, I get 4500 non-persistent msgs/sec, or 2000 persistent msgs/sec using a simple C-language locally bound program. Over client connection (just using localhost, not actually going out over a real network connection) I get 2700 non-persistent msgs/sec, or 1500 persistent msgs/sec.
You should read the MQ Performance Reports for details of the expected rates you can get.
As an ex MQ performance person I would say - it depends.
At one level you can ask - what can one application in isolation process.
For persistent messages this will come down to the rate at which you can write to the log files.
If you have 10 applications in parallel each putting and getting from their own queue, then you will not get 10 times the throughput - you might get 8 or 9 times the throughput.
If they are all processing the same queue, then the throughput may drop a bit more as the queue usage is serialised.
If only one application is writing to the log, the application may see 1 millisecond response time. If you have 10 applications running concurrently, they may see a 3 milliseconds response time - so individual throughput goes down, but with more threads, the overall throughput goes up.
If you have requests coming in over the network, you need to add network time, but you can run more clients and so get improved throughput.
If your application has a delay built in - it may only process a low message rate. You can have lots (1000s) of these and get a high >overall< throughput.
If your application is putting and getting as fast as possible, you may find that you can run 10-100 instances before the throughput plateaus.
Let's say you want to run you box so it is using 75% of the CPU, and the logging is 50% busy.
If you have just MQ on the box, then this can run more messages than if you had DB2 on the box (with DB2 using 50% of the CPU)
If you have an application (DB2) hammering the disk, then the MQ throughput will go down.
If you have lots of applications putting to a server queue - and one server program, you will find the throughput is limited by the rate at which the server can process work. If it is doing DB2 work, it will be slower than no DB2 work. If you find the server queue depth is over 5 then you need more server instances.
As Morag said, see the performance reports, but they are not the clearest reports to understand.

Golden Gate replication is extremely delayed

We are using Golden Gate in production to replicate from Oracle Database into the Postgres. Together with that, the Golden Gate replicates also into another instance of Oracle Database.
Replicated Oracle Database is placed in the internal network of our company.
Target Oracle database is placed also in the internal network of our company.
Postgres is placed in AWS Amazon Cloud.
Replication Oracle->Oracle is without problem, there is no delay.
Replication Oracle->Postgres can have an inedibly large delay - sometimes in can grow up to 1 day delay. Also, there is no error reported.
We have been investigating the problem and we cannot find the cause: the network bandwidth is large enough for our transferred data, there is enough RAM memory and CPU is used only by 20%.
The only difference seems to be in the Ping in between internal network and AWS Amazon Cloud. In internal network the ping is approx 2ms and and into the amazon the ping is almost 20ms.
What can be the cause and how to resolve it?
You really should contact Oracle Support on this topic; however, Oracle GoldenGate 12.2 supports Postgres as a target (only).
As for your latency within your replication process. It sounds like Oracle-to-Oracle is working fine and that is within your internal network. The problem only appears when going Oracle-to-Postgres (AWS Cloud).
Do you have your lag monitoring configured? LAGINFO (https://docs.oracle.com/goldengate/c1221/gg-winux/GWURF/laginfo.htm#GWURF532) should be configured within your MGR processes. This will provide some baseline lag information for determining how to proceed forward.
Are you compressing the trail files?
How much data are you sending? DML stats?
This should get you started on the right path.

Redis cluster-network latency

There is a new Redis cluster setup, one team I know in my company is working on, in order to improve the application data caching based out on Redis. The setup is as follows, a Redis cluster with a Redis master and many slaves, say 40-50 (but can grow more when the application is scaled), one Redis instance per one virtual machine. I was told this setup helps the applications deployed in servers on every virtual machines query the data present in the local Redis instance than querying an instance in the network in order to avoid network latency. Periodically, the Redis master is updated only with whatever data are modified or newly created or deleted (data backed by a relational database), say every 5 seconds or so. This will initiate the data sync operation with all the Redis slave instances. The data-consumers (the application deployed on all the virtual machines) of the Redis (slaves) reads updated values to do processing. Is this approach a correct one to the network latency problem faced by the applications in querying from a Redis instance that is within a data center network? Will this setup not create lots of network traffic when Redis master syncing the data with all its slave nodes?
I couldn't find much answers on this from the internet. Your opinions on this are much appreciated.
The relevance of this kind of architecture depends a lot about the workload. Here are the important criteria:
the ratio between the write and read operations. Obviously, the more read operations, the more relevant the architecture. The main benefit IMO, is not necessarily the latency gains, but the scalability, the extra reliability it brings, and the network resource consumption.
the ratio between the cost of a local Redis access against the cost of a remote Redis access. Do not assume that the only cost of a remote Redis access is the network latency. It is not. On my systems, a local Redis access costs about 50 us (in average, very low workload), while a remote access costs 120 us (in average, very low workload). The network latency is about 60 us. Measure the same kind of figures on your own system/network, with your own data.
Here are a few advices:
do not use a single Redis master against many slave instances. It will limit the scalability of the system. If you want to scale, you need to build a hierarchy of slaves. For instance, have the master replicates to 8 slaves. Each slave replicates to 8 other slaves locally running on your 64 application servers. If you need to add more nodes, you can tune the replication factor at the master or slave level, or add one more layer in this tree for extreme scalability. It brings you flexibility.
consider using unix socket between the application and the local slaves, rather than TCP sockets. If it good for both latency and throughput.
Regarding your last questions, you really need to evaluate the average local and remote latencies to decide whether this is worth it. Note that the protocol used by Redis to synchronize master and slaves is close to the normal client server traffic. Every SET commands applied on the master, will be also applied on the slave. The network bandwidth consumption is therefore similar. So in the end, it is really a matter of how many reads and how many writes you expect.

java.net.SocketException: Connection reset at Jmeter

I'm doing a load test on a web application, and with minimum of 14-15 users am getting this connection reset issue and I ensure the following from my end:
Request retries has been set to 1 in user.properties files
stale check is set to true
Test data and lan connectivity is good.
number of users are less hence it wont need more RAM for jmeter
Hence could this be concluded as an issue in application design and not an issue from Jmeter?
To avoid long trail of comments, I'll try to summarize it and answer.
This issue looks from application deployment system.
JMeter ---------------> ( Web server <-> App server <-> DB )
Find out in which area bottleneck is present using profilers.
Issue could be in anyone of below layers,
Web Server :
If Web server is bottleneck then try to tune the web server for handling more load. Like more threadpool size, more timeouts, buffers, queues
Application Server :
If app server is bottleneck then tune your application server. Again check configurations, any specific settings for handling more load and if required code improvement should be done.
Database Server :
If DB is bottleneck then check queries, indexes, statistics and optimize them for your needs. config settings also help sometimes.
For all layers check server resource utilization. If it is not much then there is room for perf. improvement else server vertical/horizontal scaling is required.
You are saying problem is because some ids were not generated in DB. so you can start with DB layer for possible bottlenecks.
Hope this helps :)

What can interfere with testing a server's performance?

My HTTP server can't take load tests... It gives really high latency when multiple connections are made.
Server Configuration:
5 instances of (CPU 0.5vCore, Memory 512MB, Disk 20GB)
A load balancer
10G shared bandwidth
When I transfer a 3.5mb zip, it takes about 1second when there is only one connection. However, when over 30 connections are made, it goes up to 20~50 seconds.
I am testing with JMeter on my laptop. Is there a possibility that my testing environment interferes with the load-testing?
If so, what would be a solution to improve my testing environment?
First of all you need to monitor and pin down the problem(s).
Start off by picking up information on these four layers:
CPU Usage
Memory Usage
Network Usage
I/O Usage
All of them on the OS layer. (Monitoring tools will vary depending on your OS).
Once you have this data and you can narrow the problem path (CPU bound, network latency, I/O latency or whatever) an answer will kick in. Also doing this (if it is the first time you are trying to test your app) will help you get scaling information on your environment and your application in general.

Resources