What the performance impact of enabling WebSphere PMI

What the performance impact of enabling WebSphere PMI - performance

I am currently looking at some JProfiler traces from our WebSphere-based application, and am noticing that a significant amount of CPU time is being spent in the class com.ibm.io.async.AsyncLibrary.getCompletionData2.
I am guessing, but I am wondering whether this is PMI-related (and we do have this enabled).
My knowledge of PMI is limited, as this is managed by another team.
Is it expected that PMI can have this sort of impact?
(If so) Is the only option to turn it off completely? Or are there some types of data capture that have a particularly high overhead?

PMI has multiple levels that can be instrumented. The basic one should have minimal impact.
This particular class that you are referring to should not be related to PMI. I am only guessing here as these classes are not exposed publicly and they are used by the WAS runtime internally.
What version of WAS are you on? There were some known issues on WAS in this space.
For e.g refer to this link below:
[1]: http://www-01.ibm.com/support/docview.wss?uid=swg1PK41617 PK41617: PERFORMANCE OF THE AIO LIBRARY SUFFERS UNDER CERTAIN TYPES OF TRAFFIC FLOW
Please check if this is applicable to your environment.
Also try and skim through this link -
[1]: http://www-01.ibm.com/support/docview.wss?uid=swg21366862 Disabling AIO (Asynchronous Input/Output) in WebSphere Application Server
You can probably give this a try if this is not a production environment and see if disabling AIO eliminates the spikes for you. Even if it works well, i would go through the formal IBM support process to find out exact reasons before doing the same for a production enviroment.
HTH
Manglu

Related

Jelastic Poor Performance and Poor Support Issue

Please explain me how containers are sharing hardware node resources between each other?
I feel my node is lack of CPU resources.
Even though I set maximum for cloudlets limit.
I have no real load on my node, but last month it very often stops responding for short periods and then restores with no actual reasons in logs.
I also feel my provider has poor experience with Jelastic administration.
Instead of looking for real reason why hardware node is overloaded, they just turn it off for a while and then turn it on.
See my screenshots. Zero CPU usage Zero CPU usage
Is it possible for hosting provider to "oversell" hardware node?
I am looking for support from Jelastic team here.

I know that you requested a reply from Jelastic in particular, but I suppose that it might help to get some insight from a hosting provider as well.
Is it possible for hosting provider to "oversell" hardware node?
The Jelastic platform itself does not have any limitations on this. The platform ensures that containers are distributed to the least loaded hardware nodes, but obviously if a hosting provider does not supply sufficient infrastructure / keep adding more, that distribution is worthless (i.e. all hardware is overloaded).
I feel my node is lack of CPU resources.
From those graphs it looks like you're hitting approx. 2GHz CPU, which for a LAMP application (right?) seems to be quite high. Are you sure that your bottleneck is CPU? If yes, how did you reach that conclusion / test that assertion?
I also feel my provider has poor experience with Jelastic administration.
Most of all, if you feel that your current Jelastic provider is not servicing your needs, did you consider moving to another one? The Jelastic ecosystem has over 30 different hosting providers. You can move your environment to another provider easily with the Export/Import feature and the ratings on the Jelastic Cloud Union site can help you to identify a good quality one in your preferred location.

How can I fine-tune cowboy's runtime behavior?

I'm in the process of choosing a technology for my high-throughput web server. I've created two naive implementations, one in Go and one in Elixir, using Phoenix.
I've deployed these versions on an extra large machine on AWS, and used siege to benchmark their performance.
I've managed to increase Go's performance after setting the GOMAXPROCS, but running the Elixir version seems to reach its peak performance long before it fully utilizes the machine's CPU or memory.
I couldn't seem to find any documentation or explanation on how I can fine-tune cowboy's behavior in production settings, so it will properly utilize the machine it runs on, and produce the performance everybody talks about...
I'm pretty sure that there is a simple place (file or environment variable) where I can tweak a value or two to produce much better results.
Can anyone tell me where that place may be?

Following the suggestions in the comments, I've re-implemented my project using plug instead of phoenix.
With the same functionality (parsing post body to JSON, calling DynamoDB, reading from an Amnesia table and formatting a JSON response) I've received a much better performance, with far more resource utilization.
I guess I can still "milk" a few more requests per second (currently I get around 500 requests per second), but it is now on-par with the Go implementation of the same thing...

I don't have enough rep to comment directly, so I'll answer here. I'd love to see the numbers you got with Phoenix. Were you running in prod mode? Perf will be much slower if you were running in dev (the default) since code reloading is enabled and checking on every request. Vanilla Plug is going to be doing less work than Phoenix, but not much less. A standard Phoenix Router/Controller should be more or less inline with the Plug code you end up with.

Performance monitoring all layers of a system

I use several loadtesting tools (Loadrunner, JMeter, NeoLoad) to performance test different applications. Im wondering if it is possible to monitor all layers of an application stack so for example. Say i have the following data chain.
Loadbalancer <-x-> Application Server <-x-> RMI <-x-> Java Application <-x-> MQ <-x-> Legacy application <-x-> Database
Where i have marked the x in the chain i am interested in monitoring, for example avg responsetimes.
Obviously we could simply create a wrapper on all endpoints which would gather the statistics for us and maybe we could import it into loadrunner or other loadtesting tools and sideline hem with the tools inbuilt performance statistics, but maybe there is tools/applications which already does this?
If not, how should we proceed, in order to gather this kind of statistics?

The standard for this was supposed to be Application Response Measurement (ARM). It was a cross language set of APIs that did just what you were looking for. The issue is that the products that implement this spec all tend to be big, expensive "enterprise" level monitoring tools. Think multi-week installs, consultants, more infrastructure and lots of buzzwords.
Still, if this is a mission critical app with a mission critical budget, this may be what you need. But you may be able to build your own that does just enough without too much effort. A quick search turns up at least one open source ARM implementation if you still want to use that API.
Another option is to simply to have transactions you can run against each tier of the system to check general responsiveness. For example you can have a static web page on the LB, a no-op tx on the app server, a "hello" servlet on the Java app, put a message directly on the queue, etc. During a performance / load test, these could be hit directly by the load testing tool or you could write a wrapper servlet / application call that does this as a single HTTP (RMI?) call. Running these a few times a minute won't add too much load to the system, but it should help you pinpoint which tier is slower. The nice thing about this approach is that it also works in production, just watch out for security issues.
For single user kind of test, where you know you have problem (e.g. this tx is "slow"), I have also had pretty good luck with network tracing. It's very tedious, but when you aren't sure what tier is slow, starting up a network trace on a few machines and running a single tx usually gives a good idea of what the system is doing.

I have handled this decomposition a number of ways in the past. The first is at a very low level using protocol analyzer dumped data to find the time points where a conversation leaves tier X and enters tier Y. The second method is through the use of log examination for the various tiers. Something that can make your examination quite usefule in this case is a common log server for all of your components (syslog, Rsyslog, etc....) and a nice log parsing tool, such as the freely available Microsoft Logparser. The third method utilization of the audit trail for an application stored in the database. You may find this when working on enterprise services bus style applications which have a consumer/producer model and a bus to pass information rather than a direct connection. The audit trails I have seen are typically stored in a database and allow the tracking of an individual transaction through the entire application infrastructure. Your Load balancer, as a network device, may be out of the hunt on this one.
Note, if you go the protocol analyzer or log route, then be sure and synchronize all of your source information devices to a common time server. Having one of your collectors (analyzer, app log) off on a time stamp basis can really be a hair pulling experience when you get into the analysis phase.
As to how you move from your collected data into LoadRunner, that part is very mechanical. The Analysis program supports an interface to import external datapoints. The format is very specific and is documented in both help and the online docs. This import process works very well, as I often have to use it for collection of statistics from hosts which I do not have direct monitoring access to, but which need to be included as a part of the monitored test infrastructure.
James Pulley
Moderator (YahooGroups LoadRunner, Advanced-Loadrunner; GoogleGroups lr-LoadRunner; Linkedin LoadRunner, LoadRunnerByTheHour; SQAForums LoadRunner, WinRunner)

WebLogic Diagnostic Framework (WLDF): Alternatives?

WLDF (WebLogic Diagnostic Framework) allows many performance-related analyses - in particular resource demands tracking and tracing across classes and methods. In that sense, it is similar to a profiler - however, it works on the server side, and is bound to the particular product/vendor.
Are there any other products (maybe even open/free) which offer similar level of detail? I'm not interested in "conventional" monitoring products such as JMX, VisualVM, Hyperic etc. but in low-level, detailed tracing and request tracking.
Many thanks,
Michael

The free version of http://appdynamics.com/ is probably your best bet. It does the low level detailed tracing without the traditional overhead of a profiler (so yes, you can run it in prod).

High traffic web sites

What makes a site good for high traffic?
Does it have more to do with the hardware/infrastructure, or with how one writes the software, using Java as the example, if it matters?
I'm wondering how the software changes just because it is expected that billions of users will be on the site, if at all.
My understanding up to this point is that the code doesn't change, but that it is deployed on multiple servers, in a cluster, and a load balancer distributes the load, so really, on any one server/deployment, the application is just as any other standard application/website.

I highly recommend reading Jeff Atwood's blog on Micro-Optimization. In previous blogs he talks somewhat about how this site was created and the hardware upgrades he has had (which quickly summarized said that better hardware performs better only the extent that it is faster/better), but the real speed of a site comes from good programming, and this article seems like it should sum up some of your site programming questions quite well.

Hardware is cheap. Programming is expensive.

There are some programming techniques to make sure your code can handle multiple simultaneous views/updates. If you're using an existing framework, much of that work is (hopefully) done for you, but otherwise you're going to find stuff that worked for a few hundred hits an hour on one server isn't going to work when you're getting hundreds of thousands of hits and you have to deploy multiple load balancing machines.

Well, it is primarily an issue of hardware scaling but there are a few things to keep in mind with respect to the software involved in scaling. For example, if you are on a server farm, you'll need to work with a session management server (either via SQL Server or via a state server - which has implications in that your session variables need to be serializable).
But, in the bigger picture, there are a variety of things that you would want to do to scale to an enterprise level. For example, it becomes particularly important that you abstract out your database calls to a DAL because you may well need to adopt the use of a middleware package for high volume environments.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio