what is the difference between "gwan_1" and "gwan -w 1"? - g-wan

In the gwan.ch/en_timeline.html stated that,
renaming gwan to gwan_1 is to start gwan with one worker.
what is the difference between gwan_1 and gwan -w 1 ?

As seen in the documentation, there is no difference in functionality.
One way (renaming the executable) allows a permanent choice while the command-line switch allows quick tests.
Historically, these features were introduced for G-WAN tests on machines with thousands of CPU Cores. Those tests led to rewrite of some portions of the G-WAN kernel to get higher performance. The result is G-WAN v3.8.

Related

How do I instrument only the actual benchmark of SPEC CPU2006 with Intel's Pin?

I've been trying to instrument SPEC CPU2006 benchmarks using Intel's Pin on Ubuntu. I have a Pintool with a simple cache simulator that counts reads and writes. When running the Pintool on a 'runspec -nonreportable' command for a specific benchmark I get the data I want. However, the results of different benchmarks hardly differ at all. My pintool doesn't seem to be the problem as it looks to be working correctly on other applications. I suspect the results are due to the Pintool is instrumenting everything including the setup of the benchmark.
What I've previously done it just running the pintool on the runspec command. I've also tried to use '--action build' and '--action setup' prior to using runspec to reduce the overhead, but it seems like runs much of the same setup anyway. I know there are monitoring hooks in SPEC CPU 2006 where I can run additional commands right before starting a benchmark, and I'm thinking there might be someway in which I can use those but I'm know sure how. Maybe the 'monitor_wrapper' hook is most appropriate? Maybe I can get a hold of the pid somehow and attach my pintool to the correct process just as the benchmark is starting? Super thankful for any help I can get!
You're probably just instrumenting runspec itself, which runs in a process that creates another process in which the benchmark is run. You have two options: either tell Pin to follow child processes (using the -follow_execv option) or directly inject Pin into the process of the benchmark when it gets created (by running the benchmark using specinvoke instead of runspec).

Performance Testing Tool That Can Produce a Graph

Is anybody know a good testing tool that can produce a graph containing the CPU cycle and RAM usage?
What I will do for ex. is I will run an application and while the application is running the testing tool will record CPU cycle and RAM Usage and it will make a graph as an output.
Basically what I'm trying to test is how much heavy load an application put on RAM and CPU.
Thanks in advance.
In case this is Windows the easiest way is probably Performance Monitor (perfmon.exe).
You can configure the counters you are interested in (Such as Processor Time/Commited Bytes/et) and create a Data Collector Set that measures these counters at the desired interval. There are even templates for basic System Performance Report or you can add counters for the particular process you are interested in.
You can schedule the time where you want to execute the sampling and you will be able to see the result using PerfMon or export to a file for further processing.
Video tutorial for the basics: http://www.youtube.com/watch?v=591kfPROYbs
Good Sample where it shows how to monitor SQL:
http://www.brentozar.com/archive/2006/12/dba-101-using-perfmon-for-sql-performance-tuning/
Loadrunner is the best I can think of ; but its very expensive too ! Depending on what you are trying to do, there might be cheaper alternatives.
Any tool which can either hook to the standard Windows or 'NIX system utilities can do this. This has been a defacto feature set on just about every commercial tool for the past 15 years (HP, IBM, Microfocus, etc). Some of the web only commercial tools (but not all) and the hosted services offer this as wekll. For the hosted services you will generally need to punch a hole through your firewall for them to get access to the hosts for monitoring purposes.
On the open source fron this is a totally mixed bag. Some have it, some don't. Some support one platform, but not others (i.e. support Windows, but not 'NIX or vice-versa).
What tools are you using? It is unfortunately common for people to have performance tools in use and not be aware of their existing toolset's monitoring capabilities.
All of the major commercial performance testing tools have this capability, as well as a fair number of the open source ones. The ability to integrate monitor data with response time data is key to the identification of bottlenecks in the system.
If you have a commercial tool and your staff is telling you that it cannot be done then what they are really telling you is that they don't know how to do this with the tool that you have.
It can be done using jmeter, once you install the agent in the target machine you just need to add the perfmon monitor to your test plan.
It will produce 2 result files, the pefmon file and the requests log.
You could also build a plot that compares the resource compsumtion to the load, and througput. The throughput stops increasing when some resource capacity is exceeded. As you can see in the image CPU time increases as the load increases.
JMeter perfmon plugin: http://jmeter-plugins.org/wiki/PerfMon/
I know this is an old thread but I was looking for the same thing today and as I did not found something that was simple to use and produced graphs I made this helper program for apachebench:
https://github.com/juanluisbaptiste/apachebench-graphs
It will run apachebench and plot the results and percentile files using gnuplot.
I hope it helps someone.

Analogs of Intel's Cluster OpenMP

Are there analogs of Intel Cluster OpenMP? This library simulates shared-memory machine (like SMP or NUMA) while running on distributed memory machine (like Ethernet-connected cluster of PC's).
This library allows to start openmp programs directly on cluster.
I search for
libraries, which allow multithreaded programms run on distributed cluster
or libraries (replacement of e.g. libgomp), which allow OpenMP programms run on distributed cluster
or compilers, capable of generating cluster code from openmp programms, besides Intel C++
The keyword you want to be searching for is "distributed shared memory"; there's a Wikipedia page on the subject. MOSIX, which became openMOSIX, which is now being developed as part of LinuxPMI, is the closest thing I'm aware of; but I don't have much experience with the current LinuxPMI project.
One thing you need to be aware of is that none of these systems work especially well, performance-wise. (Maybe a more optimistic way of saying it is that it's a tribute to the developers that these things work at all). You can't just abstract away the fact that accessing on-node memory is very very different from memory on some other node over a network. Even making local memory systems fast is difficult and requires a lot of hardware; you can't just hope that a little bit of software will hide the fact that you're now doing things over a network.
The performance ramifications are especially important when you consider that OpenMP programs you might want to run are almost always going to be written assuming that memory accesses are local and thus cheap, because, well, that's what OpenMP is for. False sharing is bad enough when you're talking about different sockets accessing a common cache line - page-based false sharing across a network is just disasterous.
Now, it could well be that you have a very simple program with very little actual shared state, and a distributed shared memory system wouldn't be so bad -- but in that case I've got to think you'd be better off in the long run just migrating the problem away from a shared-memory-based model like OpenMP towards something that'll work better in a cluster environment anyway.

Program is slower when compiled

Any suggestions on why a VB6 program would be slower when compiled than when running in the debugger? I'm compiling it with "Optimize for fast code."
Notes:
I measure performance by running the compiled version and the non-compiled version on the same machine. I based my predictions on wall-clock time, since 30 minutes vs. 100 minutes is a big enough difference to be visible.
Several months ago, I configured a debugging tool to attach itself to my program whenever it ran. I totally forgot that I had done this.
Special thanks to Process Monitor for making this very obvious.
Turning it off made the program run fast.
AppVerifier, for those who are curious.
You should select the compile to Native Code option
The compile to P-code option forces your program to run in an interpreted mode, which can be slower.
There are some optimizations in the advanced section. Try them out too.
Some more points to consider:
Are you running the compliled application in the same environment? Is it taking the same data as input?
How did you know that it is slow? What if your timing program is wrong?
How do you measure the performance?
It is hard to measure the performance by what you just said. You have to ensure the running environment must be exactly same for compare the performance?
Are you running on the same machine? Do you connect to DB? Does DB has the same work load at different run? You need isolate other factors before reaching such a decision.

How to benchmark virtual machines

I am trying to perform a fair comparison of XenServer vs ESX and one comparison I would like to make is performance with multiple VMs. Does anyone know how to go about benchmarking VM performance in a fair way?
On each server I would like to run a fixed number of XP/Vista VMs (for example 8) and have some measure of how quickly each one runs when under load. Ideally I would like some benchmark of the overall system (CPU/Memory/Disk/Network) rather than just one aspect.
It seems to me this is actually a very tricky thing to do and obtain any meaningful results so would be grateful for any suggestions!
I would also be interested to see any existing reports or comparisons that have been published (preferably independent rather than vendor biased!)
As a general answer, VMware (together with other virtualization vendors in the SPEC Virtualization sub-committee) has put together a hypervisor benchmarking suite called VMmark that is available for download. The VMmark website discusses why this benchmark may be useful for comparing hypervisors, including an FAQ and a whitepaper describing the benchmark.
That said, if you are looking for something very specific (e.g., how will it perform under your workload), you may have to roll your own variants of VMmark, especially if you are not trying to do the sorts of things that VMmark benchmarks (e.g., web servers, database servers, file servers, etc.) Nonetheless, the methodology behind its development should be of interest.
Disclaimer: I work for VMware, though not on VMmark.
I don't see why you can't use common benchmarks inside the VMs: WinSAT, Passmark, Futuremark, SiSoftware, etc... Host the VMs over different hosts and see how it goes.
As an aside, benchmarks that don't closely match your intended usage may actually hinder your evaluation. Depending on the importance of getting this right, you may have to build-your-own to make it relevant.
Why do you want to bench?
How about some anecdotal evidence?
I'm going to assume this is a test environment, because you're wanting to benchmark on XP/Vista. Please correct me if I'm wrong.
My current test environment is about 20 VMs with varying OS's (2000/XP/Vista/Vista64/Server 2008/Server 2003) in different configurations on a Dual Quad Core Xeon machine with 8Gb RAM (looking to upgrade to 16Gb soon) and the slowest machines of all are Vista primarily due to heavy disk access (this is even with Windows Defender disabled)
Recommendations
- Hardware RAID. Too painful to run Vista VMs otherwise.
- More RAM.
If you're benchmarking and looking to run Vista VMs, I would suggest putting your focus on benchmarking disk access. If there are going to be performance differences elsewhere I doubt they would be of anything significant.
I recently came across VMware's ESX Performance documentation - http://www.vmware.com/pdf/VI3.5_Performance.pdf. There is quite a bit in there on improving performance and benchmarking.

Resources