Linux/Windows performance comparison of MATLAB against other library using containers - performance

I need to compare the performance of MATLAB and an open-source alternative (eg numpy/scipy) on particular matrices problems on both Windows and Linux. It has been expressly asked that the comparison is executed strictly on the same hardware platform - for obvious comparison reasons.
My question is, would using containers (Windows and Linux distro images) satisfy this requirement?
I believe setting two VMs would satisfy the requirement, but using containers would be much less of a hassle and make the tests easily reproducible on any machine, but I'm not too familiar with their architecture or how they access their host's hardware.
Thanks in advance for any help.

Essentially, when talking about performance tests, you'd want your test machines to be as close as possible to the final end-use machine.
While virtualization and containerization might offer different advantages in terms of ease of setup, reusability and whatever else, they also introduce a degree of distortion in the metrics obtained in such tests due to their architecture and implementation.
I don't know the exact degree to which a performance test on a Windows container would differ from a performance test run on a Windows machine, and if such difference is trivial or not, but the first paragraph answers my question.

Related

RTOS Alongside Windows

I have a question about a family of softwares, of which one example is INtime, which lets you run a real-time operating system in parallel with Windows.
I have a reasonable grasp on how Windows works, including kernel/driver/application security rings etc. Similarly, I know how a RTOS runs on a dedicated system.
The Simple Question:
How do these go about existing together without fighting over hardware or other similar problems? How is the allocation or resources made, and how is this integrated with Windows?
Slightly more complicated:
What are the steps I would have to take if I wanted to develop something similar myself? Are there some open-source embodiment's of this paradigm I can inspect to glean some more understanding?

Analogs of Intel's Cluster OpenMP

Are there analogs of Intel Cluster OpenMP? This library simulates shared-memory machine (like SMP or NUMA) while running on distributed memory machine (like Ethernet-connected cluster of PC's).
This library allows to start openmp programs directly on cluster.
I search for
libraries, which allow multithreaded programms run on distributed cluster
or libraries (replacement of e.g. libgomp), which allow OpenMP programms run on distributed cluster
or compilers, capable of generating cluster code from openmp programms, besides Intel C++
The keyword you want to be searching for is "distributed shared memory"; there's a Wikipedia page on the subject. MOSIX, which became openMOSIX, which is now being developed as part of LinuxPMI, is the closest thing I'm aware of; but I don't have much experience with the current LinuxPMI project.
One thing you need to be aware of is that none of these systems work especially well, performance-wise. (Maybe a more optimistic way of saying it is that it's a tribute to the developers that these things work at all). You can't just abstract away the fact that accessing on-node memory is very very different from memory on some other node over a network. Even making local memory systems fast is difficult and requires a lot of hardware; you can't just hope that a little bit of software will hide the fact that you're now doing things over a network.
The performance ramifications are especially important when you consider that OpenMP programs you might want to run are almost always going to be written assuming that memory accesses are local and thus cheap, because, well, that's what OpenMP is for. False sharing is bad enough when you're talking about different sockets accessing a common cache line - page-based false sharing across a network is just disasterous.
Now, it could well be that you have a very simple program with very little actual shared state, and a distributed shared memory system wouldn't be so bad -- but in that case I've got to think you'd be better off in the long run just migrating the problem away from a shared-memory-based model like OpenMP towards something that'll work better in a cluster environment anyway.

How to benchmark virtual machines

I am trying to perform a fair comparison of XenServer vs ESX and one comparison I would like to make is performance with multiple VMs. Does anyone know how to go about benchmarking VM performance in a fair way?
On each server I would like to run a fixed number of XP/Vista VMs (for example 8) and have some measure of how quickly each one runs when under load. Ideally I would like some benchmark of the overall system (CPU/Memory/Disk/Network) rather than just one aspect.
It seems to me this is actually a very tricky thing to do and obtain any meaningful results so would be grateful for any suggestions!
I would also be interested to see any existing reports or comparisons that have been published (preferably independent rather than vendor biased!)
As a general answer, VMware (together with other virtualization vendors in the SPEC Virtualization sub-committee) has put together a hypervisor benchmarking suite called VMmark that is available for download. The VMmark website discusses why this benchmark may be useful for comparing hypervisors, including an FAQ and a whitepaper describing the benchmark.
That said, if you are looking for something very specific (e.g., how will it perform under your workload), you may have to roll your own variants of VMmark, especially if you are not trying to do the sorts of things that VMmark benchmarks (e.g., web servers, database servers, file servers, etc.) Nonetheless, the methodology behind its development should be of interest.
Disclaimer: I work for VMware, though not on VMmark.
I don't see why you can't use common benchmarks inside the VMs: WinSAT, Passmark, Futuremark, SiSoftware, etc... Host the VMs over different hosts and see how it goes.
As an aside, benchmarks that don't closely match your intended usage may actually hinder your evaluation. Depending on the importance of getting this right, you may have to build-your-own to make it relevant.
Why do you want to bench?
How about some anecdotal evidence?
I'm going to assume this is a test environment, because you're wanting to benchmark on XP/Vista. Please correct me if I'm wrong.
My current test environment is about 20 VMs with varying OS's (2000/XP/Vista/Vista64/Server 2008/Server 2003) in different configurations on a Dual Quad Core Xeon machine with 8Gb RAM (looking to upgrade to 16Gb soon) and the slowest machines of all are Vista primarily due to heavy disk access (this is even with Windows Defender disabled)
Recommendations
- Hardware RAID. Too painful to run Vista VMs otherwise.
- More RAM.
If you're benchmarking and looking to run Vista VMs, I would suggest putting your focus on benchmarking disk access. If there are going to be performance differences elsewhere I doubt they would be of anything significant.
I recently came across VMware's ESX Performance documentation - http://www.vmware.com/pdf/VI3.5_Performance.pdf. There is quite a bit in there on improving performance and benchmarking.

How can i connect two or more machines via tcp cable to form a network grid?

How can i connect two or more machines to form a network grid and how can i distribute work load to the two machines?
What operating systems do i need to run on the machines, and what application should i use to manage the load balancing?
NB: I read somewhere that google uses cheap machines to perform this fete, how do they connect two network cards( 'Teaming' ) and distribute load across the machines?
Good practical examples would serve me good, with actual code samples.
Pointers to some good site i might read this stuff will be highly appreciated.
An excellent place to start is with the Beowulf project. Basically an opensource cluster built on the Linux OS.
There are several software solutions in this expanding market. The term "cloud computing" is certainly gaining traction to describe what you want to do. Are you wanting a service, or do you want to run it in house?
I'm most familiar with Appistry EAF - Runs on commodity based hardware. Its available as a free download. Runs on windows or linux.
Another is GoGrid - I believe this is only available as a service, but I'm not as familiar with it.
There are many different approaches to parallel processing, and many types of system architectures you could use.
For commodity systems, there are clusters and grids, or you can even form a single system image from several pieces of commodity hardware. There is of course also load balancing, high availability, failover, etc.
It's pretty much impossible to answer this question without more detail. What exactly do you want to with these systems? The answer is very highly dependent on the application.
You might want to have a look at some of the stuff to do with Folding At Home and the SETI project, and some of their participants blogs, here is a pretty amazing cluster that a guy built in an IKEA cabinet:
http://helmer.sfe.se/
Might give you some ideas.
The question is too abstract.
One of the (imaginable) ways is to use MPI - a framework for parallel programming, the Wikipedia page includes examples in C++, and there are bindings for other languages.

Supplying 64 bit specific versions of your software

Would I expect to see any performance gain by building my native C++ Client and Server into 64 bit code?
What sort of applications benefit from having a 64 bit specific build?
I'd imagine anything that makes extensive use of long would benefit, or any application that needs a huge amount of memory (i.e. more than 2Gb), but I'm not sure what else.
Architectural benefits of Intel x64 vs. x86
larger address space
a richer register set
can link against external libraries or load plugins that are 64-bit
Architectural downside of x64 mode
all pointers (and thus many instructions too) take up 2x the memory, cutting the effective processor cache size in half in the worst case
cannot link against external libraries or load plugins that are 32-bit
In applications I've written, I've sometimes seen big speedups (30%) and sometimes seen big slowdowns (> 2x) when switching to 64-bit. The big speedups have happened in number crunching / video processing applications where I was register-bound.
The only big slowdown I've seen in my own code when converting to 64-bit is from a massive pointer-chasing application where one compiler made some really bad "optimizations". Another compiler generated code where the performance difference was negligible.
Benefit of porting now
Writing 64-bit-compatible code isn't that hard 99% of the time, once you know what to watch out for. Mostly, it boils down to using size_t and ptrdiff_t instead of int when referring to memory addresses (I'm assuming C/C++ code here). It can be a pain to convert a lot of code that wasn't written to be 64-bit-aware.
Even if it doesn't make sense to make a 64-bit build for your application (it probably doesn't), it's worth the time to learn what it would take to make the build so that at least all new code and future refactorings will be 64-bit-compatible.
Before working too hard on figuring out whether there is a technical case for the 64-bit build, you must verify that there is a business case. Are your customers asking for such a build? Will it give you a definitive leg up in competition with other vendors? What is the cost for creating such a build and what business costs will be incurred by adding another item to your accounting, sales and marketing processes?
While I recognize that you need to understand the potential for performance improvements before you can get a handle on competitive advantages, I'd strongly suggest that you approach the problem from the big picture perspective. If you are a small or solo business, you owe it to yourself to do the appropriate due diligence. If you work for a larger organization, your superiors will greatly appreciate the effort you put into thinking about these questions (or will consider the whole issue just geeky excess if you seem unprepared to answer them).
With all of that said, my overall technical response would be that the vast majority of user-facing apps will see no advantage from a 64-bit build. Think about it: how much of the performance problems in your current app comes from being processor-bound (or RAM-access bound)? Is there a performance problem in your current app? (If not, you probably shouldn't be asking this question.)
If it is a Client/Server app, my bet is that network latency contributes far more to the performance on the client side (especially if your queries typically return a lot of data). Assuming that this is a database app, how much of your performance profile is due to disk latency times on the server? If you think about the entire constellation of factors that affect performance, you'll get a better handle on whether your specific app would benefit from a 64-bit upgrade and, if so, whether you need to upgrade both sides or whether all of your benefit would derive just from the server-side upgrade.
Not much else, really. Though writing a 64-bit app can have some advantages to you, as the programmer, in some cases. A simplistic example is an application whose primary focus is interacting with the registry. As a 32-bit process, your app would not have access to large swaths of the registry on 64-bit systems.
Continuing #mdbritt's comment, building for 64-bit makes far more sense [currently] if it's a server build, or if you're distributing to Linux users.
It seems that far more Windows workstations are still 32-bit, and there may not be a large customer base for a new build.
On the other hand, many server installs are 64-bit now: RHEL, Windows, SLES, etc. NOT building for them would be cutting-off a lot of potential usage, in my opinion.
Desktop Linux users are also likely to be running the 64-bit versions of their favorite distro (most likely Ubuntu, SuSE, or Fedora).
The main obvious benefit of building for 64-bit, however, is that you get around the 3GB barrier for memory usage.
According to this web page you benefit most from the extra general-purpose registers with 64 bit CPU if you use a lot of and/or deep loops.
You can expect gain thanks to the additional registers and the new passing parameters convention (which are really linked to the additional registers).

Resources