Error in comparing two Intel VTune Amplfier analysis? - parallel-processing

I'm following this video tutorial (from Linux) about VTune Amplifier and I've followed everything, but when he compares the two basic analysis there is this error:
How can I solve this?

All analyses, which are compared must not be opened currently by the Intel VTune Amplifier.
The two analysis r001hs is already opened as the tab 'r001hs' indicates. Close it and retry the comparison. Then it will work.

Related

CUDAGRIND - MEMORY TRANSACTION CHECKING FOR CUDA in windows [duplicate]

Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
I was looking into Valgrind to help improve my C coding/debugging when I discovered it is only for Linux - I have no other need or interest in moving my OS to Linux so I was wondering if there is a equally good program for Windows.
As jakobengblom2 pointed out, valgrind has a suit of tools. Depending which one you are talking about there are different windows counter parts. I will only mention OSS or free tools here.
1. MemCheck:
Dr. Memory. It is a relatively new tool, works very well on Windows 7. My favorite feature is that it groups the same leaks' allocation stacks in the report.
http://code.google.com/p/drmemory/
I have also used UMDH( http://support.microsoft.com/kb/268343 ) and found it quiet useful and easy to setup. It works from Win2000 to Win7.
AppVerifier is a must have swissknife for windows native code developers, its "memory" checker does similar job
http://msdn.microsoft.com/en-us/library/dd371695%28v=vs.85%29.aspx
2. Callgrind:
My favorite is verysleepy ( http://www.codersnotes.com/sleepy ) It is tiny but very useful and easy to use.
If you need more features, AMD CodeAnalystâ„¢ Performance Analyzer is free:
http://developer.amd.com/documentation/videos/pages/introductiontoamdcodeanalystperformanceanalyzer.aspx
Windows Performance Analysis tools is free from Microsoft, not very easy to use but can get the job done if you are willing to spend the time. http://blogs.microsoft.co.il/blogs/sasha/archive/2008/03/15/xperf-windows-performance-toolkit.aspx
Download:
http://msdn.microsoft.com/en-us/performance/cc752957
3. Massif:
Similar(not quite exact match) free tools on windows are:
VMMap from sysinternals : http://technet.microsoft.com/en-us/sysinternals/dd535533
!heap command in windbg : http://hacksoflife.blogspot.com/2009/06/heap-debugging-memoryresource-leak-with.html
4. Cachegrind:
Above mentioned Windows Performance Tools has certain level of L2 cache miss profiling capability but not quite as good and easy to use as Cachegrind.
5. DRD:
Haven't found anything free and as powerful on Windows yet, the only free tool for windows I can find that is slightly close is the "lock" checker in
AppVerifier:
http://msdn.microsoft.com/en-us/library/dd371695%28v=vs.85%29.aspx
Why not use Valgrind + Wine to debug your Windows app? See
http://wiki.winehq.org/Wine_and_Valgrind
(Chromium uses this to check the Windows version for memory errors; see
build.chromium.org
and look at the experimental or memory waterfalls, and search for wine.)
There's also Dr. Memory, see
dynamorio.org/drmemory.html
Some more good commercial tools:
Purify
Insure++
For Visual C++, try Visual Leak Detector. When I used it, it detected a memory leak from a new call and returned the actual line in source code of the leak. The latest release can be found at http://vld.codeplex.com/.
Development environment for Windows you are using may contain its own tools. Visual Studio, for example, lets you detect and isolate memory leaks in your programs
i would like to list some tool , hope will be useful
read this article for more detail
Purify
Bounds Checker
Coverity (basically its a code analyzer but, it will catch memory leak in static )
Glow Code
dmalloc
ccmalloc
NJAMD
YAMD
Valgrind
mpatrol
Insure++
Try DUMA
There is Pageheap.exe part of the debugging tools for Windows. It's free and is basically a custom memory allocator/deallocator.
See http://support.microsoft.com/kb/286470
In combination with Visual Studio I generally use Visual Leak Detector or simply _CrtDumpMemoryLeaks() which is a win32 api call. Both are nothing fancy but they get the job done.
I had the chance to use Compuware DevPartner Studio in the past and that was really good, but it's quite expensive.
A cheaper solution could be GlowCode, i just worked with a 5.x version and, despite some problems in attaching to a process i needed to debug, it worked quite well.
I've been loving Memory Validator, from a company called Software Verification.
Viusual Studio can help detecting memory leaks itself. See Microsoft Visual C++ Tips and Tricks -> "Memory Leaks" section.
See also this post in SO
Although real tracing is only possible with the Team Edtion of Visual Studio.
See the "Source Test Tools" link on the Software QA Testing and Test Tool Resources page for a list of similar tools.
I've used BoundsChecker,DevPartner Studio and Intel V-Tune in the past for profiling. I liked V-Tune the best; you could emulate various Intel chipsets and it would give you hints on how to optimize for that platform.
Does Jochen Kalmbach's Memory Leak Detector qualify?
PS: The URL to the latest version is buried somewhere in the article's comment thread.
LeakDiag, UMDH, App Verifier, DebugDiag, are all useful tools to improve robustness of code and find memory leaks.
The Boost Test library can detect memory leaks.
How about the Purify?
Try Intel's Inspector XE product which can help you detect both memory and threading issues: http://software.intel.com/en-us/articles/intel-inspector-xe/
Perhaps CodeSnitch would be something you're after? http://www.entrek.com/codesnitch.html
If you are developing with Borland/CodeGear/Embarcadero C++ Builder, you could use CodeGuard.
More or less all Profilers include checking for memory leaks and show you the stack when the memory was allocated.
I can recommend Intels Parallel Inspector. Simple to use and no recompilation needed. The trial version runs for 30 days.
GlowCode and AtromatedQA also include such capabilites. They all offer free trials.
Compuware DevPartner (aka BoundsChecker) in Contrast needs a slowed down "instrumentation" recompile and the application also runs slower when checking for errors. And BoundsChecker can not work with 64 Bit evsrions at all. We gave up on that tool.
The best tool I ever used is DevPartner BoundsChecker - it's not free but it has an evaluation period.
Another memory tool for your list: Memory Validator.
Not free, but nowhere near as expensive as Purify or Boundschecker.
If you're not afraid of mingw, here are some links (some might work with MSVC)...
http://betterlogic.com/roger/?p=1140
We are just completing a Memory Safety checking tool for Windows, that handles GCC and Micrsoft Visual C (not C++ yet), and are looking for Beta testers.
EDIT June 12, 2011: Not Beta anymore, now production for GCC and Microsoft Visual Studio C.
I found this SF project today:
http://sourceforge.net/p/valgrind4win/wiki/Home/
They are porting valgrind to Windows. Probably in several years we will have a reliable valgrind on windows.
Check out this question: Is there a good Valgrind substitute for Windows? . Though general substitute for valgrind is asked, it mainly discusses memory leak detectors and not race conditions detections.
I used Insure++ which does excellent job in finding c++ memory leaks/corruptions and many other bugs like uninitialized variables, pointer errors, strings etc., It also does visual "Code coverage" and run time memory usage etc.. which give more confident on your code.. You can try it for trail version..
You might want to read what Mozilla is doing regarding memory leaks. One tool in their toolbox is the Hans Boehm garbage collector used as memory leak detector.
You can give a try to RuntimeChecker trial ot to IBM Purify trial..
A free solution would be to use the following code in Visual Studio:
#ifdef _DEBUG
#define new DEBUG_NEW
#endif
Just write this in the top of all your cpp files.
This will detect memory leaks of your application whenc stopping debug run and list them in the output window. Double clicking on a memory leaks line will higlight you the line where memory is allocated and never released. This may help you : http://www.flipcode.com/archives/How_To_Find_Memory_Leaks.shtml

Which is better way of profiling using VTUNE: standalone or integrated with MSVC

I am getting certain errors while running VTUNE stand alone, but every thing works fine if I run it from MSVC IDE.
Will there be any reporting inaccuracy if I run VTUNE from inside the MSVC?
Which VTune version do you use? In general MSVS-integrated version of VTune provides exactly the same functionality as "Standalone". And of course there is no difference in accuracy at all.
MSVS vs. "Standalone" choice is a matter of your application and working style. In case you use MSVS for development purposes (and thus you have solution and sources integrated), using MSVS integrated version should be more convinient in terms of "more automation" for source viewpoints. At the same time some people prefer to use profiler in standalone manner, because their solutions are already overloading IDE process, however it's a rare case even for big legacy industry codes.
Side note: across other Parallel Studio tools (VTune, Inspector and Advisor XE, as well as Composer=Compiler+Libraries) - you may really find features enabled in different ways in MSVS and Linux/Standalone. Examples: 1. Inspector debugger integration with cl vs. gcc/gdb 2. Advisor Annotation Wizard vs. Assistance window. However VTune doesn't have even those small variations between hosts.

OpenCL distribution

I'm currently developing an OpenCL-application for a very heterogeneous set of computers (using JavaCL to be specific). In order to maximize performance I want to use a GPU if it's available otherwise I want to fall back to the CPU and use SIMD-instructions. My plan is to implement the OpenCL-code using vector-types because my understanding is that this allows CPUs to vectorize the instructions and use SIMD-instructions.
My question however is regarding which OpenCL-implementation to use. E.g. if the computer has a Nvidia GPU I assume it's best to use Nvidia's library but if no GPU is available I want to use Intel's library to use the SIMD-instructions.
How do I achieve this? Is this handled automatically or do I have to include all libraries and implement some logic to pick the right one? It feels like this is a problem that more people than I are facing.
Update
After testing the different OpenCL-drivers this is my experience so far:
Intel: crashed the JVM when JavaCL tried to call it. After a restart it didn't crash the JVM but it also didn't return any usable
devices (I was using an Intel I7-CPU). When I compiled the
OpenCL-code offline it seemed to be able to do some
auto-vectorization so Intel's compiler seems quite nice.
Nvidia: Refused to install their WHQL-drivers because it claimed I didn't have Nvidia-card (that computer has a Geforce GT 330M). When
I tried it on a different computer I managed to get all the way to
create a kernel but at the first execution it crashed the drivers
(the screen flickered for a while and Windows 7 said it had to
restart the drivers). The second execution caused a bluee-screen of
death.
AMD/ATI: Refused to install 32-bit SDK (I tried that since I will be using a 32-bit JVM) but 64-bit SDK worked well. This is the only
driver which I've managed to execute the code on (after a restart
because at first it gave a cryptic error-message when compiling).
However it doesn't seem to be able to do any implicit vectorization
and since I don't have any ATI GPU I didn't get any performance
increase compared to the Java-implementation. If I use vector-types I
might see some improvements though.
TL;DR None of the drivers seem ready for commercial use. I'm probably better of creating JNI-module with C-code compiled to use SSE-instructions.
First try to understand hosts & devices: http://www.streamcomputing.eu/blog/2011-07-14/basic-concept-hosts-and-devices/
Basically you can just do exactly what you described: check if a certain driver is available and if not, try the next one. What you choose first depends completely on your own preference. I would pick the device I have tested my kernel best on. In JavaCL you can pick the fastest device with JavaCL.createBestContext and CLPlatform.getBestDevice, check the host-code here: http://ochafik.com/blog/?p=501
Know NVidia does not support CPUs via their driver; only AMD and Intel do. Also is targeting multiple devices (say 2 GPUs and a CPU) a bit more difficult.
There is no API providing what you want. however, you can do the following:
i suggest you iterate over clGetPlatformIDs and query for the number of devices (clGetDeviceIDs), and device type for each device;
and pick the platform which has both types.
then build a map in u'r code, that maps for each type the list of platforms supporting it, ordered in some manner.
finally, just get the first item in the list corresponding for CL_DEVICE_TYPE_CPU and the first item corresponding for CL_DEVICE_TYPE_GPU.
if both returned results are equal (platform_cpu == platform_gpu) then pick one of them and use it for both.
if there is a platform supporting both, you will get match as before since you got order lists. then u can also do load balancing if u like on a single platform, like what Intel has.
Sorry for being late to the party, but regarding Intel's implementation behaviour under JavaCL, I'm afraid you've been bitten by a JavaCL bug :
https://github.com/ochafik/nativelibs4java/issues/297
Fixed in JavaCL 1.0.0-RC2 !
Cheers

Debugging Assembly Code (Intel 8086)

I'm in an Assembly class focusing on the intel 8086 architecture (all compiling / linking / execution comes from running DOS on win7 via DOS-Box).
I've finished programming the latest assignment, but as I have yet to program any program successfully the first time through, I am now stuck trying to debug my code.
I have visual studio 2010 and was wondering if there was some built in feature that would help me debug my assembly code, specifically, I'm looking to track the value of a variable.
Failing that, instructions pointing to a DOS-Box debugger (and instructions!) would be much appreciated. (I think I've been able to run codeview debug, but I couldn't figure out how to do what I was looking for).
You are generating 16-bit code, you have to break into a museum to find better tooling. Try Borland's, maybe the debugger included with Turbo C.
Yes, indeed, you can use the debugger in VS to examine pretty much everything. Irvine's site has a section specifically on using the debugger here. You can examine registers, use the watch window, etc. He also has a guide for highlighting asm keywords if you need that.
Edit: as Hans pointed out, if you are using 16-bit instead of 32-bit protected, you'll need different tools. There are several choices, listed here.
Borland's tools for DOS were called tasm, tlink, and tdebug.

How can i access the Intel CPU Counter

Is there any small tool that gives me access to the data gathered by the Intel CPU Counters (like L1/L2 cache misses, branch prediction failures ... you know there are hunderts of them on modern Core2 CPU's).
It must work on Windows (while being able to use it with Solaris, FreeBSD, Linux, MacOSX would of course be nice).
Check out the Intel PCM (Performance Counter Monitor) tool which does exactly what you want to do.
Link: https://software.intel.com/en-us/articles/intel-performance-counter-monitor-a-better-way-to-measure-cpu-utilization
Intel PCM provides a rich API that allows you to instrument your code. Furthermore, to date, PCM is the only tool to read uncore events too.
This thread seems a little old but if you're still interested, I wrote a howto recently on this topic using nothing more than rdmsr and wrmsr in Linux. It only deals with the performance counters on an Intel uncore for Westmere, but the process I described might help you figure out what you need if you haven't already. I'm sure Windows has some equivalent program or function call to RDMSR and WRMSR. The problem is you need to be ring 0 (kernel mode) to read MSRs. I have no idea how to do that in Windows. I won't be able to help with any Windows questions but may be able to answer some MSR-related questions if you have any. I'm by no means an expert though.
PAPI is a very promising lead, however, I believe they discontinued support for Windows (and therefore .NET C#) quite a few years ago.
On the windows front, Visual Studio 2010 Premium comes with performance explorer. If you run any project or binary in instrumentation mode, you can get access to hardware events such as instructions retired.
The results can be somewhat mixed and inconsistent depending external factors, but it integrates with Visual Studio nicely and you get detailed counts (avg, maximum, total) on a per method/module level.
Intel V-tune performance analyzer also exposes these natively. I haven't played with this tool yet but it might be a more flexible API than what Visual Studio 2010 exposes.
You didn't write of your are looking for a application or for a library.
For Windows there is Intel VTune. But this not exactly an small tool. For linux I have used oprofile, which works without kernel patches.
On OS X, Shark lets you get data from the PMCs. I'm not sure what's available on Windows other than Intel's tools (VTune, as mentioned by drhirsch).
Try this
http://icl.cs.utk.edu/papi/
It is a full library that allows you to read any CPU counters data, works both on Windows and Linux [and other OS]
This thread looks pretty old. But still, all the above mentioned counters are available at Intel PCM .These counters can be used as a Microsoft Perfmon plugin or a command prompt interface. The Intel PCM gives informations like L2 and L3 cache hit ratio, cache misses etc.

Resources