I'm trying to create a custom profiler to see the performance inside builds and I want to recreate the same hierarchy tree as the one in profiler CPU usage with the calls and milliseconds.
Profiler
All the info I see in documentation and forums is to use the line ProfilerRecorder.StartNew(ProfilerCategory.Internal, "Main Thread", 15); but all the info I can get is the milliseconds each frame needs to finish.
Related
I have a .NET CORE 3.1 Console application running on a Ubuntu20 x64 server, and randomly experiencing High Cpu(100% for 4 cores) cases.
I'm following diagnostics to start a diag during the peak time for my app likeļ¼
dotnet-trace collect -p 1039 --providers Microsoft-DotNETCore-SampleProfiler
from the resulted .nettrace file opened in Visual Studio, I can see the Funcions list with CPU Time for each single function.
But I understand the CPU time here is actually the wall time that just means the time of a function call stayed in a thread stack, and no matter it consumes real CPU caculation resource or not.
The hotest spot my this .nettrace now is pointing to these lines of code(pseudo code):
while(true)
{
Thread.Sleep(1000);//<---------hottest spot
socket.Send(bytes);
}
and
while(true)
{
ManualResetEvent.WaitOne();//<---------hottest spot
httpClient.Post(data);
}
Obviously above 2 hottest spot will not consume real CPU resource but just idle waitting, so any way to trace the functions that used the real cpu usage, just like the JetBrains dotTrace provided:
You might want to use external tools like top. This could help to identify the process consuming the CPU percentage.
If your profiler identifies Thread.Sleep() as hottest spot, chances are that your application is waiting for some external process outside the scope of the profiler.
I would suggest refactoring this code to use async and use 'await Task.Wait(xxx)' instead of doing this on the Thread level.
I'm having this suggestion based on partially similar problem which has been described here
Why Thread.Sleep() is so CPU intensive?
I've created tracepoints that capture some raw data. I want to be able to post-process this data and possibly create a new viewer for the tracing perspective in Eclipse but I really have no idea where to start. I was hoping to find a document that described how to create a new viewer for the trace eclipse perspective, how to read the ctf files, and how to graph the results in the view.
Alternatively, I'd just like to read the trace data and add some new trace events with postprocessed data.
As background to the question, I want to perform analysis on the trace timestamps and generate statistics about the average throughput and latency. Although I can do this while inserting the tracepoint, I'd like to offload the math to the analysis portion.
Rich
In general, such analysis is better done in post-processing. Doing it at runtime in your traced program may affect performance, to a point where the data you collect is not representative of the real behaviour of the application anymore!
The Trace Compass documentation, particularly this section, explains how to create new graphical views in Eclipse.
If you want to output a time-graph or XY-chart view, you can also look at the data-driven XML interface. It is more limited in features, but can work straight off the RCP (no need to recompile, no need to setup the dev environmnent).
I did a trace of application
In this report file:
1.
When I select "CUDA -> CUDA Summary" in the drop down
Under the Runtime API calls item in the table
% Time - 80.66
Launches
% Device Time - 15.46
All the other time percentages are nearly 0%
so my question here is that where is the rest of the 19.34% of Time and 84.54% of Device Time? That is, if they mean percentage to completely different 'Total Time' values?
2.
I used thrust vectors to copy back and forth my data. In the "Memory Copy" section of this report, all the % Time values for memo copy for my run are apparently negligible.
But guess what, when I click the 'summary' link of the Runtime API Calls (which has its % Time value as high as 80.66), I immediately see that the culprit - 'cudaMemcpy' with its 'Capture Time %' value as high as 73.75 in this 'Runtime API Calls Summary' page.
so my question here is that
does this mean that my bottle neck are still those call to thrust::copy(), even the "Memo Copies" section of the report doesn't show it?
and how can I really find the exact function call that is the most expensive to me in general?
how does timeline feature help with any of these?
CUDA SUMMARY
In the CUDA Summary the % Time under Runtime API Calls is the % of CPU time that is taken by the CUDA Runtime. I do not recall if the % is limited to 100% (all CPU threads are flattened) or if the maximum % is NumCpuCores * 100%.
API CALLS
In order to find the most expensive Runtime API Calls perform the following steps:
Navigate to the page CUDA Runtime API Calls
Click on the Duration column 2 times to sort Descending
It is possible capture the call stack for CUDA Runtime API Calls so you can jump to the source code from the report. This can be enabled in the Activity with the following steps:
Navigate to Trace Setings in the Activity
Enable System Trace
Expand the CUDA Trace Settings
Enable Runtime API Trace and Call Stack Trace = Always
WARNING: Setting Call Stack Trace to Always increases the API call overhead. Only enable this when the program is CPU limited and you are trying to identify the source code generating the API calls.
The call stack trace can be accessed from report page that references the API call by using the correlation pane in the bottom left corner of the report page. The screen shot below shows the call stack for the cudaEventSynchronize call in the CUDA Runtime API Calls report page.
It is possible to query for the longest API calls in the Timeline report page using the correlation information for the Process\Thread\Function Calls or Process\CUDA\CUDA Context\Runtime API rows.
Click on the row containing the API Calls
In the correlation tree click on Row Information\Runtime API
In the table of API calls click 2 times on the Duration column and scroll the table to the top.
Click on the API call to navigate the timeline view to the API call.
The call stack can also be retrieved at this point using the correlation pane.
I'm using EF Code first, with one model that has over than 200 Entities(winforms), when i ran my program for first time, it took long time to run first query,then I used pre-generated views for improving performance, startup time reduced to about 12-13 seconds(before pregenerated views, startup time was about 30 seconds), which options i have, to reduce the time of my first query?
You don't have many options. First of all try to use the latest EF version - that means EF6 alpha 2 because there were some improvements but it may not be enough. IMHO add splash screen to your app and make the "first query" during application startup. WinForms application simply can have longer startup time if they perform some complex logic. Commonly whole application is initialized during startup so that it run smoothly once it is started.
In the XCode Instruments application there is a Core Data Saves instrument. It can show the "Save duration" for each Core Data save. What units is it using for the Save Duration? With no humanly perceptible lag time I'm seeing readings ranging from 67 to 6343.
Is this counting microseconds, something related to processor cycles, or multiples of the instrument sample time?
Save duration: Duration of the save operation in microseconds.
For further info see Apple doc.
Furthermore, if you enable the debug functionality for Core Data
-com.apple.CoreData.SQLDebug 1
You can also see duration times within the console. In this case they are expressed in seconds. To enable it see XCode4 and Core Data: How to enable SQL Debugging.