Interpreting Xcode Time profiler

Interpreting Xcode Time profiler - xcode

I profiled my app for about 38 seconds and selected the 12 seconds that have UI problems. It looks to me like the profiler is telling me that out of the 12 seconds that I have selected, that more than 3 seconds is spent removing notification observers. Is this the correct way to interpret these results?

It's telling you it's spending 3 seconds out of 12 doing _CFXNotificationRemoveObservers.
Is that useful?
I would think you'd want to know why it's doing that, and whatever else it is doing as well.
It's giving you a very incomplete picture.
If you simply paused it randomly (a few times) in that 12 seconds, you would be using this technique.
It tells you not only what the program is doing at the time you stopped it, but you can see why by reading the stack.
If it is spending any of that time doing I/O or blocking system calls, you will see that too.

Is this the correct way to interpret these results?
Yes, that's correct -- according to your image, your app is spending a lot of time removing observers.

Related

Understanding Xcode build time summary

I am trying to generate the Xcode build summary for my project so that I can optimize the bottlenecks. As per attached screenshot
Total build time it shows at the bottom is 135.3 seconds. While the first module CompileC takes 449.356 seconds. I know Xcode do some parallelization while building the project but I am not sure how it is calculating this summary time. Can anyone explain this?

I know this is old, but I was looking into this, and I came across this comment by Rick Ballard, an Apple Xcode build system engineer.
Yes – many commands, especially compilation, are able to run in parallel with each other, so multicore machines will indeed finish the build much faster than the time it took to run each of the commands.
In other words, the quoted numbers are core-seconds, not real time, except for the last one. So if you have six cores, your CompileC task might only take 449/6 = 75 seconds. You've got, maybe, 660 core-seconds, so you'd get about 110 clock-seconds, which looks about right versus 135 total time.

Limit spikes, flatten processor usage?

I have multiple instances of a Ruby script (running on Linux) that does some automated downloading and every 30 minutes it calls "ffprobe" to programmatically evaluate the video download.
Now, during the downloading my processor is at 60%. However, every 30 minutes (when ffprobe runs), my processor usage skyrockets to 100% for 1 to 3 minutes and ends up sometimes crashing other instances of the Ruby program.
Instead of this, I would like to allocate lesser cpu resources to the processor heavy ffprobe, so it runs slowly. i.e. I would like it to use - say, a max of 20% of the CPU and it can run as long as it likes. So, one might expect it to take 15 minutes to complete a task that it now takes 1-3 minutes to complete. That's fine with me.
This will then prevent crashing of my critical downloading program that should have the highest priority.
Thank you!

Give all possible resources to a program

I created a program in C# to work with 2.5 million records in Oracle Express (local instance), parse/split those records and create an additional 5 million records.
I added some code to print times on the screen and it seems fairly fast. It is doing all the processing for 1K records every 9 seconds. Which means it takes more than 6 hours to finish.
Now, with Task Manager I can see the program is using 6% of CPU (max) and around 50MB of memory. I understand the OS, and Oracle itself need resources to operate but..... is there a way to tell this little program "hey, it's ok, go ahead and use at least 50% of CPU, there are 4GB of RAM so knock yourself out"?
Note: One of the reasons I'm using a local instance with Oracle Express is to reduce the network bottleneck. Also I might not run this process quite often but I was intrigued to see if this was at all possible.
Please forgive my noobness,
Thanks!

The operating system will give your program all the resources it needs, the reason your process is not consuming all the CPU is probably because it's waiting for the IO sub system more than the processor.
If you want to see if you can consume more CPU cycles try writing a program that runs a short infinite loop as fast as possible and you will see the difference in CPU usage.

A number of thoughts, not really answers I guess, but.
You could up the priority of the applications thread, however, its possible that the code maybe less efficient than you think, so..
Have you run a profiler on it?
If its currently a single threaded app, you could look to see if you could parse it in batches and therefore run them in parallel.
Without knowing a lot of detail of the splitting of records, is it possible to off hand that more to oracle to do? eg, would matter less about network etc or local or otherwise.
If you're apps drawing/updating a screen or UI then it will almost certainly slow the progress of the work down. An example. I ran an app which sorted about 10k emails into around 250k lines into a database, if I added an item to a listbox each line the time went from short to rediculous eg, crash out got bored. So, again, offloading to a thread to do the work with as few UI updates to do as possible can help.

Why is the sleep-time of Sleep(1) seems to be variable in Windows?

Last week I needed to test some different algorithmic functions and to make it easy to myself I added some artifical sleeps and simply measured the clock time. Something like this:
start = clock();
for (int i=0;i<10000;++i)
{
...
Sleep(1);
...
}
end = clock();
Since the argument of Sleep is expressed in milliseconds I expected a total wall clock time of about 10 seconds (a big higher because of the algorithms but that's not important now), and that was indeed my result.
This morning I had to reboot my PC because of new Microsoft Windows hot fixes and to my surprise Sleep(1) didn't take 1 millisecond anymore, but about 0.0156 seconds.
So my test results were completely screwed up, since the total time grew from 10 seconds to about 156 seconds.
We tested this on several PC's and apparently on some PC's the result of one Sleep was indeed 1 ms. On other PC's it was 0.0156 seconds.
Then, suddenly, after a while, the time of Sleep dropped to 0.01 second, and then an hour later back to 0.001 second (1 ms).
Is this normal behavior in Windows?
Is Windows 'sleepy' the first hours after reboot and then gradually gets a higher sleep-granularity after a while?
Or are there any other aspects that might explain the change in behavior?
In all my tests no other application was running at the same time (or: at least not taking any CPU).
Any ideas?
OS is Windows 7.

I've not heard about the resolution jumping around like that on its own, but in general the resolution of Sleep follows the clock tick of the task scheduler. So by default it's usually 10 or 15 ms, depending on the edition of Windows. You can set it manually to 1 ms by issuing a timeBeginPeriod.

I'd guess it's the scheduler. Each OS has a certain amount of granularity. If you ask it to do something lower than that, the results aren't perfect. By asking to sleep 1ms (especially very often) the scheduler may decide you're not important and have you sleep longer, or your sleeps may run up against the end of your time slice.
The sleep call is an advisory call. It tells the OS you want to sleep for amount of time X. It can be less than X (due to a signal or something else), or it can be more (as you are seeing).
Another Stack Overflow question has a way to do it, but you have to use winsock.

When you call Sleep the processor is stopping that thread until it can resume at a time >= to the called Sleep time. Sometimes due to thread priority (which in some cases causes Sleep(0) to cause your program to hang indefinitely) your program may resume at a later time because more processor cycles were allocated for another thread to do work (mainly OS threads have higher priority).

I just wrote some words about the sleep() function in the thread Sleep Less Than One Millisecond. The characteristics of the sleep() function depends on the underlying hardware and on the setting of the multimedia timer interface.
A windows update may change the behavior, i.e. Windows 7 does treat things differently compared to Vista. See my comment in that thread and its links to learn more abut the sleep() function.

Most likely the sleep timer has not enough resolution.
What kind of resolution do you get when you call the timeGetDevCaps function as explained in the documentation of the Sleep function?

Windows Sleep granularity is normally 16ms, you get this unless your or some other program changes this. When you got 1ms granularity other days and 16ms others, some other program probably set time slice (effects to your program also). I think that my Labview for example does this.

How to force workflow runtime to use more CPU power?

Hello
I've quite unordinary problem because I think that in my case workflow runtime doesn't use enough CPU power. Scenario is as follow:
I send a lot of messages to queues. I use EnqueueItem method from WorkflowRuntime class.
I create new instance of workflow with CreateWorkflow method of WorkflowRuntime class.
I wait until new workflow will be moved to the first state. Under normal conditions it takes dozens of second (the workflow is complicated). When at the same time messages are being sent to queues (as described in the point 1) it takes 1 minute or more.
I observe low CPU (8 cores) utilization, no more than 15%. I can add that I have separate process that is responsible for workflow logic and I communicate with it with WCF.

You've got logging, which you think is not a problem, but you don't know. There are many database operations. Those need to block for I/O. Having more cores will only help if different threads can run unimpeded.
I hate to sound like a stuck record, always trotting out the same answer, but you are guessing at what the problem is, and you're asking other people to guess too. People are very willing to guess, but guesses don't work. You need to find out what's happening.
To find out what's happening, the method I use is, get it running under a debugger. (Simplify the problem by going down to one core.) Then pause the whole thing, look at each active thread, and find out what it's waiting for. If it's waiting for some CPU-bound function to complete for some reason, fine - make a note of it. If it's waiting for some logging to complete, make a note. If it's waiting for a DB query to complete, note it. If it's waiting at a mutex for some other thread, note it.
Do this for each thread, and do it several times. Then, you can really say you know what it's doing. When you know what it's waiting for and why, you'll have a pretty good idea how to improve it. That's a variation on this technique.

What are you doing in the work item?
If you have any sort of cross thread synchronisation (Critical sections etc) then this could cause you to spend time stalling the threads waiting for resources to become free.
For example, If you are doing any sort of file access then you are going to spend considerable time blocked waiting for the loads to complete and this will leave your threads idle a lot of the time. You could throw more threads at the problem but then you'd end up generating more disk requests and the resource contention would become even more of a problem.
Thats a couple of potential ideas but I'd really need to know what you are doing before I can be more useful ...
Edit: in answer to your comments...
1) OK
2) You'd perform terribly with 2000 threads working flat out due to switching overhead. In fact running 20-25 threads on an 8 core machine may be a bad plan too because if you get them running at high speed then they will spend time stealing each other's runtime and regular context switches (software thread switches) are very expensive. They may not be as expensive as the waits your code is suffering.
3) Logging? Do you just submit them to an asynchronous queue that spits them out to disk when it has the opportunity or are they sychronous file writes? If they are aysnchronous can you guarantee that there isn't a maximum number of request that can be queued before you DO have to wait? And if you have to wait how many threads end up iin contention for the space that just opened up? There are a lot of ifs there alone.
4) Database operation even on the best database are likely to block if 2 threads make similar calls into the database simultaneously. A good database is designed to limit this but its quite likely that, at least some, clashing will happen.
Suffice to say you will want to get a good thread profiler to see where time is REALLY being lost. Failing that you will just have to live with the performance or attack the problem in a different way ...

WF3 performance is a little on the slow side. If you are using .NET 4 you will get a better performance moving to WF4. Mind you is means a rewrite as WF4 is a completely different product.
As to WF3. There is white paper here that should give you plenty of information to improve things from the standard settings. Look for things like increasing the number of threads used by the DefaultWorkflowSchedulerService or switching to the ManualWorkflowSchedulerService and disabling performance counters which are enabled by default.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio