Longshot: Python and PyTorch not running unless I bump my GPU to run by doing some other task - memory-management

Longshot, but anyone had this issue?
I have code running (or not) which is set to run on the GPU. It fails to run unless I bump the GPU to make it run a little bit, for example by watching YouTube or playing a game.
I believe there may be some computer resource/application prioritisation/limitation configurations causing vscode to stop running, and wondered if anyone else had run into this issue.
I am running the code in a .ipynb notebook file in vscode (not sure if that might contribute to the issue). Sometimes the code will just freeze permanently, and I usually go about restarting the code to get things going properly.
The code should normally takes about 7 seconds for the training epoch, and 0.7 seconds for the validation epoch. But I was away for the first epoch and found it hadn't started, and so I opened up Youtube and it began.
Code timings
I can't think of what settings to change for this, but have tried a few
Power options
Anyone had a similar issue before? My second theory is that I think perhaps I am using too much GPU ram in my python code which is slowing it down and effectively made it freeze. And then when I load another application to use the GPU it forces the GPU ram to reconfigure and somehow this RAM reconfiguring might be unblocking the GPU allowing it to run again.

Related

Diagnosing Windows program that hangs on startup

I have a new Win10 laptop. I've installed lots of software, including a 25-year-old Codewright editor that I've customized up the wazoo, and that I've been installing on all my machines for, well, 25 years. After working for a few days, it suddenly stopped, and reinstalling it didn't fix it. On startup, it puts up a small splash window, and normally opens the main window a half a second later (that took more than 5 seconds 25 years ago). It's not using any CPU, and there's nothing I can do but kill the process.
In the past, I've occasionally got my system into a state where Codewright would hang on loading, due to some other program that hadn't terminated correctly, and it was unfrozen by killing off that other process. So that's reason to believe that Codewright is waiting at some global lock which some other malfunctioning software is holding. So I have two questions:
Does this ring a bell? Is there some known failure mode where a program putting up a splash window then switching to another window can be prevented by something else going on the system?
Is there a way to diagnose this, perhaps by finding out what system call it's hanging inside? I tried dtrace.exe, started Codewright, and then stopped tracing, and it produced a 3GB XML file, which is quite a haystack. There's a way to filter it by PID, but since this is a startup problem, I have no idea what the PID will be. Is there a better tool for doing this, or some more appropriate dtrace feature that I missed?
The comment about using the Task Manager to create a dump file actually led me to notice that there is an Analyze Wait Chain function there that I had never seen before, since I haven't used Task Manager much since I switched from Win7. This gave me exactly the answer I wanted. My editor was waiting for something that was being held by some NVIDIA GeForce Experience module. Since I don't use that, I uninstalled it, and I'm back up and running. Thanks for the tip.

Julia slows down after computer upgrade

I am very new to Julia and am not a computer geek so this question may not be very clear. I am happy to add any information as necessary.
I am running Julia on VSCode in Windows. I recently added some memory sticks (128G to 256G) to the PC and found out that Julia was significantly slower. I tried several things including moving the positions of the memory sticks, reinstalling Julia and VSCode, disabling hyperthreading. (in this order). Nothing worked.
I then decided to install Windows Server on my computer, hoping that Julia would behave normally under the new system. It was still slow.
Could anyone give me some suggestions on what to do? Thanks!

MinGW compiling excessively slow

Since some years ago I started using Qt in both Windows 7 as well as in Linux Ubuntu and it would always compile fast with MinGW being used for Windows. But in the last couple of years or so, maybe thanks to updates in the version of both Qt and MinGW, I started detecting a slow down in the compiling speed inside Windows. I did some research trying to find why MinGW had started to become so slow compared to Linux (it wasn't before!) and everything people told me was that MinGW was slower in Windows and that it would be better, if possible, to just use Linux.
Since I wanted to continue my project, I followed the suggestion and since I've being using Linux with relatively no problems. The situation now is that I must go back to Windows (now updated to Windows 10) to make visual corrections for this OS and I need to once again work with MinGW having to face the same problem as before.
But for some reason it seems that the slowness of MinGW became even worse! While before I at least was able to compile the app in around 4 minutes, now the last time I tried it took 38 minutes before I gave up and went to sleep - and this is for a project that takes only 1:03 minute to be compiled in Linux [under the same compile configuration]!
Well I'm still aware about the slowness of MinGW, but as a quick research around this problem on the web reveals, that is just too slow: all backtesting one can find in other threads here on SO reveals at best 2x-3x more time to compile a project, not 38x+!!
So I would like to know what kind of possible problems I might have in my Windows for this exaggerated slowness to happen. I know I ended up installing at least 4 different versions of MinGW; could this have brought the problem?
It's interesting also to notice that when compiling using the -j option and watching the Compile Output log in Qt Creator alongside Process Explorer, there are moments when the compiling simple pauses for 10 seconds or more and the CPU usage drops from its ~100% to close to 5% with nothing happening till it suddenly continues the compilation process. I'm sure this constant pauses are part of the above average time, but I have no idea why MinGW is showing this behaviour.
You might want to check where the time is spent.
There a lot of tools that allow you to capture what a certain process is doing, I name just two of them:
ProcMon
XPerf or its successor
But to analyze the reports generated by these tools you need a rather deep understanding. If this doesn't help temporarily disable other running services and program step-by-step (if you want to know which program causes the problem) or disable all of them at once.
Looking at the spikes of cpu usage that TaskManager or Procexp by sysinternals show might help too to identify those components that block your cpu.
If your antivirus is the cause of the collision that makes the compile so slow you can define exceptions, then the antivirus will not scan certain programs or paths.
So perhaps it is easier to first try the compilation process with a disabled antivirus software or even from a clean live boot Windows CD.

Hadoop FileSystem.getFS() pauses for about 2 minutes

I'm having a very strange problem. I'm using dfs-datastores Pail abstraction to write data to HDFS in Java. I don't think the Pail piece is important to the problem though.
When it calls org.apache.hadoop.fs.FileSystem getFS(java.lang.String path) with a path on my local filesystem it pauses for about 2 minutes seemingly doing nothing then returns. This is on my laptop.
The weird thing is that it worked really fast when I was on the network at my office today, but now that I'm home it's doing it again. I'm running Ubuntu 10.10 64-bit with Java 1.7.
Anyone have any ideas what it's doing? What could be different between being at work and being at home?
UPDATE:
I've been stepping through code with the debugger and it seems to be having trouble in Configuration.loadResource(). It's calling that multiple times and it will take 5-10 seconds to return from that function.
UPDATE2:
I've narrowed this down a little further. The biggest hang up seems to be when it calls KerberosName.setConfiguration(). Which would explain why it runs fast at work since the Active Directory acts as a Kerberos server. I don't have one here at home, so it can't find one. Now they question is why in the world it's trying to load the Java Kerberos stuff.
I found a solution (or at least a work around). I installed the krb5-kdc package and now my little program runs fast without any unexplained pauses. After this I removed krb5-kdc, tested and it was still running fast. I removed /etc/krb5.conf and it started doing the pause again. It looks like using the Hadoop library on Ubuntu (at least) requires a /etc/krb5.conf file.
Maybe this will help someone else.

Autoconf on Windows 7 dreadfully slow

I am working on a project using Google's cmockery unit testing framework. For a while, I was able to build the cmockery project with no problems. e.g. "./configure", "make && make install" etc. and it took a reasonable amount of time (1-2 minutes or so.) After working on other miscellaneous tasks on the computer and going back to re-build it, it becomes horrendously slow. (e.g. after fifteen minutes it is still checking system variables.)
I did a system restore to earlier in the day and it goes back to working properly for a time. I have been very careful about monitoring any changes I make to the system, and have not been able to find any direct correlation between something I am changing and the problem. However, the problem inevitably recurs (usually as soon as I assume I must have accidentally avoided the problem and move on). The only way I am able to fix it is to do a system restore to a time when it was working. (Sometimes restarting the machine works as well, sometimes it does not.)
I imagine that the problem is between the environment and autoconf itself rather than something specific in cmockery's configuration. Any ideas?
I am using MinGW and under Windows 7 Professional
Make sure that antivirus software is not interfering. Often, antivirus programs monitor every file access; autoconf accesses many files during its operation and is likely to be slowed down drastically.

Resources