how to fix hyper v issue - vagrant

While running vagrant up --provision, getting following error:

One of the issues that I have occasionally struggled with in Hyper-V is that of time synchronization. Right now, for example, the time on my Windows 10 desktop that I am using to write this article is 9:37 p.m. However, the Hyper-V host that I am watching on another monitor is displaying a time of 9:33 p.m.
On the surface, a few minutes of clock skew might seem like a non-issue. However, there are several reasons why the differing clocks are problematic.
One such reason is that the Kerberos protocol, which is heavily used by the Windows operating system, is time-sensitive. If two systems' clocks are more than a few minutes out of sync with one another, it can cause Kerberos to stop working.
Another reason why clock skew is such a big deal is because multi-tier applications often span multiple Hyper-V hosts. A database server might, for instance, reside on one host, while a Web front end exists on another host, possibly even within a different host cluster.

when looking at start_vm.ps1, it seems to use command Start-VM.
and since this seems to be Windows, the Event Viewer might tell why it crashes.
I'd have a look at event channel Hyper-V-Config or Hyper-V-Worker.
command Test-VHD might also be worth a try.

I am not getting event log and do not understand it.

Related

5.6 GB not enough for Cloudera?

I am running Cloudera Hadoop on my laptop and Oracle VirtualBox VM.
I have given 5.6 GB out of mine 8 and six from eight cores as well.
And still I am not able to keep it up and running.
Even without load services would not stay up and running and when I try a query at least Hive will be down within 20 minutes. And sometimes they go down like dominoes: one after another.
More memory seemed to help some: with 3GB and all services, Hue was blinking with red colors when the Hue itself managed to get up. And after rebooting it would takes 30 - 60 minutes before I manage to get the system up enough to even try running anything on it.
There has been two sensible notes (that I have managed to find):
- Warning of swapping.
- Crashing note when the system used 26 GB of virtual memory which was not enough.
My dataset is less than one megabyte, so it is hard to understand why the system would go up to dozens of gigabytes, but for whatever was reason for that has passed: now the system is running more steadily around the 5.6 GB that I have given to it after closing down a few services: see my answer to myself.
And still it is just more stable. Right after I got a warning of swapping and the Hive went down again. What could be reason for more-or-less all Hadoop services going down if the VM starts to swap?
I don't have enough reputation to post the picture to here, but when Hive went down again it was swapping 13 pages / second and utilizing 5.9 GB / 5.6 GB. So basically my system starts crashing more-or-less right after it start to swap. "428 pages were swapped to disk in the previous 15 minute(s)"
I have used default installation options as far as hard drive is concerned.
Only addition is a shared folder between Windows and VM. That works somewhat strangely locking files all the time, so I used it just like FTP and only for passing files from one system to another. Thus I can go days without using it, but systems still crash, so that is not the cause either.
Now that the system is mostly up, services crash still about twice a day: Service Monitor and Hive are quite even with their crashing frequency. After those come Activity Monitor and Event Server, which appear to crash always together. I believe Yarn crashes as well, but it gets up on its own. Last time Hive crashed first, and then it got followed by Service Monitor, Hive (second time), Activity Monitor and Event Server all.
As swap is disk, perhaps the problem is with disk:
# cat /etc/fstab
# swapoff -a
# badblocks -v /dev/VolGroup/lv_swap
Checking blocks 0 to 8388607
Checking for bad blocks (read-only test): done
Pass completed, 0 bad blocks found.
# badblocks -vw /dev/VolGroup/lv_swap
Checking for bad blocks in read-write mode
From block 0 to 8388607
Testing with pattern 0xaa: done
Reading and comparing: done
Testing with pattern 0x55: done
Reading and comparing: done
Testing with pattern 0xff: done
Reading and comparing: done
Testing with pattern 0x00: done
Reading and comparing: done
Pass completed, 0 bad blocks found.
So nothing wrong with swap disk and I have not noticed any disk error anywhere else either.
Note that you could check file system from Windows side also. But I expect that if you make Windows to fix your Linux file system, you have good chances of destroying your Linux with that, so I did my checks somewhat pessimistically, because AFAIK these commands are safe to execute.
About half of the services kept going down, so giving more specifics would be a long story.
I succeeded to get the system more stable by closing down flume, hbase, impala, ks_indexer, oozie, spark and sqoop. And by increasing more memory to some remaining services that complained they had not been given enough memory.
Also I fixed couple of thing on the Windows side, I am not sure which one of these helped:
- MsMpEng.exe kept my hard drive busy. I didn't have permissions to kill it, but I decreased its priority to lowest possible.
- CcmExec.exe got to loop on my DVD and kept reading it for forever. This I solved by taking the DVD out from the drive. Then later on I killed the process tree to keep it from bothering for a while.
I found these using Windows resource manager.
The VM requires 4GB: http://www.cloudera.com/content/cloudera-content/cloudera-docs/DemoVMs/Cloudera-QuickStart-VM/cloudera_quickstart_vm.html You should use that.
I am not clear whether you are using the QuickStart VM though. It's set up to run just the essential services and tuned to conserve memory rather than exploit lots of memory.
It sounds like you are running your own installation, on one virtual machine, on your Windows machine. You may be running an entire cluster's worth of services on one desktop machine. Each of these services has master, worker processes, monitoring processes, etc. You don't need most of them.
You also probably have left memory settings at default suitable for a server-class machine of 16+ GB RAM. Remember these services usually run across many machines, not all on one.
Finally, you're clearly swapping, and that makes things incredibly slow. Remember this is all through a VM too!
Bottom line, use the QuickStart VM if you really want a 1-machine cluster tuned correctly. If you want a real cluster or more services, you need more hardware.
Also consider: cloudera.com/live contains a full CDH 5.1 cluster + sample data, running on demand on AWS. Of course, the advantage of the VM is that you can BYOD, but if you're simply looking for a hands-on Hadoop experience, Live is a great option.

GetPrivateProfileInt on network file on freshly booted machine

After intensive searching why certain workstations wouldn't perform a certain action when just being started up in the morning (...) I've discovered that GetPrivateProfileInt just returns the default value and doesn't bother to set GetLastError to something non-zero when the network-subsystem hasn't activated yet (e.g. because the DHCP client is still trying to get hold of an IP address to use.)
Does this sound familiar to someone? Does anybody happen to know what I should/could do about it?
For now I'll correct by using an alternate default value, and stalling a bit while I get my default value.
GetPrivateProfileInt() is one of those innocent looking Windows API functions that has a ton of code behind it. There's a mass of appcompat code, designed to allow Win3 programs to run on modern versions of Windows. One of the side-effects is that it is incredibly slow, it took about 50 msec the last time I profiled it.
Looks like you found a flaw in it. For all I know, it might actually be designed appcompat behavior. Emulating the way this API worked 18 years ago. I have no clue of course if that's accurate.
The very best thing you can do is stop using it. A possible workaround is to open the file first so that your program blocks until the service is up and running.
I would check if the file exists and sleep for a few seconds until the file is there. After some number of tries either use the default value or take an appropriate action.

Looking for advice on solving problems that occur only on your machine

I'm stuck trying to debug a problem which only occurs on my machine. It doesn't exhibit on any of the other devs' systems, nor on our production test server. I've tried pretty much everything I can think of short of completely wiping my hard disk and starting from scratch, or sneaking into the office in the middle of the night to swap my computer with someone else's.
This brings to mind the titular question, then: short of those drastic measures, what do you do when trying to resolve issues that no one else has? I'm open to advice that's general or specific.
[Not sure if this should be CW or not.]
Have you attached a debugger to the program to find the exact point of failure? That is what I would do first.
Sometimes third party software can be the root cause of these sorts of issues. Things like Anti-virus software install low-level filesystem and network drivers that can cause random intermittent failures. You can try killing all processes that aren't base OS services and your app.
Depending on your OS there are different tools that you can use to see what's going on under the covers. E.g. on Windows you can use Process Monitor to see what Registry keys it opens, what DLLs get loaded, etc. You can run this on your machine and on the success machines and compare to see if perhaps some required module is missing .
But seriously, use a debugger. That's what they are there for.
Two things:
I start with the obvious: What's different on your box? More memory? Odball PCI card? Different Microsoft APIs or service packs?
For oddball random software and/or OS crashes:
Check your system for heat issues.
Check your RAM for bad bits.
In this situation, I would try to check out the code and cleanly rebuild it from a different directory to make sure that there are no miscellaneous files in your working directory that are causing a problem.
If you are doing work against a database, I would also try tearing down the database and reconstructing it, possibly using a dump from another developer's machine.
Check the versions of any external third party software - database version, OS version, even software patches.
Look at the configuration on someone else's machine who doesn't have the problem and compare.
Get another developer to sit at your workstation and try to reproduce the problem and also go to their workstation and try it. True story - a fellow developer had a bug that he could only reproduce on his machine...it turns out that he was doing something slightly different in the GUI that no one else was doing (tabbing to a button and then hitting enter, everyone else just hit enter). It never occurred to him that other people might just hit enter to submit, because that "didn't make sense" to him.

Identify a reboot

Is there any "Boot session ID" or (reliable) "Boot timestamp"?
For an installation I need to detect that a scheduled reboot took place indeed.
I guess I could do a dummy MoveFileEx() with MOVEFILE_DELAY_UNTIL_REBOOT, but i did hope for something easier.
(We have to install a 3rd party package that sometimes behaves erratically after an repair/update. In that state, accessing the device may even lock up the system)
(Windows XP, Vista, 7)
For things like this, WMI (Windows Management Instrumentation) is often a good starting place. I know you can get current uptime directly through it, which may allow you to determine if a machine recently rebooted.
Here is a blog post with some code samples as well:
http://blogs.technet.com/heyscriptingguy/archive/2004/09/07/how-can-i-tell-if-a-server-has-rebooted.aspx
Depending on your implementation language, you probably just want to pull out the query code from the vbscript.
Apparently Windows has the equivalent of "uptime". Here's more info: http://support.microsoft.com/kb/555737
As I understand it, this should tell you how long ago the system was booted. Will that information solve your problem?
You could search the System event log for event 6009 from the EventLog source - this is the first event recorded after each reboot.
I think the best answer has already been given here: Find out if computer rebooted since the last time my program ran?
That seems to be the simplest way. Use GlobalFindAtom() to see if it exists and create it, with GlobalAddAtom(), if it doesn't. It will persist beyond the execution of your program. If your application runs again, and sees that the atom exists, then then it isn't the first run since reboot.
If the computer is restarted, then the atom won't exist, indicating that this is the first run of your program since the reboot.

What causes the MS Windows 'System' Process to go nuts when compiling?

A couple of times recently I have noticed that 'something' is causing the Windows System Process to sit at 50+% and it will not quit until the PC is rebooted. Happening on Win2k and Win XP so far.
This is particularly troublesome because it currently appears to be triggered by MSVC 2005/Incredibuild and rebooting the build servers is not a nice thing.
At the same time the 'System Idle Process' process is holding the rest of the CPU and the build steps themselves seem to be starved. ie. a module that normally takes <5 minutes to compile is currently taking 20+.
I'd take a few guesses at maybe being virus checker or tortoise svn but would desperatly like some other suggestions.
Edit:
I've been experiencing this as something that is triggered, and the culprit may not be ongoing. Thats not to say that some other ongoing process hasn't done something 'stupid' and is managing an active lock up of System while appearing to be idle itself.
System (100% of 1 core), and System Idle Process are sharing 98-100% of the total CPU.
Occasionaly mt.exe, link.exe, buildservice would get a look in at 1-2%.
I'm running VNC to view the machine, so it's getting a look in on occasion.
Edit 2:
When left the previous evening the build process seemed to be progressing all be it slowly, but after waiting another 13 hours the 1 hour build process hasn't completed. System is still hogging the 1 core.
My understanding is that the "System" process is the time spent in the kernel (so performing disk I/O, network I/O (you did mention Incredibuild) and the like) -- I'd check for disk fragmentation, virus checkers and possibly look at these on other machines in your Incredibuild cluster.
As the System Idle process runs at "Low" priority, it's a red herring that it'd be "taking up CPU time" -- if anything it's just showing that there is available CPU time available. The fact the processing is stuck to a single processor shows that the process is doing something that is not multi-core aware, or someone has set it's thread affinity to 1.
I've noticed the virus checking software that I use can radically slow down compilation but it does not extend beyond the end of the build. Turning off advanced and heuristic checking improves this to the extent that I do not have to disable the scanner entirely. I have changed my scanning strategy such that I use scheduled full scans now more than advanced on the fly scanning, as it hurts the perfromance of a number of apps. (n.b. I am using the latest cut of Kaspersky). I'm also using an automated backup tool (AJCBackup) that also needs to be restrained when compiling.
You may also want to consider disableing the Windows Indexing service on drives that are be used to create a lot of temporary and object files, as it doesn't provide much value in this context for the amount of performance it draws.
Edit: Have checked which processes are actually hogging the CPU core and traced them back to a given app?
We've encountered issues with Kaspersky and Incredibuild in our offices - compiles and sometimes links will just hang and never finish.
Only seems to affect some machines though which is wierd, and only Windows XP (Vista seems immune from what I've seen).
Only solution I've found so far is to turn Kaspersky off entirely - so if you find a solution then let me know!
RE: smacl, work from the Windows Search/Indexing Service (WSearch) won't be attributed to the System process's CPU time, it should come from the SearchIndexer.exe/SearchFilterHost.exe services (Vista+).
The majority of activity from System you will see will be in disk activity from the lazy writer and other disk accesses. CPU activity from System will be because of kernel activity such as drivers (ISRs/DPCs) and other kernel-level filters (which could include AV file and process filters).
Process Explorer (http://technet.microsoft.com/en-us/sysinternals/bb896653.aspx) can aid in viewing CPU usage across processes, including System. You can use the public Microsoft Symbol Server and this resource to get you started.
If you can take a trace with Xperf (http://msdn.microsoft.com/en-us/performance/cc825801.aspx), I can help you analyze where the CPU time is being spent in the System (kernel) context. Xperf isn't officially supported on XP, but you can take a trace on XP and analyze it on other systems.
Xperf and Process Explorer should be able to shine a spotlight on exactly the module(s) that are causing the runaway CPU usage. Symbols may not even be necessary to diagnose the problem; simply the module name can often point to the component in question that is slowing down your system. For example, high CPU usage from ndis.sys can point to network interrupts, or activity from modules such as aavmker4.sys can point to AV software (Avast! in this case).
And as always, check if there are any updated drivers and AV software for your system.
In my office, a conflict between Incredibuild and Spyware Doctor's Immunize feature caused similar issues. Turning off Immunize solved it for us.
What anti-virus/malware do you use?
I'm having same hangs when compiling using IncrediBuild in VS2003, on clean Windows 7 without any anti-virus. It worked fine on same box in XP and Vista.

Resources