I've been having issues with my parity instance freezing up while syncing. Given enough time it'll generally pick up again but sometimes it will stay where it is for hours. Restarting the instance temporarily solves the problem and it will immediately start syncing like normal.
For example, it might sync to the latest block at 15:00:00 and then not pick up anything until 15:30:00 (or a few hours later) and then start syncing until it catches back up to the top.
Peers are generally 20+ and absolutely nothing new will show up in the console until parity decides to resume (or I restart)
Running off of a windows server instance. Off of an HDD (Not ideal I know but except for these pauses it has been sufficient and the fact that restarting parity fixes the issue makes me believe it is unrelated to the hard drive performance)
Running parity 1.10.8 with the cmd:
parity.exe --jsonrpc-apis all --cache-size 1024 --db-compaction hdd --tracing off --pruning fast
I have a large number of wallets that I query regularly with the RPC APIs to check for changes.
Did anyone encounter anything like this?
That is a known bug and fixed a while ago, if you keep experiencing this, please ugprade to the latest Parity Ethereum 2.x releases!
https://github.com/paritytech/parity-ethereum/releases/latest
Note: I work for Parity Technologies.
Related
Summary
When putting the monitor in sleep mode in Windows 10, Windows seems to execute some tasks that don't get executed with the screen on.
This is interfering with our software, and we need to get rid of it.
Ful story
For a hardware device with a touch screen, I need to be able to turn off the touch screen when it's not in use, for durability reasons. Windows has a message that you can send to turn it off, SC_MONITORPOWER. More specifically:
SendMessage(hwnd, WM_SYSCOMMAND, SC_MONITORPOWER, 2);
This works fine, but when the screen is off, Windows is apparently sometimes performing some tasks that it doesn't do when the screen is on. We are careful to never write anything to the screen in this situation (that causes huge problems when the screen is off, in fact just having a blinking cursor in a DOS box is using up half a core when the screen is off).
Our software requires a callback to be executed every 0.25 ms. We have turned nearly every task, service and several other things in Windows off, and with the screen on, I can run our software for days without ever missing a callback. But with the screen off I get hiccups. The callback already runs at the highest possible priority.
So there is apparently something that we missed when we turned all services and tasks off. There appear to be 2 causes of hiccups:
One happens once every 10-30 hours or so (not sure of the exact time, it seems to vary). But it always happens 5 times, with EXACTLY 5 minutes (at most a few milliseconds off) in between (so in total it happens 5 times in a 25 minute period).
Beside this, we get a single hiccup typically every 4-10 hours, but the time between occurrences doesn't seem to be very constant so there could also be multiple causes.
I'm a bit at a loss here, and running analysis software can easily interfere with our own software, making it harder to detect when these hiccups really occur and when they are caused by running the analysis software.
Interestingly, I have seen this 5-times-every-5-minutes thing also on a completely different system (different hardware, different OS version), when recording audio in Adobe Audition. Audition misses pieces of audio every 5 minutes in this case, and I think it also only happens when the monitor is in sleep mode and you're not logged in remotely.
We have already tried to turn the touch screen off using direct monitor commands like Nircmd does, and it doesn't support those. My guess is that the SC_MONITORPOWER message is triggering more things in Windows, and if we can turn them off, that would fix our problem. Any ideas?
System
Intel i5-8700 with 6 cores, Windows LTSB, no extra software installed except our own.
Never mind, problem solved. It was not an extra task that was being started, it was one of the existing Windows processes that for some reason only causes issues when the display is off. Since killing them is not an option (Windows will just restart them), I've suspended the following processes, and the culprit is one of these (I don't know which one yet):
sihost.exe
igfxEM.exe (I very much suspect this one)
RuntimeBroker.exe
dllhost.exe
taskhostw.exe
explorer.exe
I have to continue testing a bit longer to be absolutely certain, but so far with these tasks all suspended I've not seen a missed callback in the last 38 hours. I don't know yet if there are any drawbacks to suspending all these things, so I'll try to find the cause(s) and suspend only that/those.
Fresh installation - plugins disabled, no malware. Chromium runs without delays. Firefox on every other action, including typing every now and then - takes 3-5 second pauses, sometimes displays a running wheel when the delay is even longer.
Started at certain point in time. De-installation/re-installation didn't cure. Running as a single process didn't cure. Latest version.
Any ideas what can be causing this, specifically for the Firefox?
Upgrade to v67 - I remember reading some glitch has been removed in prior versions that might have been causing this behavior.
Also, another idea would be not to limit your cache to a very low space.
+1 to move this question to SuperUser.
I have just noticed that my C drive is getting full, whereas it still had 30 GB free space 3 days ago.
Given last days activity I can't find any reason for this.
Now I realize that my C free space is getting lower and lower even though there's no current activity on my PC (except that it's turned on).
Every 2 minutes, I lose approximately 100 MB of free space, even though I don't download anything.
I launched my antivirus and I have closed my internet connection in order to see if the free space would stop decreasing, but it continued decreasing at the same pace.
I checked the task manager and notice there was a software running which I think was named "One Drive setup.exe" (during the past weeks, I had many pop up windows saying I had to update onedrive, but there was a problem with the auto update etc... but I didn't car because I don't even know what OneDrive is and I don't think I use it). So I killed this running task.
I thought it had stopped the loss of free space (I even gained 100 MB), but the decrease started again.
Now I connected to Internet again.
I got 300 MB free space back and now it seems constant since 4 minutes. Maybe these little ups and downs can be due to the current antivirus scanning.
But what can explain the loss of 30 GB during the past 2 or 3 days?
Could it be windows update? How can i check this with windows 10?
Could it be a virus or something bad?
Please, answer quickly, i only have 17GB left :-(
Thanks
Which version of Windows OS are you using?
Turn off/ disable System Restore point, this way you will able to recover some space. Other than use CCleaner (https://www.piriform.com/ccleaner/download) to clean your system.
They release patch I believe. But u also can use built in disk cleanup tool(https://support.microsoft.com/en-us/help/4026616/windows-disk-cleanup-in-windows-10).
Also uninstalled OneDrive/Google Drive unless u actively use this service. OneDrive sync with cloud files so that u can use those off line.
I am running Cloudera Hadoop on my laptop and Oracle VirtualBox VM.
I have given 5.6 GB out of mine 8 and six from eight cores as well.
And still I am not able to keep it up and running.
Even without load services would not stay up and running and when I try a query at least Hive will be down within 20 minutes. And sometimes they go down like dominoes: one after another.
More memory seemed to help some: with 3GB and all services, Hue was blinking with red colors when the Hue itself managed to get up. And after rebooting it would takes 30 - 60 minutes before I manage to get the system up enough to even try running anything on it.
There has been two sensible notes (that I have managed to find):
- Warning of swapping.
- Crashing note when the system used 26 GB of virtual memory which was not enough.
My dataset is less than one megabyte, so it is hard to understand why the system would go up to dozens of gigabytes, but for whatever was reason for that has passed: now the system is running more steadily around the 5.6 GB that I have given to it after closing down a few services: see my answer to myself.
And still it is just more stable. Right after I got a warning of swapping and the Hive went down again. What could be reason for more-or-less all Hadoop services going down if the VM starts to swap?
I don't have enough reputation to post the picture to here, but when Hive went down again it was swapping 13 pages / second and utilizing 5.9 GB / 5.6 GB. So basically my system starts crashing more-or-less right after it start to swap. "428 pages were swapped to disk in the previous 15 minute(s)"
I have used default installation options as far as hard drive is concerned.
Only addition is a shared folder between Windows and VM. That works somewhat strangely locking files all the time, so I used it just like FTP and only for passing files from one system to another. Thus I can go days without using it, but systems still crash, so that is not the cause either.
Now that the system is mostly up, services crash still about twice a day: Service Monitor and Hive are quite even with their crashing frequency. After those come Activity Monitor and Event Server, which appear to crash always together. I believe Yarn crashes as well, but it gets up on its own. Last time Hive crashed first, and then it got followed by Service Monitor, Hive (second time), Activity Monitor and Event Server all.
As swap is disk, perhaps the problem is with disk:
# cat /etc/fstab
# swapoff -a
# badblocks -v /dev/VolGroup/lv_swap
Checking blocks 0 to 8388607
Checking for bad blocks (read-only test): done
Pass completed, 0 bad blocks found.
# badblocks -vw /dev/VolGroup/lv_swap
Checking for bad blocks in read-write mode
From block 0 to 8388607
Testing with pattern 0xaa: done
Reading and comparing: done
Testing with pattern 0x55: done
Reading and comparing: done
Testing with pattern 0xff: done
Reading and comparing: done
Testing with pattern 0x00: done
Reading and comparing: done
Pass completed, 0 bad blocks found.
So nothing wrong with swap disk and I have not noticed any disk error anywhere else either.
Note that you could check file system from Windows side also. But I expect that if you make Windows to fix your Linux file system, you have good chances of destroying your Linux with that, so I did my checks somewhat pessimistically, because AFAIK these commands are safe to execute.
About half of the services kept going down, so giving more specifics would be a long story.
I succeeded to get the system more stable by closing down flume, hbase, impala, ks_indexer, oozie, spark and sqoop. And by increasing more memory to some remaining services that complained they had not been given enough memory.
Also I fixed couple of thing on the Windows side, I am not sure which one of these helped:
- MsMpEng.exe kept my hard drive busy. I didn't have permissions to kill it, but I decreased its priority to lowest possible.
- CcmExec.exe got to loop on my DVD and kept reading it for forever. This I solved by taking the DVD out from the drive. Then later on I killed the process tree to keep it from bothering for a while.
I found these using Windows resource manager.
The VM requires 4GB: http://www.cloudera.com/content/cloudera-content/cloudera-docs/DemoVMs/Cloudera-QuickStart-VM/cloudera_quickstart_vm.html You should use that.
I am not clear whether you are using the QuickStart VM though. It's set up to run just the essential services and tuned to conserve memory rather than exploit lots of memory.
It sounds like you are running your own installation, on one virtual machine, on your Windows machine. You may be running an entire cluster's worth of services on one desktop machine. Each of these services has master, worker processes, monitoring processes, etc. You don't need most of them.
You also probably have left memory settings at default suitable for a server-class machine of 16+ GB RAM. Remember these services usually run across many machines, not all on one.
Finally, you're clearly swapping, and that makes things incredibly slow. Remember this is all through a VM too!
Bottom line, use the QuickStart VM if you really want a 1-machine cluster tuned correctly. If you want a real cluster or more services, you need more hardware.
Also consider: cloudera.com/live contains a full CDH 5.1 cluster + sample data, running on demand on AWS. Of course, the advantage of the VM is that you can BYOD, but if you're simply looking for a hands-on Hadoop experience, Live is a great option.
I am working with a SuSE machine (cat /etc/issue: SUSE Linux Enterprise Server 11 SP1 (i586)) running Postgresql 8.1.3 and the Slony-I replication system (slon version 1.1.5). We have a working replication setup going between two databases on this server, which is generating log shipping files to be sent to the remote machines we are tasked to maintain. As of this morning, we ran into a problem with this.
For a while now, we've had strange memory problems on this machine - the oom-killer seems to be striking even when there is plenty of free memory left. That has set the stage for our current issue to occur - we ran a massive update on our system last night, while replication was turned off. Now, as things currently stand, we cannot replicate the changes out - slony is attempting to compile all the changes into a single massive log file, and after about half an hour or so of running, it trips over the oom-killer issue, which appears to restart the replication package. Since it is constantly trying to rebuild that same package, it never gets anywhere.
My first question is this: Is there a way to cap the size of Slony log shipping files, so that it writes out no more than 'X' bytes (or K, or Meg, etc.) and after going over that size, closes the current log shipping file and starts a new one? We've been able to hit about four megs in size before the oom-killer hits with fair regularity, so if I could cap it there, I could at least start generating the smaller files and hopefully eventually get through this.
My second question, I guess, is this: Does anyone have a better solution for this issue than the one I'm asking about? It's quite possible I'm getting tunnel vision looking at the problem, and all I really need is -a- solution, not necessarily -my- solution.