Just wanted your lights to understand something as much as i can.
Some days ago i have noticed a weird command running which i don't quite remember it's name at the moment but what i do remember is it's parent process which was something like:
/usr/lib/systemd/systemd --switched-root --system --deserialize 22
as far as i can recall.
Systemd is linux's Services Manager and for some reason invoked with the above arguments the above call.
Unfortunately i cannot recall the actual command which was being invoked at the moment but i do remember the call to that weird command. Actually this call comes and go occasionally at arbitrary intervals. 3-4 times os so.
I googled a bit , but it seems there is no intrinsic information regarding to that systemd call. Intentionally undocumented ystem call some post mentioned.
Can someone shed some light into this, as to how can i investigate this any further?
You need no investigation for normal, expected behaviour.
--switched-root --system --deserialize 22
That translates to "Systemd switched roots (i.e from the initrd to the live running system), it is a --system instance and it wanted to restore its state from open file descriptor number 22 "
Related
I have a server on which we execute multiple bash scripts to automate tasks (like copying files to other servers, kicking off backups, etc). It has been working for some months, but today it started to get erratic.
What is happening, is that the script gets 'stuck' for a while, and after that, it runs with no problem. If I copy and paste the commands one by one on the terminal, it works, so is not something on the script itself, but it seems something that is preventing the bash interpreter (if that makes sense).
Another weird behavior is that the same script will run with no issues eventually. However, as we use Jenkins for automation, the scripts are re-created every time a new job starts.
For example, I created a new script, tst.sh, which only contains an echo. If I try to run it directly, it gets stuck for a while. I tried to debug it with bash -xeav but it does not print my script code, which means that it is not reading it. After a while, the script ran, with no changes. However, creating one script, with the same content and a different name, resurfaces the issue.
My hypothesis is that something prevents the script to be read, and just waits until whatever is blocking it to finish. However, I did not see any process holding the file, which means that it may not the case.
Is there any other thing I should try? My knowledge in bash is pretty basic, so I don't know if there is a flag that may help me on debugging this internally.
I am working on RHEL 8.85, the bash version is GNU bash, version 4.4.20(1)-release (x86_64-redhat-linux-gnu)
UPDATES BASED ON THE COMMENTS
Server resources are OK, no usage for them.
Hardware for the server also works fine, the ops team has not reached out with any known issue at least
Reboot makes the issue disappear, however, it reappears after 5 minutes or so
The issue seems that is not related to bash profiles and such.
Issue solved, posting this as an answer so people can find it quicker.
Turns out, as multiple users suggested in the comments (thanks to all!!) the problem was caused by a security monitor, which analyzed each of the scripts that were executed. The team changed some settings on that end to prevent it from happening, and so far is working.
I have a new Win10 laptop. I've installed lots of software, including a 25-year-old Codewright editor that I've customized up the wazoo, and that I've been installing on all my machines for, well, 25 years. After working for a few days, it suddenly stopped, and reinstalling it didn't fix it. On startup, it puts up a small splash window, and normally opens the main window a half a second later (that took more than 5 seconds 25 years ago). It's not using any CPU, and there's nothing I can do but kill the process.
In the past, I've occasionally got my system into a state where Codewright would hang on loading, due to some other program that hadn't terminated correctly, and it was unfrozen by killing off that other process. So that's reason to believe that Codewright is waiting at some global lock which some other malfunctioning software is holding. So I have two questions:
Does this ring a bell? Is there some known failure mode where a program putting up a splash window then switching to another window can be prevented by something else going on the system?
Is there a way to diagnose this, perhaps by finding out what system call it's hanging inside? I tried dtrace.exe, started Codewright, and then stopped tracing, and it produced a 3GB XML file, which is quite a haystack. There's a way to filter it by PID, but since this is a startup problem, I have no idea what the PID will be. Is there a better tool for doing this, or some more appropriate dtrace feature that I missed?
The comment about using the Task Manager to create a dump file actually led me to notice that there is an Analyze Wait Chain function there that I had never seen before, since I haven't used Task Manager much since I switched from Win7. This gave me exactly the answer I wanted. My editor was waiting for something that was being held by some NVIDIA GeForce Experience module. Since I don't use that, I uninstalled it, and I'm back up and running. Thanks for the tip.
I use Vagrant on a MacOS with an ubuntu64 16.04. Running htop, I can see vagrant ssh process can use virtually 530G (in VIRT Column).
Is it the normal behavior of Vagrant? Should I panic? Is it "normal" to have virtually 530G on a mac with 120G of disk and 16G of RAM? Or maybe did I not understand the meaning of VIRT?
The vagrant box runs on virtual box and has only 1G of RAM allocated.
Answer by chrisroberts on github:
Hi! I was able to reproduce this behavior, but with any vagrant command executed. The vagrant ssh command is the easiest to see this behavior simply because the process is left running for as long as the ssh session is alive.
The tl;dr version of below is simply: Don't worry about it. VIRT isn't allocated memory. If it were, you would either need massive swap space, or nothing would be working.
So, what's going on here? The vagrant installer includes a small go executable (vagrant) whose job is to setup the current environment with the proper locations of everything it needs. The installers bin directory, the lib directory for ruby and all the other friends, all the gems, and the vagrant gem itself. Once it has all this configured, it spawns off a new process, the actual Ruby vagrant process.
Because your example was referencing vagrant ssh, and as was previously pointed out (#7296 (comment)) a Kernel.exec happens meaning the Ruby process does not persist, I figured it must be the wrapper that was the culprit. After a bit of searching (mostly to find stackoverflow items saying "don't worry about VIRT") I stumbled upon:
keybase/keybase-issues#1908
They refer to the golang FAQ that talks about a bunch of VIRT being claimed up front and it not being a big deal, but never any absolutes about how much was actually being claimed. A link to lwn was dropped in there (keybase/keybase-issues#1908 (comment)) regarding golang's behavior on startup of claiming a huge chunk of VIRT, but still everything referenced a much lower amount than I was seeing locally. So I decided to go dig into the golang runtime code, and within malloc.go we find the answer:
golang src/runtime/malloc.go
The why it's happening is because of the go wrapper used to start vagrant. Because the VIRT you see is simply a reservation and not actually allocated, it's not a problem and not something that should be worried about.
(There are some interesting conversations on the golang ML around the pros and cons of this approach, all pretty great reads).
It's just a copy/paste (and bolded the TLDR), hope it could help someone else.
After intensive searching why certain workstations wouldn't perform a certain action when just being started up in the morning (...) I've discovered that GetPrivateProfileInt just returns the default value and doesn't bother to set GetLastError to something non-zero when the network-subsystem hasn't activated yet (e.g. because the DHCP client is still trying to get hold of an IP address to use.)
Does this sound familiar to someone? Does anybody happen to know what I should/could do about it?
For now I'll correct by using an alternate default value, and stalling a bit while I get my default value.
GetPrivateProfileInt() is one of those innocent looking Windows API functions that has a ton of code behind it. There's a mass of appcompat code, designed to allow Win3 programs to run on modern versions of Windows. One of the side-effects is that it is incredibly slow, it took about 50 msec the last time I profiled it.
Looks like you found a flaw in it. For all I know, it might actually be designed appcompat behavior. Emulating the way this API worked 18 years ago. I have no clue of course if that's accurate.
The very best thing you can do is stop using it. A possible workaround is to open the file first so that your program blocks until the service is up and running.
I would check if the file exists and sleep for a few seconds until the file is there. After some number of tries either use the default value or take an appropriate action.
Is there any "Boot session ID" or (reliable) "Boot timestamp"?
For an installation I need to detect that a scheduled reboot took place indeed.
I guess I could do a dummy MoveFileEx() with MOVEFILE_DELAY_UNTIL_REBOOT, but i did hope for something easier.
(We have to install a 3rd party package that sometimes behaves erratically after an repair/update. In that state, accessing the device may even lock up the system)
(Windows XP, Vista, 7)
For things like this, WMI (Windows Management Instrumentation) is often a good starting place. I know you can get current uptime directly through it, which may allow you to determine if a machine recently rebooted.
Here is a blog post with some code samples as well:
http://blogs.technet.com/heyscriptingguy/archive/2004/09/07/how-can-i-tell-if-a-server-has-rebooted.aspx
Depending on your implementation language, you probably just want to pull out the query code from the vbscript.
Apparently Windows has the equivalent of "uptime". Here's more info: http://support.microsoft.com/kb/555737
As I understand it, this should tell you how long ago the system was booted. Will that information solve your problem?
You could search the System event log for event 6009 from the EventLog source - this is the first event recorded after each reboot.
I think the best answer has already been given here: Find out if computer rebooted since the last time my program ran?
That seems to be the simplest way. Use GlobalFindAtom() to see if it exists and create it, with GlobalAddAtom(), if it doesn't. It will persist beyond the execution of your program. If your application runs again, and sees that the atom exists, then then it isn't the first run since reboot.
If the computer is restarted, then the atom won't exist, indicating that this is the first run of your program since the reboot.