We have several jenkins pipeline jobs that are taking much longer to complete than we expect. Specific steps seem to "hang" for unwarranted periods of time. Running those same steps manually on another system runs significantly faster.
One example job is a step that uses Ruby to recurse through a bunch of directories and performs a shell command on each file in those directories. Running on our Ubuntu 14.04 Jenkins system takes about 50 minutes. Running the same command on my desktop Mac runs in about 10 seconds.
I did some experimentation on the Jenkins builder by running the Ruby command at the command prompt and had the same slow result that Jenkins had. I also removed Ruby from the equation by batching up each of the individual shell commands Ruby would have run and put them in a shell script to run each shell command sequentially. That took a long time as well.
I've read some posts about STDERR blocking may be the reason. I've then done some experimentation with redirecting STDERR and STDOUT to /dev/null and the commands will finish in about 20 seconds. That is what I would expect.
My questions are:
1. Would these slowdowns in execution time be the result of some I/O blocking?
2. What is the best way to fix this? Some cases I may want the output so redirecting to /dev/null is probably not going to work. Is there a kernel or OS level change I can make?
Running on Ubuntu 14.04 Amazon EC2 instance R3.Large.
Ubuntu 14.04.5 LTS (GNU/Linux 3.13.0-108-generic x86_64)
ruby 2.1.5p273 (2014-11-13 revision 48405) [x86_64-linux]
Yes. Transferring huge amounts of data between slave and master leads to performance problems indeed. This applies for storing build artifacts as well as for massive amounts of console output.
For console output, the performance penalty is particularly big if you use the timestamper plugin. If enabled for your job, try disabling that first.
Otherwise, I'd avoid huge amounts of console output in general. Try to restrict console output to very high job-level information that (in case of failure) provides links to further "secondary" logfile data.
Using I/O redirection (as you already did) is the proper way to accomplish that, e.g.
mycommand 2>mycommand.stderr.txt 1>mycommand.stdout.txt
This will always work (except for very special cases where you may need to select command-specific options to redirect a console output stream that a command explicitly creates on its own).
Related
I have a server on which we execute multiple bash scripts to automate tasks (like copying files to other servers, kicking off backups, etc). It has been working for some months, but today it started to get erratic.
What is happening, is that the script gets 'stuck' for a while, and after that, it runs with no problem. If I copy and paste the commands one by one on the terminal, it works, so is not something on the script itself, but it seems something that is preventing the bash interpreter (if that makes sense).
Another weird behavior is that the same script will run with no issues eventually. However, as we use Jenkins for automation, the scripts are re-created every time a new job starts.
For example, I created a new script, tst.sh, which only contains an echo. If I try to run it directly, it gets stuck for a while. I tried to debug it with bash -xeav but it does not print my script code, which means that it is not reading it. After a while, the script ran, with no changes. However, creating one script, with the same content and a different name, resurfaces the issue.
My hypothesis is that something prevents the script to be read, and just waits until whatever is blocking it to finish. However, I did not see any process holding the file, which means that it may not the case.
Is there any other thing I should try? My knowledge in bash is pretty basic, so I don't know if there is a flag that may help me on debugging this internally.
I am working on RHEL 8.85, the bash version is GNU bash, version 4.4.20(1)-release (x86_64-redhat-linux-gnu)
UPDATES BASED ON THE COMMENTS
Server resources are OK, no usage for them.
Hardware for the server also works fine, the ops team has not reached out with any known issue at least
Reboot makes the issue disappear, however, it reappears after 5 minutes or so
The issue seems that is not related to bash profiles and such.
Issue solved, posting this as an answer so people can find it quicker.
Turns out, as multiple users suggested in the comments (thanks to all!!) the problem was caused by a security monitor, which analyzed each of the scripts that were executed. The team changed some settings on that end to prevent it from happening, and so far is working.
I'm developing a program running on embedded Linux (Debian Buster), and I found the program sometimes has performance issues. After some debugging process, I doubt the issue might not be in my program. Instead, somehow the OS start doing memory swap and my program was swapped to the file system.
Therefore, I use the code here to verify. And it turns out my program occupied much less physical memory after about 500 seconds, and it matches the hypothesis.
Now I want to find which process suddenly takes lots of memory at that point, but I don't know how.
Is there anyway to keep monitoring memory usage of all processes (or the top 10) of the system and dump to a log file? Any tools or commands would be good.
Thanks.
I'm developing a program running on embedded Linux
It will be helpful, if you could specify which embedded Linux you are working on.
Based on that, there are tools that someone could suggest.
For Linux, I would say, you could use:
top -p [PID]
you can get PID by:
ps [options]
I am not sure if there is a problem while using the command line?
dump to a log file
I think you could use grep to dump the terminal output to a log file you can create using touch command.
Scenario: I'm logging in remotely to my M1 Mini, and trying to run a Perl script which launches a long stream of instances of my (compiled for arm64 C++) program, one at a time, for testing purposes. Because it will surely take hours to run the full sequence, I nohup the Perl script.
I launched the first test run about 36 hours ago, and it is about one-third completed thus far. This is much slower than I'd expected -- in general my individual tests of the program have shown this machine to be faster than any other machine I have, and this isn't living up to that at all.
I got to thinking about this, and wondered if my code is getting treated as a "background" task, and run on one of the power-efficient "Icestorm" cores rather than one of the fast "Firestorm" cores. Does anyone know how to detect which core a process is running on and/or control which core a process is run on?
I'm trying to reduce the amount of forking our Ruby/Rails app does. We shell out a lot with backticks, and each of these forks the entire process, which can cause a huge memory bloat.
I'm going through, identifying the ones that get called the most, and trying to replace them with code which achieves the same thing without making a shell call. However, in some cases I suspect it might still be forking under the hood anyway.
Is there a way to detect or log whenever a process forks? I'm using ubuntu 14.04. A log would be ideal as I can then keep an eye on it when I run the amended code.
It must be obvious, but I cant get a usecase of Delayed Job, cause due to ruby`s Gargabe Collector specific, it doesnt free memory back to OS. And once delayed job process will take all memory anyway. And the only way is to restart delayed job process.
But if I restart delayed job process and there is currenlty running task - it will never be completed. Probably, there is some workaround to restart that task later, but this approach seems ugly to me.
I tried real jobs and some simple computatuin without any variables, symbols or links so I dont think that "my code leaks". Still, every new job increases memory of delayed_job process.
May be I use Delayed job for something that its not designed? Or it could be environment problem (besides, tried on local machine and on VPS) ?
Tested on: Ubuntu 14.04 and Debian 6 (both x86), Rails 3.2, delayed_job 4.0.2, delayed_job_active_record 4.0.1, ruby 2.1.2
I could give some code examples, but, as I mentioned, I tried both: real job and simple computation. So I won`t if it is not significant and my mistakes are fundamental.
Due to my conditions - my tasks can be executed for couple of minutes, read and write about 100K records to database and require a lot of computation, tasks cant be interrupted, and number of tasks limited by 10-20 dayli, may be - I only guess to use Resque, because it forks process everytime, so there should be no problems with accumulating memory with time.
So do I realy do something wrong or this is a nature of DJ - to occupie all memory or require a restart - and if I cant restart it, I shouldnt use its approach ?
Everything I read on the internet (not so much, by the way) tells that its rubys GC trouble that it doesnt free memory back to OS, and some advises to profile code for unlinked objects (it sounds the most realistic to my case, but, I tried a lot with code that doesnt create any objects, and I explicitly set everything to nil and call GC.start)