Strange output from Docker image/container? - image

I am fairly new to Docker and am trying to run an image and when I do I would usually get “inside” the image if that makes sense, where i can access different directories that i have made inside.
However, when I have done it recently I have gotten the following output:
top - 15:49:10 up 2:36, 0 users, load average: 0.65, 0.70, 0.71
Tasks: 1 total, 1 running, 0 sleeping, 0 stopped, 0 zombie
%Cpu(s): 5.9 us, 2.8 sy, 0.2 ni, 89.2 id, 1.8 wa, 0.0 hi, 0.1 si, 0.0 st
KiB Mem : 3930660 total, 370676 free, 1749516 used, 1810468 buff/cache
KiB Swap: 4076540 total, 4076540 free, 0 used. 1550316 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1 root 20 0 36536 2968 2604 R 0.0 0.1 0:00.05 top -b -c
top - 15:49:13 up 2:36, 0 users, load average: 0.65, 0.70, 0.71
Tasks: 1 total, 1 running, 0 sleeping, 0 stopped, 0 zombie
%Cpu(s): 3.0 us, 2.6 sy, 0.0 ni, 94.2 id, 0.0 wa, 0.0 hi, 0.2 si, 0.0 st
KiB Mem : 3930660 total, 366860 free, 1753244 used, 1810556 buff/cache
KiB Swap: 4076540 total, 4076540 free, 0 used. 1546536 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1 root 20 0 36536 2968 2604 R 0.0 0.1 0:00.05 top -b -c^
from the following docker command:
sudo docker run -i -t ubuntu-latest
I am running docker 17.12 on ubuntu 16.04. At this time If I could receive a solution without having to post the dockerfile I will, due to certain information being present in the file.
Any feedback would be greatly appreciated

when a container launches it executes a binary which can be defined within the image or overridden with the cli using the entrypoint and command arguments.
see https://docs.docker.com/engine/reference/builder/#cmd vs https://docs.docker.com/engine/reference/builder/#entrypoint
In this case it looks like you've launched your container to run 'top' automatically which is why it's launching and executing top as pid 1 instead of an interactive bash session. If you could paste just your Dockerfile Entrypoint and CMD args it would be possible to know exactly what's happening but you should be able to override them via the cli with:
sudo docker run --entrypoint /bin/bash -i -t ubuntu-latest

Related

What is the priority of background processes in linux environment

I would like to know how the OS prioritises the execution of background processes in Linux.
Suppose I have the below command, would it be executed right away, or would the OS prioritise the execution order.
nohup /bin/bash /tmp/kill_loop.sh &
Thanks
All processes running at the same nice value will get an equal cpu-timeslice.
Here is a simple test that launches 2 processes, both performing the exact same operations. One is launched in the background and the other in the foreground.
dd if=/dev/zero of=/dev/null bs=1 &
dd if=/dev/zero of=/dev/null bs=1
The relevant extract from subsequently running the top command
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1366 root 20 0 1576 532 436 R 100 0.0 0:30.79 dd
1365 root 20 0 1576 532 436 R 100 0.0 0:30.79 dd
Next, if both the processes are restricted to the same CPU,
taskset -c 0 dd if=/dev/zero of=/dev/null bs=1 &
taskset -c 0 dd if=/dev/zero of=/dev/null bs=1
Again the relevant extract from subsequently running the top command shows
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1357 root 20 0 1576 532 436 R 50 0.0 0:38.74 dd
1358 root 20 0 1576 532 436 R 50 0.0 0:38.74 dd
both the processes compete for CPU-timeslice and are equally prioritised.
Finally,
kill -SIGINT 1357 &
kill -SIGINT 1358 &
kill -SIGINT 1365 &
kill -SIGINT 1366 &
results in similar amounts of data copied and throughput.
25129255+0 records in
25129255+0 records out
25129255 bytes (25 MB) copied, 34.883 s, 720 kB/s
Slight discrepancies in output may occur in the throughput due to differences in the exact moment the individual processes respond to the break-signal and stop running.
However also note that sched_autogroup_enabled exists.
if enabled, sched_autogroup_enabled ensures that the fairness in distributing cpu-timeslice is now performed between individual shells. By distributing cpu equally amongst the various active shells.
Thus if a shell launches 1 process A,
and another shell launches 2 processes B and C,
then the CPU execution timeslice will typically be distributed as
A <-- 50% <---- shell1 50%
B <-- 25% <-.
C <-- 25% <--`- shell2 50%
(though all 3 processes A, B & C are running at the same nice level.)
The process priorities in Linux kernel is given by NICE values.
Refer to the link
http://en.wikipedia.org/wiki/Nice_(Unix)
The nice values (ranging between -20 to +19) define the process priorities, -20 being the highest priority task. Usually the user-space processes are given default nice values of '0'. You can check the nice values for the running processes on your shell using the below command.
ps -al
F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD
0 S 1039 1268 16889 0 80 0 - 11656 poll_s pts/8 00:00:08 vim
0 S 1047 1566 17683 0 80 0 - 2027 wait pts/18 00:00:00 arm-linux-andro
0 R 1047 1567 1566 21 80 0 - 9143 ? pts/18 00:00:00 cc1
0 R 1031 1570 15865 0 80 0 - 2176 - pts/24 00:00:00 ps
0 R 1031 17357 15865 99 80 0 - 2597 - pts/24 00:03:29 top
So from above output if you see the 'NI' column shows your nice values. When i tried running a background process, that too got a nice value of '0' (top is that process with PID 17357). That would mean, it will also be queued up for like a foreground process and will be scheduled likewise.

Puppet agent hangs and eventually gives a memory allocation error

I'm using puppet as a provisioner for Vagrant, and am coming across an issue where Puppet will hang for an extremely long time when I do a "vagrant provision". Building the box from scratch using "vagrant up" doesn't seem to be a problem, only subsequent provisions.
If I turn puppet debug on and watch where it hangs, it seems to stop at various, seemingly arbitrary, points the first of which is:
Info: Applying configuration version '1401868442'
Debug: Prefetching yum resources for package
Debug: Executing '/bin/rpm --version'
Debug: Executing '/bin/rpm -qa --nosignature --nodigest --qf '%{NAME} %|EPOCH?{% {EPOCH}}:{0}| %{VERSION} %{RELEASE} %{ARCH}\n''
Executing this command on the server myself returns immediately.
Eventually, it gets past this and continues. Using the summary option, I get the following, after waiting for a very long time for it to complete:
Debug: Finishing transaction 70191217833880
Debug: Storing state
Debug: Stored state in 9.39 seconds
Notice: Finished catalog run in 1493.99 seconds
Changes:
Total: 2
Events:
Failure: 2
Success: 2
Total: 4
Resources:
Total: 18375
Changed: 2
Failed: 2
Skipped: 35
Out of sync: 4
Time:
User: 0.00
Anchor: 0.01
Schedule: 0.01
Yumrepo: 0.07
Augeas: 0.12
Package: 0.18
Exec: 0.96
Service: 1.07
Total: 108.93
Last run: 1401869964
Config retrieval: 16.49
Mongodb database: 3.99
File: 76.60
Mongodb user: 9.43
Version:
Config: 1401868442
Puppet: 3.4.3
This doesn't seem very helpful to me, as the amount of time total's 108 seconds, so where have the other 1385 seconds gone?
Throughout, Puppet seems to be hammering the box, using up a lot of CPU, but still doesn't seem to advance. The memory it uses seems to continually increase. When I kick off the command, top looks like this:
Cpu(s): 10.2%us, 2.2%sy, 0.0%ni, 85.5%id, 2.2%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 4956928k total, 2849296k used, 2107632k free, 63464k buffers
Swap: 950264k total, 26688k used, 923576k free, 445692k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
28486 root 20 0 439m 334m 3808 R 97.5 6.9 2:02.92 puppet
22 root 20 0 0 0 0 S 1.3 0.0 0:07.55 kblockd/0
18276 mongod 20 0 788m 31m 3040 S 1.3 0.6 2:31.82 mongod
20756 jboss-as 20 0 3081m 1.5g 21m S 1.3 31.4 7:13.15 java
20930 elastics 20 0 2340m 236m 6580 S 1.0 4.9 1:44.80 java
266 root 20 0 0 0 0 S 0.3 0.0 0:03.85 jbd2/dm-0-8
22717 vagrant 20 0 98.0m 2252 1276 S 0.3 0.0 0:01.81 sshd
28762 vagrant 20 0 15036 1228 932 R 0.3 0.0 0:00.10 top
1 root 20 0 19364 1180 964 S 0.0 0.0 0:00.86 init
To me, this seems fine, there's over 2GB of available memory and plenty of available swap. I have a max open files limit of 1024.
About 10-15 minutes later, still no advance in the console output, but top looks like this:
Cpu(s): 11.2%us, 1.6%sy, 0.0%ni, 86.9%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%s
Mem: 4956928k total, 3834376k used, 1122552k free, 64248k buffers
Swap: 950264k total, 24408k used, 925856k free, 445728k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
28486 root 20 0 1397m 1.3g 3808 R 99.6 26.7 15:16.19 puppet
18276 mongod 20 0 788m 31m 3040 R 1.7 0.6 2:45.03 mongod
20756 jboss-as 20 0 3081m 1.5g 21m S 1.3 31.4 7:25.93 java
20930 elastics 20 0 2340m 238m 6580 S 0.7 4.9 1:52.03 java
8486 root 20 0 308m 952 764 S 0.3 0.0 0:06.03 VBoxService
As you can see, puppet is now using a lot more of the memory, and it seems to continue in this fashion. The box it's building has 5GB of RAM, so I wouldn't have expected it to have memory issues. However, further down the line, after a long wait, I do get "Cannot allocate memory - fork(2)"
Running unlimit -a, I get:
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 38566
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 1024
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
Which, again looks fine to me...
To be honest, I'm completely at a loss as to how to go about solving this, or what is causing it.
Any help or insight would be greatly appreciated!
EDIT:
So I managed to fix this eventually... It came down to using recurse with a file directive for a large directory. The target directory in question contained around 2GB worth of files, and puppet took a huge amount of time loading this into memory and doing it's hashes and comparisons. The first time I stood the server up, the directory was relatively empty so the check was quick, but then other resources were placed in it that increased its size massively, meaning subsequent runs took much longer.
The memory error that eventually was thrown was because, I can only assume, Puppet was loading the whole thing into memory in order to do its stuff...
I found a way around using the recurse function, and am now trying to avoid it like the plague...
Yeah, the problem with the recurse parameter on the file type is that it checks every single file's checksum, which on a massive directory adds up real quick.
As Felix suggests, using checksum => none is one way to fix it, another is to accomplish the task you're trying to do (say chmod or chown a whole directory) with an exec performing the native task, with an unless to check if it's already been done.
Something like:
define check_mode($mode) {
exec { "/bin/chmod $mode $name":
unless => "/bin/sh -c '[ $(/usr/bin/stat -c %a $name) == $mode ]'",
}
}
Taken from http://projects.puppetlabs.com/projects/1/wiki/File_Permission_Check_Patterns

What is LOCAL=NO in oracle processes

I was trying to find out processes that are consuming more memory on my Unix box using top command:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
23421 test 18 0 6408m 2.8g 2.8g D 0.0 23.7 1:03.63 xyz
11874 test 15 0 6378m 1.9g 1.9g S 0.0 16.1 0:05.47 xyz
31217 test 15 0 6379m 1.9g 1.9g R 0.0 16.0 0:44.21 xyz
As above processes are consuming more than 15% of MEMory, I tried to search further:
-bash-3.2$ ps 23421 11874 31217
PID TTY STAT TIME COMMAND
23421 ? Ds 1:03 ora_dbw0_xyz
11874 ? Ss 0:05 oraclexyz (LOCAL=NO)
31217 ? Ds 0:46 oraclexyz (LOCAL=NO)
This command shows some output that Oracle database is consuming more memory.
On searching in internet I found that ora_dbw0 is some database writer process but I am not able to understand what is (LOCAL=NO) process and how is it associated with Oracle database. Please help me in understanding what are these processes.
(LOCAL=NO) processes are the processes of connections using SQL*net (localhost or remote machines) and are not using MTS (Multi Threaded server)
local processes, connections from the database server, using ORACLE_SID use the Bequeath protocol. In the process list these are show as :
oracledxxx (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))

User processes in D-state leads to a watchdog reset using Linux 2.6.24 and arm processor

Most of the user space processes are ending up in D-state after the unit runs for around 3-4 days, the unit is running on ARM processor. From the top o/p we can see that processes that are in D-state are waiting on system calls "page_fault" and "squashfs_readpage". Utimately this leads to a watchdog reset. The processes that go into D-sate would take unusually long time to recover.
Following is the top o/p when the system ends up in trouble:
top - 12:00:11 up 3 days, 2:40, 3 users, load average: 2.77, 1.90, 1.72
Tasks: 250 total, 3 running, 238 sleeping, 0 stopped, 9 zombie
Cpu(s): 10.0% us, 75.5% sy, 0.0% ni, 0.0% id, 10.3% wa, 0.0% hi, 4.2% si
Mem: 191324k total, 188896k used, 2428k free, 2548k buffers
Swap: 0k total, 0k used, 0k free, 87920k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1003 root 20 0 225m 31m 4044 S 15.2 16.7 0:21.91 user_process_1
3745 root 20 0 80776 9476 3196 **D** 9.0 5.0 1:31.79 user_process_2
129 root 15 -5 0 0 0 S 7.4 0.0 0:27.65 **mtdblockd**
4624 root 20 0 3640 256 160 **D** 6.5 0.1 0:00.20 GetCounters_cus
3 root 15 -5 0 0 0 S 3.2 0.0 43:38.73 ksoftirqd/0
31363 root 20 0 2356 1176 792 R 2.6 0.6 40:09.58 top
347 root 30 10 0 0 0 S 1.9 0.0 28:56.04 **jffs2_gcd_mtd3**
1169 root 20 0 225m 31m 4044 S 1.9 16.7 39:31.36 user_process_1
604 root 20 0 0 0 0 S 1.6 0.0 27:22.76 user_process_3
1069 root -23 0 225m 31m 4044 S 1.3 16.7 20:45.39 user_process_1
4545 root 20 0 3640 564 468 S 1.0 0.3 0:00.08 GetCounters_cus
64 root 15 -5 0 0 0 **D** 0.3 0.0 0:00.83 **kswapd0**
969 root 20 0 20780 1856 1376 S 0.3 1.0 14:18.89 user_process_4
973 root 20 0 225m 31m 4044 S 0.3 16.7 3:35.74 user_process_1
1070 root -23 0 225m 31m 4044 S 0.3 16.7 16:41.04 user_process_1
1151 root -81 0 225m 31m 4044 S 0.3 16.7 23:13.05 user_process_1
1152 root -99 0 225m 31m 4044 S 0.3 16.7 8:48.47 user_process_1
One more interesting observation is that when the system lands up in this problem, we can consistently see "mtdblockd" process running in the top o/p. We have swap disabled on this unit. there is no apparent memory leak in the unit.
Any idea what could be the possible reasons, the processes are stuck in D-sates?
D-state means the processes are stuck in the kernel in a TASK_UNINTERRUPTIBLE sleep, this is unlikely to be bugs in the Squashfs error handling code because if a process exited Squashfs holding a mutex, the system would quickly grind to a halt as other processes entered Squashfs and slept forever waiting for the mutex. You would also see a low load average/system time as most processes would be sleeping. Furthermore there is no evidence Squashfs has hit any I/O errors.
Load average (2.77) and system time (75.5%) is extremely high, coupled with the fact a lot of processes are in Squashfs_readpage (which is completing but slow), indicates the system is thrashing. There is too little memory and the system is spending all it's time constantly (re-)demand paging pages from disk. This will account for the fact a lot of processes are in Squashfs_readpage, system time is extremely high because the system is spending most of its time in Squashfs in the CPU intensive task of decompression. The other processes are stuck in Squashfs waiting on the decompressor mutex (only one process can be decompressing at a time because the decompressor state is shared).

Using the top program in bash to extract cpu time into a variable

I have an assignment in bash scripting trying to measure cpu time used for a process passed into the script by name. I can find the process id and pass it to the top program in bash. However, I haven't figured out how to extract the cpu time from the top program. for example:
top is printing out:
top - 00:57:07 up 6:06, 2 users, load average: 0.46, 0.31, 0.55
Tasks: 1 total, 1 running, 0 sleeping, 0 stopped, 0 zombie
%Cpu(s): 3.7 us, 0.8 sy, 0.0 ni, 94.6 id, 0.9 wa, 0.0 hi, 0.1 si, 0.0 st
KiB Mem: 1928720 total, 1738072 used, 190648 free, 57184 buffers
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3337 amarkovi 20 0 372m 31m 10m R 0.7 1.7 13:28.74 chromium-browse
all I want from this is the TIME+ field to be assigned to variable so I can add up the time and print it out by it self.
I am a noob to bash scripting so please be patient.
thanks,
Do you have to use top? It should be much simpler (once you work out the right options) to use ps to give you just the fields you want, then use grep to select just the processes you want.
Since it's an assigment i don't want to spoil all the fun :D. I'll just point you to some commands of which can help you in your endeavour : sed, awk and cut. With this 3 you can solve it in many ways, enjoy!

Resources