minifi java agent uses high CPU on AIX - apache-nifi

I noticed that the TailFile processor consumes CPU on the AIX operating system.
Can I do anything to reduce the consumption?
Processors:
- id: xxxxxxxxxxxxxxxxxxxxxxxxxxx
name: TailFile
class: org.apache.nifi.processors.standard.TailFile
max concurrent tasks: 1
scheduling strategy: TIMER_DRIVEN
scheduling period: 0 sec
penalization period: 30 sec
yield period: 1 sec
run duration nanos: 0
auto-terminated relationships list:
- success
Properties:
File Location: Local
File to Tail: *.log
Initial Start Position: Beginning of File
Rolling Filename Pattern:
tail-base-directory: /WorkingDir85/log/
tail-mode: Multiple files
tailfile-lookup-frequency: 10 minutes
tailfile-maximum-age: 24 hours
tailfile-recursive-lookup: 'false'

The scheduling period is 0 sec which basically means run as fast as possible. Setting to something like '10 ms' or even '1 ms' should lighten the CPU usage.

Related

bash script with parallel execution

I am trying to use parallel in a bash script, to verify if s3 path exists or not and I am trying to verify multiple s3 paths, by counting the objects in the path. If the count of the object is zero it will continue to the next date in the for loop, with parallel it is not working as expected.
For Date range I provided in the for loop, we actually don't have those folders in the s3bucket, and in the function checkS3Path if s3 path doesnt exists, I am creating a 0KB file, but I dont see those 0KB files being created after script is executed. From the output of the script, I am seeing S3 Path Consists CSV Files, Proceeding to next step folder1:+2019-10-03, instead of S3 Path Doesnt Exists folder1:+2019-10-03. Please see the output below.
please let me what might be the issue.
Here is the sample code.
#!/bin/bash
#set -x
s3Bucket=testbucket
version=v20
Array=(folder1 folder2 folder3)
checkS3Path() {
fldName=$1
date=$2
objectNum=$(aws s3 ls s3://${s3Bucket}/${version}/${fldName}/date=${date}/ | wc -l)
echo $objectNum
if [ "$objectNum" -eq 0 ]
then
echo "S3 Path Doesnt Exists ${fldName}:${date}" >> /app/${fldName}.log
touch /home/ubuntu/${fldName}_${date}.txt
continue
else
echo "S3 Path Consists csv Files, Proceeding to next step ${fldName}:${date}"
fi
}
final() {
fldName=$1
date=$2
checkS3Path $fldName $date
function2 $fldName $date
function3 $fldName $date
}
export -f final checkS3Path
for date in 2019-10-{01..03}
do
# finalstep folder1 $date
parallel --jobs 4 --eta finalstep ::: "${Array[#]}" ::: +"$date"
done
Here is the output I am seeing.
$ ./test.sh
Academic tradition requires you to cite works you base your article on.
When using programs that use GNU Parallel to process data for publication
please cite:
O. Tange (2011): GNU Parallel - The Command-Line Power Tool,
;login: The USENIX Magazine, February 2011:42-47.
This helps funding further development; AND IT WON'T COST YOU A CENT.
If you pay 10000 EUR you should feel free to use GNU Parallel without citing.
To silence this citation notice: run 'parallel --citation'.
Computers / CPU cores / Max jobs to run
1:local / 4 / 4
Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete
ETA: 0s Left: 14 AVG: 0.00s local:4/0/100%/0.0s 202
S3 Path Consists CSV Files, Proceeding to next step folder1:+2019-10-01
ETA: 0s Left: 13 AVG: 0.00s local:4/1/100%/2.0s 202
S3 Path Consists CSV Files, Proceeding to next step folder2:+2019-10-01
ETA: 0s Left: 12 AVG: 0.00s local:4/2/100%/1.0s 202
S3 Path Consists CSV Files, Proceeding to next step folder3:+2019-10-01
Academic tradition requires you to cite works you base your article on.
When using programs that use GNU Parallel to process data for publication
please cite:
O. Tange (2011): GNU Parallel - The Command-Line Power Tool,
;login: The USENIX Magazine, February 2011:42-47.
This helps funding further development; AND IT WON'T COST YOU A CENT.
If you pay 10000 EUR you should feel free to use GNU Parallel without citing.
To silence this citation notice: run 'parallel --citation'.
Computers / CPU cores / Max jobs to run
1:local / 4 / 4
Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete
ETA: 0s Left: 14 AVG: 0.00s local:4/0/100%/0.0s 202
S3 Path Consists CSV Files, Proceeding to next step folder1:+2019-10-02
ETA: 0s Left: 13 AVG: 0.00s local:4/1/100%/0.0s 202
S3 Path Consists CSV Files, Proceeding to next step folder2:+2019-10-02
ETA: 6s Left: 12 AVG: 0.50s local:4/2/100%/0.5s 202
S3 Path Consists CSV Files, Proceeding to next step folder3:+2019-10-02
ETA: 3s Left: 11 AVG: 0.33s local:4/3/100%/0.3s 202
Academic tradition requires you to cite works you base your article on.
When using programs that use GNU Parallel to process data for publication
please cite:
O. Tange (2011): GNU Parallel - The Command-Line Power Tool,
;login: The USENIX Magazine, February 2011:42-47.
This helps funding further development; AND IT WON'T COST YOU A CENT.
If you pay 10000 EUR you should feel free to use GNU Parallel without citing.
To silence this citation notice: run 'parallel --citation'.
Computers / CPU cores / Max jobs to run
1:local / 4 / 4
Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete
ETA: 0s Left: 14 AVG: 0.00s local:4/0/100%/0.0s 202
S3 Path Consists CSV Files, Proceeding to next step folder1:+2019-10-03
ETA: 0s Left: 13 AVG: 0.00s local:4/1/100%/1.0s 202
S3 Path Consists CSV Files, Proceeding to next step folder2:+2019-10-03
ETA: 0s Left: 12 AVG: 0.00s local:4/2/100%/0.5s 202
S3 Path Consists CSV Files, Proceeding to next step folder3:+2019-10-03
ETA: 0s Left: 11 AVG: 0.00s local:4/3/100%/0.3s 202
$
Thanks
If checkS3Path works when run by hand, then you probably just need to:
export s3Bucket=testbucket
export version=v20
Each GNU Parallel job runs in its own shell (started from Perl) which is the reason you need to export variables, if you want them to be visible to the job.
Also look at env_parallel to do this automatically.

Ansible: Polling a log file for idle time before server restart

I'd like Ansible to tail a log file and wait for idle time - say for example XX seconds where the logs are idle for that time.
If the logs are not idle within that XX seconds, continue to wait until we have XX seconds of idle time.
If the idle time elapses then Ansible will restart the server.
Idle time can be calculated by checking the last 2 log entries and the time difference between them.
How can I go about achieving this with Ansible?
There's no need to wait for two log entries XX seconds apart rather than just waiting until XX seconds after the last log entry since if you've gone longer than XX seconds the next write will match your condition.
And on that basis, just do this (for example purposes XX = 5)
- find: paths="/tmp" patterns="logfile" age="5s" age_stamp="mtime"
register: modified
until: modified.matched == 1
retries: 20
delay: 5
Just make sure the paths and patterns specifications match only your log file.
If you have a command to print the time difference in seconds, you can use:
- shell: /opt/myscripts/log_difference.sh
register: result
until: result.stdout | int > 60
retries: 5
delay: 10
This will execute log_difference.sh until number of retries is exceeded or number in stdout is greater than 60.

Jmeter Distributed Environment : Active Threads Over Time : total not displayed

I have a JMeter distributed environment (localhost and host1)
localhost (master and slave)
host1 (slave).
I have a thread group in my test plan with 10 users and 50 loops.
Running the test as bellow :
jmeter.bat -t myscript.jmx -n -r -l results.csv
My test runs successfully and I get a total of 20 threads as displayed on the output
summary + 800 in 30,2s = 26,5/s Avg: 580 Min: 33 Max: 2315 Err: 0 (0,00%) Active: 20 Started: 20 Finished: 0
But when I try to graph "jp#gc - Active Threads Over Time" with data in results.CSV , I just get the number of active threads = 10 .
My question is how can I get a Grape with all 20 threads running?
The main problem is that I suspect that result.csv is not complete and does not contain all performance information such as response time etc.
For JMeter version < 5.0 :
This is because you need to add a unique ID per injector in Thread Group name as per documentation:
https://jmeter-plugins.org/wiki/ActiveThreadsOverTime/
For example:
If running only 1 JMeter per machine:
${__machineName()}_My Threadgroup name
If running multiple injectors per machine::
${__P(JVM_ID,1)}_My Threadgroup name
For version >= JMETER 5.0:
It will work correctly Out Of The Box as per this fix:
https://bz.apache.org/bugzilla/show_bug.cgi?id=62684

Puppet agent hangs and eventually gives a memory allocation error

I'm using puppet as a provisioner for Vagrant, and am coming across an issue where Puppet will hang for an extremely long time when I do a "vagrant provision". Building the box from scratch using "vagrant up" doesn't seem to be a problem, only subsequent provisions.
If I turn puppet debug on and watch where it hangs, it seems to stop at various, seemingly arbitrary, points the first of which is:
Info: Applying configuration version '1401868442'
Debug: Prefetching yum resources for package
Debug: Executing '/bin/rpm --version'
Debug: Executing '/bin/rpm -qa --nosignature --nodigest --qf '%{NAME} %|EPOCH?{% {EPOCH}}:{0}| %{VERSION} %{RELEASE} %{ARCH}\n''
Executing this command on the server myself returns immediately.
Eventually, it gets past this and continues. Using the summary option, I get the following, after waiting for a very long time for it to complete:
Debug: Finishing transaction 70191217833880
Debug: Storing state
Debug: Stored state in 9.39 seconds
Notice: Finished catalog run in 1493.99 seconds
Changes:
Total: 2
Events:
Failure: 2
Success: 2
Total: 4
Resources:
Total: 18375
Changed: 2
Failed: 2
Skipped: 35
Out of sync: 4
Time:
User: 0.00
Anchor: 0.01
Schedule: 0.01
Yumrepo: 0.07
Augeas: 0.12
Package: 0.18
Exec: 0.96
Service: 1.07
Total: 108.93
Last run: 1401869964
Config retrieval: 16.49
Mongodb database: 3.99
File: 76.60
Mongodb user: 9.43
Version:
Config: 1401868442
Puppet: 3.4.3
This doesn't seem very helpful to me, as the amount of time total's 108 seconds, so where have the other 1385 seconds gone?
Throughout, Puppet seems to be hammering the box, using up a lot of CPU, but still doesn't seem to advance. The memory it uses seems to continually increase. When I kick off the command, top looks like this:
Cpu(s): 10.2%us, 2.2%sy, 0.0%ni, 85.5%id, 2.2%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 4956928k total, 2849296k used, 2107632k free, 63464k buffers
Swap: 950264k total, 26688k used, 923576k free, 445692k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
28486 root 20 0 439m 334m 3808 R 97.5 6.9 2:02.92 puppet
22 root 20 0 0 0 0 S 1.3 0.0 0:07.55 kblockd/0
18276 mongod 20 0 788m 31m 3040 S 1.3 0.6 2:31.82 mongod
20756 jboss-as 20 0 3081m 1.5g 21m S 1.3 31.4 7:13.15 java
20930 elastics 20 0 2340m 236m 6580 S 1.0 4.9 1:44.80 java
266 root 20 0 0 0 0 S 0.3 0.0 0:03.85 jbd2/dm-0-8
22717 vagrant 20 0 98.0m 2252 1276 S 0.3 0.0 0:01.81 sshd
28762 vagrant 20 0 15036 1228 932 R 0.3 0.0 0:00.10 top
1 root 20 0 19364 1180 964 S 0.0 0.0 0:00.86 init
To me, this seems fine, there's over 2GB of available memory and plenty of available swap. I have a max open files limit of 1024.
About 10-15 minutes later, still no advance in the console output, but top looks like this:
Cpu(s): 11.2%us, 1.6%sy, 0.0%ni, 86.9%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%s
Mem: 4956928k total, 3834376k used, 1122552k free, 64248k buffers
Swap: 950264k total, 24408k used, 925856k free, 445728k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
28486 root 20 0 1397m 1.3g 3808 R 99.6 26.7 15:16.19 puppet
18276 mongod 20 0 788m 31m 3040 R 1.7 0.6 2:45.03 mongod
20756 jboss-as 20 0 3081m 1.5g 21m S 1.3 31.4 7:25.93 java
20930 elastics 20 0 2340m 238m 6580 S 0.7 4.9 1:52.03 java
8486 root 20 0 308m 952 764 S 0.3 0.0 0:06.03 VBoxService
As you can see, puppet is now using a lot more of the memory, and it seems to continue in this fashion. The box it's building has 5GB of RAM, so I wouldn't have expected it to have memory issues. However, further down the line, after a long wait, I do get "Cannot allocate memory - fork(2)"
Running unlimit -a, I get:
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 38566
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 1024
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
Which, again looks fine to me...
To be honest, I'm completely at a loss as to how to go about solving this, or what is causing it.
Any help or insight would be greatly appreciated!
EDIT:
So I managed to fix this eventually... It came down to using recurse with a file directive for a large directory. The target directory in question contained around 2GB worth of files, and puppet took a huge amount of time loading this into memory and doing it's hashes and comparisons. The first time I stood the server up, the directory was relatively empty so the check was quick, but then other resources were placed in it that increased its size massively, meaning subsequent runs took much longer.
The memory error that eventually was thrown was because, I can only assume, Puppet was loading the whole thing into memory in order to do its stuff...
I found a way around using the recurse function, and am now trying to avoid it like the plague...
Yeah, the problem with the recurse parameter on the file type is that it checks every single file's checksum, which on a massive directory adds up real quick.
As Felix suggests, using checksum => none is one way to fix it, another is to accomplish the task you're trying to do (say chmod or chown a whole directory) with an exec performing the native task, with an unless to check if it's already been done.
Something like:
define check_mode($mode) {
exec { "/bin/chmod $mode $name":
unless => "/bin/sh -c '[ $(/usr/bin/stat -c %a $name) == $mode ]'",
}
}
Taken from http://projects.puppetlabs.com/projects/1/wiki/File_Permission_Check_Patterns

Ruby concurrency: non-blocking I/O vs threads

I am playing around with concurrency in Ruby (1.9.3-p0), and have created a very simple, I/O-heavy proxy task. First, I tried the non-blocking approach:
require 'rack'
require 'rack/fiber_pool'
require 'em-http'
require 'em-synchrony'
require 'em-synchrony/em-http'
proxy = lambda {|*|
result = EM::Synchrony.sync EventMachine::HttpRequest.new('http://google.com').get
[200, {}, [result.response]]
}
use Rack::FiberPool, :size => 1000
run proxy
=begin
$ thin -p 3000 -e production -R rack-synchrony.ru start
>> Thin web server (v1.3.1 codename Triple Espresso)
$ ab -c100 -n100 http://localhost:3000/
Concurrency Level: 100
Time taken for tests: 5.602 seconds
HTML transferred: 21900 bytes
Requests per second: 17.85 [#/sec] (mean)
Time per request: 5602.174 [ms] (mean)
=end
Hmm, I thought I must be doing something wrong. An average request time of 5.6s for a task where we are mostly waiting for I/O? I tried another one:
require 'sinatra'
require 'sinatra/synchrony'
require 'em-synchrony/em-http'
get '/' do
EM::HttpRequest.new("http://google.com").get.response
end
=begin
$ ruby sinatra-synchrony.rb -p 3000 -e production
== Sinatra/1.3.1 has taken the stage on 3000 for production with backup from Thin
>> Thin web server (v1.3.1 codename Triple Espresso)
$ ab -c100 -n100 http://localhost:3000/
Concurrency Level: 100
Time taken for tests: 5.476 seconds
HTML transferred: 21900 bytes
Requests per second: 18.26 [#/sec] (mean)
Time per request: 5475.756 [ms] (mean)
=end
Hmm, a little better, but not what I would call a success. Finally, I tried a threaded implementation:
require 'rack'
require 'excon'
proxy = lambda {|*|
result = Excon.get('http://google.com')
[200, {}, [result.body]]
}
run proxy
=begin
$ thin -p 3000 -e production -R rack-threaded.ru --threaded --no-epoll start
>> Thin web server (v1.3.1 codename Triple Espresso)
$ ab -c100 -n100 http://localhost:3000/
Concurrency Level: 100
Time taken for tests: 2.014 seconds
HTML transferred: 21900 bytes
Requests per second: 49.65 [#/sec] (mean)
Time per request: 2014.005 [ms] (mean)
=end
That was really, really surprising. Am I missing something here? Why is EM performing so badly here? Is there some tuning I need to do? I tried various combinations (Unicorn, several Rainbows configurations, etc), but none of them came even close to the simple, old I/O-blocking threading.
Ideas, comments and - obviously - suggestions for better implementations are very welcome.
See how your "Time per request" exactly equals total "Time taken for tests"? This is a reporting arithmetic artifact due to your request count (-n) being equal to your concurrency level (-c). The mean-time is the total-time*concurrency/num-requests. So the reported mean when -n == -c will be the time of the longest request. You should conduct your ab runs with -n > -c by several factors to get reasonable measures.
You seem to be using an old version of ab as a relatively current one reports far more detailed results by default. Running directly against google I show similar total-time == mean time when -n == -c, and get more reasonable numbers when -n > -c. You really want to look at the req/sec, mean across all concurrent requests, and the final service level breakdown to get a better understanding.
$ ab -c50 -n50 http://google.com/
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking google.com (be patient).....done
Server Software: gws
Server Hostname: google.com
Server Port: 80
Document Path: /
Document Length: 219 bytes
Concurrency Level: 50
Time taken for tests: 0.023 seconds <<== note same as below
Complete requests: 50
Failed requests: 0
Write errors: 0
Non-2xx responses: 50
Total transferred: 27000 bytes
HTML transferred: 10950 bytes
Requests per second: 2220.05 [#/sec] (mean)
Time per request: 22.522 [ms] (mean) <<== note same as above
Time per request: 0.450 [ms] (mean, across all concurrent requests)
Transfer rate: 1170.73 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 1 2 0.6 3 3
Processing: 8 9 2.1 9 19
Waiting: 8 9 2.1 9 19
Total: 11 12 2.1 11 22
WARNING: The median and mean for the initial connection time are not within a normal deviation
These results are probably not that reliable.
Percentage of the requests served within a certain time (ms)
50% 11
66% 12
75% 12
80% 12
90% 12
95% 12
98% 22
99% 22
100% 22 (longest request) <<== note same as total and mean above
$ ab -c50 -n500 http://google.com/
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking google.com (be patient)
Completed 100 requests
Completed 200 requests
Completed 300 requests
Completed 400 requests
Completed 500 requests
Finished 500 requests
Server Software: gws
Server Hostname: google.com
Server Port: 80
Document Path: /
Document Length: 219 bytes
Concurrency Level: 50
Time taken for tests: 0.110 seconds
Complete requests: 500
Failed requests: 0
Write errors: 0
Non-2xx responses: 500
Total transferred: 270000 bytes
HTML transferred: 109500 bytes
Requests per second: 4554.31 [#/sec] (mean)
Time per request: 10.979 [ms] (mean)
Time per request: 0.220 [ms] (mean, across all concurrent requests)
Transfer rate: 2401.69 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 1 1 0.7 1 3
Processing: 8 9 0.7 9 13
Waiting: 8 9 0.7 9 13
Total: 9 10 1.3 10 16
Percentage of the requests served within a certain time (ms)
50% 10
66% 11
75% 11
80% 12
90% 12
95% 13
98% 14
99% 15
100% 16 (longest request)

Resources