Prometheus does not scrape all metrics from PCP pmproxy - performance

On my laptop with Fedora 30 I have Performance Co-Pilot (PCP) daemons installed and running, and Prometheus installed from the package golang-github-prometheus-prometheus-1.8.0-4.fc30.x86_64. In PCP collector's config I specified the following metric namespaces:
# Performance Metrics Domain Specifications
#
# This file is automatically generated during the build
# Name Id IPC IPC Params File/Cmd
#root 1 pipe binary /var/lib/pcp/pmdas/root/pmdaroot
#pmcd 2 dso pmcd_init /var/lib/pcp/pmdas/pmcd/pmda_pmcd.so
proc 3 pipe binary /var/lib/pcp/pmdas/proc/pmdaproc -d 3
#xfs 11 pipe binary /var/lib/pcp/pmdas/xfs/pmdaxfs -d 11
linux 60 pipe binary /var/lib/pcp/pmdas/linux/pmdalinux
#pmproxy 4 dso pmproxy_init /var/lib/pcp/pmdas/mmv/pmda_mmv.so
mmv 70 dso mmv_init /var/lib/pcp/pmdas/mmv/pmda_mmv.so
#jbd2 122 dso jbd2_init /var/lib/pcp/pmdas/jbd2/pmda_jbd2.so
#kvm 95 pipe binary /var/lib/pcp/pmdas/kvm/pmdakvm -d 95
[access]
disallow ".*" : store;
disallow ":*" : store;
allow "local:*" : all;
When I visit the URL localhost:44323/metrics the output is very rich and covers many namespaces, ie. mem, network, kernel, filesys, hotproc etc., but when I scrape it with Prometheus where the job is defined as:
scrape_configs:
- job_name: 'pcp'
scrape_interval: 10s
sample_limit: 0
static_configs:
- targets: ['127.0.0.1:44323']
I see the target status UP, but in the console only these two metric namespaces are available for querying: hinv and mem. I tried to copy other metric names from /metrics page, but queries result in the error: 'No datapoints found.' Initially I thought the problem could have been due to a limit on the number of samples per target or the sampling interval too small (I originally set it to 1s), but hinv and mem are not next to each other and there are other metrics (ie. filesys, kernel) in between them that are omitted. What could be the reason for that?

I have not found the exact cause of the problem, but it must have been version-specific, because after I downloaded and launched the latest version 2.19 the problem was gone and with exactly the same config Prometheus was reading all metrics from the target.

Adding another answer, because I have just seen this issue again in another environment where Prometheus v.2.19 was pulling metrics via PMAPI from PCP v.5 on CentOS 7 servers. In Prometheus config file the scrape was configured as a single job with multiple metric domains, ie.:
- job_name: 'pcp'
file_sd_configs:
- files: [...]
metric_path: '/metrics'
params:
target: ['kernel', 'mem', 'disk', 'network', 'mounts', 'lustre', 'infiniband']
When there was a problem with one metric domain, usually lustre or infiniband due to the lack of corresponding hardware on the host, only kernel metrics were collected and no other.
The issue was fixed by splitting the scrape job into multiple jobs with only one target each, ie.:
- job_name: 'pcp-kernel'
file_sd_configs:
- files: [...]
metric_path: '/metrics'
params:
target: ['kernel']
- job_name: 'pcp-mem'
file_sd_configs:
- files: [...]
metric_path: '/metrics'
params:
target: ['mem']
[...]
This way metrics from the core domains were always scraped successfully despite of one or all of the extra ones failing. Such setup seems to be more robust, however it makes the target status view busier, because there are more scrape jobs.

Related

failed to add service - already in use error

I compiled SFML library and my app on Raspbian with this tutorial https://github.com/oomek/sfml-pi. After this I moved shared objects and app to buildroot system for Raspberry Pi 4. I chosed DISPMANX version, my goal was to run app without X server.
When I try to run app, I have error failed to add service - already in use?. I know that there was many similar topics, I tried this solutions:
Comment dtoverlay=vc4-kms-v3d in config.txt -> this line didn't existed in my config.
Change gpu_mem to 128 -> any improvement
My config.txt:
# Please note that this is only a sample, we recommend you to change it to fit
# your needs.
# You should override this file using a post-build script.
# See http://buildroot.org/manual.html#rootfs-custom
# and http://elinux.org/RPiconfig for a description of config.txt syntax
# We always use the same names, the real used variant is selected by
# BR2_PACKAGE_RPI_FIRMWARE_{DEFAULT,X,CD} choice
start_file=start.elf
fixup_file=fixup.dat
kernel=zImage
# To use an external initramfs file
#initramfs rootfs.cpio.gz
# Disable overscan assuming the display supports displaying the full resolution
# If the text shown on the screen disappears off the edge, comment this out
disable_overscan=1
# How much memory in MB to assign to the GPU on Pi models having
# 256, 512 or 1024 MB total memory
gpu_mem_256=128
gpu_mem_512=128
gpu_mem_1024=128
gpu_mem_1024=192
gpu_mem=128
# fixes rpi (3B, 3B+, 3A+, 4B and Zero W) ttyAMA0 serial console
dtoverlay=miniuart-bt
On buildroot I set opengl from gst1-plugins-base with dispmanx, gles2, egl and wayland. I didn't set mesa-3d.
Any idea how can I make my app working? Should I add something to my config.txt?

Lighttpd closes connection when system time is changed

These are some of the parameters of my lighttpd config file.
server.modules += ( "mod_wstunnel", "mod_auth")
wstunnel.debug = 4
wstunnel.server.max-read-idle = 86400
#wstunnel.ping-interval = 5
#wstunnel.timeout = 30
When I open my web application, connection is created properly using websocket and connects to my c++ server.
All functionalities work except one.
One requirement of my application is to change the system time of machine, but when system time is changed, connection is closed and in log file it shows as :
`2019-02-12 14:04:10: (gw_backend.c.308) released proc: pid: 0 socket: tcp:127.0.0.1:10002 load: 0`
I want to maintain the connection even if system time is changed.
What other parameters can be used or any modification is required in these parameters?
System OS : Fedora 26
Lighttpd version : 1.4.49
wstunnel.server.max-read-idle does not exist. Did you test the lighttpd config before running it and look at the error trace? It should have noted wstunnel.server.max-read-idle as an unrecognized directive.
The directives you seek are:
server.max-read-idle
server.max-write-idle
server.max-keep-alive-idle
However, if the time on your server (running lighttpd) is jumping more than a few seconds, then I suggest that is your primary problem.
Also, Fedora 26 reach end-of-life on May 29, 2018. Supported Fedora have newer version of lighttpd. The current version of lighttpd is lighttpd 1.4.53.

rsyslog not escaping backslash in JSON

I have a rsyslogd instance running, producing the following JSON from syslog:
{"timegenerated":"2019-01-28T09:24:37.033990+00:00","type":"syslog","host":"REDACTED_HOSTNAME","host-ip":"REDACTED_IP","message":"<190>Jan 28 2019 10:24:35: %ASA-X-XXXXXX: Teardown TCP connection 82257709 for outside:REDACTED_IP\/REDACTED_PORT(LOCAL\ususername) to inside:REDACTED_IP\/REDACTED_PORT duration 0:01:52 bytes XXXX TCP FINs from outside (ususername)"}
This is invalid JSON, as the \ususe is interpreted as a hex representation of a unicode symbol. It should have been escaped as \\ususe.
I noticed on GitHub that there were an open issue (https://github.com/rsyslog/rsyslog/issues/1235), although it mentions another issue that resulted in a merged fix.
Here's some system info:
:~# rsyslogd -version
rsyslogd 8.24.0, compiled with:
PLATFORM: x86_64-pc-linux-gnu
PLATFORM (lsb_release -d):
FEATURE_REGEXP: Yes
GSSAPI Kerberos 5 support: Yes
FEATURE_DEBUG (debug build, slow code): No
32bit Atomic operations supported: Yes
64bit Atomic operations supported: Yes
memory allocator: system default
Runtime Instrumentation (slow code): No
uuid support: Yes
Number of Bits in RainerScript integers: 64
:~# lsb_release -a
No LSB modules are available.
Distributor ID: Debian
Description: Debian GNU/Linux 9.4 (stretch)
Release: 9.4
Codename: stretch
Template used to create the JSON document is:
template(name="json_syslog"
type="list") {
constant(value="{")
constant(value="\"timegenerated\":\"") property(name="timegenerated" dateFormat="rfc3339")
constant(value="\",\"type\":\"syslograw")
constant(value="\",\"host\":\"") property(name="fromhost")
constant(value="\",\"host-ip\":\"") property(name="fromhost-ip")
constant(value="\",\"message\":\"") property(name="rawmsg" format="jsonr")
constant(value="\"}\n")
Is there any functionality in rsyslog that would allow me to fix this, or does it seem like an upstream bug?
I notice you are using format="jsonr" in the template for the message. There is a difference if you use json instead of jsonr, which the documentation describes very briefly as avoids double escaping the value. Using a template with
constant(value="\",\n\"json\":\"") property(name="rawmsg" format="json")
constant(value="\",\n\"jsonr\":\"") property(name="rawmsg" format="jsonr")
and providing input containing
LOCAL\ususer "abc"
produces the 2 lines
"json":"LOCAL\\ususer \"abc\",
"jsonr":"LOCAL\ususer \"abc\",
in which the json format has escaped the \u into \\u (tested with rsyslog-8.27.0).
If this is not right for you, you can always manipulate the message, for example as follows, adding before your action:
set $.msg2 = replace($rawmsg, "\\u", "\\\\u");
and in your template use
constant(value="\",\"message\":\"") property(name="$.msg2" format="jsonr")
The replace function does a global replace, so you may want to restrict it, for example with
set $.msg2 = replace($rawmsg, "LOCAL\\u", "LOCAL\\\\u");

Flink Error: Could not find or load main class

I'm trying to run those Flink Benchmarks:
https://github.com/dataArtisans/flink-benchmarks
I've generated the jar file using maven with that command:
mvn clean package -Pbuild-jar
Then I'm trying to run the benchmark on a Flink Cluster with that command:
./bin/flink run -c org.apache.flink.benchmark.WindowBenchmarks ~/flinkBenchmarks/target/flink-hackathon-benchmarks-0.1.jar
I've used the -c option to add to the classpath the Main of the benchmark (WindowBenchmarks) I want to run.
Finally, I get that error:
# JMH version: 1.19
# VM version: JDK 1.8.0_151, VM 25.151-b12
# VM invoker: /usr/lib/jvm/java-8-oracle/jre/bin/java
# VM options: -Dlog.file=/home/user/flink-1.3.2/flink-dist/target/flink-1.3.2-bin/flink-1.3.2/log/flink-user-client-mypc.log -Dlog4j.configuration=file:/home/user/flink-1.3.2/flink-dist/target/flink-1.3.2-bin/flink-1.3.2/conf/log4j-cli.properties -Dlogback.configurationFile=file:/home/user/flink-1.3.2/flink-dist/target/flink-1.3.2-bin/flink-1.3.2/conf/logback.xml -Djava.rmi.server.hostname=127.0.0.1 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false
# Warmup: 10 iterations, 1 s each
# Measurement: 10 iterations, 1 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: org.apache.flink.benchmark.WindowBenchmarks.sessionWindow
# Run progress: 0.00% complete, ETA 00:04:00
# Fork: 1 of 3
Error: Could not find or load main class org.openjdk.jmh.runner.ForkedMain
<forked VM failed with exit code 1>
<stdout last='20 lines'>
</stdout>
<stderr last='20 lines'>
Error: Could not find or load main class org.openjdk.jmh.runner.ForkedMain
</stderr>
# Run complete. Total time: 00:00:00
Benchmark Mode Cnt Score Error Units
The program didn't contain a Flink job. Perhaps you forgot to call execute() on the execution environment.
I don't have any previous experience with Flink and Maven so I find out what is missing. My first thought was that it's a missing dependencies error, but they look fine. Any suggestions?
Thank you in advance!
flink-benchmarks is a repository that contains sets of micro benchmarks designed to run on single machine, not on the cluster. The main functions defined in the various classes (test cases) are 'JMH' runners, not Flink programs. As such you can either execute whole benchmark suite (which takes ~1hour):
mvn -Dflink.version=1.5.0 clean install exec:exec
or if you want to execute just one benchmark, the best approach is to execute selected main function manually. For example from your IDE (don't forget about selecting flink.version, default value for the property is defined in pom.xml).
There is also a possibility to execute single benchmark from console, but I haven't tried it for very long time.

Why doesn't rsync work when connecting to NCBI's FTP server home page?

I'm currently trying to update some code at work. I previously created a script with an up to date version of Python that logs a users path choices on NCBI's FTP webiste (ftp://ftp.ncbi.nlm.nih.gov/). The log was used to update my file system with updated files (NCBI updates their files weekly I believe). I basically recreated the wheel. I want to use rsync now, but I can't seem to get what I want...
rsync -vah --include "\*/" --exclude "\*" rsync://ftp.ncbi.nlm.nih.gov/
The above script should go to NCBI's website and begin downloading directories. Instead it prints to the shell a list of folders in the NCBI home directory and then terminates without copying anything.
Here is what the output looks like:
**Warning Notice!**
**You are accessing a U.S. Government information system which includes this
computer, network, and all attached devices. This system is for
Government-authorized use only. Unauthorized use of this system may result in
disciplinary action and civil and criminal penalties. System users have no
expectation of privacy regarding any communications or data processed by this
system. At any time, the government may monitor, record, or seize any
communication or data transiting or stored on this information system.**
**-------------------------------------------------------------------------------**
**Welcome to the NCBI rsync server.**
- 1000genomes
- bigwig
- bioproject
- biosample
- blast
- cgap
- cn3d
- dbgap
- entrez
- epigenomics
- fa2htgs
- genbank
- gene
- genomes
- hapmap
- mmdb
- ncbi-asn1
- pathogen
- pubchem
- pubmed
- pub
- refseq
- repository
- SampleData
- sequin
- sky-cgh
- snp
- sra
- tech-reports
- toolbox
- tpa
- tracedb
- variation
[user#host ~/bin $]
Whenever I use
rsync://ftp.ncbi.nlm.nih.gov/destination
(basically if I include any directory inside of NCBI's FTP homepage) everything seems to work fine.
What should I do here? What is the problem/solution?

Resources