ceph runtime config is not the same with ceph.conf - runtime

i am using ceph-deploy to deploy ceph cluster. after deployment is finished, i found the runtime config is not the same with ceph.conf. i did not modify the runtime config in manual.
[root#sz02 ~]# ceph daemon osd.0 config show | grep rbd_cache
"rbd_cache": "true",
"rbd_cache_writethrough_until_flush": "true",
"rbd_cache_size": "33554432",
"rbd_cache_max_dirty": "25165824",
"rbd_cache_target_dirty": "16777216",
"rbd_cache_max_dirty_age": "1",
"rbd_cache_max_dirty_object": "0",
"rbd_cache_block_writes_upfront": "false",
[root#sz02 ~]# cat /etc/ceph/ceph.conf | grep "rbd cache size"
rbd cache size = 268435456
we can see that rbd_cache_size is different. so i want to know:
whether ceph runtime config reads the value from ceph.conf or not? if not, what's the meaning of ceph.conf?
thanks

An OSD while it's starting reads /etc/ceph/ceph.conf and applies found parameters from this file to its runtime config. If it doesn't find some parameters, it uses the default values described in the docs. So the setting rbd cache size = 268435456 should take an effect.
You can do the following:
Restart the osd daemon.
Check that the setting rbd cache size = 268435456 is under [client] config section in your ceph.conf.

If you don't want to restart the daemon:
ceph tell osd.0 injectargs '--rbd_cache_size=268435456'
but it's suggested to change it on all osds:
ceph tell osd.* injectargs '--rbd_cache_size=268435456'

Related

Unknown processors type "resourcedetection" for "resourcedetection"

Running OT Collector with image ghcr.io/open-telemetry/opentelemetry-collector-releases/opentelemetry-collector:0.58.0
In config.yaml I have,
processors:
batch:
resourcedetection:
detectors: [ env ]
timeout: 2s
override: false
The collector is deployed as a sidecar but it keeps failing with
collector server run finished with error: failed to get config: cannot unmarshal the configuration: unknown processors type "resourcedetection" for "resourcedetection" (valid values: [resource span probabilistic_sampler filter batch memory_limiter attributes])
Any idea as to what is causing this? I haven't found any relevant documentation/question
The Resource Detection Processor is part of the otelcol-contrib distro upstream and you'd hence would need to use otel/opentelemetry-collector-contrib:0.58.0 (or the equivalent on your container registry of choice) for this processor to be available in your collector.

Failed to sufficiently increase receive buffer size (was: 208 kiB, wanted: 2048 kiB, got: 416 kiB) in quic golang appengine

I am using a google cloud app engine to deploy my quic-go server. But getting the error:
failed to sufficiently increase receive buffer size (was: 208 kiB, wanted: 2048 kiB, got: 416 kiB).
I am using app.yaml file to build a docker file which is as follows:
FROM golang:1.18.3
RUN mkdir /app
ADD . /app
WORKDIR /app
RUN apt-get update && apt-get install -y ffmpeg
CMD sudo --sysctl net.core.rmem_default=15000000
CMD sudo --sysctl net.core.rmem_max=15000000
RUN go build -x server.go
ENV GCS_BUCKETNAME xyz
ENV AI_CLIENT_SSL_CERT /path to cert
ENV AI_CLIENT_SSL_KEY /path to key
ENV GCP_BUCKET_SERVICE_ACCOUNT_CREDS /path to google cloud service account credential
CMD [ "./server" ]
This is my app.yaml
runtime: custom
env: flex
env_variables:
GCS_BUCKETNAME : "xyz"
AI_CLIENT_SSL_CERT : "./path to cert"
AI_CLIENT_SSL_KEY : "./path to key"
GCP_BUCKET_SERVICE_ACCOUNT_CREDS : "./path to google cloud credential.json file"
service: streaming-app
automatic_scaling:
min_num_instances: 1
max_num_instances: 20
cpu_utilization:
target_utilization: 0.85
target_concurrent_requests: 100
Any sort of help will be appreciated.
Since sysctl is an OS-level config that doesn't fit in line with App Engine's principle use case. App Engine does not currently have any way of configuring the underlying sysctl config files. I believe that Google Kubernetes engine may be a better use case for running that server, as App Engine environments have a limited set of configurable settings.
can you tell me the scenarios when this file is not present in the kernel?
I’m not sure about the scenarios as I have least experience with kernel. For me it seems a different question rather than original post. you can raise a new StackOverflow question regarding this.

Gramex redis cache set memory error in azure instance

I am unable to set memory through config for redis azure instance in gramex
It gives this error:
redis.exceptions.ResponseError: unknown command CONFIG, with args beginning with: SET, maxmemory, 1000000000
image reference
You can set the size as zero. This will ignore the size limit. However, please ensure that you have set the maxmemory-policy as volatile-lru in the redis instance. This solution would require gramex >= 1.67

Ambari is not able to start the Namenode

I have a problem with my Ambari server, it is not able to start the Namenode. I'm using HDP 2.0.6, Ambari 1.4.1. It is worth to mention this is happening once I've enabled the Kerberos security, I mean, when it is disabled there is no error.
The error is:
2015-02-04 16:01:48,680 ERROR namenode.EditLogInputStream (EditLogFileInputStream.java:nextOpImpl(173)) - caught exception initializing http://int-iot-hadoop-fe-02.novalocal:8480/getJournal?jid=integration&segmentTxId=1&storageInfo=-47%3A1493795199%3A0%3ACID-a5152e6c-64ab-4978-9f1c-e4613a09454d
org.apache.hadoop.hdfs.server.namenode.TransferFsImage$HttpGetFailedException: Fetch of http://int-iot-hadoop-fe-02.novalocal:8480/getJournal?jid=integration&segmentTxId=1&storageInfo=-47%3A1493795199%3A0%3ACID-a5152e6c-64ab-4978-9f1c-e4613a09454d failed with status code 500
Response message:
getedit failed. java.lang.IllegalArgumentException: Does not contain a valid host:port authority: null at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:211) at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:163) at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:152) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.getHttpAddress(SecondaryNameNode.java:210) at org.apache.hadoop.hdfs.qjournal.server.GetJournalEditServlet.isValidRequestor(GetJournalEditServlet.java:93) at org.apache.hadoop.hdfs.qjournal.server.GetJournalEditServlet.checkRequestorOrSendError(GetJournalEditServlet.java:128) at org.apache.hadoop.hdfs.qjournal.server.GetJournalEditServlet.doGet(GetJournalEditServlet.java:174) at
...
It seems the problem is about retrieving the Secondary Namenode http address, which in fact is set to null in hdfs-site-xml (I do not know why):
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>null</value>
</property>
I've tried to set that parameter's value to the appropriate one, but nothing works:
By manually editing the hdfs-site.xml files and running hdfs namenode, but nothing occurs.
By manually editing the hdfs-site.xml files and starting the whole HDFS from Ambari, but nothing occurs. Even, the dfs.namenode.secondary.http-address parameter is set to null again!
Through Ambari UI > HDFS services > config tab > hdfs-site.xml list > add new property... the problem is that dfs.namenode.secondary.http-address is not listed by the UI does not allow me to add it because it says... it is already existing! :)
I've tried to add the value in /usr/lib/ambari-server/web/data/configuration/hdfs-site.json, thinking this could be the place where Ambari stores the values that are show in the UI, but no success.
I've also noted that a site-XXXX.pp file is created under /var/lib/ambari-agent/data/ each time the HDFS service is restarted from the Amabri UI, and I've found each one of these files has:
[root#int-iot-hadoop-fe-02 ~]# cat /var/lib/ambari-agent/data/site-3228.pp | grep dfs.namenode.secondary.http-address
"dfs.namenode.secondary.http-address" => 'null',
I think other candidate file for configuring this property could be /var/lib/ambari-agent/puppet/modules/hdp-hadoop/manifests/params.pp. There is a ### hdfs-site section, but I'm not able to figure out which is the name of the puppet variable associated to the dfs.namenode.secondary.http-address property.
Any ideas? Thanks!
I have a workaround to make it work under ambari environment:
In the ambari node modify:
/usr/lib/ambari-server/web/javascripts/app.js
/usr/lib/ambari-server/web/javascripts/app.js.map
changing from:
{
"name": "dfs.namenode.secondary.http-address",
"templateName": ["snamenode_host"],
"foreignKey": null,
"value": "<templateName[0]>:50090",
"filename": "hdfs-site.xml"
},
to the specific value for your secondary namenode and not the template one:
{
"name": "dfs.namenode.secondary.http-address",
"templateName": ["snamenode_host"],
"foreignKey": null,
"value": "my.secondary.namenode.domain:50090",
"filename": "hdfs-site.xml"
},
rename /usr/lib/ambari-server/web/javascripts/app.js.gz to /usr/lib/ambari-server/web/javascripts/app.js.gz.old
gzip the app.js so a new app.js.gz is generated in the same directory
Refresh your ambari web and force an HDFS restart, this will regenerate the appropiate /etc/hadoop/conf/hdfs-site.xml, if it does not, you coud add in the ambari web a new property and then delete it in order to force the changes when you press the save button.
Hope this helps.
--mLG
Partially fixed: it is necessary to stop all the HDFS services (Journal Node, Namenodes and Datanodes) before editing the hdfs-site.xml file. Then, of course, Ambari "start button" cannot be used because the configuration would be smashed... thus it is necessary to re-start all the services manually. This is not the definitive solution since it is desirable this changes of configuration could be done from Ambari UI...

Warning: Failed to connect to the agentx master agent ([NIL])

I have installed net-snmp5.7.2 on my system, I have written my app_agent.conf for my application and
agentXSocket udp:X.X.X.X:1610
and exported SNMPCONFIGPATH=path_to_app_agent.conf
I have also wrtten snmpd.conf in /usr/etc/snmp/snmp.conf
trap2sink X.X.X.Y
agentXSocket udp:X.X.X.X:1610
I have two more snmpd.conf present in my /etc/snmp/ and /var/net-snmp/
Config from /etc/snmp:
com2sec notConfigUser default public
com2sec notConfigUser v1 notConfigUser
com2sec notConfigUser v1 notConfigUser
view systemview included .1.3.6.1.2.1.1
view systemview included .1.3.6.1.2.1.25.1.1
access notConfigGroup "" any noauth exact systemview none none
pass .1.3.6.1.4.1.4413.4.1 /usr/bin/ucd5820stat
Config from /var/net-snmp:
setserialno 1322276014
ifXTable .1 14:0 18:0x $
ifXTable .2 14:0 18:0x $
ifXTable .3 14:0 18:0x $
engineBoots 14
oldEngineID 0x80001f888000e17f6964b28450
I have started snmpd and snmptrapd. Now in my code I am calling
netsnmp_ds_set_boolean(NETSNMP_DS_APPLICATION_ID, NETSNMP_DS_AGENT_ROLE, 1);
init_agent("app_agent");
init_snmp("app_agent");
init_snmp is throwing a warning
Warning: Failed to connect to the agentx master agent ([NIL]):
I have no idea why?? Thanks in advance for any help
This is basically saying the sub-agent you wrote failed to connect to NetSNMP master agent, as the message suggested. In Linux, by default agentx will attempt to make the connection via socket using /var/agentx/master. The following hint might help:
Running your sub-agent under appropriate privilege that has access
to sockets e.x. sudo
Check socket setting in your snmpd.conf (which located varies) if not already specified, such as
agentxsocket /var/agentx/master and agentxperms 777 777
Restart NetSNMP for any change to take effect with sudo service snmpd restart; or as an option you can try stop the service with sudo service snmpd stop and run an instance with debugging mode snmpd -f -Lo -Dagentx which most likely will output useful information on sub-agent connection.
I ran into this problem right now with quagga and ospfd and after doing an strace -f -p PID, noticed this among the output:
connect(14, {sa_family=AF_FILE, path="/var/agentx/master"}, 110) = -1 EACCES (Permission denied)
so I:
$ ls -al /var/agentx/
total 8
drwx------ 2 root root 4096 Sep 12 20:50 .
drwxr-xr-x. 27 root root 4096 Sep 12 20:13 ..
srwxrwxrwx 1 root root 0 Sep 12 20:50 master
and then I:
$ chmod 755 /var/agentx/
and immediately zebra and ospfd had their Agentx subnets connect.
$ tail -10f /var/log/quagga/zebra.log
2014/09/12 20:52:59 ZEBRA: snmp[info]: NET-SNMP version 5.5 AgentX subagent connected
$ tail -10f /var/log/quagga/ospfd.log
2014/09/12 20:52:59 OSPF: snmp[info]: NET-SNMP version 5.5 AgentX subagent connected
This is running quagga-0.99.23-2014062401 on RHEL6. hope this helps.
Had a similar problem, whether it be with the unix Sockets or Tcp:localhost:750 i was still getting the same error message:
/var/log/quagga/ospfd.log: warning, failed to connect to Master AgentX [nill] or [tcp:localhost:750].
I resolved the issue by disabling SELINUX.
This is not the answer to your problem, but I too got "Warning: Failed to connect to the agentx master agent ([NIL]):" message when my snmpd service didn't startup properly or went down. For my SNMP Sub-Agent, I used the example they provide, example-demon.c, and found I get this message nonstop (about every second) when processing agent_check_and_process(0) on every loop.
while (true) {
agent_check_and_process(0); /* 0 == don't block */
}
This is how I fixed it.
netsnmp_transport *snmpTransport;
while( true ) {
// Check to see snmpd is still running
snmpTransport = netsnmp_transport_open_client("agentx", NULL);
if (snmpTransport == NULL)
{
// Just went down?
if (snmpAgentDown == false)
{
snmp_log( LOG_INFO, "Net-SNMP Agent is down\n" );
snmpAgentDown = true;
}
Sleep(5000); // Sleep for a 5 sec
} else
{
if (snmpAgentDown)
{
snmp_log( LOG_INFO, "Net-SNMP Agent is back up\n" );
snmpAgentDown = false;
}
// Close connection test
snmpTransport->f_close(snmpTransport); // This burn me without; its needed
netsnmp_transport_free(snmpTransport);
// Process SNMP request and notifications
agent_check_and_process( 0 ); // 0 == don't block, 1 = block
Sleep(1); // Sleep for 1ms; Need to sleep thread, but need subAgent to be responsive too
}
i++;
}
Now if the snmpd goes down, my app can detect it being down and not process agent_check_and_process() stopping the "Warning: Failed to connect to the agentx master agent ([NIL]):" from ever appearing. If snmpd comes back up, then it processes it.
Final Note: I determine that code based off subagent.c file subagent_open_master_session() funtion in net-snmp-5.7.2 package. snmpTransport->f_close(snmpTransport) is also needed and determine that by following what snmp_close() did at the end of subagent_open_master_session() function.
As the subagent of Net-SNMP sometimes unable to read the adress of master agent from the configuration file, so you can even try
/* set the location of master agent */
netsnmp_ds_set_string(NETSNMP_DS_APPLICATION_ID,
NETSNMP_DS_AGENT_X_SOCKET, "udp:X.X.X.X:1610");
Write these lines in the agentx code before calling init_agent().
I have solved problem next comands line in OS Ubuntu 17.07
Change code (add line)
view systemview included .1.3.6.1.2.1.1
view systemview included .1.3.6.1.2.1.2
view systemview included .1.3.6.1.2.1.25.1.1
instead of
view systemview included .1.3.6.1.2.1.1
view systemview included .1.3.6.1.2.1.25.1.1
Write down new line master agentx in /etc/snmpd.conf
Restart snmpd demon:
sudo /etc/init.d/snmpd restart or sudo service snmpd restart

Resources