I have set "root-disk-space-utilization" and "data-disk-space-utilization" for my ec2 instance. The code to set "root-disk-space-utilization":
aws cloudwatch put-metric-alarm \
--alarm-name root-disk-space-utilization \
--alarm-description "Alarm when root disk space exceeds $ROOT_DISK_THRESHOLD percent" \
--metric-name DiskSpaceUtilization \
--namespace System/Linux \
--statistic Average \
--period $period \
--threshold $ROOT_DISK_THRESHOLD \
--treat-missing-data notBreaching \
--comparison-operator GreaterThanThreshold \
--dimensions Name=Filesystem,Value=$ROOT_DEVICE Name=InstanceId,Value=$val Name=MountPath,Value=$ROOT_PATH \
--evaluation-periods 1 \
--alarm-actions $arn \
--ok-actions $arn \
--unit Percent
Here, ROOT_DEVICE=/dev/sda1 ; DATA_DEVICE=/dev/sdf ; ROOT_PATH=/ ; DATA_PATH=/data .
To set "data-disk-space-utilization":
aws cloudwatch put-metric-alarm \
--alarm-name data-disk-space-utilization \
--alarm-description "Alarm when data disk space exceeds $DATA_DISK_THRESHOLD percent" \
--metric-name DiskSpaceUtilization \
--namespace System/Linux \
--statistic Average \
--period $period \
--threshold $DATA_DISK_THRESHOLD \
--treat-missing-data notBreaching \
--comparison-operator GreaterThanThreshold \
--dimensions Name=Filesystem,Value=$DATA_DEVICE Name=InstanceId,Value=$val Name=MountPath,Value=$DATA_PATH \
--evaluation-periods 1 \
--alarm-actions $arn \
--ok-actions $arn \
--unit Percent
With the help of above code, I am able to set the cloudwatch metrics but none is getting in "Alarm State". I have also tried and changed the threshold to 1, just to check if it goes in the "Alarm State" but still it did not changed.
I am a bit unsure if my above code is correct or not and how will it trigger Alarm?
Related
I want to run (or resume) the run_mlm.py script with a specific learning rate, but it doesn't seem like setting it in the script arguments does anything.
os.system(
f"python {script} \
--model_type {model} \
--config_name './models/{model}/config.json' \
--train_file './content/{data}/train.txt' \
--validation_file './content/{data}/test.txt' \
--learning_rate 6e-4 \
--weight_decay 0.01 \
--warmup_steps 6000 \
--adam_beta1 0.9 \
--adam_beta2 0.98 \
--adam_epsilon 1e-6 \
--tokenizer_name './tokenizer/{model}' \
--output_dir './{out_dir}' \
--do_train \
--do_eval \
--num_train_epochs 40 \
--overwrite_output_dir {overwrite} \
--ignore_data_skip"
)
After warm-up, the log indicates that the learning rate tops out at 1e-05—a default from somewhere, I guess, but I'm not sure where (and certainly not 6e-4):
{'loss': 3.9821, 'learning_rate': 1e-05, 'epoch': 0.09}
I installed BPM:
./imcl install \
com.ibm.bpm.ESB.v85_8.6.0.20170918_1207, \
com.ibm.websphere.ND.v85_8.5.5012.20170627_1018 \
-repositories /u01/tmp/BPM/repository/repos_64bit/repository.config \
-acceptLicense \
-installationDirectory /u01/apps/IBM/BPM \
-properties user.wasjava=java8 \
-showVerboseProgress -log silentinstall.log
Than i created Deployment_Managed Profile:
./manageprofiles.sh \
-create \
-adminPassword XXXXXXX \
-profileName Dmgr06 \
-cellName Cell03 \
-serverType DEPLOYMENT_MANAGER \
-adminUserName wasadmin \
-enableAdminSecurity true \
-nodeName CellManager03 \
-profilePath /u01/apps/IBM/BPM/profiles/Dmgr06 \
-personalCertValidityPeriod 15 \
-signingCertValidityPeriod 15 \
-keyStorePassword XXXXXXXX \
-templatePath /u01/apps/IBM/BPM/profileTemplates/management/ \
-startingPort 10000 \
-isDefault
After This i run the startManager.sh command. I was expecting to see WebSphere and ESB up and running but i see only WebSphere:
How do i add the ESB?
I'm planning to add new members to a single instance of etcd, but am faced with problems.
I started the first etcd member with the following command:
nohup etcd \
--advertise-client-urls=https://192.168.22.34:2379 \
--cert-file=/etc/kubernetes/pki/etcd/server.crt \
--client-cert-auth=true \
--data-dir=/var/lib/etcd \
--initial-advertise-peer-urls=https://192.168.22.34:2380 \
--initial-cluster=test-master-01=https://192.168.22.34:2380 \
--key-file=/etc/kubernetes/pki/etcd/server.key \
--listen-client-urls=https://0.0.0.0:2379 \
--listen-peer-urls=https://192.168.22.34:2380 \
--name=test-master-01 \
--peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt \
--peer-client-cert-auth=true \
--peer-key-file=/etc/kubernetes/pki/etcd/peer.key \
--peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt \
--snapshot-count=10000 \
--trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt &
Then I checked the health of the cluster and it seems to be healthy:
member f13d668ae0cba84 is healthy: got healthy result from https://192.168.22.34:2379
cluster is healthy
I also checked the members:
f13d668ae0cba84: name=test-master-01 peerURLs=http://192.168.22.34:2380 clientURLs=https://192.168.22.34:2379 isLeader=true
Then I tried to add second member:
etcdctl \
--endpoints=https://127.0.0.1:2379 \
--ca-file=/etc/kubernetes/pki/etcd/ca.crt \
--cert-file=/etc/kubernetes/pki/etcd/healthcheck-client.crt \
--key-file=/etc/kubernetes/pki/etcd/healthcheck-client.key \
member add test-master-02 https://192.168.22.37:2380
Added member named test-master-02 with ID 65bec874cca265d8 to cluster ETCD_NAME="test-master-02"
ETCD_INITIAL_CLUSTER="test-master-01=http://192.168.22.34:2380,test-master-02=https://192.168.22.37:2380"
ETCD_INITIAL_CLUSTER_STATE="existing"
Then started the second etcd member with the following command:
etcd \
--name test-master-02 \
--listen-client-urls https://192.168.22.37:2379 \
--advertise-client-urls https://192.168.22.37:2379 \
--listen-peer-urls https://192.168.22.37:2380 \
--cert-file=/etc/kubernetes/pki/etcd/server.crt \
--client-cert-auth=true \
--trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt \
--peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt \
--peer-client-cert-auth=true \
--peer-key-file=/etc/kubernetes/pki/etcd/peer.key \
--key-file=/etc/kubernetes/pki/etcd/server.key \
--initial-cluster-state=existing \
--initial-cluster=test-master-01=https://192.168.22.34:2380,test-master-02=https://192.168.22.37:2380
But I got an error:
etcdmain: error validating peerURLs {ClusterID:bc8c76911939f2de Members:[&{ID:f13d668ae0cba84 RaftAttributes:{PeerURLs:[http://192.168.22.34:2380]} Attributes:{Name:test-master-01 ClientURLs:[https://192.168.22.34:2379]}} &{ID:65bec874cca265d8 RaftAttributes:{PeerURLs:[https://192.168.22.37:2380]} Attributes:{Name: ClientURLs:[]}}] RemovedMemberIDs:[]}: unmatched member while checking PeerURLs
Update
Looks like I don't have such problem while starting cluster from scratch without restoring from snapshot.
Figured out that before adding new members I needed to update my main etcd member, because instead of etcd config, member list command returned 127.0.0.1 on peerurl
When one version of a set of scripts runs, which apply RRDTool, you try more of the same .....
Made a version of the lua-script, which now collects power/energy-info, and the related file create_pipower1A_graph.sh is a direct derivative of the errorfree running sh-file described in RRDTool, How to get png-files by means of os-execute-call from lua-script?
The derivative sh-file should produce a graph with the output of 3 inverters and the parallel consumption.
That sh-file for graphic output is below.
#!/bin/bash
rrdtool graph /home/pi/pipower1.png \
DEF:Pwr_MAC=/home/pi/pipower1.rrd:Power0430:AVERAGE \
DEF:Pwr_SAJ=/home/pi/pipower1.rrd:Power1530:AVERAGE \
DEF:Pwr_STECA=/home/pi/pipower1.rrd:Power2950:AVERAGE \
DEF:Pwr_Cons=/home/pi/pipower1.rrd:Power_Cons:AVERAGE \
LINE1:Pwr_MAC#ff0000:Output Involar \
LINE1:Pwr_SAJ#0000ff:Output SAJ1.5 \
LINE1:Pwr_STECA#5fd00b:Output STECA \
LINE1:Pwr_Cons#00ffff:Consumption \
COMMENT:"\t\t\t\t\t\t\l" \
COMMENT:"\t\t\t\t\t\t\l" \
GPRINT:Pwr_MAC:LAST:"Output_Involar Latest\: %2.1lf" \
GPRINT:Pwr_MAC:MAX:" Max.\: %2.1lf" \
GPRINT:Pwr_MAC:MIN:" Min.\: %2.1lf" \
COMMENT:"\t\t\t\t\t\t\l" \
GPRINT:Pwr_SAJ:LAST:"Output SAJ1.5k Latest\: %2.1lf" \
GPRINT:Pwr_SAJ:MAX:" Max.\: %2.1lf" \
GPRINT:Pwr_SAJ:MIN:" Min.\: %2.1lf" \
COMMENT:"\t\t\t\t\t\t\l" \
GPRINT:Pwr_STECA:LAST:"Output STECA Latest\: %2.1lf" \
GPRINT:Pwr_STECA:MAX:" Max.\: %2.1lf" \
GPRINT:Pwr_STECA:MIN:" Min.\: %2.1lf" \
COMMENT:"\t\t\t\t\t\t\l" \
GPRINT:Pwr_Cons:LAST:"Consumption Latest\: %2.1lf" \
GPRINT:Pwr_Cons:MAX:" Max.\: %2.1lf" \
GPRINT:Pwr_Cons:MIN:" Min.\: %2.1lf" \
COMMENT:"\t\t\t\t\t\t\l" \
--width 700 --height 400 \
--title="Graph B: Power Production & Consumption for last 24 hour" \
--vertical-label="Power(W)" \
--watermark "`date`"
The lua-script again runs without errors and as result the rrd-file is periodically updated, the graphic output is generated,but no graph appears! Tested on 2 different Raspberries, but no difference in reactions.
Running the sh-file create_pipower1A_graph from the commandline produces the following errors.
pi#raspberrypi:~$ sudo /home/pi/create_pipower1A_graph.sh
ERROR: 'I' is not a valid function name
pi#raspberrypi:~$ ./create_pipower1A_graph.sh
ERROR: 'I' is not a valid function name
Question: Puzzled, because nowhere in the sh-file an I is applied as function command. Explanation? Hint for remedy of this error?
Your problem is here:
LINE1:Pwr_MAC#ff0000:Output Involar \
LINE1:Pwr_SAJ#0000ff:Output SAJ1.5 \
LINE1:Pwr_STECA#5fd00b:Output STECA \
LINE1:Pwr_Cons#00ffff:Consumption \
These lines need to be quoted as they contain spaces and hash symbols.
LINE1:"Pwr_MAC#ff0000:Output Involar" \
LINE1:"Pwr_SAJ#0000ff:Output SAJ1.5" \
LINE1:"Pwr_STECA#5fd00b:Output STECA" \
LINE1:"Pwr_Cons#00ffff:Consumption" \
I've created quite a few RRDTool graphs monitoring various aspects of a Raspberry Pi server.
I'm displaying 36 hours, 10 days, 45 days and 18 months for things like transferred data, CPU temperature, load averages etc.
However, the only "continuous" looking graphs are the 10-day graphs, all the others have gaps in them. I'm recording each data point at a minutely interval.
There are 28 (29) images, so I'm not going to put them all here, so I've put them on imgur for your perusal
But here's an example of what I'm talking about:
10-days works fine!
45-days, not so much.
Here's my .rrd creation script:
rrdtool create data.rrd \
--start N --step '60' \
'DS:rx:GAUGE:60:0:U' \
'DS:tx:GAUGE:60:0:U' \
'DS:rxc:COUNTER:60:0:U' \
'DS:txc:COUNTER:60:0:U' \
'DS:wrx:GAUGE:60:0:U' \
'DS:wtx:GAUGE:60:0:U' \
'DS:wrxc:COUNTER:60:0:U' \
'DS:wtxc:COUNTER:60:0:U' \
'RRA:AVERAGE:0.5:1:129600' \
'RRA:AVERAGE:0.5:2:64800' \
'RRA:AVERAGE:0.5:60:14400' \
'RRA:AVERAGE:0.5:300:12960' \
'RRA:AVERAGE:0.5:3600:13140'
rrdtool create load.rrd \
--start N \
--step '60' \
'DS:load:GAUGE:60:0:4' \
'RRA:AVERAGE:0.5:1:129600' \
'RRA:AVERAGE:0.5:2:64800' \
'RRA:AVERAGE:0.5:60:14400' \
'RRA:AVERAGE:0.5:300:12960' \
'RRA:AVERAGE:0.5:3600:13140'
rrdtool create mem.rrd \
--start N \
--step '60' \
'DS:mem:GAUGE:60:0:100' \
'RRA:AVERAGE:0.5:1:129600' \
'RRA:AVERAGE:0.5:2:64800' \
'RRA:AVERAGE:0.5:60:14400' \
'RRA:AVERAGE:0.5:300:12960' \
'RRA:AVERAGE:0.5:3600:13140'
rrdtool create pitemp.rrd \
--start N \
--step '60' \
'DS:pitemp:GAUGE:60:U:U' \
'RRA:AVERAGE:0.5:1:129600' \
'RRA:AVERAGE:0.5:2:64800' \
'RRA:AVERAGE:0.5:60:14400' \
'RRA:AVERAGE:0.5:300:12960' \
'RRA:AVERAGE:0.5:3600:13140'
My entire draw script is like over 900 lines long, so I'll just include the actual draw code here for one set of graphs ($RRDTOOL is a variable containing the path /usr/bin/rrdtool):
$RRDTOOL graph /var/www/html/images/graphs/data36h.png \
--title 'Odin Absolute Traffic (eth0)' \
--watermark "Graph Drawn `date`" \
--vertical-label 'Bytes' \
--lower-limit '0' \
--rigid \
--alt-autoscale \
--units=si \
--width '640' \
--height '300' \
--full-size-mode \
--start end-36h \
'DEF:rx=/usr/local/bin/system/data.rrd:rx:AVERAGE' \
'CDEF:cleanrx=rx,UN,PREV,rx,IF' \
'DEF:tx=/usr/local/bin/system/data.rrd:tx:AVERAGE' \
'AREA:rx#00CC00FF:Download\:' \
'GPRINT:rx:LAST:\:%8.2lf %s]' \
'STACK:tx#0000FFFF:Upload\:' \
'GPRINT:tx:LAST:\:%8.2lf %s]\n'
$RRDTOOL graph /var/www/html/images/graphs/data10d.png \
--title 'Odin Absolute Traffic (eth0) 10 days' \
--watermark "Graph Drawn `date`" \
--vertical-label 'Bytes' \
--lower-limit '0' \
--rigid \
--alt-autoscale \
--units=si \
--width '640' \
--height '300' \
--full-size-mode \
--start end-10d \
'DEF:rx=/usr/local/bin/system/data.rrd:rx:AVERAGE' \
'DEF:tx=/usr/local/bin/system/data.rrd:tx:AVERAGE' \
'AREA:rx#00CC00FF:Download\:' \
'GPRINT:rx:LAST:\:%8.2lf %s]' \
'STACK:tx#0000FFFF:Upload\:' \
'GPRINT:tx:LAST:\:%8.2lf %s]\n'
$RRDTOOL graph /var/www/html/images/graphs/data45d.png \
--title 'Odin Absolute Traffic (eth0) 45 days' \
--watermark "Graph Drawn `date`" \
--vertical-label 'Bytes' \
--lower-limit '0' \
--rigid \
--alt-autoscale \
--units=si \
--width '640' \
--height '300' \
--full-size-mode \
--start end-45d \
'DEF:rx=/usr/local/bin/system/data.rrd:rx:AVERAGE' \
'DEF:tx=/usr/local/bin/system/data.rrd:tx:AVERAGE' \
'AREA:rx#00CC00FF:Download\:' \
'GPRINT:rx:LAST:\:%8.2lf %s]' \
'STACK:tx#0000FFFF:Upload\:' \
$RRDTOOL graph /var/www/html/images/graphs/data18m.png \
--title 'Odin Absolute Traffic (eth0) 18 month' \
--watermark "Graph Drawn `date`" \
--vertical-label 'Bytes' \
--lower-limit '0' \
--rigid \
--alt-autoscale \
--units=si \
--width '640' \
--height '300' \
--full-size-mode \
--start end-1y6m \
'DEF:rx=/usr/local/bin/system/data.rrd:rx:AVERAGE' \
'DEF:tx=/usr/local/bin/system/data.rrd:tx:AVERAGE' \
'AREA:rx#00CC00FF:Download\:' \
'GPRINT:rx:LAST:\:%8.2lf %s]' \
'STACK:tx#0000FFFF:Upload\:'
And yes, I know that the title on one of the graphs is wrong, I've fixed that, but only after saving all the images to imgur.
If you choose a --step of 60 seconds, I would choose a mrhb of 120s and not also of 60s because rrdtool will disregard any updates that are more than 60s apart.
rrdtool create data.rrd \
--start N --step '60' \
'DS:rx:GAUGE:120:0:U' \
'DS:tx:GAUGE:120:0:U' \
'DS:rxc:COUNTER:120:0:U' \
'DS:txc:COUNTER:120:0:U' \
'DS:wrx:GAUGE:120:0:U' \
'DS:wtx:GAUGE:120:0:U' \
'DS:wrxc:COUNTER:120:0:U' \
'DS:wtxc:COUNTER:120:0:U' \
'RRA:AVERAGE:0.5:1:129600' \
'RRA:AVERAGE:0.5:2:64800' \
'RRA:AVERAGE:0.5:60:14400' \
'RRA:AVERAGE:0.5:300:12960' \
'RRA:AVERAGE:0.5:3600:13140'