couldn't find local name "" in the initial cluster configuration when start etcd service - etcd

I am start etcd(v3.3.15) service using this command:
systemctl start etcd
this is my etcd systemd config:
[Unit]
Description=Etcd Server
After=network.target
After=network-online.target
Wants=network-online.target
Documentation=https://github.com/coreos
[Service]
Type=notify
WorkingDirectory=/var/lib/etcd/
EnvironmentFile=-/etc/etcd/etcd.conf
ExecStart=/usr/local/bin/etcd \
--name ${ETCD_NAME} \
--cert-file=/etc/kubernetes/ssl/kubernetes.pem \
--key-file=/etc/kubernetes/ssl/kubernetes-key.pem \
--peer-cert-file=/etc/kubernetes/ssl/kubernetes.pem \
--peer-key-file=/etc/kubernetes/ssl/kubernetes-key.pem \
--trusted-ca-file=/etc/kubernetes/ssl/ca.pem \
--peer-trusted-ca-file=/etc/kubernetes/ssl/ca.pem \
--initial-advertise-peer-urls ${ETCD_INITIAL_ADVERTISE_PEER_URLS} \
--listen-peer-urls ${ETCD_LISTEN_PEER_URLS} \
--listen-client-urls ${ETCD_LISTEN_CLIENT_URLS},http://127.0.0.1:2379 \
--advertise-client-urls ${ETCD_ADVERTISE_CLIENT_URLS} \
--initial-cluster-token ${ETCD_INITIAL_CLUSTER_TOKEN} \
--initial-cluster infra1=https://172.19.104.231:2380,infra2=https://172.19.104.230:2380,infra3=https://172.19.150.82:2380 \
--initial-cluster-state new \
--data-dir=${ETCD_DATA_DIR}
Restart=on-failure
RestartSec=5
LimitNOFILE=65536
[Install]
WantedBy=multi-user.target
and this is my etcd config:
# [member]
ETCD_NAME=infra1
ETCD_DATA_DIR="/var/lib/etcd"
ETCD_LISTEN_PEER_URLS="https://172.19.150.82:2380"
ETCD_LISTEN_CLIENT_URLS="https://172.19.150.82:2379"
#[cluster]
ETCD_INITIAL_ADVERTISE_PEER_URLS="https://172.19.150.82:2380"
ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster"
ETCD_ADVERTISE_CLIENT_URLS="https://172.19.150.82:2379"

Your IPs or names mismatch.
infra3=https://172.19.150.82
and
ETCD_NAME=infra1
ETCD_DATA_DIR="/var/lib/etcd"
ETCD_LISTEN_PEER_URLS="https://172.19.150.82:2380"

Related

How to run aws bash commands consecutively?

How can I execute the following bash commands consecutively?
aws logs create-export-task --task-name "cloudwatch-log-group-export1" \
--log-group-name "/my/log/group1" \
--from 1488708419000 --to 1614938819000 \
--destination "my-s3-bucket" \
--destination-prefix "my-log-group1"
aws logs create-export-task --task-name "cloudwatch-log-group-export" \
--log-group-name "/my/log/group2" \
--from 1488708419000 --to 1614938819000 \
--destination "my-s3-bucket" \
--destination-prefix "my-log-group2"
The problem I have with the above commands is that after the first command completes execution, the script will stuck at the following state, making the second command not reachable.
{
"taskId": "0e3cdd4e-1e95-4b98-bd8b-3291ee69f9ae"
}
It seems that I should find a way to wait for cloudwatch-log-group-export1 task to complete.
You could have to crate a waiter function which uses describe-export-tasks to get current status of an export job.
Example of such function:
wait_for_export() {
local sleep_time=${2:-10}
while true; do
job_status=$(aws logs describe-export-tasks \
--task-id ${1} \
--query "exportTasks[0].status.code" \
--output text)
echo ${job_status}
[[ $job_status == "COMPLETED" ]] && break
sleep ${sleep_time}
done
}
Then you use it:
task_id1=$(aws logs create-export-task \
--task-name "cloudwatch-log-group-export1" \
--log-group-name "/my/log/group1" \
--from 1488708419000 --to 1614938819000 \
--destination "my-s3-bucket" \
--destination-prefix "my-log-group1" \
--query 'taskId' --output text)
wait_for_export ${task_id1}
# second export
aws-cli auto access to vim edit mode by default.
You can avoid it by setting AWS_PAGER environment variable is "" before execute aws command.
export AWS_PAGER=""
aws logs create-export-task...
Or, you can fix it in to aws's config file (~/.aws/config):
[default]
cli_pager=

How to test "DiskSpaceUtilization" alarm for my ec2 instance?

I have set "root-disk-space-utilization" and "data-disk-space-utilization" for my ec2 instance. The code to set "root-disk-space-utilization":
aws cloudwatch put-metric-alarm \
--alarm-name root-disk-space-utilization \
--alarm-description "Alarm when root disk space exceeds $ROOT_DISK_THRESHOLD percent" \
--metric-name DiskSpaceUtilization \
--namespace System/Linux \
--statistic Average \
--period $period \
--threshold $ROOT_DISK_THRESHOLD \
--treat-missing-data notBreaching \
--comparison-operator GreaterThanThreshold \
--dimensions Name=Filesystem,Value=$ROOT_DEVICE Name=InstanceId,Value=$val Name=MountPath,Value=$ROOT_PATH \
--evaluation-periods 1 \
--alarm-actions $arn \
--ok-actions $arn \
--unit Percent
Here, ROOT_DEVICE=/dev/sda1 ; DATA_DEVICE=/dev/sdf ; ROOT_PATH=/ ; DATA_PATH=/data .
To set "data-disk-space-utilization":
aws cloudwatch put-metric-alarm \
--alarm-name data-disk-space-utilization \
--alarm-description "Alarm when data disk space exceeds $DATA_DISK_THRESHOLD percent" \
--metric-name DiskSpaceUtilization \
--namespace System/Linux \
--statistic Average \
--period $period \
--threshold $DATA_DISK_THRESHOLD \
--treat-missing-data notBreaching \
--comparison-operator GreaterThanThreshold \
--dimensions Name=Filesystem,Value=$DATA_DEVICE Name=InstanceId,Value=$val Name=MountPath,Value=$DATA_PATH \
--evaluation-periods 1 \
--alarm-actions $arn \
--ok-actions $arn \
--unit Percent
With the help of above code, I am able to set the cloudwatch metrics but none is getting in "Alarm State". I have also tried and changed the threshold to 1, just to check if it goes in the "Alarm State" but still it did not changed.
I am a bit unsure if my above code is correct or not and how will it trigger Alarm?

WebSphere 8.5 :ESB is missing. How to start it together with WebSphere

I installed BPM:
./imcl install \
com.ibm.bpm.ESB.v85_8.6.0.20170918_1207, \
com.ibm.websphere.ND.v85_8.5.5012.20170627_1018 \
-repositories /u01/tmp/BPM/repository/repos_64bit/repository.config \
-acceptLicense \
-installationDirectory /u01/apps/IBM/BPM \
-properties user.wasjava=java8 \
-showVerboseProgress -log silentinstall.log
Than i created Deployment_Managed Profile:
./manageprofiles.sh \
-create \
-adminPassword XXXXXXX \
-profileName Dmgr06 \
-cellName Cell03 \
-serverType DEPLOYMENT_MANAGER \
-adminUserName wasadmin \
-enableAdminSecurity true \
-nodeName CellManager03 \
-profilePath /u01/apps/IBM/BPM/profiles/Dmgr06 \
-personalCertValidityPeriod 15 \
-signingCertValidityPeriod 15 \
-keyStorePassword XXXXXXXX \
-templatePath /u01/apps/IBM/BPM/profileTemplates/management/ \
-startingPort 10000 \
-isDefault
After This i run the startManager.sh command. I was expecting to see WebSphere and ESB up and running but i see only WebSphere:
How do i add the ESB?

Have troubles while adding new etcd members

I'm planning to add new members to a single instance of etcd, but am faced with problems.
I started the first etcd member with the following command:
nohup etcd \
--advertise-client-urls=https://192.168.22.34:2379 \
--cert-file=/etc/kubernetes/pki/etcd/server.crt \
--client-cert-auth=true \
--data-dir=/var/lib/etcd \
--initial-advertise-peer-urls=https://192.168.22.34:2380 \
--initial-cluster=test-master-01=https://192.168.22.34:2380 \
--key-file=/etc/kubernetes/pki/etcd/server.key \
--listen-client-urls=https://0.0.0.0:2379 \
--listen-peer-urls=https://192.168.22.34:2380 \
--name=test-master-01 \
--peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt \
--peer-client-cert-auth=true \
--peer-key-file=/etc/kubernetes/pki/etcd/peer.key \
--peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt \
--snapshot-count=10000 \
--trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt &
Then I checked the health of the cluster and it seems to be healthy:
member f13d668ae0cba84 is healthy: got healthy result from https://192.168.22.34:2379
cluster is healthy
I also checked the members:
f13d668ae0cba84: name=test-master-01 peerURLs=http://192.168.22.34:2380 clientURLs=https://192.168.22.34:2379 isLeader=true
Then I tried to add second member:
etcdctl \
--endpoints=https://127.0.0.1:2379 \
--ca-file=/etc/kubernetes/pki/etcd/ca.crt \
--cert-file=/etc/kubernetes/pki/etcd/healthcheck-client.crt \
--key-file=/etc/kubernetes/pki/etcd/healthcheck-client.key \
member add test-master-02 https://192.168.22.37:2380
Added member named test-master-02 with ID 65bec874cca265d8 to cluster ETCD_NAME="test-master-02"
ETCD_INITIAL_CLUSTER="test-master-01=http://192.168.22.34:2380,test-master-02=https://192.168.22.37:2380"
ETCD_INITIAL_CLUSTER_STATE="existing"
Then started the second etcd member with the following command:
etcd \
--name test-master-02 \
--listen-client-urls https://192.168.22.37:2379 \
--advertise-client-urls https://192.168.22.37:2379 \
--listen-peer-urls https://192.168.22.37:2380 \
--cert-file=/etc/kubernetes/pki/etcd/server.crt \
--client-cert-auth=true \
--trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt \
--peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt \
--peer-client-cert-auth=true \
--peer-key-file=/etc/kubernetes/pki/etcd/peer.key \
--key-file=/etc/kubernetes/pki/etcd/server.key \
--initial-cluster-state=existing \
--initial-cluster=test-master-01=https://192.168.22.34:2380,test-master-02=https://192.168.22.37:2380
But I got an error:
etcdmain: error validating peerURLs {ClusterID:bc8c76911939f2de Members:[&{ID:f13d668ae0cba84 RaftAttributes:{PeerURLs:[http://192.168.22.34:2380]} Attributes:{Name:test-master-01 ClientURLs:[https://192.168.22.34:2379]}} &{ID:65bec874cca265d8 RaftAttributes:{PeerURLs:[https://192.168.22.37:2380]} Attributes:{Name: ClientURLs:[]}}] RemovedMemberIDs:[]}: unmatched member while checking PeerURLs
Update
Looks like I don't have such problem while starting cluster from scratch without restoring from snapshot.
Figured out that before adding new members I needed to update my main etcd member, because instead of etcd config, member list command returned 127.0.0.1 on peerurl

spark on yarn, Diagnostics: Container killed on request. Exit code is 143

it's a simple spark writen by python. I got "Diagnostics: Container killed on request. Exit code is 143" error. Here is my code and script to run this code.
The size of input dir is roughly 100GB. But if I use a small data file(3GB), it will work well.
import sys
from pyspark import SparkContext
sc = SparkContext(appName="job_name")
data1 = sc.textFile(sys.argv[1])
d1 = data1.filter(lambda x: "a string" in x)
print d1.count()
sc.stop()
//-----------------------------------------//
input="xxxxx"
output="yyyyy"
hadoop fs -rmr $output
$SPARK_HOME/bin/spark-submit \
--deploy-mode cluster \
--master yarn \
--num-executors 100 \
--executor-cores 2 \
--driver-cores 2 \
--executor-memory 8g \
--driver-memory 4g \
stat.py \
$input \
$output
So I wrote a hadoop streaming to do this job, and I got:
Error: Java heap space
Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
it seems to be the same error with my spark code.
Now I suspect that there are maybe some records in input is too large.
/*
* add at 2016-12-16
*/
my hadoop streaming is working ^-^, with config like this:
hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming-2.5.0-cdh5.3.2.jar
-D mapred.reduce.tasks=0 \
-D mapreduce.map.memory.mb=3192 \
-D mapreduce.reduce.memory.mb=3192 \
-D mapreduce.map.java.opts=-Xmx2872m \
-D mapreduce.reduce.java.opts=-Xmx2872m \
-D mapred.child.java.opts=-Xmx2048m \
-D mapreduce.task.io.sort.mb=512 \
-mapper "mapper.py" \
-file mapper.py \
-input $input \
-output $output
but my spark code still can not run, no matter how i tune it.
this is my config of spark code:
$SPARK_HOME/bin/spark-submit \
--deploy-mode cluster \
--master yarn \
--num-executors 200 \
--executor-cores 2 \
--driver-cores 2 \
--executor-memory 20g \
--driver-memory 16g \
--verbose \
stat.py \
$input \
$output
Here is my spark-defaults.conf:
spark.yarn.executor.memoryOverhead 4096
spark.yarn.driver.memoryOverhead 4096
spark.memory.storageFraction 0.9
With these infomation, does anyone know how to tune spark.

Resources