Steps to create a edge node for aws emr - amazon-ec2

I have a requirement to create an edge node (ec2) for an AWS EMR cluster. Is there a list of steps that I can follow to make this happen?

Run the below commands as root on your EC2-Instance(Edge Node)
mkdir -p /usr/lib/spark
mkdir -p /usr/lib/hive-webhcat/share/hcatalog
vi /etc/profile.d/spark.sh
export SPARK_HOME=/usr/lib/spark
export PATH=$SPARK_HOME/bin:$PATH
export HADOOP_CONF_DIR=/etc/hadoop/conf
export SPARK_CONF_DIR=/etc/spark/conf
source /etc/profile.d/spark.sh
mkdir -p /etc/hadoop/conf
chown -R kylo:kylo /etc/hadoop/conf
mkdir -p /etc/spark/conf
chown -R kylo:kylo /etc/spark/conf
mkdir -p /usr/share/aws /usr/lib/sqoop /usr/lib/hadoop-yarn /usr/lib/hadoop-mapreduce /usr/lib/hadoop-hdfs /usr/lib/hadoop
chown kylo:kylo /usr/share/aws /usr/lib/sqoop /usr/lib/hadoop-yarn /usr/lib/hadoop-mapreduce /usr/lib/hadoop-hdfs /usr/lib/hadoop
export MASTER_PRIVATE_IP=<MASTER_NODE_IP_ADDRESS>
export PEM_FILE=/home/centos/.ssh/id_rsa
scp -i $PEM_FILE hadoop#$MASTER_PRIVATE_IP:/etc/hadoop/conf/core-site.xml /etc/hadoop/conf
scp -i $PEM_FILE hadoop#$MASTER_PRIVATE_IP:/etc/hadoop/conf/yarn-site.xml /etc/hadoop/conf
scp -i $PEM_FILE hadoop#$MASTER_PRIVATE_IP:/etc/hadoop/conf/hdfs-site.xml /etc/hadoop/conf
scp -i $PEM_FILE hadoop#$MASTER_PRIVATE_IP:/etc/hadoop/conf/mapred-site.xml /etc/hadoop/conf
rsync -avz --delete -e "ssh -o StrictHostKeyChecking=no -o ServerAliveInterval=10 -i $PEM_FILE" hadoop#$MASTER_PRIVATE_IP:'/usr/lib/spark/*' /usr/lib/spark
rsync -avz --delete -e "ssh -o StrictHostKeyChecking=no -o ServerAliveInterval=10 -i $PEM_FILE" hadoop#$MASTER_PRIVATE_IP:'/usr/lib/sqoop/*' /usr/lib/sqoop
rsync -avz --delete -e "ssh -o StrictHostKeyChecking=no -o ServerAliveInterval=10 -i $PEM_FILE" hadoop#$MASTER_PRIVATE_IP:'/usr/lib/hadoop/*' /usr/lib/hadoop
rsync -avz --delete -e "ssh -o StrictHostKeyChecking=no -o ServerAliveInterval=10 -i $PEM_FILE" hadoop#$MASTER_PRIVATE_IP:'/usr/lib/hadoop-yarn/*' /usr/lib/hadoop-yarn
rsync -avz --delete -e "ssh -o StrictHostKeyChecking=no -o ServerAliveInterval=10 -i $PEM_FILE" hadoop#$MASTER_PRIVATE_IP:'/usr/lib/hadoop-mapreduce/*' /usr/lib/hadoop-mapreduce
rsync -avz --delete -e "ssh -o StrictHostKeyChecking=no -o ServerAliveInterval=10 -i $PEM_FILE" hadoop#$MASTER_PRIVATE_IP:'/usr/lib/hadoop-hdfs/*' /usr/lib/hadoop-hdfs
rsync -avz --delete -e "ssh -o StrictHostKeyChecking=no -o ServerAliveInterval=10 -i $PEM_FILE" hadoop#$MASTER_PRIVATE_IP:'/usr/share/aws/*' /usr/share/aws
rsync -avz --delete -e "ssh -o StrictHostKeyChecking=no -o ServerAliveInterval=10 -i $PEM_FILE" hadoop#$MASTER_PRIVATE_IP:'/etc/spark/conf/*' /etc/spark/conf
echo "spark.hadoop.yarn.timeline-service.enabled false" >> /etc/spark/conf/spark-defaults.conf
You might need to ls for this file on the master node since the version could be different
scp -o StrictHostKeyChecking=no -o ServerAliveInterval=10 -i $PEM_FILE hadoop#$MASTER_PRIVATE_IP:/usr/lib/hive-hcatalog/share/hcatalog/hive-hcatalog-core-2.3.3-amzn-1.jar /usr/lib/hive-webhcat/share/hcatalog/hive-hcatalog-core.jar
You should ls to verify the JAR path
ls /usr/lib/spark/examples/jars/spark-examples_ <HIT TAB>
spark-submit --class org.apache.spark.examples.SparkPi --master yarn --driver-memory 512m --executor-memory 512m --executor-cores 1 /usr/lib/spark/examples/jars/spark-examples_2.11-2.3.1.jar 10
Check the Yarn UI to verify it was successful
http://<MASTER_NODE>:8088/cluster

Related

Get output from a shell script that does ssh two level

I have two shell scripts like below:
Script1:
ssh -q -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null my_username#jump_box <<EOF
ls
ssh -q -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null actual_host <<EOF1
sudo docker ps --format='{{json .}}'
EOF1
EOF
Script2:
details=nothing
ssh -q -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null my_username#jump_box <<EOF
ls
details=$(ssh -q -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null actual_host "sudo docker ps --format='{{json .}}'")
EOF
echo "${details}"
I need the docker details in a varilable in my local machine so that I can do some operations on it. The first script runs fine and I can see the output of the docker command on my local machine but the second script doesn't work. It seems to be hung/stuck and doesn't do anything and I have to forcefully quit it.
Like the comment from #Gordon Davisson, use a jumpbox.
But you can define it in the ~/.ssh/config file, too.
HOST my_jump_box
hostname jump_box
user my_username
StrictHostKeyChecking no
UserKnownHostsFile /dev/null
HOST actual
hostname actual_hostname
user actual_user
ProxyJump my_jump_box
StrictHostKeyChecking no
UserKnownHostsFile /dev/null
RemoteCommand sudo docker ps --format='{{json .}}'"
Then you can just use ssh actual
To fetch the output details=$(ssh actual).
Btw. Your specific problem could also be solved by changing script2 to:
#!/bin/bash
details=$(./script1)
echo "$details"

Unable to start a java process using a gitlab ci pipeline to deploy on EC2 instance

What I'm trying to do is to deploy a java application on an ec2 instance using gitlab-ci pipeline.
After copying the .jar file it has to start the process.
The deploy steps are the next ones:
*deploy:
stage: 'deploy'
image: ubuntu
before_script:
- apt update
- 'which ssh-agent || ( apt-get update -y && apt-get install openssh-client -y )'
- eval $(ssh-agent -s)
- echo "$SSH_KEY" | tr -d '\r' | ssh-add -
- mkdir -p ~/.ssh
- chmod 700 ~/.ssh
script:
- scp -r -o StrictHostKeyChecking=no target/file.jar user#ip:/home/ubuntu/jars
- ssh -o StrictHostKeyChecking=no user#ip "sudo pkill -f file.jar"
- ssh -o StrictHostKeyChecking=no user#ip "sudo nohup java -jar -Dspring.profiles.active=dev -Xms100m -Xmx150m /home/ubuntu/jars/file.jar > /home/ubuntu/jars/file.log 2>&1 &"*
In the end the job has success status, but it is not starting the process.
Do you have any ideas how I can fix this issue?
Log:
$ scp -r -o StrictHostKeyChecking=no target/file.jar ubuntu#ip:/home/ubuntu/jars
Warning: Permanently added 'ip' (ECDSA) to the list of known hosts.
$ ssh -o StrictHostKeyChecking=no ubuntu#ip "sudo ps -ef | grep java"
root 9781 1 1 11:15 ? 00:00:27 java -jar -Dspring.profiles.active=dev -Xms100m -Xmx150m /home/ubuntu/jars/file.jar
ubuntu 9991 9990 0 11:41 ? 00:00:00 bash -c sudo ps -ef | grep java
ubuntu 9993 9991 0 11:41 ? 00:00:00 grep java
$ ssh -o StrictHostKeyChecking=no ubuntu#ip "sudo pkill -f file.jar"
$ ssh -o StrictHostKeyChecking=no ubuntu#ip "sudo nohup java -jar -Dspring.profiles.active=dev -Xms100m -Xmx150m /home/ubuntu/jars/file.jar > /home/ubuntu/jars/file.log 2>&1 &"
$ ssh -o StrictHostKeyChecking=no ubuntu#ip "cd /home/ubuntu/jars; ls"
job
file.jar
file.log
Running after_script
Saving cache
Uploading artifacts for successful job
Job succeeded
Update!
Using this command it worked perfect!
- ssh -o StrictHostKeyChecking=no ubuntu#ip "sudo su root -c 'nohup java -jar -Dspring.profiles.active=dev -Xms100m -Xmx150m /home/ubuntu/jars/file.jar > /home/ubuntu/jars/file.log 2>&1 &'"

(Error while running a command with SSH) command-line: line 0: Bad configuration option

Error Msg:
command-line: line 0: Bad configuration option:
sh '''ssh -i ${rundeck_rsa_key} -o StrictHostKeyChecking=no -o centos#xxxx.net "sudo su -c "sh ./home/centos/releases/xx.sh" rundeck"'''
Broken Down command (I just made the above command for your convenience)
sh '''ssh -i ${rundeck_rsa_key} -o StrictHostKeyChecking=no
-o centos#xxxx.net "sudo su -c "sh ./home/centos/releases/xx.sh" servc"'''
I'm trying to
ssh into the server
change user to "servc"
execute xx.sh shell
I think there is a syntax error on "sudo su -c "sh ./home/centos/releases/xx.sh" servc"
Do you have any clue?? :D
You can't nest a double quoted string inside another without escaping the inner ones.
Try this:
sh '''ssh -i ${rundeck_rsa_key} -o StrictHostKeyChecking=no -o centos#xxxx.net "sudo su -c \"sh ./home/centos/releases/xx.sh\" rundeck"'''

Weird output observed on executing ssh commands remotely over ProxyCommand

Team, I have two steps to perform:
SCP a shell script file to remote ubuntu linux machine
Execute this uploaded file on remote ubuntu linux machine over SSH session using PROXYCommand because I have bastion server in front.
Code:
scp -i /home/dtlu/.ssh/key.key -o "ProxyCommand ssh -i /home/dtlu/.ssh/key.key lab#api.dev.test.com -W %h:%p" /home/dtlu/backup/test.sh lab#$k8s_node_ip:/tmp/
ssh -o StrictHostKeyChecking=no -i /home/dtlu/.ssh/key.key -o 'ProxyCommand ssh -i /home/dtlu/.ssh/key.key -W %h:%p lab#api.dev.test.com' lab#$k8s_node_ip "uname -a; date;echo "Dummy123!" | sudo -S bash -c 'echo 127.0.1.1 \`hostname\` >> /etc/hosts'; cd /tmp; pwd; systemctl status cachefilesd | grep Active; ls -ltr /tmp/test.sh; echo "Dummy123!" | sudo -Sv && bash -s < test.sh"
Both calls above are working fine. I am able to upload test.sh and also its running but what is bothering me is during the process am observe weird output being thrown out.
output:
/tmp. <<< expected
[sudo] password for lab: Showing one
Sent message type=method_call sender=n/a destination=org.freedesktop.DBus object=/org/freedesktop/DBus interface=org.freedesktop.DBus member=Hello cookie=1 reply_cookie=0 error=n/a
Root directory /run/log/journal added.
Considering /run/log/journal/df22e14b1f83428292fe17f518feaebb.
Directory /run/log/journal/df22e14b1f83428292fe17f518feaebb added.
File /run/log/journal/df22e14b1f83428292fe17f518feaebb/system.journal added.
So, I don't want /run/log/hournal and other lines which don't correspond to my command in sh.
Consider adding -q to the scp and ssh commands to reduce the output they might produce. You can also redirect stderr and stdout to /dev/null as appropriate.
For example:
{
scp -q -i /home/dtlu/.ssh/key.key -o "ProxyCommand ssh -i /home/dtlu/.ssh/key.key lab#api.dev.test.com -W %h:%p" /home/dtlu/backup/test.sh lab#$k8s_node_ip:/tmp/
ssh -q -o StrictHostKeyChecking=no -i /home/dtlu/.ssh/key.key -o 'ProxyCommand ssh -i /home/dtlu/.ssh/key.key -W %h:%p lab#api.dev.test.com' lab#$k8s_node_ip "uname -a; date;echo "Dummy123!" | sudo -S bash -c 'echo 127.0.1.1 \`hostname\` >> /etc/hosts'; cd /tmp; pwd; systemctl status cachefilesd | grep Active; ls -ltr /tmp/test.sh; echo "Dummy123!" | sudo -Sv && bash -s < test.sh"
} >&/dev/null

rsync with ssh without using credentials stored in ~/.ssh/config

I have a script that transfers files. Everytime I run it It needs to connect to a different host. That's why I'm adding the host as parameter.
The script is executed as: ./transfer.sh <hostname>
#!/bin/bash -evx
SSH="ssh \
-o UseRoaming=no \
-o UserKnownHostsFile=/dev/null \
-o StrictHostKeyChecking=no \
-i ~/.ssh/privateKey.pem \
-l ec2-user \
${1}"
files=(
file1
file2
)
files="${files[#]}"
# this works
$SSH
# this does not work
rsync -avzh --stats --progress $files -e $SSH:/home/ec2-user/
# also this does not work
rsync -avzh --stats --progress $files -e $SSH ec2-user#$1:/home/ec2-user/
I can properly connect with the ssh connection stored in $SSH, but the rsync connection attempts fails because of the wrong key:
Permission denied (publickey).
rsync: connection unexpectedly closed (0 bytes received so far) [sender]
rsync error: unexplained error (code 255) at io.c(226) [sender=3.1.2]
What would be the correct syntax for the rsync connection?
Write set -x before the rsync line and watch how the arguments are expanded. I believe it will be wrong.
You need to enclose the ssh command with arguments (without hostname) into the quotes, otherwise the arguments will get passed to the rsync command and not to the ssh.
My solution after Jakuje pointed me in the right direction:
#!/bin/bash -evx
host=$1
SSH="ssh \
-o UseRoaming=no \
-o UserKnownHostsFile=/dev/null \
-o StrictHostKeyChecking=no \
-i ~/.ssh/privateKey.pem \
-l ec2-user"
files=(
file1
file2
)
files="${files[#]}"
# transfer all in one rsync connection
rsync -avzh --stats --progress $files -e "$SSH" $host:/home/ec2-user/
# launch setup script
$SSH $host ./setup.sh

Resources