run hadoop command in bash script - bash

i need to run hadoop command in bash script, which go through bunch of folders on amazon S3, then write those folder names into a txt file, then do further process. but the problem is when i ran the script, seems no folder names were written to txt file. i wonder if it's the hadoop command took too long to run and the bash script didn't wait until it finished and go ahead to do further process, if so how i can make bash wait until the hadoop command finished then go do other process?
here is my code, i tried both way, neither works:
1.
listCmd="hadoop fs -ls s3n://$AWS_ACCESS_KEY:$AWS_SECRET_KEY#$S3_BUCKET/*/*/$mydate | grep s3n | awk -F' ' '{print $6}' | cut -f 4- -d / > $FILE_NAME"
echo -e "listing... $listCmd\n"
eval $listCmd
...other process ...
2.
echo -e "list the folders we want to copy into a file"
hadoop fs -ls s3n://$AWS_ACCESS_KEY:$AWS_SECRET_KEY#$S3_BUCKET/*/*/$mydate | grep s3n | awk -F' ' '{print $6}' | cut -f 4- -d / > $FILE_NAME
... other process ....
any one knows what might be wrong? and is it better to use the eval function or just use the second way to run hadoop command directly
thanks.

I would prefer to eval in this case, prettier to append the next command to this one. and I would rather break down listCmd into parts, so that you know there is nothing wrong at the grep, awk or cut level.
listCmd="hadoop fs -ls s3n://$AWS_ACCESS_KEY:$AWS_SECRET_KEY#$S3_BUCKET/*/*/$mydate > $raw_File"
gcmd="cat $raw_File | grep s3n | awk -F' ' '{print $6}' | cut -f 4- -d / > $FILE_NAME"
echo "Running $listCmd and other commands after that"
otherCmd="cat $FILE_NAME"
eval "$listCmd";
echo $? # This will print the exit status of the $listCmd
eval "$gcmd" && echo "Finished Listing" && eval "$otherCmd"
otherCmd will only be executed if $gcmd succeeds. If you have too many commands that you need to execute, then this becomes a bit ugly. If you roughly know how long it will take, you can insert a sleep command.
eval "$listCmd"
sleep 1800 # This will sleep 1800 seconds
eval "$otherCmd"

Related

Dockerfile RUN echo command runs with /bin/sh instead of /bin/sh -c

This dockerfile has a spark path, retrieves the file name and installs the same. However, the script fails at the first echo command. It looks like the echo is being run as /bin/sh instead of /bin/sh -c.
How can I execute this echo command using /bin/sh -c? Is this the correct way to implement it, I'm planning on using the same logic for other installations such as Mongo, Node etc.
FROM ubuntu:18.04
ARG SPARK_FILE_LOCATION="http://www.us.apache.org/dist/spark/spark-2.4.4/spark-2.4.4-bin-hadoop2.7.tgz"
CHAR_COUNT=`echo "${SPARK_FILE_LOCATION}" | awk -F"${DELIMITER}" '{print NF-1}'`
RUN echo $CHAR_COUNT
RUN CHAR_COUNT=`expr $CHAR_COUNT + 1`
RUN SPARK_FILE_NAME=`echo ${SPARK_FILE_LOCATION} | cut -f${CHAR_COUNT} -d"/"`
RUN Dir_name=`tar -tzf $SPARK_FILE_NAME | head -1 | cut -f1 -d"/"`
RUN echo Dir_name
/bin/sh: 1: 'echo http://www.us.apache.org/dist/spark/spark-2.4.4/spark-2.4.4-bin-hadoop2.7.tgz | awk -F/ "{print NF-1}"': not found

Bash- Running a command on each grep correspondence without stopping tail -n0 -f

I'm currently monitoring a log file and my ultimate goal is to write a script that uses tail -n0 -f and execute a certain command once grep finds a correspondence. My current code:
tail -n 0 -f $logfile | grep -q $pattern && echo $warning > $anotherlogfile
This works but only once, since grep -q stops when it finds a match. The script must keep searching and running the command, so I can update a status log and run another script to automatically fix the problem. Can you give me a hint?
Thanks
use a while loop
tail -n 0 -f "$logfile" | while read LINE; do
echo "$LINE" | grep -q "$pattern" && echo "$warning" > "$anotherlogfile"
done
awk will let us continue to process lines and take actions when a pattern is found. Something like:
tail -n0 -f "$logfile" | awk -v pattern="$pattern" '$0 ~ pattern {print "WARN" >> "anotherLogFile"}'
If you need to pass in the warning message and path to anotherLogFile you can use more -v flags to awk. Also, you could have awk take the action you want instead. It can run commands via the system() function where you pass the shell command to run

Crontab can't execute command in script

I have shell script i've written that deletes the oldest logfile in a directory when the mount point reaches 90% capacity. When I run the script manually it works fine but when I attempt to use crontab to run it cannot seem to execute the actual rm command but it executes everything else in the script. See my crontab and script below.
0 * * * * /acsmgmt/iselogs/iselogcleanup.sh
#!/bin/bash
df -H | grep /acsmgmt | awk '{ print $4 " " $5 }' | while read output;
do
#!echo $output
usep=$(echo $output | awk '{ print $1 }' | cut -d '%' -f1)
#!echo $usep
if [ $usep -ge 90 ]; then
echo $(date) "Logs cleaned up" >> /tmp/isecleanup.log
rm -v `ls /acsmgmt/iselogs -rt | grep "iselog-" | head -1` >> /tmp/isecleanup.log
else
echo $(date) "No logs to clean up" >> /tmp/isecleanup.log
fi
done
So, the answer is indeed, as I suspected, to always make sure you specify a correct and complete PATH variable in any script called by cron.
(I keep making this same mistake myself, even after years of writing cron scripts -- some versions of cron allow you to specify a default PATH (and other environment variables) for all your scripts, and this can help, but it also needs careful maintenance.)

Bash + SSH + Grep not generating output

I'm trying to use the script below to extract values from the df command on remote servers, then record to a log file. SSH keys are in place and no password is needed (this is not the problem).
It's getting hung up, however, and not spitting back output.
#!/bin/bash
PATH=/bin:/usr/bin:/usr/sbin
export PATH
SERVERLIST=/opt/scripts/server-list.dat
while IFS='|' read -u 3 hostname; do
echo evaluating $hostname...
SIZE=$(ssh $hostname | df -Pkhl | grep '/Volumes/UserStorage$' | awk '{print $2}')
echo $SIZE
done 3< $SERVERLIST
exit 0
You need to run df on the remote system, not pipe the output of an interactive ssh to it:
SIZE=$(ssh -n $hostname df -Pkhl | grep '/Volumes/UserStorage$' | awk '{print $2}')
Also, use the -n option to ssh to keep it from trying to read from stdin, which would consume the rest of the lines from the server list file.

Grep output of command and use it in "if" statement, bash

Okay so here's another one about the StarMade server.
Previously I had this script for detecting a crash, it would simply search through the logs:
#!/bin/bash
cd "$(dirname "$0")"
if ( grep "[SERVER] SERVER SHUTDOWN" log.txt.0); then
sleep 7; kill -9 $(ps -aef | grep -v grep | grep 'StarMade.jar' | awk '{print $2}')
fi
It would find "[SERVER] SERVER SHUTDOWN" and kill the process after that, however this is not a waterproof method, because with different errors it could be possible that the message doesn't appear, rendering this script useless.
So I have this tool that can send commands to the server, but returns an EOF exception when the server is in a crashed state. I basically want to grab the output of this command, and use it in the if-statement above, instead of the current grep command, in such a way that it would execute the commands below when the grep finds "java.io.EOFException".
I could make it write the output to a file and then grep it from there, but I wonder, isn't there a better/more efficient method to do this?
EDIT: okay, so after a bit of searching I put together the following:
if ( java -jar /home/starmade/StarMade/StarNet.jar xxxxx xxxxx /chat) 2>&1 > /dev/null |grep java.io.EOFException);
Would this be a valid if-statement? I need it to match "java.io.EOFException" in the output of the first command, and if it matches, to execute something with "then" (got that part working).
Not sure to solve your problem, but this line:
ps -aef | grep -v grep | grep 'StarMade.jar' | awk '{print $2}'
could be change to
ps -aef | awk '/[S]tarMade.jar/ {print $2}'
The [S] prevents awk from finding itself.
Or just like this to get the pid
pidof StarMade.jar

Resources