Executing hadoopfs in shell script - bash

I trying to run a bash script of the following :
#!/bin/bash
CURRENT_HDFS_PATH=`hadoopfs -ls -t -r /$CLEAN_HDFS_PATH | tail -1 | awk -F ' ' '{print $8}'`
echo "Here is the last (most current) file in the history folder to be downloaded=$CURRENT_HDFS_PATH"
The above does not produce any result at all. Please note that CLEAN_HDFS_PATH=/temp/local-*.inprogress
When I use the following in command line:
hadoopfs -ls -t -r '/temp/local-*.inprogress' | tail -1 | awk -F ' ' '{print $8}'
I get the answer from the command line.
What am I doing wrong in my script ?
Cheers,

Is the name of the file literally local-*.inprogress? If so, your problem is wildcard expansion within script. Add double quotes around the variable and see if that works like:
CURRENT_HDFS_PATH=`hadoopfs -ls -t -r "/$CLEAN_HDFS_PATH" | tail -1 | awk -F ' ' '{print $8}'`

Related

syntax error near unexpected token near `('

Command below does not run from script:
zcat *|cut -d"," -f1,2 | tr -d "\r" |
awk -F "," '{if (\$1 =="\"word\"" || \$1 =="\"word2\""){printf "\n%s",\$0}else{printf "%s",\$0}}' |
grep -i "resultCode>00000" | wc -l
Error:
./script.sh: command substitution: line 8: syntax error near unexpected token `('
./script.sh: command substitution: line 8: `ssh -t user#ip 'cd "$(ls -td path/* | tail -n1)" && zcat *|cut -d"," -f1,2 | tr -d "\r" | awk -F "," '{if ($1 =="\"word\"" || $1 =="\"word2\""){printf "\n\%s",$0}else{printf "\%s",$0}}'| grep -i "resultCode>00000" | wc -l''
How should i fix syntax error near unexpected token?
ssh -t user#ip 'cd "$(ls -td path/* | tail -n1)" &&
zcat *|cut -d"," -f1,2 | tr -d "\r" |
awk -F "," '{if ($1 =="\"word\"" || $1 =="\"word2\""){
printf "\n\%s",$0}else{printf "\%s",$0}}'|
grep -i "resultCode>00000" | wc -l''
There's a mountain of syntax errors here. First off, you can't nest single quotes like this: ''''. That's two single-quoted empty strings next to each other, not single quotes inside single quotes. In fact, there is no way to have single quotes inside single quotes. (It is possible to get them there by other means, e.g. by switching to double quotes.)
If you don't have any particular reason to run all of these commands remotely, the simplest fix is probably to just run the zcat in SSH, and have the rest of the pipeline run locally. If the output from zcat is massive, there could be good reasons to avoid sending it all over the SSH connection, but let's just figure out a way to fix this first.
ssh -t user#ip 'cd "$(ls -td path/* | tail -n1)" && zcat *' |
cut -d"," -f1,2 | tr -d "\r" |
awk -F "," '{if ($1 =="\"word\"" || $1 =="\"word2\""){
printf "\n\%s",$0}else{printf "\%s",$0}}'|
grep -i "resultCode>00000" | wc -l
But of course, you can replace grep | wc -l with grep -c, and probably refactor all of the rest into your Awk script.
ssh -t user#ip 'cd "$(ls -td path/* | tail -n1)" && zcat *' |
awk -F "," '$1 ~ /^\"(word|word2)\"$/ { printf "\n%s,%s", $1, $2; next }
{ printf "%s,%s", $1, $2 }
END { printf "\n" }' |
grep -ic "resultCode>0000"
The final grep can probably also be refactored into the Awk script, but without more knowledge of what your expected input looks like, I would have to guess too many things. (This already rests on some possibly incorrect assumptions.)
If you want to run all of this remotely, the second simplest fix is probably to pass the script as a here document to SSH.
ssh -t user#ip <<\:
cd "$(ls -td path/* | tail -n1)" &&
zcat * |
awk -F "," '$1 ~ /^\"(word|word2)\"$/ { printf "\n%s,%s", $1, $2; next }
{ printf "%s,%s", $1, $2 } END { printf "\n" }' |
grep -ic "resultCode>00000"
:
where again my refactoring of your Awk script may or may not be an oversimplification which doesn't do exactly what your original code did. (In particular, removing DOS carriage returns from the end of the line seems superfluous if you are only examining the first two fields of the input; but perhaps there can be lines which only have two fields, which need to have the carriage returns trimmed. That's easy in Awk as such; sub(/\r/, "").)

How to add shell script to jenkins pipeline

I have the below shell script:
du -sh /bbhome/shared/data/repositories/* |sort -h |tail -20 |
while IFS= read -r line;do
DIR=`echo $line | awk '{print$2}'`
Rep=`cat $DIR/repository-config |grep 'project\|repo' | tr '\n' ' '`
Size=`echo $line | awk '{print $1}' `
echo $Size $Rep
done
How can I run it thought Execute shell in Jenkins? I need also to add ssh command to the env (no need for a password).
Note I don't want to connect to the env and run this shell, but directly from Excecue shell box
If I'm not wrong your are using a Freestyle job and not a pipeline job.
Anyway, I think you have to try the following :
ssh -t XXXXX#YYYYY << 'EOF'
du -sh /bbhome/shared/data/repositories/* |sort -h |tail -20 |
while IFS= read -r line;do\
DIR=echo $line | awk '{print$2}'\
Rep=cat $DIR/repository-config |grep 'project\|repo' | tr '\n' ' '\
Size=echo $line | awk '{print $1}' \
echo $Size $Rep\
done
EOF
I've escaped the code inside your while loop using \, if it's doesn't works you can use ; instead.
If you want help for using a pipeline job, let me know but i might be a bit more complex.

grep search with filename as parameter

I'm working on a shell script.
OUT=$1
here, the OUT variable is my filename.
I'm using grep search as follows:
l=`grep "$pattern " -A 15 $OUT | grep -w $i | awk '{print $8}'|tail -1 | tr '\n' ','`
The issue is that the filename parameter I must pass is test.log.However, I have the folder structure :
test.log
test.log.001
test.log.002
I would ideally like to pass the filename as test.log and would like it to search it in all log files.I know the usual way to do is by using test.log.* in command line, but I'm facing difficulty replicating the same in shell script.
My efforts:
var-$'.*'
l=`grep "$pattern " -A 15 $OUT$var | grep -w $i | awk '{print $8}'|tail -1 | tr '\n' ','`
However, I did not get the desired result.
Hopefully this will get you closer:
#!/bin/bash
for f in "${1}*"; do
grep "$pattern" -A15 "$f"
done | grep -w $i | awk 'END{print $8}'

how to output awk result to varial

i need to run hadoop command to list all live nodes, then based on the output i reformat it using awk command, and eventually output the result to a variable, awk use different delimiter each time i call it:
hadoop job -list-active-trackers | sort | awk -F. '{print $1}' | awk -F_ '{print $2}'
it outputs result like this:
hadoop-dn-11
hadoop-dn-12
...
then i put the whole command in variable to print out the result line by line:
var=$(sudo -H -u hadoop bash -c "hadoop job -list-active-trackers | sort | awk -F "." '{print $1}' | awk -F "_" '{print $2}'")
printf %s "$var" | while IFS= read -r line
do
echo "$line"
done
the awk -F didnt' work, it output result as:
tracker_hadoop-dn-1.xx.xsy.interanl:localhost/127.0.0.1:9990
tracker_hadoop-dn-1.xx.xsy.interanl:localhost/127.0.0.1:9390
why the awk with -F won't work correctly? and how i can fix it?
var=$(sudo -H -u hadoop bash -c "hadoop job -list-active-trackers | sort | awk -F "." '{print $1}' | awk -F "_" '{print $2}'")
Because you're enclosing the whole command in double quotes, your shell is expanding the variables $1 and $2 before launching sudo. This is what the sudo command looks like (I'm assuming $1 and $2 are empty)
sudo -H -u hadoop bash -c "hadoop job -list-active-trackers | sort | awk -F . '{print }' | awk -F _ '{print }'"
So, you see your awk commands are printing the whole line instead of just the first and 2nd fields respectively.
This is merely a quoting challenge
var=$(sudo -H -u hadoop bash -c 'hadoop job -list-active-trackers | sort | awk -F "." '\''{print $1}'\'' | awk -F "_" '\''{print $2}'\')
A bash single quoted string cannot contain single quotes, so that's why you see ...'\''... -- to close the string, concatenate a literal single quote, then re-open the string.
Another way is to escape the vars and inner double quotes:
var=$(sudo -H -u hadoop bash -c "hadoop job -list-active-trackers | sort | awk -F \".\" '{print \$1}' | awk -F \"_\" '{print \$2}'")

Assigning deciles using bash

I'm learning bash, and here's a short script to assign deciles to the second column of file $1.
The complicating bit is the use of awk within the script, leading to ambiguous redirects when I run the script.
I would have gotten this done in SAS by now, but like the idea of two lines of code doing the job.
How can I communicate the total number of rows (${N}) to awk within the script? Thanks.
N=$(wc -l < $1)
cat $1 | sort -t' ' -k2gr,2 | awk '{$3=int((((NR-1)*10.0)/"${N}")+1);print $0}'
You can set an awk variable from the command line using -v.
N=$(wc -l < "$1" | tr -d ' ')
sort -t' ' -k2gr,2 "$1" | awk -v n=$N '{$3=int((((NR-1)*10.0)/n)+1);print $0}'
I added tr -d to get rid of the leading spaces that wc -l puts in its result.

Resources