Run a bash command and split the output by line not by space - bash

I want to run the following command and get the output in the variable splited in an array by line not by space:
files=$( hdfs dfs -ls -R $hdfsDir)
So the output I get is the following: echo $files
drwxr-xr-x - pepeuser supergroup 0 2016-05-27 15:03 /user/some/kpi/2015/01/02 -rw-r--r-- 3 pepeuser supergroup 55107934 2016-05-27 15:02 /user/some/kpi/2015/01/02/part-00000902148 -rw-r--r-- 3 pepeuser supergroup 49225279 2016-05-27 15:02 /user/some/kpi/2015/01/02/part-00001902148
When I do a for in $files in stead of getting the full line on each., I get the column in stead of the line. It prints like the following:
drwxr-xr-x
-
pepeuser
supergroup
and what I need on the for to print like this:
drwxr-xr-x - pepeuser supergroup 0 2016-05-27 15:03 /user/some/kpi/2015/01/02
-rw-r--r-- 3 pepeuser supergroup 55107934 2016-05-27 15:02 /user/some/kpi/2015/01/02/part-00000902148
-rw-r--r-- 3 pepeuser supergroup 49225279 2016-05-27 15:02 /user/some/kpi/2015/01/02/part-00001902148

If you have bash 4, you can use readarray:
readarray -t files < <(hdfs dfs -ls -R "$hdfsDir")
Otherwise, use read -a to read into an array. IFS=$'\n' sets the field separator to newlines and -d '' tells it to keep reading until it hits a NUL character: effectively, that means it'll read to EOF.
IFS=$'\n' read -d '' -r -a files < <(hdfs dfs -ls -R "$hdfsDir")
You can verify that the array is populated correctly with something like:
printf '[%s]\n' "${files[#]}"
And can loop over the array with:
for file in "${files[#]}"; do
echo "$file"
done

Related

Does anybody have a script that counts the number of consecutive files which contain a specific word?

Any resources or advice would help, since I am pretty rubbish at scripting
So, I need to go to this path: /home/client/data/storage/customer/data/2020/09/15
And check to see if there are 5 or more consecutive files that contain the word "REJECTED":
ls -ltr
-rw-rw-r-- 1 root root 5059 Sep 15 00:05 customer_rlt_20200915000514737_20200915000547948_8206b49d-b585-4360-8da0-e90b8081a399.zip
-rw-rw-r-- 1 root root 5023 Sep 15 00:06 customer_rlt_20200915000547619_20200915000635576_900b44dc-1cf4-4b1b-a04f-0fd963591e5f.zip
-rw-rw-r-- 1 root root 39856 Sep 15 00:09 customer_rlt_20200915000824108_20200915000908982_b87b01b3-a5dc-4a80-b19d-14f31ff667bc.zip
-rw-rw-r-- 1 root root 39719 Sep 15 00:09 customer_rlt_20200915000901688_20200915000938206_38261b59-8ebc-4f9f-9e2d-3e32eca3fd4d.zip
-rw-rw-r-- 1 root root 12829 Sep 15 00:13 customer_rlt_20200915001229811_20200915001334327_1667be2f-f1a7-41ae-b9ca-e7103d9abbf8.zip
-rw-rw-r-- 1 root root 12706 Sep 15 00:13 customer_rlt_20200915001333922_20200915001357405_609195c9-f23a-4984-936f-1a0903a35c07.zip
Example of rejected file:
customer_rlt_20200513202515792_20200513202705506_5b8deae0-0405-413c-9a81-d1cc2171fa51REJECTED.zip
What I have so far:
!/bin/bash
YYYY=$(date +%Y);
MM=$(date +%m)
DD=$(date +%d)
#Set constants
CODE_OK=0
CODE_WARN=1
CODE_CRITICAL=2
CODE_UNKNOWN=3
#Set Default Values
FILE="/home/client/data/storage/customer/data/${YYYY}/${MM}/{DD}"
if [ ! -f $FILE ]
then
echo "NO TRANSACTIONS FOUND"
exit $CODE_CRITICAL
fi
You can do something quick in AWK:
$ cat consec.awk
/REJECTED/ {
if (match_line == NR - 1) {
consecutives++
} else {
consecutives = 1
}
if (consecutives == 5) {
print "5 REJECTED"
exit
}
match_line = NR
}
$ touch 1 2REJECTED 3REJECTED 5REJECTED 6REJECTED 7REJECTED 8
$ ls -1 | awk -f consec.awk
5 REJECTED
$ rm 3REJECTED; touch 3
$ ls -1 | awk -f consec.awk
$
This works by matching line containing REJECTED, counting consecutive lines (checked with match_line == NR - 1, which means "the last matching line was the previous line") and printing "5 REJECTED" if the number of consecutive lines is 5.
I've used ls -1 (note digit 1, not letter l) to sort by filename in this example. You could use ls -1rt (digit 1 again) to sort by file modification time, as in your original post.

How to iterate through multiple directories with multiple ifs in bash?

unfortunately I'm quite new at bash, and I want to write a script that will start in a main directory, and check all subdirectories one by one for the presence of certain files, and if those files are present, perform an operation on them. For now, I have written a simplified version to test whether I can do the first part (checking for the files in each directory). This code runs without any errors that I can tell, but it does not echo anything to say that it has successfully found the files which I know are there.
#!/bin/bash
runlist=(1 2 3 4 5 6 7 8 9)
for f in *; do
if [[ -d {$f} ]]; then
#if f is a directory then cd into it
cd "{$f}"
for b in $runlist; do
if [[ -e "{$b}.png" ]]; then
echo "Found {$b}"
#if the file exists then say so
fi
done
cd -
fi
done
'''
Welcome to stackoverflow.
The following will do the trick (a combination of find, array, and if then else):
# list of files we are looking for
runlist=(1 2 4 8 16 32 64 128)
#find each of above anywhere below current directory
# using -maxdepth 1 because, based on on your exam you want to look one level only
# if that's not what you want then take out -maxdepth 1 from the find command
for b in ${runlist[#]}; do
echo
PATH_TO_FOUND_FILE=`find . -name $b.png`
if [ -z "$PATH_TO_FOUND_FILE" ]
then
echo "nothing found" >> /dev/null
else
# You wanted a postive confirmation, so
echo found $b.png
# Now do something with the found file. Let's say ls -l: change that to whatever
ls -l $PATH_TO_FOUND_FILE
fi
done
Here is an example run:
mamuns-mac:stack foo$ ls -lR
total 8
drwxr-xr-x 4 foo 1951595366 128 Apr 11 18:03 dir1
drwxr-xr-x 3 foo 1951595366 96 Apr 11 18:03 dir2
-rwxr--r-- 1 foo 1951595366 652 Apr 11 18:15 find_file_and_do_something.sh
./dir1:
total 0
-rw-r--r-- 1 foo 1951595366 0 Apr 11 17:58 1.png
-rw-r--r-- 1 foo 1951595366 0 Apr 11 17:58 8.png
./dir2:
total 0
-rw-r--r-- 1 foo 1951595366 0 Apr 11 18:03 64.png
mamuns-mac:stack foo$ ./find_file_and_do_something.sh
found 1.png
-rw-r--r-- 1 foo 1951595366 0 Apr 11 17:58 ./dir1/1.png
found 8.png
-rw-r--r-- 1 foo 1951595366 0 Apr 11 17:58 ./dir1/8.png
found 64.png
-rw-r--r-- 1 foo 1951595366 0 Apr 11 18:03 ./dir2/64.png

is it safe to remove the /tmp/hive/hive folder?

is it safe to remove the /tmp/hive/hive folder? ( from hdfs )
as ( from user hdfs )
hdfs dfs -rm -r /tmp/hive/hive
the reason for that because under /tmp/hive/hive we have thousand of files and we cant delete them
hdfs dfs -ls /tmp/hive/
Found 7 items
drwx------ - admin hdfs 0 2019-03-05 12:00 /tmp/hive/admin
drwx------ - drt hdfs 0 2019-06-16 14:02 /tmp/hive/drt
drwx------ - ambari-qa hdfs 0 2019-06-16 15:11 /tmp/hive/ambari-qa
drwx------ - anonymous hdfs 0 2019-06-16 08:57 /tmp/hive/anonymous
drwx------ - hdfs hdfs 0 2019-06-13 08:42 /tmp/hive/hdfs
drwx------ - hive hdfs 0 2019-06-13 10:58 /tmp/hive/hive
drwx------ - root hdfs 0 2018-07-17 23:37 /tmp/hive/root
what we did until now - as the following is to remove the files that are older then 10 days ,
but because there are so many files then files not deleted at all
hdfs dfs -ls /tmp/hive/hive | tr -s " " | cut -d' ' -f6-8 | grep "^[0-9]" | awk 'BEGIN{ MIN=14400; LAST=60*MIN; "date +%s" | getline NOW } { cmd="date -d'\''"$1" "$2"'\'' +%s"; cmd | getline WHEN; DIFF=NOW-WHEN; if(DIFF > LAST){ print "Deleting: "$3; system("hdfs dfs -rm -r "$3) }}'

specify time range using unix grep

Hi I have few files in hdfs , now I have to extract the files in specific range . How can I do that using unix grep command?
My hdfs looks like this:
-rw-rw-r-- 3 pscore hdpdevs 94461 2014-12-10 02:08 /data/bus/pharma/shared/purch/availability_alert/proc/2014-12-10_02-07-12-0
-rw-rw-r-- 3 pscore hdpdevs 974422 2014-12-11 02:08 /data/bus/pharma/shared/purch/availability_alert/proc/2014-12-11_02-07-10-0
-rw-rw-r-- 3 pscore hdpdevs 32854 2014-12-11 02:08 /data/bus/pharma/shared/purch/availability_alert/proc/2014-12-11_02-07-16-0
-rw-rw-r-- 3 pscore hdpdevs 1936753 2014-12-12 02:07 /data/bus/pharma/shared/purch/availability_alert/proc/2014-12-12_02-06-04-0
-rw-rw-r-- 3 pscore hdpdevs 79365 2014-12-12 02:07 /data/bus/pharma/shared/purch/availability_alert/proc/2014-12-12_02-06-11-0
I want to extract the files from 2014-12-11 09:00 to 2014-12-12 09:00.
I tried using hadoop fs -ls /dabc | sed -n '/2014-12-11 09:00/ , /2014-12-12 09:00/p' but that does'nt work . Any help? I want to use grep command for this
awk '$6FS$7 >= "2014-12-11 09:00" && $6FS$7 <= "2014-12-12 09:00"'
Can I do string comparison in awk?

How to extract date from filename with extenstion using shell script

I tried to extract date from filenames for first two rows only with extension .log
ex: filenames are as follows
my_logFile.txt contains
abc20140916_1.log
abhgg20140914_1.log
abf20140910_1.log
log.abc_abc20140909_1
The code I tried:
awk '{print substr($1,length($1)-3,4)}' my_logFile.txt
But getting op as:
.log
.log
.log
Need op as:
20140916
20140914
*****revised query*
I have a txt file containing n number of log files. Each line in txt file is like this.
-rw-rw-rw- 1 abchost abchost 241315175 Apr 16 10:45 abc20140405_1.log
-rw-rw-rw- 1 abchost abchost 241315175 Apr 16 10:45 aghtff20140404_1.log
-rw-rw-rw- 1 abchost abchost 241315175 Apr 16 10:45 log.pqrs20140403_1
I need to extract date out of file names from only first two rows. Here the filename has varying number of char before date.
The op should beL
20140405
20140404
Will this work to you?
$ head -2 file | grep -Po ' [a-z]+\K[0-9]+(?=.*\.log$)'
20140405
20140404
Explanation
head -2 file gets the first two lines of the file.
grep -Po ' [a-z]+\K[0-9]+(?=.*\.log$)' gets the set of digits in between a block of (space + a-z letters) and (.log + end of line).
try this,
cut -f9 -d " " <file> | grep -o -E "[0-9]{8}"
worked on my machine,
[root#giam20 ~]# cat sample.txt
-rw-rw-rw- 1 abchost abchost 241315175 Apr 16 10:45 abc20140405_1.log
-rw-rw-rw- 1 abchost abchost 241315175 Apr 16 10:45 aghtff20140404_1.log
-rw-rw-rw- 1 abchost abchost 241315175 Apr 16 10:45 log.pqrs20140403_1
[root#giam20 ~]# cut -f9 -d " " sample.txt | grep -o -E "[0-9]{8}"
20140405
20140404
20140403

Resources