specify time range using unix grep - bash

Hi I have few files in hdfs , now I have to extract the files in specific range . How can I do that using unix grep command?
My hdfs looks like this:
-rw-rw-r-- 3 pscore hdpdevs 94461 2014-12-10 02:08 /data/bus/pharma/shared/purch/availability_alert/proc/2014-12-10_02-07-12-0
-rw-rw-r-- 3 pscore hdpdevs 974422 2014-12-11 02:08 /data/bus/pharma/shared/purch/availability_alert/proc/2014-12-11_02-07-10-0
-rw-rw-r-- 3 pscore hdpdevs 32854 2014-12-11 02:08 /data/bus/pharma/shared/purch/availability_alert/proc/2014-12-11_02-07-16-0
-rw-rw-r-- 3 pscore hdpdevs 1936753 2014-12-12 02:07 /data/bus/pharma/shared/purch/availability_alert/proc/2014-12-12_02-06-04-0
-rw-rw-r-- 3 pscore hdpdevs 79365 2014-12-12 02:07 /data/bus/pharma/shared/purch/availability_alert/proc/2014-12-12_02-06-11-0
I want to extract the files from 2014-12-11 09:00 to 2014-12-12 09:00.
I tried using hadoop fs -ls /dabc | sed -n '/2014-12-11 09:00/ , /2014-12-12 09:00/p' but that does'nt work . Any help? I want to use grep command for this

awk '$6FS$7 >= "2014-12-11 09:00" && $6FS$7 <= "2014-12-12 09:00"'
Can I do string comparison in awk?

Related

For loop with if statements isn't working as expected in bash

It only prints the "else" statement for everything but I know for a fact the files exist that it's looking for. I've tried adapting some of the other answers but I thought this should definitely work.
Does anyone know what's wrong with my syntax?
# Contents of script
for ID_SAMPLE in $(cut -f1 metadata.tsv | tail -n +2);
do if [ -f ./output/${ID_SAMPLE} ]; then
echo Skipping ${ID_SAMPLE};
else
echo Processing ${ID_SAMPLE};
fi
done
Additional information
# Output directory
(base) -bash-4.1$ ls -lhS output/
total 170K
drwxr-xr-x 8 jespinoz tigr 185 Jan 3 16:16 ERR1701760
drwxr-xr-x 8 jespinoz tigr 185 Jan 17 18:03 ERR315863
drwxr-xr-x 8 jespinoz tigr 185 Jan 16 23:23 ERR599042
drwxr-xr-x 8 jespinoz tigr 185 Jan 17 00:10 ERR599072
drwxr-xr-x 8 jespinoz tigr 185 Jan 16 13:00 ERR599078
# Example of inputs
(base) -bash-4.1$ cut -f1 metadata.tsv | tail -n +2 | head -n 10
ERR1701760
ERR599078
ERR599079
ERR599070
ERR599071
ERR599072
ERR599073
ERR599074
ERR599075
ERR599076
# Output of script
(base) -bash-4.1$ bash test.sh | head -n 10
Processing ERR1701760
Processing ERR599078
Processing ERR599079
Processing ERR599070
Processing ERR599071
Processing ERR599072
Processing ERR599073
Processing ERR599074
Processing ERR599075
Processing ERR599076
# Checking a directory
(base) -bash-4.1$ ls -l ./output/ERR1701760
total 294
drwxr-xr-x 2 jespinoz tigr 386 Jan 15 21:00 checkpoints
drwxr-xr-x 2 jespinoz tigr 0 Jan 10 01:36 tmp
-f is for checking whether the name is a file, but all your names are directories. Use -d to check that.
if [ -d "./output/$ID_SAMPLE" ]
then
If you want to check whether the name exists with any type, use -e.

How to get a filename list with ncftp?

So I tried
ncftpls -l
which gives me a list
-rw-r--r-- 1 100 ftpgroup 3817084 Jan 29 15:50 1548773401.tar.gz
-rw-r--r-- 1 100 ftpgroup 3817089 Jan 29 15:51 1548773461.tar.gz
-rw-r--r-- 1 100 ftpgroup 3817083 Jan 29 15:52 1548773521.tar.gz
-rw-r--r-- 1 100 ftpgroup 3817085 Jan 29 15:53 1548773582.tar.gz
-rw-r--r-- 1 100 ftpgroup 3817090 Jan 29 15:54 1548773642.tar.gz
But all I want is to check the timestamp (which is the name of the tar.gz)
How to only get the timestamp list ?
As requested, all I wanted to do is delete old backups, so awk was a good idea (at least it was effective) even it wasn't the right params. My method to delete old backup is probably not the best but it works
ncftpls *authParams* | (awk '{match($9,/^[0-9]+/, a)}{ print a[0] }') | while read fileCreationDate; do
VALIDITY_LIMIT="$((`date +%s`-600))"
a=$VALIDITY_LIMIT
b=$fileCreationDate
if [ $b -lt $a ];then
deleteFtpFile $b
fi
done;
You can use awk to only display the timestamps from the output like so:
ncftpls -l | awk '{ print $5 }'

grep -l does not behave as expected on piping to xargs [duplicate]

This question already has answers here:
How to ignore xargs commands if stdin input is empty?
(7 answers)
Closed 4 years ago.
So I have a command like: grep "\"tool\":\"SEETEST\"" * -l Which works great standalone - it prints out a list of JSON files generated for the selected tool in the current directory.
But then, if I were to pipe it to xargs ls like that:
grep "\"tool\":\"SEETEST\"" * -l | xargs ls -lSh
It prints all the files in the current directory!
How do I make it print just the matched filenames and pipe them to ls sorted by size?
If there are not matches for xargs, then it will list all files in the current directory:
#----------- current files in the directory
mortiz#florida:~/Documents/projects/bash/test$ ls -ltr
total 8
-rw-r--r-- 1 mortiz mortiz 585 Jun 18 12:13 json.example2
-rw-r--r-- 1 mortiz mortiz 574 Jun 18 12:14 json.example
#----------- using your command
mortiz#florida:~/Documents/projects/bash/test$ grep "\"title\": \"example\"" * -l
json.example
#-----------adding xargs to the previous command
mortiz#florida:~/Documents/projects/bash/test$ grep "\"title\": \"example\"" * -l | xargs ls -lSh
-rw-r--r-- 1 mortiz mortiz 574 Jun 18 12:14 json.example
#-----------adding purposely an error on "title"
mortiz#florida:~/Documents/projects/bash/test$ grep "\"titleo\": \"example\"" * -l | xargs ls -lSh
total 8.0K
-rw-r--r-- 1 mortiz mortiz 585 Jun 18 12:13 json.example2
-rw-r--r-- 1 mortiz mortiz 574 Jun 18 12:14 json.example
If you want to use xargs and grep didn't return any match, then add "-r | --no-run-if-empty" that will prevent xargs to list all the files in the current directory:
grep "\"titleo\": \"example\"" * -l | xargs -r ls -lSh

Run a bash command and split the output by line not by space

I want to run the following command and get the output in the variable splited in an array by line not by space:
files=$( hdfs dfs -ls -R $hdfsDir)
So the output I get is the following: echo $files
drwxr-xr-x - pepeuser supergroup 0 2016-05-27 15:03 /user/some/kpi/2015/01/02 -rw-r--r-- 3 pepeuser supergroup 55107934 2016-05-27 15:02 /user/some/kpi/2015/01/02/part-00000902148 -rw-r--r-- 3 pepeuser supergroup 49225279 2016-05-27 15:02 /user/some/kpi/2015/01/02/part-00001902148
When I do a for in $files in stead of getting the full line on each., I get the column in stead of the line. It prints like the following:
drwxr-xr-x
-
pepeuser
supergroup
and what I need on the for to print like this:
drwxr-xr-x - pepeuser supergroup 0 2016-05-27 15:03 /user/some/kpi/2015/01/02
-rw-r--r-- 3 pepeuser supergroup 55107934 2016-05-27 15:02 /user/some/kpi/2015/01/02/part-00000902148
-rw-r--r-- 3 pepeuser supergroup 49225279 2016-05-27 15:02 /user/some/kpi/2015/01/02/part-00001902148
If you have bash 4, you can use readarray:
readarray -t files < <(hdfs dfs -ls -R "$hdfsDir")
Otherwise, use read -a to read into an array. IFS=$'\n' sets the field separator to newlines and -d '' tells it to keep reading until it hits a NUL character: effectively, that means it'll read to EOF.
IFS=$'\n' read -d '' -r -a files < <(hdfs dfs -ls -R "$hdfsDir")
You can verify that the array is populated correctly with something like:
printf '[%s]\n' "${files[#]}"
And can loop over the array with:
for file in "${files[#]}"; do
echo "$file"
done

How to extract date from filename with extenstion using shell script

I tried to extract date from filenames for first two rows only with extension .log
ex: filenames are as follows
my_logFile.txt contains
abc20140916_1.log
abhgg20140914_1.log
abf20140910_1.log
log.abc_abc20140909_1
The code I tried:
awk '{print substr($1,length($1)-3,4)}' my_logFile.txt
But getting op as:
.log
.log
.log
Need op as:
20140916
20140914
*****revised query*
I have a txt file containing n number of log files. Each line in txt file is like this.
-rw-rw-rw- 1 abchost abchost 241315175 Apr 16 10:45 abc20140405_1.log
-rw-rw-rw- 1 abchost abchost 241315175 Apr 16 10:45 aghtff20140404_1.log
-rw-rw-rw- 1 abchost abchost 241315175 Apr 16 10:45 log.pqrs20140403_1
I need to extract date out of file names from only first two rows. Here the filename has varying number of char before date.
The op should beL
20140405
20140404
Will this work to you?
$ head -2 file | grep -Po ' [a-z]+\K[0-9]+(?=.*\.log$)'
20140405
20140404
Explanation
head -2 file gets the first two lines of the file.
grep -Po ' [a-z]+\K[0-9]+(?=.*\.log$)' gets the set of digits in between a block of (space + a-z letters) and (.log + end of line).
try this,
cut -f9 -d " " <file> | grep -o -E "[0-9]{8}"
worked on my machine,
[root#giam20 ~]# cat sample.txt
-rw-rw-rw- 1 abchost abchost 241315175 Apr 16 10:45 abc20140405_1.log
-rw-rw-rw- 1 abchost abchost 241315175 Apr 16 10:45 aghtff20140404_1.log
-rw-rw-rw- 1 abchost abchost 241315175 Apr 16 10:45 log.pqrs20140403_1
[root#giam20 ~]# cut -f9 -d " " sample.txt | grep -o -E "[0-9]{8}"
20140405
20140404
20140403

Resources