Find text files in bash - bash

I have a bunch of email's as text files in multiple directories under one directory. I am trying to write a script where I would type in a date as 3 separate command line arguments like so:
findemail 2015 20 04
the format being yyyy/dd/mm and it would bring up all filenames for emails that were sent on that day. I am unsure where to start for this though. I figured I could use find possibly but I am new to scripting so I am unsure. Any help would be greatly appreciated!
The timestamp in the email looks like:
TimeStamp: 02/01/2004 at 11:19:02 (still in the same format as the input)

grep -lr "$(printf "^TimeStamp: %02i/%02i/%04i" "$2" "$1" "$3")" path/to/directory
The regex looks for mm/dd/yyyy; swap the order of $1 and $2 if you want the more sensible European date order.
The command substitution $(command ...) runs command ... and substitutes its output into the command line which contains the command substitution. So we use a subshell which runs printf to create the regex argument to grep.
The -l option says to list the names of matching files; the -r option says to traverse a set of directories recursively. (If your grep is too pedestrian to have the -r option, it's certainly not hard to concoct a find expression which does the same. See e.g. here.)

The easiest thing to do would be to use a search utility such as grep. Grep has a very useful recursive option that allows searching for a string in all the files in a directory (and subdirectories) that's easy to use.
Assuming you have your timestamp in a variable called timestamp, then this would return a list of filenames that contain the timestamp:
grep -lr $timestamp /Your/Main/Directory/Goes/Here
EDIT: To clarify, this would only search for the exact string, so it needs to be in the exact same format as in the searched text.

Related

How to sort files by modified timestamp in unix for a shellscript to pick them one at a time

I am writing a shell script that picks one file at a time and processes them.
I want the script to pick files in the ascending order of their modified time.
I used the code below to pick .csv files with a particular filename pattern.
for file in /filepath/file*.csv
do
#mystuff
done
But I expect the script to pick .csv files according to the ascending order of their modified time. Please suggest.
Thanks in advance
If you are sure the file names don't contain any "strange" characters, e.g. newline, you could use the sorting capability of ls and read the output with a while read... loop. This will also work for file names that contain spaces.
ls -tr1 /filepath/file*.csv | while read -r file
do
mystuff "$file"
done
Note this solution should be preferred over something like
for file in $(ls -tr /filepath/file*.csv) ...
because this will fail if you have a file name that contains a space due to the word-splitting involved here.
You can return the results of ls -t as an array. (-t sorts by modified time)
csvs=($(ls -t /filepath/file*.csv))
Then apply your for loop.
for file in $csvs
do
#mystuff
done
with your loop "for"
for file in "`ls -tr /filepath/file*.csv`"
do
mystuff $file
done

Sort files in directory then execute command on each one of them

I have a directory containing files numbered like this
1>chr1:2111-1111_mask.txt
1>chr1:2111-1111_mask2.txt
1>chr1:2111-1111_mask3.txt
2>chr2:345-678_mask.txt
2>chr2:345-678_mask2.txt
2>chr2:345-678_mask3.txt
100>chr19:444-555_mask.txt
100>chr19:444-555_mask2.txt
100>chr19:444-555_mask3.txt
each file contains a name like >chr1:2111-1111 in the first line and a series of characters in the second line.
I need to sort files in this directory numerically using the number before the > as guide, the execute the command for each one of the files with _mask3 and using.
I have this code
ls ./"$INPUT"_temp/*_mask3.txt | sort -n | for f in ./"$INPUT"_temp/*_mask3.txt
do
read FILE
Do something with each file and list the results in output file including the name of the string
done
It works, but when I check the list of the strings inside the output file they are like this
>chr19:444-555
>chr1:2111-1111
>chr2:345-678
why?
So... I'm not sure what "Works" here like your question stated.
It seems like you have two problems.
Your files are not in sorted order
The file names have the leading digits removed
Addressing 1, your command ls ./"$INPUT"_temp/*_mask3.txt | sort -n | for f in ./"$INPUT"_temp/*_mask3.txt here doesn't make a whole lot of sense. You are getting a list of files from ls, and then piping that to sort. That probably gives you the output you are looking for, but then you pipe that to for, which doesn't make any sense.
In fact you can rewrite your entire script to
for f in ./"$INPUT"_temp/*_mask3.txt
do
read FILE
Do something with each file and list the results in output file including the name of the string
done
And you'll have the exact same output. To get this sorted you could do something like:
for f in `ls ./"$INPUT"_temp/*_mask3.txt | sort -n`
do
read FILE
Do something with each file and list the results in output file including the name of the string
done
As for the unexpected truncation, that > character in your file name is important in your bash shell since it directs the stdout of the preceding command to a specified file. You'll need to insure that when you use variable $f from your loop that you stick quotes around that thing to keep bash from misinterpreting the file name a command > file type of thing.

How to do a loop with filenames in shellscript ordered by 'last modified'?

i'm new in Linux, and i've been trying to run a script which processes all the files in a folder using ImageMagick's convert (I rather do this task in Shell than using mogrify, because as far as I know it doesn't save different files). The files have to be processed in 'last modified' order, so I used this code:
for file in `ls -1tr {*.jpg,*.png}`; do
# imagemagick processes with the filename...
done
This code breaks for files with spaces, and according to this answer using ls is wrong for these purposes.
I also tried this solution from this response, but apparently I got it totally wrong (It raised an 'ambiguous redirect' error) and I decided I needed help.
while read LINE; do
...
done `(ls -1tr {*.png,*.jpg}`
So how do I get an ordered list of filenames for a loop? (It doesn't necessarily have to be a FOR...IN loop, it can be a WHILE, or anything.)
try this :
for file in `ls -1tr {*.jpg,*.png} | awk '{print $9}'`; do
# imagemagick processes with the filename...
done
ls -lrth give's 9 coulmns in output, out of which you need only 9th column(file names), you can get that using awk
If you have space seperated filenames, modify awk print to print all data after 9th column

Iterate through list of filenames in order they were created in bash

Parsing output of ls to iterate through list of files is bad. So how should I go about iterating through list of files in order by which they were first created? I browsed several questions here on SO and they all seem to parsing ls.
The embedded link suggests:
Things get more difficult if you wanted some specific sorting that
only ls can do, such as ordering by mtime. If you want the oldest or
newest file in a directory, don't use ls -t | head -1 -- read Bash FAQ
99 instead. If you truly need a list of all the files in a directory
in order by mtime so that you can process them in sequence, switch to
perl, and have your perl program do its own directory opening and
sorting. Then do the processing in the perl program, or -- worst case
scenario -- have the perl program spit out the filenames with NUL
delimiters.
Even better, put the modification time in the filename, in YYYYMMDD
format, so that glob order is also mtime order. Then you don't need ls
or perl or anything. (The vast majority of cases where people want the
oldest or newest file in a directory can be solved just by doing
this.)
Does that mean there is no native way of doing it in bash? I don't have the liberty to modify the filename to include the time in them. I need to schedule a script in cron that would run every 5 minutes, generate an array containing all the files in a particular directory ordered by their creation time and perform some actions on the filenames and move them to another location.
The following worked but only because I don't have funny filenames. The files are created by a server so it will never have special characters, spaces, newlines etc.
files=( $(ls -1tr) )
I can write a perl script that would do what I need but I would appreciate if someone can suggest the right way to do it in bash. Portable option would be great but solution using latest GNU utilities will not be a problem either.
sorthelper=();
for file in *; do
# We need something that can easily be sorted.
# Here, we use "<date><filename>".
# Note that this works with any special characters in filenames
sorthelper+=("$(stat -n -f "%Sm%N" -t "%Y%m%d%H%M%S" -- "$file")"); # Mac OS X only
# or
sorthelper+=("$(stat --printf "%Y %n" -- "$file")"); # Linux only
done;
sorted=();
while read -d $'\0' elem; do
# this strips away the first 14 characters (<date>)
sorted+=("${elem:14}");
done < <(printf '%s\0' "${sorthelper[#]}" | sort -z)
for file in "${sorted[#]}"; do
# do your stuff...
echo "$file";
done;
Other than sort and stat, all commands are actual native Bash commands (builtins)*. If you really want, you can implement your own sort using Bash builtins only, but I see no way of getting rid of stat.
The important parts are read -d $'\0', printf '%s\0' and sort -z. All these commands are used with their null-delimiter options, which means that any filename can be procesed safely. Also, the use of double-quotes in "$file" and "${anarray[*]}" is essential.
*Many people feel that the GNU tools are somehow part of Bash, but technically they're not. So, stat and sort are just as non-native as perl.
With all of the cautions and warnings against using ls to parse a directory notwithstanding, we have all found ourselves in this situation. If you do find yourself needing sorted directory input, then about the cleanest use of ls to feed your loop is ls -opts | read -r name; do... This will handle spaces in filenames, etc.. without requiring a reset of IFS due to the nature of read itself. Example:
ls -1rt | while read -r fname; do # where '1' is ONE not little 'L'
So do look for cleaner solutions avoiding ls, but if push comes to shove, ls -opts can be used sparingly without the sky falling or dragons plucking your eyes out.
let me add the disclaimer to keep everyone happy. If you like newlines inside your filenames -- then do not use ls to populate a loop. If you do not have newlines inside your filenames, there are no other adverse side-effects.
Contra: TLDP Bash Howto Intro:
#!/bin/bash
for i in $( ls ); do
echo item: $i
done
It appears that SO users do not know what the use of contra means -- please look it up before downvoting.
You can try using use stat command piped with sort:
stat -c '%Y %n' * | sort -t ' ' -nk1 | cut -d ' ' -f2-
Update: To deal with filename with newlines we can use %N format in stat andInstead of cut we can use awk like this:
LANG=C stat -c '%Y^A%N' *| sort -t '^A' -nk1| awk -F '^A' '{print substr($2,2,length($2)-2)}'
Use of LANG=C is needed to make sure stat uses single quotes only in quoting file names.
^A is conrtrol-A character typed using ControlVA keys together.
How about a solution with GNU find + sed + sort?
As long as there are no newlines in the file name, this should work:
find . -type f -printf '%T# %p\n' | sort -k 1nr | sed 's/^[^ ]* //'
It may be a little more work to ensure it is installed (it may already be, though), but using zsh instead of bash for this script makes a lot of sense. The filename globbing capabilities are much richer, while still using a sh-like language.
files=( *(oc) )
will create an array whose entries are all the file names in the current directory, but sorted by change time. (Use a capital O instead to reverse the sort order). This will include directories, but you can limit the match to regular files (similar to the -type f predicate to find):
files=( *(.oc) )
find is needed far less often in zsh scripts, because most of its uses are covered by the various glob flags and qualifiers available.
I've just found a way to do it with bash and ls (GNU).
Suppose you want to iterate through the filenames sorted by modification time (-t):
while read -r fname; do
fname=${fname:1:((${#fname}-2))} # remove the leading and trailing "
fname=${fname//\\\"/\"} # removed the \ before any embedded "
fname=$(echo -e "$fname") # interpret the escaped characters
file "$fname" # replace (YOU) `file` with anything
done < <(ls -At --quoting-style=c)
Explanation
Given some filenames with special characters, this is the ls output:
$ ls -A
filename with spaces .hidden_filename filename?with_a_tab filename?with_a_newline filename_"with_double_quotes"
$ ls -At --quoting-style=c
".hidden_filename" " filename with spaces " "filename_\"with_double_quotes\"" "filename\nwith_a_newline" "filename\twith_a_tab"
So you have to process a little each filename to get the actual one. Recalling:
${fname:1:((${#fname}-2))} # remove the leading and trailing "
# ".hidden_filename" -> .hidden_filename
${fname//\\\"/\"} # removed the \ before any embedded "
# filename_\"with_double_quotes\" -> filename_"with_double_quotes"
$(echo -e "$fname") # interpret the escaped characters
# filename\twith_a_tab -> filename with_a_tab
Example
$ ./script.sh
.hidden_filename: empty
filename with spaces : empty
filename_"with_double_quotes": empty
filename
with_a_newline: empty
filename with_a_tab: empty
As seen, file (or the command you want) interprets well each filename.
Each file has three timestamps:
Access time: the file was opened and read. Also known as atime.
Modification time: the file was written to. Also known as mtime.
Inode modification time: the file's status was changed, such as the file had a new hard link created, or an existing one removed; or if the file's permissions were chmod-ed, or a few other things. Also known as ctime.
Neither one represents the time the file was created, that information is not saved anywhere. At file creation time, all three timestamps are initialized, and then each one gets updated appropriately, when the file is read, or written to, or when a file's permissions are chmoded, or a hard link created or destroyed.
So, you can't really list the files according to their file creation time, because the file creation time isn't saved anywhere. The closest match would be the inode modification time.
See the descriptions of the -t, -u, -c, and -r options in the ls(1) man page for more information on how to list files in atime, mtime, or ctime order.
Here's a way using stat with an associative array.
n=0
declare -A arr
for file in *; do
# modified=$(stat -f "%m" "$file") # For use with BSD/OS X
modified=$(stat -c "%Y" "$file") # For use with GNU/Linux
# Ensure stat timestamp is unique
if [[ $modified == *"${!arr[#]}"* ]]; then
modified=${modified}.$n
((n++))
fi
arr[$modified]="$file"
done
files=()
for index in $(IFS=$'\n'; echo "${!arr[*]}" | sort -n); do
files+=("${arr[$index]}")
done
Since sort sorts lines, $(IFS=$'\n'; echo "${!arr[*]}" | sort -n) ensures the indices of the associative array get sorted by setting the field separator in the subshell to a newline.
The quoting at arr[$modified]="${file}" and files+=("${arr[$index]}") ensures that file names with caveats like a newline are preserved.

Using ssh remote plus grep

I'm running a shell script like above:
vQtde=`ssh user#server 'ls -lrt /mnta2/gvt/Interfaces/output/BI/sent/*.?${vDiaAnterior}* | grep "${vMDAtual}0[345678]:" |wc -l'`
And the return is on error: ksh: /usr/bin/sh: arg list too long
I know that the same script in local server return 9, how can I escape "" in remote grep ?
The variables are:
vDiaAtual=`date +%d`
vMesAtual=`date +%b`
vMDAtual=" $vMesAtual $vDiaAtual ";
vDiaAnterior=120614
The problem here is not with grep. The problem is following: the argument /mnta2/gvt/Interfaces/output/BI/sent/*.?${vDiaAnterior}* is expanded by shell (by ksh in the case) and the resulting list is too big.
It would be better to do simply ls -lrt /mnta2/gvt/Interfaces/output/BI/sent/ and then add additional grep after it.
Something like:
ls -lrt /mnta2/gvt/Interfaces/output/BI/sent/ | grep "\..${vDiaAnterior}" | grep ...
Based on information regarding that error message, I'm not sure if escaping the quotes is the real issue here.
What is it that you're ultimately trying to do? There's probably a slightly different way to approach it that avoids this problem. It appears that you're trying to count the number of files with a certain "last modified" date. Is this accurate? If so, I highly recommend against using the output of ls to do that. The output is inconsistent between platforms and can even change between versions. The find utility is much better suited for this sort of thing.
Try something like this instead:
dir=/mnta2/gvt/Interfaces/output/BI/sent/
pattern="*.?${vDiaAnterior}*"
time= # Fill this in based on the "last modified" time that you're looking for
find $dir -iname "$pattern" -mtime $time -exec printf '.' \; | wc -c
You can omit using the extra variables, they're only there to make the code more readable on the webpage.
This will search the given directory for all files with names that match the specified wildcard pattern and with "last modified" times that match whatever you specify. For each match found, the code printf '.' (which prints one dot to stdout) will be run. wc then counts the number of dot characters, which will be equal to the number of matching files found. The benefit of this method is that it minimizes the amount of data that needs to be piped between programs (including between the shell and ls). find handles the wildcard matching internally instead of requiring the shell to expand the wildcard and pass the result to ls. You're also only sending one character per matching file to wc instead of one long line of ls output per match. That should reduce the chances that you encounter the "arg list too long" error.
I resolved the problem with this ways:
- Create a file .sh in local server receiving a parameters:
#!/usr/local/bin/bash
vDiaAnterior="${1}";
vMDAtual="${2}";
ls -l /mnta2/gvt/Interfaces/output/BI/sent/*.?${vDiaAnterior}AMA | grep "${vMDAtual}[345678]:" | wc -l;
Call remote :
ssh user#server ". /mnta1/prod_med1/scriptsf/ver_jobs_3_horas.sh $vDiaAnterior '$vMDAtual'"
Result: 9 Files.
Best Regards,
Cauca

Resources