Bash: Subshell behaviour of ls - bash

I am wondering why I do not get se same output from:
ls -1 -tF | head -n 1
and
echo $(ls -1 -tF | head -n 1)
I tried to get the last modified file, but using it inside a sub shell sometimes I get more than one file as result?
Why that and how to avoid?

The problem arises because you are using an unquoted subshell and -F flag for ls outputs shell special characters appended to filenames.
-F, --classify
append indicator (one of */=>#|) to entries
Executable files are appended with *.
When you run
echo $(ls -1 -tF | head -n 1)
then
$(ls -1 -tF | head -n 1)
will return a filename, and if it happens to be an executable and also be the prefix to another file, then it will return both.
For example if you have
test.sh
test.sh.backup
then it will return
test.sh*
which when echoed expands to
test.sh test.sh.backup
Quoting the subshell prevents this expansion
echo "$(ls -1 -tF | head -n 1)"
returns
test.sh*

I just found the error:
If you use echo $(ls -1 -tF | head -n 1)
the file globing mechanism may result in additional matches.
So echo "$(ls -1 -tF | head -n 1)" would avoid this.
Because if the result is an executable it contains a * at the end.
I tried to place the why -F in a comment, but now I decided to put it here:
I added the following lines to my .bashrc, to have a shortcut to get last modified files or directories listed:
function L {
myvar=$1; h=${myvar:="1"};
echo "last ${h} modified file(s):";
export L=$(ls -1 -tF|fgrep -v / |head -n ${h}| sed 's/\(\*\|=\|#\)$//g' );
ls -l $L;
}
function LD {
myvar=$1;
h=${myvar:="1"};
echo "last ${h} modified directories:";
export LD=$(ls -1 -tF|fgrep / |head -n $h | sed 's/\(\*\|=\|#\)$//g'); ls -ld $LD;
}
alias ol='L; xdg-open $L'
alias cdl='LD; cd $LD'
So now I can use L (or L 5) to list the last (last 5) modified files. But not directories.
And with L; jmacs $L I can open my editor, to edit it. Traditionally I used my alias lt='ls -lrt' but than I have to retype the name...
Now after mkdir ... I use cdl to change to that dir.

Related

Getting the path to the newest file in a directory with f=$(cd dir | ls -t | head) not honoring "dir"

I would like to get file (zip file) from path with this part of code file=$(cd '/path_to_zip_file' | ls -t | head -1). Instead that I got my .sh file in directory where I am running this file.
Why I can't file from /path_to_zip_file
Below is my code in .sh file
file=$(cd '/path_to_zip_file' | ls -t | head -1)
last_modified=`stat -c "%Y" $file`;
current=`date +%s`
echo $file
if [ $(($current-$last_modified)) -gt 86400 ]; then
echo 'Mail'
else
echo 'No Mail'
fi;
If you were going to use ls -t | head -1 (which you shouldn't), the cd would need to be corrected as a prior command (happening before ls takes place), not a pipeline component (running parallel with ls, with its stdout connected to ls's stdin):
set -o pipefail # otherwise, a failure of ls is ignored so long as head succeeds
file=$(cd '/path_to_zip_file' && ls -t | head -1)
A better-practice approach might look like:
newest_file() {
local result=$1; shift # first, treat our first arg as latest
while (( $# )); do # as long as we have more args...
[[ $1 -nt $result ]] && result=$1 # replace "result" if they're newer
shift # then take them off the argument list
done
[[ -e $result || -L $result ]] || return 1 # fail if no file found
printf '%s\n' "$result" # more reliable than echo
}
newest=$(newest_file /path/to/zip/file/*)
newest=${newest##*/} ## trim the path to get only the filename
printf 'Newest file is: %s\n' "$newest"
To understand the ${newest##*/} syntax, see the bash-hackers' wiki on parameter expansion.
For more on why using ls in scripts (except for output displayed to humans) is dangerous, see ParsingLs.
Bot BashFAQ #99, How do I get the latest (or oldest) file from a directory? -- and BashFAQ #3 (How can I sort or compare files based on some metadata attribute (newest / oldest modification time, size, etc)?) have useful discussion on the larger context in which this question was asked.

Bash - Concatenate files in a directory ordered by date

i need some help with a simple script i m writing. The script takes as input a directory that contains files in the likes of :
FILENAME20160220.TXT
FILENAME20160221.TXT
FILENAME20160222.TXT
...
The script needs to have the directory as input, concatenate them into a new file called :
FILENAME.20160220_20160222.TXT
The above filenames need to have the "Earliest"_"Latest" date it finds. The script i ve written so far is this, but it doesnt produce the necessary output. Can someone help me tinker with it?
declare FILELISTING="FILELISTING.TXT"
declare SOURCEFOLDER="/Cat_test/cat_test/"
declare TEMPFOLDER="/Cat_Test/cat_test/temp/"
# Create temporary folder
cd $SOURCEFOLDER
mkdir $TEMPFOLDER
chk_abnd $?
# Move files into temporary folder
mv *.TXT $SOURCEFOLDER $TEMPFOLDER
chk_abnd $?
# Change directory to temporary folder
cd $TEMPFOLDER
chk_abnd $?
# Iterate through files in temp folder and create temporary listing files
for FILE in $TEMPFOLDER
do
echo $FILE >> $FILELISTING
done
# Iterate through the lines of FILELISTING and store dates into array for sorting
while read lines
do
array[$i] = "${$line:x:y}"
(( i++ ))
done <$FILELISTING
# Sort dates in array
for ((i = 0; i < $n ; i++ ))
do
for ((j = $i; j < $n; j++ ))
do
if [ $array[$i] -gt $array[$j] ]
then
t=${array[i]}
array[$i]=${array[$j]}
array[$j]=$t
fi
done
done
# Get first and last date of array and construct output filename
OT_FILE=FILENAME.${array[1]}_${array[-1]}.txt
# Sort files in folder
# Cat files into one
cat *.ACCT > "$OT_FILE.temp"
chk_abnd $?
# Remove Hex 1A
# tr '\x1A' '' < "$OT_FILE.temp" > $OT_FILE
# Cleanup - Remove File Listing
rm $FILE_LISTING
chk_abnd $?
rm $OT_FILE.temp
chk_abnd $?
Assuming that the base list of your files can be identified using FILENAME*.TXT which is nice and simple, ls can be used to generate an ordered list which will by default be ordered ascending alphabetically and thus (because of the date format you've chosen) in date ascending order.
You can get the earliest and lateest dates as follows:
$ earliest=$( ls -1 FILENAME*.TXT | head -1 | cut -c9-16 )
$ echo $earliest
20160220
$ latest=$( ls -1 FILENAME*.TXT | tail -1 | cut -c9-16 )
$ echo $latest
20160222
Therefore your file name can be produced using:
filename="FILENAME.${earliest}_${latest}.TXT"
And the concatenation should be as simple as:
cat $( ls -1 FILENAME*.TXT ) > ${filename}
though if you are writing to the same directory, you may wish to direct the output first to a temporary name that doesn't meet this pattern and then rename it. Perhaps something like:
earliest=$( ls -1 FILENAME*.TXT | head -1 | cut -c9-16 )
latest=$( ls -1 FILENAME*.TXT | tail -1 | cut -c9-16 )
filename="FILENAME.${earliest}_${latest}.TXT"
cat $( ls -1 FILENAME*.TXT ) > temp_${filename}
mv temp_${filename} ${filename}
Here are some hints, cat does most of the work.
If your filenames have fixed size date fields, as in your example, lexical sorting is enough.
ls -1 FILENAME* > allfiles
aggname=$(cat allfiles | sed -rn '1s/([^0-9]*)/\1./p;$s/[^0-9]*//p' |
paste -sd-)
cat allfiles | xargs cat > $aggname
you can combine the last two steps into one, but more readable this way.
don't reinvent the wheel.

Different behavior in bash string comparison while inside a function

There's the following part in a shell script I'm writing:
(Find the latest directory in someDir which is not summary/)
latestDirName=""
for dirName in `ls -lt /user/someDir/ | head -3 | tail -n +2 | awk '{print $9}'`
do
if [ "$dirName" == "summary" ]; then
continue
fi
latestDirName=$dirName
done
Here $dirname is compared against string summary, while an echo of dirName variable during the iteration will print as summary/
This comparison part works all fine when the code is written in a file and executed.
But once this same code is put inside a function and placed in my bashrc, the comparison in if check doesn't seem to work!
Does this have anything to do with the string being a directory name, or it having the /?
What difference does it make when the same code is inside a function?
Code inside bashrc:
findLatestDir()
{
latestDirName=""
for dirName in `ls -lt /user/someDir/ | head -3 | tail -n +2 | awk '{print $9}'`
do
if [ "$dirName" == "summary" ]; then
continue
fi
latestDirName=$dirName
done
}
The scripts are called as follows:
Case #1 (code in file): $ ./findLatestDir.sh
Case #2 (function in bashrc): $ findLatestDir
Maybe you have an alias or function definition for ls in your .bashrc, which interacts poorly with your use of ls in the function? If so, explicitly saying /bin/ls in the function may solve the problem.
Parsing ls is never a good idea. Use stat instead: print out the epoch modification time, sort numerically in descending order, then find the first dir that is not "summary".
findLatestDir() (
cd /some/dir
stat -c $'%Y\t%n' */ |
sort -rn |
cut -f2 | {
while read dir; do
[[ $dir == "summary/" ]] || break
done
echo $dir
}
)
Note that the bash wildcard */ with the trailing directory limits the results to directories only.
I execute the function in a subshell so the cd command does not affect your current directory.

Different pipeline behavior between sh and ksh

I have isolated the problem to the below code snippet:
Notice below that null string gets assigned to LATEST_FILE_NAME='' when the script is run using ksh; but the script assigns the value to variable $LATEST_FILE_NAME correctly when run using sh. This in turn affects the value of $FILE_LIST_COUNT.
But as the script is in KornShell (ksh), I am not sure what might be causing the issue.
When I comment out the tee command in the below line, the ksh script works fine and correctly assigns the value to variable $LATEST_FILE_NAME.
(cd $SOURCE_FILE_PATH; ls *.txt 2>/dev/null) | sort -r > ${SOURCE_FILE_PATH}/${FILE_LIST} | tee -a $LOG_FILE_PATH
Kindly consider:
1. Source Code: script.sh
#!/usr/bin/ksh
set -vx # Enable debugging
SCRIPTLOGSDIR=/some/path/Scripts/TEST/shell_issue
SOURCE_FILE_PATH=/some/path/Scripts/TEST/shell_issue
# Log file
Timestamp=`date +%Y%m%d%H%M`
LOG_FILENAME="TEST_LOGS_${Timestamp}.log"
LOG_FILE_PATH="${SCRIPTLOGSDIR}/${LOG_FILENAME}"
## Temporary files
FILE_LIST=FILE_LIST.temp #Will store all extract filenames
FILE_LIST_COUNT=0 # Stores total number of files
getFileListDetails(){
rm -f $SOURCE_FILE_PATH/$FILE_LIST 2>&1 | tee -a $LOG_FILE_PATH
# Get list of all files, Sort in reverse order, and store names of the files line-wise. If no files are found, error is muted.
(cd $SOURCE_FILE_PATH; ls *.txt 2>/dev/null) | sort -r > ${SOURCE_FILE_PATH}/${FILE_LIST} | tee -a $LOG_FILE_PATH
if [[ ! -f $SOURCE_FILE_PATH/$FILE_LIST ]]; then
echo "FATAL ERROR - Could not create a temp file for file list.";exit 1;
fi
LATEST_FILE_NAME="$(cd $SOURCE_FILE_PATH; head -1 $FILE_LIST)";
FILE_LIST_COUNT="$(cat $SOURCE_FILE_PATH/$FILE_LIST | wc -l)";
}
getFileListDetails;
exit 0;
2. Output when using shell sh script.sh:
+ getFileListDetails
+ rm -f /some/path/Scripts/TEST/shell_issue/FILE_LIST.temp
+ tee -a /some/path/Scripts/TEST/shell_issue/TEST_LOGS_201304300506.log
+ cd /some/path/Scripts/TEST/shell_issue
+ sort -r
+ tee -a /some/path/Scripts/TEST/shell_issue/TEST_LOGS_201304300506.log
+ ls 1.txt 2.txt 3.txt
+ [[ ! -f /some/path/Scripts/TEST/shell_issue/FILE_LIST.temp ]]
cd $SOURCE_FILE_PATH; head -1 $FILE_LIST
++ cd /some/path/Scripts/TEST/shell_issue
++ head -1 FILE_LIST.temp
+ LATEST_FILE_NAME=3.txt
cat $SOURCE_FILE_PATH/$FILE_LIST | wc -l
++ cat /some/path/Scripts/TEST/shell_issue/FILE_LIST.temp
++ wc -l
+ FILE_LIST_COUNT=3
exit 0;
+ exit 0
3. Output when using ksh ksh script.sh:
+ getFileListDetails
+ tee -a /some/path/Scripts/TEST/shell_issue/TEST_LOGS_201304300507.log
+ rm -f /some/path/Scripts/TEST/shell_issue/FILE_LIST.temp
+ 2>& 1
+ tee -a /some/path/Scripts/TEST/shell_issue/TEST_LOGS_201304300507.log
+ sort -r
+ 1> /some/path/Scripts/TEST/shell_issue/FILE_LIST.temp
+ cd /some/path/Scripts/TEST/shell_issue
+ ls 1.txt 2.txt 3.txt
+ 2> /dev/null
+ [[ ! -f /some/path/Scripts/TEST/shell_issue/FILE_LIST.temp ]]
+ cd /some/path/Scripts/TEST/shell_issue
+ head -1 FILE_LIST.temp
+ LATEST_FILE_NAME=''
+ wc -l
+ cat /some/path/Scripts/TEST/shell_issue/FILE_LIST.temp
+ FILE_LIST_COUNT=0
exit 0;+ exit 0
OK, here goes...this is a tricky and subtle one. The answer lies in how pipelines are implemented. POSIX states that
If the pipeline is not in the background (see Asynchronous Lists), the shell shall wait for the last command specified in the pipeline to complete, and may also wait for all commands to complete.)
Notice the keyword may. Many shells implement this in a way that all commands need to complete, e.g. see the bash manpage:
The shell waits for all commands in the pipeline to terminate before returning a value.
Notice the wording in the ksh manpage:
Each command, except possibly the last, is run as a separate process; the shell waits for the last command to terminate.
In your example, the last command is the tee command. Since there is no input to tee because you redirect stdout to ${SOURCE_FILE_PATH}/${FILE_LIST} in the command before, it immediately exits. Oversimplified speaking, the tee is faster than the earlier redirection, which means that your file is probably not finished writing to by the time you are reading from it. You can test this (this is not a fix!) by adding a sleep at the end of the whole command:
$ ksh -c 'ls /tmp/* | sort -r > /tmp/foo.txt | tee /tmp/bar.txt; echo "[$(head -n 1 /tmp/foo.txt)]"'
[]
$ ksh -c 'ls /tmp/* | sort -r > /tmp/foo.txt | tee /tmp/bar.txt; sleep 0.1; echo "[$(head -n 1 /tmp/foo.txt)]"'
[/tmp/sess_vo93c7h7jp2a49tvmo7lbn6r63]
$ bash -c 'ls /tmp/* | sort -r > /tmp/foo.txt | tee /tmp/bar.txt; echo "[$(head -n 1 /tmp/foo.txt)]"'
[/tmp/sess_vo93c7h7jp2a49tvmo7lbn6r63]
That being said, here are a few other things to consider:
Always quote your variables, especially when dealing with files, to avoid problems with globbing, word splitting (if your path contains spaces) etc.:
do_something "${this_is_my_file}"
head -1 is deprecated, use head -n 1
If you only have one command on a line, the ending semicolon ; is superfluous...just skip it
LATEST_FILE_NAME="$(cd $SOURCE_FILE_PATH; head -1 $FILE_LIST)"
No need to cd into the directory first, just specify the whole path as argument to head:
LATEST_FILE_NAME="$(head -n 1 "${SOURCE_FILE_PATH}/${FILE_LIST}")"
FILE_LIST_COUNT="$(cat $SOURCE_FILE_PATH/$FILE_LIST | wc -l)"
This is called Useless Use Of Cat because the cat is not needed - wc can deal with files. You probably used it because the output of wc -l myfile includes the filename, but you can use e.g. FILE_LIST_COUNT="$(wc -l < "${SOURCE_FILE_PATH}/${FILE_LIST}")" instead.
Furthermore, you will want to read Why you shouldn't parse the output of ls(1) and How can I get the newest (or oldest) file from a directory?.

grep for multiple strings in file on different lines (ie. whole file, not line based search)?

I want to grep for files containing the words Dansk, Svenska or Norsk on any line, with a usable returncode (as I really only like to have the info that the strings are contained, my one-liner goes a little further then this).
I have many files with lines in them like this:
Disc Title: unknown
Title: 01, Length: 01:33:37.000 Chapters: 33, Cells: 31, Audio streams: 04, Subpictures: 20
Subtitle: 01, Language: ar - Arabic, Content: Undefined, Stream id: 0x20,
Subtitle: 02, Language: bg - Bulgarian, Content: Undefined, Stream id: 0x21,
Subtitle: 03, Language: cs - Czech, Content: Undefined, Stream id: 0x22,
Subtitle: 04, Language: da - Dansk, Content: Undefined, Stream id: 0x23,
Subtitle: 05, Language: de - Deutsch, Content: Undefined, Stream id: 0x24,
(...)
Here is the pseudocode of what I want:
for all files in directory;
if file contains "Dansk" AND "Norsk" AND "Svenska" then
then echo the filename
end
What is the best way to do this? Can it be done on one line?
You can use:
grep -l Dansk * | xargs grep -l Norsk | xargs grep -l Svenska
If you want also to find in hidden files:
grep -l Dansk .* | xargs grep -l Norsk | xargs grep -l Svenska
Yet another way using just bash and grep:
For a single file 'test.txt':
grep -q Dansk test.txt && grep -q Norsk test.txt && grep -l Svenska test.txt
Will print test.txt iff the file contains all three (in any combination). The first two greps don't print anything (-q) and the last only prints the file if the other two have passed.
If you want to do it for every file in the directory:
for f in *; do grep -q Dansk $f && grep -q Norsk $f && grep -l Svenska $f; done
grep –irl word1 * | grep –il word2 `cat -` | grep –il word3 `cat -`
-i makes search case insensitive
-r makes file search recursive through folders
-l pipes the list of files with the word found
cat - causes the next grep to look through the files passed to it list.
You can do this really easily with ack:
ack -l 'cats' | ack -xl 'dogs'
-l: return a list of files
-x: take the files from STDIN (the previous search) and only search those files
And you can just keep piping until you get just the files you want.
How to grep for multiple strings in file on different lines (Use the pipe symbol):
for file in *;do
test $(grep -E 'Dansk|Norsk|Svenska' $file | wc -l) -ge 3 && echo $file
done
Notes:
If you use double quotes "" with your grep, you will have to escape the pipe like this: \| to search for Dansk, Norsk and Svenska.
Assumes that one line has only one language.
Walkthrough: http://www.cyberciti.biz/faq/howto-use-grep-command-in-linux-unix/
awk '/Dansk/{a=1}/Norsk/{b=1}/Svenska/{c=1}END{ if (a && b && c) print "0" }'
you can then catch the return value with the shell
if you have Ruby(1.9+)
ruby -0777 -ne 'print if /Dansk/ and /Norsk/ and /Svenka/' file
This searches multiple words in multiple files:
egrep 'abc|xyz' file1 file2 ..filen
Simply:
grep 'word1\|word2\|word3' *
see this post for more info
This is a blending of glenn jackman's and kurumi's answers which allows an arbitrary number of regexes instead of an arbitrary number of fixed words or a fixed set of regexes.
#!/usr/bin/awk -f
# by Dennis Williamson - 2011-01-25
BEGIN {
for (i=ARGC-2; i>=1; i--) {
patterns[ARGV[i]] = 0;
delete ARGV[i];
}
}
{
for (p in patterns)
if ($0 ~ p)
matches[p] = 1
# print # the matching line could be printed
}
END {
for (p in patterns) {
if (matches[p] != 1)
exit 1
}
}
Run it like this:
./multigrep.awk Dansk Norsk Svenska 'Language: .. - A.*c' dvdfile.dat
Here's what worked well for me:
find . -path '*/.svn' -prune -o -type f -exec gawk '/Dansk/{a=1}/Norsk/{b=1}/Svenska/{c=1}END{ if (a && b && c) print FILENAME }' {} \;
./path/to/file1.sh
./another/path/to/file2.txt
./blah/foo.php
If I just wanted to find .sh files with these three, then I could have used:
find . -path '*/.svn' -prune -o -type f -name "*.sh" -exec gawk '/Dansk/{a=1}/Norsk/{b=1}/Svenska/{c=1}END{ if (a && b && c) print FILENAME }' {} \;
./path/to/file1.sh
Expanding on #kurumi's awk answer, here's a bash function:
all_word_search() {
gawk '
BEGIN {
for (i=ARGC-2; i>=1; i--) {
search_terms[ARGV[i]] = 0;
ARGV[i] = ARGV[i+1];
delete ARGV[i+1];
}
}
{
for (i=1;i<=NF; i++)
if ($i in search_terms)
search_terms[$1] = 1
}
END {
for (word in search_terms)
if (search_terms[word] == 0)
exit 1
}
' "$#"
return $?
}
Usage:
if all_word_search Dansk Norsk Svenska filename; then
echo "all words found"
else
echo "not all words found"
fi
I did that with two steps. Make a list of csv files in one file
With a help of this page comments I made two scriptless steps to get what I needed. Just type into terminal:
$ find /csv/file/dir -name '*.csv' > csv_list.txt
$ grep -q Svenska `cat csv_list.txt` && grep -q Norsk `cat csv_list.txt` && grep -l Dansk `cat csv_list.txt`
it did exactly what I needed - print file names containing all three words.
Also mind the symbols like `' "
If you only need two search terms, arguably the most readable approach is to run each search and intersect the results:
comm -12 <(grep -rl word1 . | sort) <(grep -rl word2 . | sort)
If you have git installed
git grep -l --all-match --no-index -e Dansk -e Norsk -e Svenska
The --no-index searches files in the current directory that is not managed by Git. So this command will work in any directory irrespective of whether it is a git repository or not.
I had this problem today, and all one-liners here failed to me because the files contained spaces in the names.
This is what I came up with that worked:
grep -ril <WORD1> | sed 's/.*/"&"/' | xargs grep -il <WORD2>
A simple one-liner in bash for an arbitrary list LIST for file my_file.txt can be:
LIST="Dansk Norsk Svenska"
EVAL=$(echo "$LIST" | sed 's/[^ ]* */grep -q & my_file.txt \&\& /g'); eval "$EVAL echo yes || echo no"
Replacing eval with echo reveals, that the following command is evaluated:
grep -q Dansk my_file.txt && grep -q Norsk my_file.txt && grep -q Svenska my_file.txt && echo yes || echo no

Resources