Listing files in date order with spaces in filenames - bash

I am starting with a file containing a list of hundreds of files (full paths) in a random order. I would like to list the details of the ten latest files in that list. This is my naive attempt:
$ ls -las -t `cat list-of-files.txt` | head -10
That works, so long as none of the files have spaces in, but fails if they do as those files are split up at the spaces and treated as separate files. File "hello world" gives me:
ls: hello: No such file or directory
ls: world: No such file or directory
I have tried quoting the files in the original list-of-files file, but the here-document still splits the files up at the spaces in the filenames, treating the quotes as part of the filenames:
$ ls -las -t `awk '{print "\"" $0 "\""}' list-of-files.txt` | head -10
ls: "hello: No such file or directory
ls: world": No such file or directory
The only way I can think of doing this, is to ls each file individually (using xargs perhaps) and create an intermediate file with the file listings and the date in a sortable order as the first field in each line, then sort that intermediate file. However, that feels a bit cumbersome and inefficient (hundreds of ls commands rather than one or two). But that may be the only way to do it?
Is there any way to pass "ls" a list of files to process, where those files could contain spaces - it seems like it should be simple, but I'm stumped.

Instead of "one or more blank characters", you can force bash to use another field separator:
OIFS=$IFS
IFS=$'\n'
ls -las -t $(cat list-of-files.txt) | head -10
IFS=$OIFS
However, I don't think this code would be more efficient than doing a loop; in addition, that won't work if the number of files in list-of-files.txt exceeds the max number of arguments.

Try this:
xargs -a list-of-files.txt ls -last | head -n 10

I'm not sure whether this will work, but did you try escaping spaces with \? Using sed or something. sed "s/ /\\\\ /g" list-of-files.txt, for example.

This worked for me:
xargs -d\\n ls -last < list-of-files.txt | head -10

Related

Piping the contents of a file to ls

I have a file called "input.txt." that contains one line:
/bin
I would like to make the contents of the file be the input of the command ls
I tried doing
cat input.txt | ls
but it doesn't output the list of files in the /bin directory
I also tried
ls < input.txt
to no avail.
You are looking for the xargs (transpose arguments) command.
xargs ls < input.txt
You say you want /bin to be the "input" to ls, but that's not correct; ls doesn't do anything with its input. Instead, you want /bin to be passed as a command-line argument to ls, as if you had typed ls /bin.
Input and arguments are completely different things; feeding text to a command as input is not the same as supplying that text as an argument. The difference can be blurred by the fact that many commands, such as cat, will operate on either their input or their arguments (or both) – but even there, we find an important distinction: what they actually operate on is the content of files whose names are passed as arguments.
The xargs command was specifically designed to transform between those two things: it interprets its input as a whitespace-separated list of command-line arguments to pass to some other command. That other command is supplied to xargs as its command-line argument(s), in this case ls.
Thanks to the input redirection provided by the shell via <, the arguments xargs supplies to ls here come from the input.txt file.
There are other ways of accomplishing the same thing; for instance, as long as input.txt does not have so many files in it that they won't fit in a single command line, you can just do this:
ls $(< input.txt)
Both the above command and the xargs version will treat any spaces in the input.txt file as separating filenames, so if you have filenames containing space characters, you'll have to do more work to interpret the file properly. Also, note that if any of the filenames contain wildcard/"glob" characters like ? or * or [...], the $(<...) version will expand them as wildcard patterns, while xargs will not.
ls takes the filenames from its command line, not its standard input, which | ls and ls < file would use.
If you have only one file listed in input.txt and the filename doesn't contain trailing newlines, it's enough to use (note quotes):
ls "$(cat input.txt)"
Or in almost all but plain POSIX shell:
ls "$(< input.txt)"
If there are many filenames in the file, you'd want to use xargs, but to deal with whitespace in the names, use -d "\n" (with GNU xargs) to take each line as a filename.
xargs -d "\n" ls < input.txt
Or, if you need to handle filenames with newlines, you can separate them using NUL bytes in the input, and use
xargs -0 ls < input.txt
(This also works even if there's only one filename.)
Try xargs
cat file | xargs ls
Ohhh man, I have to put this to get 30 characters long ;)

Print list of files in a directory to a text file (but not the text file itself) from terminal

I would like to print all the filenames of every file in a directory to a .txt file.
Let's assume that I had a directory with 3 files:
file1.txt
file2.txt
file3.txt
and I tried using ls > output.txt.
The thing is that when I open output.txt I find this list:
file1.txt
file2.txt
file3.txt
output.txt
Is there a way to avoid printing the name of the file where I'm redirecting the output? Or better is there a command able to print all the filenames of files in a directory except one?
printf '%s\n' * > output.txt
Note that this assumes that there's no preexisting output.txt file -
if so, delete it first.
printf '%s\n' * uses globbing (filename expansion) to robustly print the names of all files and subdirectories located in the current directory, line by line.
Globbing happens before output.txt is created via output redirection > output.txt (which still happens before the command is executed, which explains your problem), so its name is not included in the output.
Globbing also avoids the use of ls, whose use in scripting is generally discouraged.
In general, it is not good to parse the output of ls, especially while writing production quality scripts that need to be in good standing for a long time. See this page to find out why: Don't parse ls output
In your example, output.txt is a part of the output in ls > output.txt because shell arranges the redirection (to output.txt) before running ls.
The simplest way to get the right behavior for your case would be:
ls file*txt > output.txt # as long as you are looking for files named that way
or, store the output in a hidden file (or in a normal file in some other directory) and then move it to the final place:
ls > .output.txt && mv .output.txt output.txt
A more generic solution would be using grep -v:
ls | grep -vFx output.txt > output.txt
Or, you can use an array:
files=( "$(ls)" )
printf '%s\n' "${files[#]}" > output.txt
ls has an ignore option and we can use find command also.
Using ls with ignore option
ls -I "output.txt" > output.txt
ls --ignore "output.txt" > output.txt
-I, --ignore are same. This option says, as in the man page, do not list implied entries matching shell PATTERN.
Using find
find \! -name "output.txt" > output.txt
-name option in find finds files/directories whose name match the pattern.
! -name excludes whose name match the pattern.
find \! -name "output.txt" -printf '%P\n' > output.txt
%P strips the path and gives only names.
The most safe way, without assuming anything about the file names, is to use bash arrays (in memory) or a temporary file. A temporary file does not need memory, so it may be even safer. Something like:
#!/bin/bash
tmp=$(tempfile)
ls > $tmp
mv $tmp output.txt
Using ls and awk commands you can get the correct output.
ls -ltr | awk '/txt/ {print $9}' > output.txt
This will print only filenames.
My way would be like:
ls *.txt > output.txt
Note that shell will always expand all globs before running it. In your specific case, the glob expansion process goes like:
# "ls *.txt > output.txt" will be expanded as
ls file1.txt file2.txt file3.txt > output.txt
The reason why you get "output.txt" in your final output file is that redirection actually works among all connected programs SIMULTANEOUSLY.
That means the redirection process does not occur at the end of the program ls, but happens each time ls yields a line of output. In your case, when ls finishing yield the very first line, the file "output.txt" would be created, which will finally be return by ls anyway.

To show only file name without the entire directory path

ls /home/user/new/*.txt prints all txt files in that directory. However it prints the output as follows:
[me#comp]$ ls /home/user/new/*.txt
/home/user/new/file1.txt /home/user/new/file2.txt /home/user/new/file3.txt
and so on.
I want to run the ls command not from the /home/user/new/ directory thus I have to give the full directory name, yet I want the output to be only as
[me#comp]$ ls /home/user/new/*.txt
file1.txt file2.txt file3.txt
I don't want the entire path. Only filename is needed. This issues has to be solved using ls command, as its output is meant for another program.
ls whateveryouwant | xargs -n 1 basename
Does that work for you?
Otherwise you can (cd /the/directory && ls) (yes, parentheses intended)
No need for Xargs and all , ls is more than enough.
ls -1 *.txt
displays row wise
There are several ways you can achieve this. One would be something like:
for filepath in /path/to/dir/*
do
filename=$(basename $filepath)
... whatever you want to do with the file here
done
Use the basename command:
basename /home/user/new/*.txt
(cd dir && ls)
will only output filenames in dir. Use ls -1 if you want one per line.
(Changed ; to && as per Sactiw's comment).
you could add an sed script to your commandline:
ls /home/user/new/*.txt | sed -r 's/^.+\///'
A fancy way to solve it is by using twice "rev" and "cut":
find ./ -name "*.txt" | rev | cut -d '/' -f1 | rev
The selected answer did not work for me, as I had spaces, quotes and other strange characters in my filenames. To quote the input for basename, you should use:
ls /path/to/my/directory | xargs -n1 -I{} basename "{}"
This is guaranteed to work, regardless of what the files are called.
I prefer the base name which is already answered by fge.
Another way is :
ls /home/user/new/*.txt|awk -F"/" '{print $NF}'
one more ugly way is :
ls /home/user/new/*.txt| perl -pe 's/\//\n/g'|tail -1
just hoping to be helpful to someone as old problems seem to come back every now and again and I always find good tips here.
My problem was to list in a text file all the names of the "*.txt" files in a certain directory without path and without extension from a Datastage 7.5 sequence.
The solution we used is:
ls /home/user/new/*.txt | xargs -n 1 basename | cut -d '.' -f1 > name_list.txt
There are lots of way we can do that and simply you can try following.
ls /home/user/new | tr '\n' '\n' | grep .txt
Another method:
cd /home/user/new && ls *.txt
Here is another way:
ls -1 /home/user/new/*.txt|rev|cut -d'/' -f1|rev
You could also pipe to grep and pull everything after the last forward slash. It looks goofy, but I think a defensive grep should be fine unless (like some kind of maniac) you have forward slashes within your filenames.
ls folderpathwithcriteria | grep -P -o -e "[^/]*$"
When you want to list names in a path but they have different file extensions.
me#server:/var/backups$ ls -1 *.zip && ls -1 *.gz

Get the newest file based on timestamp

I am new to shell scripting so i need some help need how to go about with this problem.
I have a directory which contains files in the following format. The files are in a diretory called /incoming/external/data
AA_20100806.dat
AA_20100807.dat
AA_20100808.dat
AA_20100809.dat
AA_20100810.dat
AA_20100811.dat
AA_20100812.dat
As you can see the filename of the file includes a timestamp. i.e. [RANGE]_[YYYYMMDD].dat
What i need to do is find out which of these files has the newest date using the timestamp on the filename not the system timestamp and store the filename in a variable and move it to another directory and move the rest to a different directory.
For those who just want an answer, here it is:
ls | sort -n -t _ -k 2 | tail -1
Here's the thought process that led me here.
I'm going to assume the [RANGE] portion could be anything.
Start with what we know.
Working Directory: /incoming/external/data
Format of the Files: [RANGE]_[YYYYMMDD].dat
We need to find the most recent [YYYYMMDD] file in the directory, and we need to store that filename.
Available tools (I'm only listing the relevant tools for this problem ... identifying them becomes easier with practice):
ls
sed
awk (or nawk)
sort
tail
I guess we don't need sed, since we can work with the entire output of ls command. Using ls, awk, sort, and tail we can get the correct file like so (bear in mind that you'll have to check the syntax against what your OS will accept):
NEWESTFILE=`ls | awk -F_ '{print $1 $2}' | sort -n -k 2,2 | tail -1`
Then it's just a matter of putting the underscore back in, which shouldn't be too hard.
EDIT: I had a little time, so I got around to fixing the command, at least for use in Solaris.
Here's the convoluted first pass (this assumes that ALL files in the directory are in the same format: [RANGE]_[yyyymmdd].dat). I'm betting there are better ways to do this, but this works with my own test data (in fact, I found a better way just now; see below):
ls | awk -F_ '{print $1 " " $2}' | sort -n -k 2 | tail -1 | sed 's/ /_/'
... while writing this out, I discovered that you can just do this:
ls | sort -n -t _ -k 2 | tail -1
I'll break it down into parts.
ls
Simple enough ... gets the directory listing, just filenames. Now I can pipe that into the next command.
awk -F_ '{print $1 " " $2}'
This is the AWK command. it allows you to take an input line and modify it in a specific way. Here, all I'm doing is specifying that awk should break the input wherever there is an underscord (_). I do this with the -F option. This gives me two halves of each filename. I then tell awk to output the first half ($1), followed by a space (" ")
, followed by the second half ($2). Note that the space was the part that was missing from my initial suggestion. Also, this is unnecessary, since you can specify a separator in the sort command below.
Now the output is split into [RANGE] [yyyymmdd].dat on each line. Now we can sort this:
sort -n -k 2
This takes the input and sorts it based on the 2nd field. The sort command uses whitespace as a separator by default. While writing this update, I found the documentation for sort, which allows you to specify the separator, so AWK and SED are unnecessary. Take the ls and pipe it through the following sort:
sort -n -t _ -k 2
This achieves the same result. Now you only want the last file, so:
tail -1
If you used awk to separate the file (which is just adding extra complexity, so don't do it sheepish), you can replace the space with an underscore again with sed:
sed 's/ /_/'
Some good info here, but I'm sure most people aren't going to read down to the bottom like this.
This should work:
newest=$(ls | sort -t _ -k 2,2 | tail -n 1)
others=($(ls | sort -t _ -k 2,2 | head -n -1))
mv "$newest" newdir
mv "${others[#]}" otherdir
It won't work if there are spaces in the filenames although you could modify the IFS variable to affect that.
Try:
$ ls -lr
Hope it helps.
Use:
ls -r -1 AA_*.dat | head -n 1
(assuming there are no other files matching AA_*.dat)
ls -1 AA* |sort -r|tail -1
Due to the naming convention of the files, alphabetical order is the same as date order. I'm pretty sure that in bash '*' expands out alphabetically (but can not find any evidence in the manual page), ls certainly does, so the file with the newest date, would be the last one alphabetically.
Therefore, in bash
mv $(ls | tail -1) first-directory
mv * second-directory
Should do the trick.
If you want to be more specific about the choice of file, then replace * with something else - for example AA_*.dat
My solution to this is similar to others, but a little simpler.
ls -tr | tail -1
What is actually does is to rely on ls to sort the output, then uses tail to get the last listed file name.
This solution will not work if the filename you require has a leading dot (e.g. .profile).
This solution does work if the file name contains a space.

Use lines in a file as filenames for grep?

I have a file which contains filenames (and the full path to them) and I want to search for a word within all of them.
some pseudo-code to explain:
grep keyword <all files specified in files.txt>
or
cat files.txt > grep keyword
cat files txt | grep keyword
the problem is that I can only get grep to search the filenames, not the contents of the actual files.
cat files.txt | xargs grep keyword
or
grep keyword `cat files.txt`
or (equivalent to previous but harder to mis-read)
grep keyword $(cat files.txt)
should do the trick.
Pitfalls:
If files.txt contains file names with spaces, either solution will malfunction, because "This is a filename.txt" will be interpreted as four files, "This", "is", "a", and "filename.txt". A good reason why you shouldn't have spaces in your filenames, ever.
There are ways around this, but none of them is trivial. (find ... -print0 / xargs -0 is one of them.)
The second (cat) version can result in a very long command line (which might fail when exceeding the limits of your environment). The first (xargs) version handles long input automatically; xargs offers several options to control the details.
Both of the answers from DevSolar work (tested on Linux Ubuntu), but the xargs version is preferable if there may be many files, since it will avoid running into command line length limits.
so:
cat files.txt | xargs grep keyword
is the way to go
tr '\n' '\0' <files.txt | LANG=C xargs -r0 grep -F keyword
tr will delimit names with NUL character so that spaces not significant (note the corresponding -0 option to xargs).
xargs -r will start a single grep process for a "large" number of files, but not start any grep process if there are no files.
LANG=C means use quick routines for matching, rather than slow locale ones
grep -F means use quick string matching rather than slow regular expression matching
bash, ksh & zsh version:
grep keyword $(<files.txt)
Long time when last created a bash shell script, but you could store the result of the first grep (the one finding all filenames) in an array and iterate over it, issuing even more grep commands.
A good starting point should be the bash scripting guide.

Resources