Iterating over parameter list when too many arguments in KSH - shell

Sometimes I have to iterate over all the files in a directory to find something and for that the usual for i in $(ls *.txt) would generally work. But there are cases when you have too many files in the folder and for yields 0403-027 The parameter list is too long. (that is, for, diff, ls, or whatever).
I have found one solution for that is reading the input line by line with a while read but then it comes the tricky part. At first, I thought the ideal would be something like:
while read file ; do
# do something with file
done < $(find . -type f -name *.txt)
But that returns a single line, filled with ^J as separators (weird?) and, of course there will be no such file. Changing IFS to \n didn't work either.
My current workaround is building a temporary file with all the files I'm interested in and then using the while:
tmpfile=$$.$(date +'%Y%m%d%k%M%S').tmp
find . -type f -name *.txt > $tmpfile
while read file ; do
# do something with file
done < $tmpfile ; rm $tmpfile
But that doesn't feels right, and so much more code than the first option.
Could someone tell me the right way to execute the first loop?
Thanks!

You need process substitution, not command substitution in this situation:
while IFS= read -r file ; do
    # do something with file
done < <(find . -type f -name *.txt)
A <() process substitution basically acts like a file, which you can redirect into the while-loop.

Related

how list just one file from a (bash) shell directory listing

A bit lowly a query but here goes:
bash shell script. POSIX, Mint 21
I just want one/any (mp3) file from a directory. As a sample.
In normal execution, a full run, the code would be such
for f in *.mp3 do
#statements
done
This works fine but if I wanted to sample just one file of such an array/glob (?) without looping, how might I do that? I don't care which file, just that it is an mp3 from the directory I am working in.
Should I just start this for-loop and then exit(break) after one statement, or is there a neater way more tailored-for-the-job way?
for f in *.mp3 do
#statement
break
done
Ta (can not believe how dopey I feel asking this one, my forehead will hurt when I see the answers )
Since you are using Linux (Mint) you've got GNU find so one way to get one .mp3 file from the current directory is:
mp3file=$(find . -maxdepth 1 -mindepth 1 -name '*.mp3' -printf '%f' -quit)
-maxdepth 1 -mindepth 1 causes the search to be restricted to one level under the current directory.
-printf '%f' prints just the filename (e.g. foo.mp3). The -print option would print the path to the filename (e.g. ./foo.mp3). That may not matter to you.
-quit causes find to exit as soon as one match is found and printed.
Another option is to use the Bash : (colon) command and $_ (dollar underscore) special variable:
: *.mp3
mp3file=$_
: *.mp3 runs the : command with the list of .mp3 files in the current directory as arguments. The : command ignores its arguments and does nothing.
mp3file=$_ sets the value of the mp3file variable to the last argument supplied to the previous command (:).
The second option should not be used if the number of .mp3 files is large (hundreds or more) because it will find all of the files and sort them by name internally.
In both cases $mp3file should be checked to ensure that it really exists (e.g. [[ -e $mp3file ]]) before using it for anything else, in case there are no .mp3 files in the directory.
I would do it like this in POSIX shell:
mp3file=
for f in *.mp3; do
if [ -f "$f" ]; then
mp3file=$f
break
fi
done
# At this point, the variable mp3file contains a filename which
# represents a regular file (or a symbolic link) with the .mp3
# extension, or empty string if there is no such a file.
The fact that you use
for f in *.mp3 do
suggests to me, that the MP3s are named without to much strange characters in the filename.
In that case, if you really don't care which MP3, you could:
f=$(ls *.mp3|head)
statement
Or, if you want a different one every time:
f=$(ls *.mp3|sort -R | tail -1)
Note: if your filenames get more complicated (including spaces or other special characters), this will not work anymore.
Assuming you don't have spaces in your filenames, (and I don't understand why the collective taboo is against using ls in scripts at all, rather than not having spaces in filenames, personally) then:-
ls *.mp3 | tr ' ' '\n' | sed -n '1p'

/bin/ls: Argument list too long

Attempting to convert a twitter account of over 10K tweets into another format with a bash script on a maxed out MBP 16" running the latest macOS.
After running for several minutes outputting many periods it says, line 43: /bin/ls: Argument list too long. Assuming this issue relates to the number of tweets so while I could attempt to break into small pieces as a last resort, not knowing what the max number to avoid the error is, decided to first search for a solution.
Searched Google and SO and found, "bash: /bin/ls: Argument list too long". If my issue is the same it sounds like replacing "ls" with "find -name" may help. Tried and same error, but perhaps not the correct syntax.
The two lines that use "ls" currently are the following (the first is the one the error currently complains about):
for fileName in `ls ${thisDir}/dotwPosts/p*` ; do
and
printf "`ls ${thisDir}/dotwPosts/p* | wc -l` posts left to import.\n"
Tried changing the first line to (with the error saying /usr/bin/find: Argument list too long).
for fileName in `find -name ${thisDir}/dotwPosts/p*` ; do
May need to provide additional code, but didn't want to make the question too specific to my needs and more general hopefully for others seeing this common error where the other stackoverflow answer didn't seem to apply.
To iterate over file in a directory in bash, print the filenames as a zero separated stream and read it. That way you don't need to store all filenames at once in any place:
find "${thisDir}/dotwPosts/" -maxdepth 1 -type f -name 'p*' -print0 |
while IFS= read -d '' -r file; do
printf "%s\n" "$file"
done
To get the count, output a single character for each file and count the characters:
find "${thisDir}/dotwPosts/" -maxdepth 1 -type f -name 'p*' -printf . | wc -c
Don't use ` backticks, they use is discouraged. Bash hackers wiki discouraged and deprecated syntax. Use $(...) instead.
for fileName in $(...) is a common antipattern in bash. Most probably if you want to iterate over output of another command, you should use while IFS= read -r line loop. bashfaq How can I read a file (data stream, variable) line-by-line (and/or field-by-field)?
Try this:
for file in "${thisDir}/dotwPosts/p"*
do
# exclude non plain files
[[ -f $file ]] || continue
# do something with "$file"
...
done
I quoted "${thisDir}/dotwPosts/p", so var thisDir can't contain a relevant wildcards, but works with blanks. Otherwise remove the quotes.

Iterate over specific files in a directory using Bash find

Shellcheck doesn't like my for over find loop in Bash.
for f in $(find $src -maxdepth 1 -name '*.md'); do wc -w < "$f" >> $path/tmp.txt; done
It suggests instead:
1 while IFS= read -r -d '' file
2 do
3 let count++
4 echo "Playing file no. $count"
5 play "$file"
6 done < <(find mydir -mtime -7 -name '*.mp3' -print0)
7 echo "Played $count files"
I understand most of it, but some things are still unclear.
In line one: What is '' file?
In line six: What does the empty space do in < < (find). Are the < redirects, as usual? If they are, what does it mean to redirect into do block?
Can someone help parse this out? Is this the right way to iterate over files of a certain kind in a directory?
In line one: What is '' file?
According to help read, that '' is an argument to the -d parameter:
-d delim continue until the first character of
DELIM is read, rather than newline
In line six: What does the empty space do in < < (find).
There are two separate operators there. There is <, the standard I/O redirection operator, followed by a <(...) construct, which is a bash-specific construct that performs process substitution:
Process Substitution
Process substitution is supported on systems that
support named pipes (FIFOs) or the /dev/fd method of naming
open files. It takes the form of <(list) or >(list). The
process list is run with its input or output connected
to a FIFO or some file in /dev/fd...
So this is is sending the output of the find command into the do
loop.
Are the < redirects, as usual? If they are, what does it mean to redirect into do block?
Redirect into a loop means that any command inside that loop that
reads from stdin will read from the redirected input source. As a
side effect, everything inside that loop runs in a subshell, which has
implications with respect to variable scope: variables set inside the
loop won't be visible outside the loop.
Can someone help parse this out? Is this the right way to iterate over files of a certain kind in a directory?
For the record, I would typically do this by piping find to xargs,
although which solution is best depends to a certain extend on what
you're trying to do. The two examples in your question do completely
different things, and it's not clear what you're actually trying to
accomplish.
But for example:
find $src -maxdepth 1 -name '*.md' -print0 |
xargs -0 -iDOC wc -w DOC
This would run wc on all the *.md files. The -print0 to find
(and the -0 to xargs) permit this command to correctly handle
filenames with embedded whitespace (e.g., This is my file.md). If
you know you don't have any of those, you just do:
find $src -maxdepth 1 -name '*.md' |
xargs -iDOC wc -w DOC
Generally, you need to use find if you want to do a recursive search through a directory tree (although with modern bash, you can set the shell option globstar, as shellcheck suggests). But in this case you've specified -maxdepth 1, so your find command is just listing files which match the pattern "$src"/*.md. That being the case, it is much simpler and more reliable to use the glob (pattern):
for f in "$src"/*.md; do
wc -w < "$f"
done >> "$path"/tmp.txt
(I also quoted all the variable expansions, for safety, and moved the output redirection so it applies to the entire for loop, which is slightly more efficient.)
If you need to use find (because a glob won't work), then you should attempt to use the -exec option to find, which doesn't require fiddling around with other options to avoid mishandled special characters in filenames. For example, you could do this:
find "$src" -maxdepth 1 -name '*.md' -exec do wc -w {} + >> "$path"/tmp.txt
To answer your specific questions:
In IFS= read -r -d '' file, the '' is the argument to the -d option. That option is used to specify the character which delimits lines to be read; by default, a newline character is used so that read reads one line at a time. The empty string is the same as specifying the NUL character, which is what find outputs at the end of each filename if you specify the -print0 option. (Unlike -exec, -print0 is not Posix standard so it is not guaranteed to work with every find implementation, but in practice it's pretty generally available.)
The space between < and <(...) is to avoid creating the token <<, which would indicate a here-document. Instead, it specifies a redirection (<) from a process substitution (<(...)).

Is there a way to hold a selection of multiple files path in bash ? Or any entries in an ouput at all?

Every time I make a list of files with ls or find, I have to use some heavy pattern matchin + redirection to a temporary file to get only the files I want and then reuse the list in the command line (eg: to move them, renome then, etc).
Is there way to either :
scan a directory and get a list of the stuff there is, and pick what we want (in a way like SCM breeze does for git), so we can later pipe the selection to any command ?
or just filter any output to do this (like a manual grep) ?
You can get a list of files and save them in a variable:
list=*.txt
for example. You can then just use $list when you need the files, although that won't work if you have embedded whitespace in any of the filenames (there are ways around that using an array).
If your filenames are in a file and you want them in a variable, just:
read list < filename
Not sure why you would want to pipe those into a command, but if you need it:
echo *.txt|some_command
I think you will find all your want in this BashPitfalls
This is one example:
while IFS= read -r -d '' file; do
some command "$file"
done < <(find . -type f -name '*.mp3' -print0)
And another one:
shopt -s globstar
for file in ./**/*.mp3; do
some command "$file"
done

How can I read a list of filenames from a file in bash?

I'm trying to write a bash script that will process a list of files whose names are stored one per line in an input file, something the likes of
find . -type f -mtime +15 > /tmp/filelist.txt
for F in $(cat /tmp/filelist.txt) ; do
...
done;
My problem is that filenames in filelist.txt may contain spaces, so the snipped above will expand the line
my text file.txt
to three different filenames, my, text and file.txt. How can I fix that?
Use read:
while read F ; do
echo $F
done </tmp/filelist.txt
Alternatively use IFS to change how the shell separates your list:
OLDIFS=$IFS
IFS="
"
for F in $(cat /tmp/filelist.txt) ; do
echo $F
done
IFS=$OLDIFS
Alternatively (as suggested by #tangens), convert the body of your loop into a separate script, then use find's -exec option to run if for each file found directly.
You can do this without a temporary file using process substitution:
while read F
do
...
done < <(find . -type f -mtime +15)
use while read
cat $FILE | while read line
do
echo $line
done
You can do redirect instead of cat with a pipe
You could use the -exec parameter of find and use the file names directly:
find . -type f -mtime +15 -exec <your command here> {} \;
The {} is a placeholder for the file name.
pipe your find command straight to while read loop
find . -type f -mtime +15 | while read -r line
do
printf "do something with $line\n"
done
I'm not a bash expert by any means ( I usually write my script in ruby or python to be cross-platform), but I would use a regex expration to escape spaces in each line before you process it.
For Bash Regex:
http://www.linuxjournal.com/node/1006996
In a similar situation in Ruby ( processing a csv file, and cleaning up each line before using it):
File.foreach(csv_file_name) do |line|
clean_line = line.gsub(/( )/, '\ ')
#this finds the space in your file name and escapes it
#do more stuff here
end
I believe you can skip the temporary file entirely and just directly iterate over the results of find, i.e.:
for F in $(find . -type f -mtime +15) ; do
...
done;
No guarantees that my syntax is correct but I'm pretty sure the concept works.
Edit: If you really do have to process the file with a list of filenames and can't simply combine the commands as I did above, then you can change the value of the IFS variable--it stands for Internal Field Separator--to change how bash determines fields. By default it is set to whitespace, so a newline, space, or tab will begin a new field. If you set it to contain only a newline, then you can iterate over the file just as you did before.

Resources