Attempting to convert a twitter account of over 10K tweets into another format with a bash script on a maxed out MBP 16" running the latest macOS.
After running for several minutes outputting many periods it says, line 43: /bin/ls: Argument list too long. Assuming this issue relates to the number of tweets so while I could attempt to break into small pieces as a last resort, not knowing what the max number to avoid the error is, decided to first search for a solution.
Searched Google and SO and found, "bash: /bin/ls: Argument list too long". If my issue is the same it sounds like replacing "ls" with "find -name" may help. Tried and same error, but perhaps not the correct syntax.
The two lines that use "ls" currently are the following (the first is the one the error currently complains about):
for fileName in `ls ${thisDir}/dotwPosts/p*` ; do
and
printf "`ls ${thisDir}/dotwPosts/p* | wc -l` posts left to import.\n"
Tried changing the first line to (with the error saying /usr/bin/find: Argument list too long).
for fileName in `find -name ${thisDir}/dotwPosts/p*` ; do
May need to provide additional code, but didn't want to make the question too specific to my needs and more general hopefully for others seeing this common error where the other stackoverflow answer didn't seem to apply.
To iterate over file in a directory in bash, print the filenames as a zero separated stream and read it. That way you don't need to store all filenames at once in any place:
find "${thisDir}/dotwPosts/" -maxdepth 1 -type f -name 'p*' -print0 |
while IFS= read -d '' -r file; do
printf "%s\n" "$file"
done
To get the count, output a single character for each file and count the characters:
find "${thisDir}/dotwPosts/" -maxdepth 1 -type f -name 'p*' -printf . | wc -c
Don't use ` backticks, they use is discouraged. Bash hackers wiki discouraged and deprecated syntax. Use $(...) instead.
for fileName in $(...) is a common antipattern in bash. Most probably if you want to iterate over output of another command, you should use while IFS= read -r line loop. bashfaq How can I read a file (data stream, variable) line-by-line (and/or field-by-field)?
Try this:
for file in "${thisDir}/dotwPosts/p"*
do
# exclude non plain files
[[ -f $file ]] || continue
# do something with "$file"
...
done
I quoted "${thisDir}/dotwPosts/p", so var thisDir can't contain a relevant wildcards, but works with blanks. Otherwise remove the quotes.
Related
A bit lowly a query but here goes:
bash shell script. POSIX, Mint 21
I just want one/any (mp3) file from a directory. As a sample.
In normal execution, a full run, the code would be such
for f in *.mp3 do
#statements
done
This works fine but if I wanted to sample just one file of such an array/glob (?) without looping, how might I do that? I don't care which file, just that it is an mp3 from the directory I am working in.
Should I just start this for-loop and then exit(break) after one statement, or is there a neater way more tailored-for-the-job way?
for f in *.mp3 do
#statement
break
done
Ta (can not believe how dopey I feel asking this one, my forehead will hurt when I see the answers )
Since you are using Linux (Mint) you've got GNU find so one way to get one .mp3 file from the current directory is:
mp3file=$(find . -maxdepth 1 -mindepth 1 -name '*.mp3' -printf '%f' -quit)
-maxdepth 1 -mindepth 1 causes the search to be restricted to one level under the current directory.
-printf '%f' prints just the filename (e.g. foo.mp3). The -print option would print the path to the filename (e.g. ./foo.mp3). That may not matter to you.
-quit causes find to exit as soon as one match is found and printed.
Another option is to use the Bash : (colon) command and $_ (dollar underscore) special variable:
: *.mp3
mp3file=$_
: *.mp3 runs the : command with the list of .mp3 files in the current directory as arguments. The : command ignores its arguments and does nothing.
mp3file=$_ sets the value of the mp3file variable to the last argument supplied to the previous command (:).
The second option should not be used if the number of .mp3 files is large (hundreds or more) because it will find all of the files and sort them by name internally.
In both cases $mp3file should be checked to ensure that it really exists (e.g. [[ -e $mp3file ]]) before using it for anything else, in case there are no .mp3 files in the directory.
I would do it like this in POSIX shell:
mp3file=
for f in *.mp3; do
if [ -f "$f" ]; then
mp3file=$f
break
fi
done
# At this point, the variable mp3file contains a filename which
# represents a regular file (or a symbolic link) with the .mp3
# extension, or empty string if there is no such a file.
The fact that you use
for f in *.mp3 do
suggests to me, that the MP3s are named without to much strange characters in the filename.
In that case, if you really don't care which MP3, you could:
f=$(ls *.mp3|head)
statement
Or, if you want a different one every time:
f=$(ls *.mp3|sort -R | tail -1)
Note: if your filenames get more complicated (including spaces or other special characters), this will not work anymore.
Assuming you don't have spaces in your filenames, (and I don't understand why the collective taboo is against using ls in scripts at all, rather than not having spaces in filenames, personally) then:-
ls *.mp3 | tr ' ' '\n' | sed -n '1p'
I am trying to save the result from find as arrays.
Here is my code:
#!/bin/bash
echo "input : "
read input
echo "searching file with this pattern '${input}' under present directory"
array=`find . -name ${input}`
len=${#array[*]}
echo "found : ${len}"
i=0
while [ $i -lt $len ]
do
echo ${array[$i]}
let i++
done
I get 2 .txt files under current directory.
So I expect '2' as result of ${len}. However, it prints 1.
The reason is that it takes all result of find as one elements.
How can I fix this?
P.S
I found several solutions on StackOverFlow about a similar problem. However, they are a little bit different so I can't apply in my case. I need to store the results in a variable before the loop. Thanks again.
Update 2020 for Linux Users:
If you have an up-to-date version of bash (4.4-alpha or better), as you probably do if you are on Linux, then you should be using Benjamin W.'s answer.
If you are on Mac OS, which —last I checked— still used bash 3.2, or are otherwise using an older bash, then continue on to the next section.
Answer for bash 4.3 or earlier
Here is one solution for getting the output of find into a bash array:
array=()
while IFS= read -r -d $'\0'; do
array+=("$REPLY")
done < <(find . -name "${input}" -print0)
This is tricky because, in general, file names can have spaces, new lines, and other script-hostile characters. The only way to use find and have the file names safely separated from each other is to use -print0 which prints the file names separated with a null character. This would not be much of an inconvenience if bash's readarray/mapfile functions supported null-separated strings but they don't. Bash's read does and that leads us to the loop above.
[This answer was originally written in 2014. If you have a recent version of bash, please see the update below.]
How it works
The first line creates an empty array: array=()
Every time that the read statement is executed, a null-separated file name is read from standard input. The -r option tells read to leave backslash characters alone. The -d $'\0' tells read that the input will be null-separated. Since we omit the name to read, the shell puts the input into the default name: REPLY.
The array+=("$REPLY") statement appends the new file name to the array array.
The final line combines redirection and command substitution to provide the output of find to the standard input of the while loop.
Why use process substitution?
If we didn't use process substitution, the loop could be written as:
array=()
find . -name "${input}" -print0 >tmpfile
while IFS= read -r -d $'\0'; do
array+=("$REPLY")
done <tmpfile
rm -f tmpfile
In the above the output of find is stored in a temporary file and that file is used as standard input to the while loop. The idea of process substitution is to make such temporary files unnecessary. So, instead of having the while loop get its stdin from tmpfile, we can have it get its stdin from <(find . -name ${input} -print0).
Process substitution is widely useful. In many places where a command wants to read from a file, you can specify process substitution, <(...), instead of a file name. There is an analogous form, >(...), that can be used in place of a file name where the command wants to write to the file.
Like arrays, process substitution is a feature of bash and other advanced shells. It is not part of the POSIX standard.
Alternative: lastpipe
If desired, lastpipe can be used instead of process substitution (hat tip: Caesar):
set +m
shopt -s lastpipe
array=()
find . -name "${input}" -print0 | while IFS= read -r -d $'\0'; do array+=("$REPLY"); done; declare -p array
shopt -s lastpipe tells bash to run the last command in the pipeline in the current shell (not the background). This way, the array remains in existence after the pipeline completes. Because lastpipe only takes effect if job control is turned off, we run set +m. (In a script, as opposed to the command line, job control is off by default.)
Additional notes
The following command creates a shell variable, not a shell array:
array=`find . -name "${input}"`
If you wanted to create an array, you would need to put parens around the output of find. So, naively, one could:
array=(`find . -name "${input}"`) # don't do this
The problem is that the shell performs word splitting on the results of find so that the elements of the array are not guaranteed to be what you want.
Update 2019
Starting with version 4.4-alpha, bash now supports a -d option so that the above loop is no longer necessary. Instead, one can use:
mapfile -d $'\0' array < <(find . -name "${input}" -print0)
For more information on this, please see (and upvote) Benjamin W.'s answer.
Bash 4.4 introduced a -d option to readarray/mapfile, so this can now be solved with
readarray -d '' array < <(find . -name "$input" -print0)
for a method that works with arbitrary filenames including blanks, newlines, and globbing characters. This requires that your find supports -print0, as for example GNU find does.
From the manual (omitting other options):
mapfile [-d delim] [array]
-d
The first character of delim is used to terminate each input line, rather than newline. If delim is the empty string, mapfile will terminate a line when it reads a NUL character.
And readarray is just a synonym of mapfile.
The following appears to work for both Bash and Z Shell on macOS.
#! /bin/sh
IFS=$'\n'
paths=($(find . -name "foo"))
unset IFS
printf "%s\n" "${paths[#]}"
If you are using bash 4 or later, you can replace your use of find with
shopt -s globstar nullglob
array=( **/*"$input"* )
The ** pattern enabled by globstar matches 0 or more directories, allowing the pattern to match to an arbitrary depth in the current directory. Without the nullglob option, the pattern (after parameter expansion) is treated literally, so with no matches you would have an array with a single string rather than an empty array.
Add the dotglob option to the first line as well if you want to traverse hidden directories (like .ssh) and match hidden files (like .bashrc) as well.
you can try something like
array=(`find . -type f | sort -r | head -2`) , and in order to print the array values , you can try something like echo "${array[*]}"
None of these solutions suited me because I didn't feel like learning readarray and mapfile. Here is what I came up with.
#!/bin/bash
echo "input : "
read input
echo "searching file with this pattern '${input}' under present directory"
# The only change is here. Append to array for each non-empty line.
array=()
while read line; do
[[ ! -z "$line" ]] && array+=("$line")
done; <<< $(find . -name ${input} -print)
len=${#array[#]}
echo "found : ${len}"
i=0
while [ $i -lt $len ]
do
echo ${array[$i]}
let i++
done
You could do like this:
#!/bin/bash
echo "input : "
read input
echo "searching file with this pattern '${input}' under present directory"
array=(`find . -name '*'${input}'*'`)
for i in "${array[#]}"
do :
echo $i
done
In bash, $(<any_shell_cmd>) helps to run a command and capture the output. Passing this to IFS with \n as delimiter helps to convert that to an array.
IFS='\n' read -r -a txt_files <<< $(find /path/to/dir -name "*.txt")
I am trying to save the result from find as arrays.
Here is my code:
#!/bin/bash
echo "input : "
read input
echo "searching file with this pattern '${input}' under present directory"
array=`find . -name ${input}`
len=${#array[*]}
echo "found : ${len}"
i=0
while [ $i -lt $len ]
do
echo ${array[$i]}
let i++
done
I get 2 .txt files under current directory.
So I expect '2' as result of ${len}. However, it prints 1.
The reason is that it takes all result of find as one elements.
How can I fix this?
P.S
I found several solutions on StackOverFlow about a similar problem. However, they are a little bit different so I can't apply in my case. I need to store the results in a variable before the loop. Thanks again.
Update 2020 for Linux Users:
If you have an up-to-date version of bash (4.4-alpha or better), as you probably do if you are on Linux, then you should be using Benjamin W.'s answer.
If you are on Mac OS, which —last I checked— still used bash 3.2, or are otherwise using an older bash, then continue on to the next section.
Answer for bash 4.3 or earlier
Here is one solution for getting the output of find into a bash array:
array=()
while IFS= read -r -d $'\0'; do
array+=("$REPLY")
done < <(find . -name "${input}" -print0)
This is tricky because, in general, file names can have spaces, new lines, and other script-hostile characters. The only way to use find and have the file names safely separated from each other is to use -print0 which prints the file names separated with a null character. This would not be much of an inconvenience if bash's readarray/mapfile functions supported null-separated strings but they don't. Bash's read does and that leads us to the loop above.
[This answer was originally written in 2014. If you have a recent version of bash, please see the update below.]
How it works
The first line creates an empty array: array=()
Every time that the read statement is executed, a null-separated file name is read from standard input. The -r option tells read to leave backslash characters alone. The -d $'\0' tells read that the input will be null-separated. Since we omit the name to read, the shell puts the input into the default name: REPLY.
The array+=("$REPLY") statement appends the new file name to the array array.
The final line combines redirection and command substitution to provide the output of find to the standard input of the while loop.
Why use process substitution?
If we didn't use process substitution, the loop could be written as:
array=()
find . -name "${input}" -print0 >tmpfile
while IFS= read -r -d $'\0'; do
array+=("$REPLY")
done <tmpfile
rm -f tmpfile
In the above the output of find is stored in a temporary file and that file is used as standard input to the while loop. The idea of process substitution is to make such temporary files unnecessary. So, instead of having the while loop get its stdin from tmpfile, we can have it get its stdin from <(find . -name ${input} -print0).
Process substitution is widely useful. In many places where a command wants to read from a file, you can specify process substitution, <(...), instead of a file name. There is an analogous form, >(...), that can be used in place of a file name where the command wants to write to the file.
Like arrays, process substitution is a feature of bash and other advanced shells. It is not part of the POSIX standard.
Alternative: lastpipe
If desired, lastpipe can be used instead of process substitution (hat tip: Caesar):
set +m
shopt -s lastpipe
array=()
find . -name "${input}" -print0 | while IFS= read -r -d $'\0'; do array+=("$REPLY"); done; declare -p array
shopt -s lastpipe tells bash to run the last command in the pipeline in the current shell (not the background). This way, the array remains in existence after the pipeline completes. Because lastpipe only takes effect if job control is turned off, we run set +m. (In a script, as opposed to the command line, job control is off by default.)
Additional notes
The following command creates a shell variable, not a shell array:
array=`find . -name "${input}"`
If you wanted to create an array, you would need to put parens around the output of find. So, naively, one could:
array=(`find . -name "${input}"`) # don't do this
The problem is that the shell performs word splitting on the results of find so that the elements of the array are not guaranteed to be what you want.
Update 2019
Starting with version 4.4-alpha, bash now supports a -d option so that the above loop is no longer necessary. Instead, one can use:
mapfile -d $'\0' array < <(find . -name "${input}" -print0)
For more information on this, please see (and upvote) Benjamin W.'s answer.
Bash 4.4 introduced a -d option to readarray/mapfile, so this can now be solved with
readarray -d '' array < <(find . -name "$input" -print0)
for a method that works with arbitrary filenames including blanks, newlines, and globbing characters. This requires that your find supports -print0, as for example GNU find does.
From the manual (omitting other options):
mapfile [-d delim] [array]
-d
The first character of delim is used to terminate each input line, rather than newline. If delim is the empty string, mapfile will terminate a line when it reads a NUL character.
And readarray is just a synonym of mapfile.
The following appears to work for both Bash and Z Shell on macOS.
#! /bin/sh
IFS=$'\n'
paths=($(find . -name "foo"))
unset IFS
printf "%s\n" "${paths[#]}"
If you are using bash 4 or later, you can replace your use of find with
shopt -s globstar nullglob
array=( **/*"$input"* )
The ** pattern enabled by globstar matches 0 or more directories, allowing the pattern to match to an arbitrary depth in the current directory. Without the nullglob option, the pattern (after parameter expansion) is treated literally, so with no matches you would have an array with a single string rather than an empty array.
Add the dotglob option to the first line as well if you want to traverse hidden directories (like .ssh) and match hidden files (like .bashrc) as well.
you can try something like
array=(`find . -type f | sort -r | head -2`) , and in order to print the array values , you can try something like echo "${array[*]}"
None of these solutions suited me because I didn't feel like learning readarray and mapfile. Here is what I came up with.
#!/bin/bash
echo "input : "
read input
echo "searching file with this pattern '${input}' under present directory"
# The only change is here. Append to array for each non-empty line.
array=()
while read line; do
[[ ! -z "$line" ]] && array+=("$line")
done; <<< $(find . -name ${input} -print)
len=${#array[#]}
echo "found : ${len}"
i=0
while [ $i -lt $len ]
do
echo ${array[$i]}
let i++
done
You could do like this:
#!/bin/bash
echo "input : "
read input
echo "searching file with this pattern '${input}' under present directory"
array=(`find . -name '*'${input}'*'`)
for i in "${array[#]}"
do :
echo $i
done
In bash, $(<any_shell_cmd>) helps to run a command and capture the output. Passing this to IFS with \n as delimiter helps to convert that to an array.
IFS='\n' read -r -a txt_files <<< $(find /path/to/dir -name "*.txt")
What do I want: find all the nginx access log files, iterate them (get some data from them).
I'm stuck at for loop:
#!/bin/bash
logfiles="$(find /var/log/nginx -name 'access.log*')"
for lf in "$logfiles"
do
echo "file"
done
Output is only one "file" word, despite of there are more than one log file. What's wrong?
when you say
for lf in "$logfiles"
your quotes preserve the whitespace within find's output. The quotes, in this case, are incorrect. Removing them will properly iterate over the files:
$ for i in "`find . -iname '*.log'`"; do echo $i; done
./2.log ./3.log ./1.log
$ for i in `find . -iname '*.log'`; do echo $i; done
./2.log
./3.log
./1.log
But there's a much better way: you should stream your data instead of iterating. Consider this pattern:
$ find . -iname '*.log' | xargs -n 1 echo
./2.log
./3.log
./1.log
It's very much worth wrapping your head around xargs, which turns its standard input into additional arguments to add to its own, which it then executes. In this simple case, I'm telling xargs to run the command echo individually for each 1 (-n 1) of the files
There's a few reasons xargs is my go-to iteration operator whenever possible: firstly, it's very smart. Iterating over command output with for i in $(command) requires $(command) to provide your list in the form item1 item2 item3, causing problems if any of the items contain special characters, which are then interpreted by bash as part of the for arguments.
Here is an example of the space which typically becomes special in bash as a valid input field spearator.
$ for i in `find . -iname '*.log'`; do echo $i; done
./4
tricky.log
./2.log
./3.log
./1.log
the file 4 tricky.log, containing a space, has now caused a problem.
xargs can be smart enough to keep them separate. For some cases you can get around it with changing your $IFS, the input field separator. But that gets messy fast. With xargs, you have better options - specifially, xargs can also use the null character to terminate the items in its input stream with the -0 character. Other programs, namely find, can also use the null character in its output to match what xargs expects. In this sense, xargs and find are a great combination:
$ find . -iname '*.log' -print0 | xargs -0 -n 1 echo
./4 tricky.log
./2.log
./3.log
./1.log
But wait, there's more! The next step in your command will surely be to grep the files looking for whatever matching lines you wish to find. If your lines are large, you'll want to parallelize too. xargs can do this as well. You can add more steps ot the pipeline for filtering etc.
Finally, using subshell substitution $() as program arguments can lead to unintended commands when not used very carefully to avoid unintentional arguments in failure cases. I once wrote a script that used $() to find mysql's source directory to do some first-time setup. It said something like remove -r /$(find / -iname mysqldir) . Well, if there's no mysqldir in the expected location that turned into rm -r /. Not what I intended, obviously: d'oh!
That's why I use and encourage others to use xargs whenever possible.
lose the quotes in this line: for lf in $logfiles
But it looks like you may have only one file named access.log
Shellcheck doesn't like my for over find loop in Bash.
for f in $(find $src -maxdepth 1 -name '*.md'); do wc -w < "$f" >> $path/tmp.txt; done
It suggests instead:
1 while IFS= read -r -d '' file
2 do
3 let count++
4 echo "Playing file no. $count"
5 play "$file"
6 done < <(find mydir -mtime -7 -name '*.mp3' -print0)
7 echo "Played $count files"
I understand most of it, but some things are still unclear.
In line one: What is '' file?
In line six: What does the empty space do in < < (find). Are the < redirects, as usual? If they are, what does it mean to redirect into do block?
Can someone help parse this out? Is this the right way to iterate over files of a certain kind in a directory?
In line one: What is '' file?
According to help read, that '' is an argument to the -d parameter:
-d delim continue until the first character of
DELIM is read, rather than newline
In line six: What does the empty space do in < < (find).
There are two separate operators there. There is <, the standard I/O redirection operator, followed by a <(...) construct, which is a bash-specific construct that performs process substitution:
Process Substitution
Process substitution is supported on systems that
support named pipes (FIFOs) or the /dev/fd method of naming
open files. It takes the form of <(list) or >(list). The
process list is run with its input or output connected
to a FIFO or some file in /dev/fd...
So this is is sending the output of the find command into the do
loop.
Are the < redirects, as usual? If they are, what does it mean to redirect into do block?
Redirect into a loop means that any command inside that loop that
reads from stdin will read from the redirected input source. As a
side effect, everything inside that loop runs in a subshell, which has
implications with respect to variable scope: variables set inside the
loop won't be visible outside the loop.
Can someone help parse this out? Is this the right way to iterate over files of a certain kind in a directory?
For the record, I would typically do this by piping find to xargs,
although which solution is best depends to a certain extend on what
you're trying to do. The two examples in your question do completely
different things, and it's not clear what you're actually trying to
accomplish.
But for example:
find $src -maxdepth 1 -name '*.md' -print0 |
xargs -0 -iDOC wc -w DOC
This would run wc on all the *.md files. The -print0 to find
(and the -0 to xargs) permit this command to correctly handle
filenames with embedded whitespace (e.g., This is my file.md). If
you know you don't have any of those, you just do:
find $src -maxdepth 1 -name '*.md' |
xargs -iDOC wc -w DOC
Generally, you need to use find if you want to do a recursive search through a directory tree (although with modern bash, you can set the shell option globstar, as shellcheck suggests). But in this case you've specified -maxdepth 1, so your find command is just listing files which match the pattern "$src"/*.md. That being the case, it is much simpler and more reliable to use the glob (pattern):
for f in "$src"/*.md; do
wc -w < "$f"
done >> "$path"/tmp.txt
(I also quoted all the variable expansions, for safety, and moved the output redirection so it applies to the entire for loop, which is slightly more efficient.)
If you need to use find (because a glob won't work), then you should attempt to use the -exec option to find, which doesn't require fiddling around with other options to avoid mishandled special characters in filenames. For example, you could do this:
find "$src" -maxdepth 1 -name '*.md' -exec do wc -w {} + >> "$path"/tmp.txt
To answer your specific questions:
In IFS= read -r -d '' file, the '' is the argument to the -d option. That option is used to specify the character which delimits lines to be read; by default, a newline character is used so that read reads one line at a time. The empty string is the same as specifying the NUL character, which is what find outputs at the end of each filename if you specify the -print0 option. (Unlike -exec, -print0 is not Posix standard so it is not guaranteed to work with every find implementation, but in practice it's pretty generally available.)
The space between < and <(...) is to avoid creating the token <<, which would indicate a here-document. Instead, it specifies a redirection (<) from a process substitution (<(...)).