> echo "fdp.txtUNE/ser/redaeR/daerorca/bil/rsu\nf.dpu" | sort -s
fdp.txtUNE/ser/redaeR/daerorca/bil/rsu
f.dpu
Since "." is not a field separating character by default, the first 3
characters appear to say:
f <= f (that's fine)
d <= . (in ASCII, "." < "d", but I'm OK with sort deciding letters
come before punctuation)
p <= d (this is problematic)
Even worse, if I remove one letter from the second string, the results
are reversed:
> echo "fdp.txtUNE/ser/redaeR/daerorca/bil/rsu\nf.dp" | sort -s
f.dp
fdp.txtUNE/ser/redaeR/daerorca/bil/rsu
What hideousness is going on here and how do I stop it? I thought "-s"
would suffice, but apparently not.
From what I can tell, 'sort' thinks "f.dpu" > "fdp.t" because "u" > "t". However, that comparison should never be made, since characters before it already differ.
As a note, I get the same results without the "-s".
EDIT: setting environment variable LC_ALL to "C" fixes this, but it still bugs me that leaving LC_ALL (locale) blank yields inconsistent results (different is OK, inconsitent is bad).
First, I had to turn on expanded echo:
$ shopt -s xpg_echo
Now, I'll run what you gave:
BASH 3.2:$ echo "fdp.txtUNE/ser/redaeR/daerorca/bil/rsu\nf.dpu" | sort -s
f.dpu
fdp.txtUNE/ser/redaeR/daerorca/bil/rsu
This is the correct sorted order the . is sorted before d, so f.dpu should come first.
I'm not getting your results. It could be because I'm on Mac OS X and not Linux. However, the -s option on both says "stabilize sort by disabling last-resort comparison". Are they any other shell options you have set that might be causing these issues?
Here's my shopt settings:
$ shopt
cdable_vars off
cdspell off
checkhash off
checkwinsize on
cmdhist on
compat31 off
dotglob off
execfail off
expand_aliases on
extdebug off
extglob on
extquote on
failglob off
force_fignore on
gnu_errfmt off
histappend off
histreedit off
histverify off
hostcomplete on
huponexit off
interactive_comments on
lithist on
login_shell off
mailwarn off
no_empty_cmd_completion off
nocaseglob off
nocasematch off
nullglob off
progcomp on
promptvars on
restricted_shell off
shift_verbose off
sourcepath on
xpg_echo on
The three differences from the default I have are exglob, lithist, and xpg_echo which I just set in order to get this to work.
Can you think of anything else that could be going on?
Related
I'm trying to work on a set of files with various extensions but I'm not that experienced with the inner workings of bash... this is what I' trying to accomplished (stripped down):
DOCUMENT_SOURCE_FILE_PATTERN="*.{yaml,md}";
pandoc -s -f markdown -o combined.html $DOCUMENT_SOURCE_FILE_PATTERN;
results in
pandoc: *.{yaml,md}: openFile: does not exist (No such file or directory)
Whereas when I do it directly
pandoc -s -f markdown -o combined.html *.{yaml,md};
it works perfectly.
The value of $DOCUMENT_SOURCE_FILE_PATTERN is really generated by command line arguments and not hard coded, otherwise the direct approach in the example above would be good enough already.
as requested, here's a fully self contained example
put the below code into a test.sh script within an empty directory
#!/bin/bash
# setup
touch 0001.md
touch 0002.md
touch metadata.yaml
# actual functionality under test
DOCUMENT_SOURCE_FILE_PATTERN="yaml,md";
shopt -s nullglob;
DOCUMENT_SOURCE_FILES=( *.{$DOCUMENT_SOURCE_FILE_PATTERN} );
echo "required logic below:";
echo "${DOCUMENT_SOURCE_FILES[#]}";
echo;
echo "working solution with hardcoding:";
DOCUMENT_SOURCE_FILES=( *.{yaml,md} );
echo "${DOCUMENT_SOURCE_FILES[#]}";
# tear down
rm *.{yaml,md};
Don't try to add a glob string in a variable. Use an array and do quoted array expansion. The nullglob is to ensure the literal glob string is not passed to the array but only the expanded list if available
shopt -s nullglob
document_source_file_pattern=( *.{yaml,md} )
and pass the array as
pandoc -s -f markdown -o combined.html "${document_source_file_pattern[#]}"
as one more level of safe-way you could do below, which runs your pandoc command only the array is non-zero.
(( "${#document_source_file_pattern[#]}" )) &&
pandoc -s -f markdown -o combined.html "${document_source_file_pattern[#]}"
On this line, you are trying to double-expand (first $DOCUMENT_SOURCE_FILE_PATTERN, then the resulting pattern):
DOCUMENT_SOURCE_FILES=( *.{$DOCUMENT_SOURCE_FILE_PATTERN} );
You can't do that directly.
If you trust that $DOCUMENT_SOURCE_FILE_PATTERN isn't going to contain malicious input, then you can achieve what you want using eval:
eval DOCUMENT_SOURCE_FILES=( *.{$DOCUMENT_SOURCE_FILE_PATTERN} )
But instead, you could (should) probably try to do this in a different way, like add the files you want to an array, instead of trying to dynamically create a brace expansion in your code:
# prevent literal globs being added to the array when no files match
shopt -s nullglob
source_files=()
if <whatever condition to add markdown files>; then
source_files+=( *.md )
fi
if <whatever condition to add yaml files>; then
source_files+=( *.yaml )
fi
I've been following this tutorial (the idea can also be found in other posts of SO)
http://www.cyberciti.biz/faq/bash-loop-over-file/
This is my test script:
function getAllTests {
allfiles=$TEST_SCRIPTS/*
# Getting all stests in the
if [[ $1 == "s" ]]; then
for f in $allfiles
do
echo $f
done
fi
}
The idea is to print all files (one per line) in the directory found in TEST_SCRIPTS.
Instead of that this is what I get as an output:
/path/to/dir/*
(The actual path obviously, but this is to convey the idea).
I have tried the followign experiment on bash. Doing this
a=(./*)
And this read me all files in the current directory into a as an array. However if anything other than ./ is used then it does not work.
How can I use this procedure with a directory other than ./?
When there are no matches, the wildcard is not expanded.
I speculate that TESTSCRIPTS contains a path which does not exist; but without access to your code, there is obviously no way to diagnose this properly.
Common solutions include shopt -s nullglob which causes the shell to replace the wildcard with nothing when there are no matches; and explicitly checking for the expanded value being equal to the wildcard (in theory, this could misfire if there is a single file named literally * so this is not completely bulletproof!)
By the by, the allfiles variable appears to be superfluous, and you should generally be much more meticulous about quoting. See When to wrap quotes around a shell variable? for details.
function getAllTests {
local nullglob
shopt -q nullglob || nullglob=reset
shopt -s nullglob
# Getting all stests in the # fix sentence fragment?
if [[ $1 == "s" ]]; then
for f in "$TEST_SCRIPTS"/*; do # notice quotes
echo "$f" # ditto
done
fi
# Unset if it wasn't set originally
case $nullglob in 'reset') shopt -u nullglob;; esac
}
Setting and unsetting nullglob inside a single function is probably excessive; most commonly, you would set it once at the beginning of your script, and then write the script accordingly.
Here is my script
data_dir="/home/data"
shopt extglob
files=!($data_dir/*08142014*)
echo ${files[#]}
for file in $files[#]
do
#blabla
done
/home/data contains multiple files with different date info within file name, thus I should be able to get a list of files that does not contain "08142014".
But kept get syntax error. It seems files is just "!(/home/data/08202014)", while I want a list of file names.
Did I miss anything? Thanks
You can use:
data_dir="/home/data"
shopt -s extglob
files=($data_dir/!(*08142014*))
for file in "${files[#]}"
do
echo "$file"
done
To set extglob you need to use shopt -s extglob
To set array your syntax isn't right
Check how array is correctly iterated
You can use ->
files=`ls $data_dir | grep -v 08142014`
I am trying to loop through files of a list of specified extensions with a bash script. I tried the solution given at Matching files with various extensions using for loop but it does not work as expected. The solution given was:
for file in "${arg}"/*.{txt,h,py}; do
Here is my version of it:
for f in "${arg}"/*.{epub,mobi,chm,rtf,lit,djvu}
do
echo "$f"
done
When I run this in a directory with an epub file in it, I get:
/*.epub
/*.mobi
/*.chm
/*.rtf
/*.lit
/*.djvu
So I tried changing the for statement:
for f in "${arg}"*.{epub,mobi,chm,rtf,lit,djvu}
Then I got:
089281098X.epub
*.mobi
*.chm
*.rtf
*.lit
*.djvu
I also get the same result with:
for f in *.{epub,mobi,chm,rtf,lit,djvu}
So it seems that the "${arg}" argument is unnecessary.
Although either of these statements finds files of the specified extensions and can pass them to a program, I get read errors from the unresolved *. filenames.
I am running this on OS X Mountain Lion. I was aware that the default bash shell was outdated so I upgraded it from 3.2.48 to 4.2.45 using homebrew to see if this was the problem. That didn't help so I am wondering why I am getting these unexpected results. Is the given solution wrong or is the OS X bash shell somehow different from the *NIX version? Is there perhaps an alternate way to accomplish the same thing that might work better in the OS X bash shell?
This may be a BASH 4.2ism. It does not work in my BASH which is still 3.2. However, if you shopt -s extglob, you can use *(...) instead:
shopt -s extglob
for file in *.*(epub|mobi|chm|rtf|lit|djvu)
do
...
done
#David W.: shopt -s extglob for f in .(epub|mobi|chm|rtf|lit|djvu) results in: 089281098X.epub #kojiro: arg=. shopt -s nullglob for f in "${arg}"/.{epub,mobi,chm,rtf,lit,djvu} results in: ./089281098X.epub shopt -s nullglob for f in "${arg}".{epub,mobi,chm,rtf,lit,djvu} results in: 089281098X.epub So all of these variations work but I don't understand why. Can either of you explain what is going on with each variation and what ${arg} is doing? I would really like to understand this so I can increase my knowledge. Thanks for the help.
In mine:
for f in *.*(epub|mobi|chm|rtf|lit|djvu)
I didn't include ${arg} which expands to the value of $arg. The *(...) matches the pattern found in the parentheses which is one of any of the series of extensions. Thus, it matches *.epub.
Kojiro's:
arg=.
shopt -s nullglob
for f in "${arg}"/*.{epub,mobi,chm,rtf,lit,djvu}
Is including $arg and the slash in his matching. Thus, koriro's start with ./ because that's what they are asking for.
It's like the difference between:
echo *
and
echo ./*
By the way, you could do this with the other expressions too:
echo *.*(epub|mobi|chm|rtf|lit|djvu)
The shell is doing all of the expansion for you. It's really has nothing to do with the for statement itself.
A glob has to expand to an existing, found name, or it is left alone with the asterisk intact. If you have an empty directory, *.foo will expand to *.foo. (Unless you use the nullglob Bash extension.)
The problem with your code is that you start with an arg, $arg, which is apparently empty or undefined. So your glob, ${arg}/*.epub expands to /*.epub because there are no files ending in ".epub" in the root directory. It's never looking in the current directory. For it to do that, you'd need to set arg=. first.
In your second example, the ${arg}*.epub does expand because $arg is empty, but the other files don't exist, so they continue not to expand as globs. As I hinted at before, one easy workaround would be to activate nullglob with shopt -s nullglob. This is bash-specific, but will cause *.foo to expand to an empty string if there is no matching file. For a strict POSIX solution, you would have to filter out unexpanded globs using [ -f "$f" ]. (Then again, if you wanted POSIX, you couldn't use brace expansion either.)
To summarize, the best solutions are to use (most intuitive and elegant):
shopt -s extglob
for f in *.*(epub|mobi|chm|rtf|lit|djvu)
or, in keeping with the original solution given in the referenced thread (which was wrong as stated):
shopt -s nullglob
for f in "${arg}"*.{epub,mobi,chm,rtf,lit,djvu}
This should do it:
for file in $(find ./ -name '*.epub' -o -name '*.mobi' -o -name '*.chm' -o -name '*.rtf' -o -name '*.lit' -o -name '*.djvu'); do
echo $file
done
Is there a way to specify that a particular command has case insensitivity, without turning on case insensitivity globally (at least for that shell)?
In my particular case, I have a small app that gives me command line access to a database of email addresses, so I type:
db get email john smith
and it returns back with John Smith's email address. So I've managed to enable completion largely inside the app: setting
COMPREPLY=($(compgen -W "$(db --complete $COMP_CWORD "$COMP_WORDS[#]"}")" -- ${COMP_WORDS[COMP_CWORD]}))
works to allow me to tab-complete get and email. However, if I then type j<tab>, it refuses, because in the email database, it's properly capitalised. I'd like to get bash to complete this anyway. (If I use a capital J, it works.)
Failing that, I can have my --complete option change the case of its reply by matching the input, I suppose, but ideally the command line would match the database if at all possible.
Note that I have this working inside the app when using readline, it's only interfacing with bash that seems to be an issue.
Indeed there seems to be no way to have compgen do case-insensitive matching against the word list (-W).
I see the following workarounds:
Simple solution: Translate both the word list and the input token to all-lowercase first.
Note: This is only an option if it's acceptable to have all completions turn into all-lowercase.
complete_lower() {
local token=${COMP_WORDS[$COMP_CWORD]}
local words=$( db --complete $COMP_CWORD "${COMP_WORDS[#]}" )
# Translate both the word list and the token to all-lowercase.
local wordsLower=$( printf %s "$words" | tr [:upper:] [:lower:] )
local tokenLower=$( printf %s "$token" | tr [:upper:] [:lower:] )
COMPREPLY=($(compgen -W "$wordsLower" -- "$tokenLower"))
}
Better, but more elaborate solution: Roll your own, case-insensitive matching logic:
complete_custommatch() {
local token=${COMP_WORDS[$COMP_CWORD]}
local words=$( db --complete $COMP_CWORD "${COMP_WORDS[#]}" )
# Turn case-insensitive matching temporarily on, if necessary.
local nocasematchWasOff=0
shopt nocasematch >/dev/null || nocasematchWasOff=1
(( nocasematchWasOff )) && shopt -s nocasematch
# Loop over words in list and search for case-insensitive prefix match.
local w matches=()
for w in $words; do
if [[ "$w" == "$token"* ]]; then matches+=("$w"); fi
done
# Restore state of 'nocasematch' option, if necessary.
(( nocasematchWasOff )) && shopt -u nocasematch
COMPREPLY=("${matches[#]}")
}
It's a lot easier to just use grep to do all the work; then the case is preserved in the completions, and you don't have to mess with shopt or anything like that. For example:
_example_completions () {
local choices="john JAMES Jerry Erik eMIly alex Don donald donny#example.com RON"
COMPREPLY=( $( echo "$choices" | tr " " "\n" | grep -i "^$2" ) )
}
Here, $choices is your list of words, tr is used to change the spaces between words to newlines so grep can understand them (you could omit that if your words are already newline-delimited), the -i option is case insensitive matching, and "^$2" matches the current word (bash passes it as $2) at the beginning of a line.
$ example dO<tab>
Don donald donny#example.com