Why is bash array-splitting this line incorrectly?

Why is bash array-splitting this line incorrectly? - bash

In the below example, why do "[ 1]" and "[20]" split fine, but "[10000]" does not?
It appears to be converting the 10000 to 1 and dropping the brackets?
#!/bin/bash
var="[ 1]"
vararray=($var)
echo "var=${var}"
echo "vararray[0]=${vararray[0]}"
var="[20]"
vararray=($var)
echo "var=${var}"
echo "vararray[0]=${vararray[0]}"
var="[10000]"
vararray=($var)
echo "var=${var}"
echo "vararray[0]=${vararray[0]}"
Results...
$ ./bashtest.sh
var=[ 1]
ararray[0]=[
var=[20]
vararray[0]=[20]
var=[10000]
vararray[0]=1 << what?

Presume that you have a file named 1 in your current directory. (This often happens unintentionally, f/e if someone wants to run 2>&1 but runs 2>1 instead by mistake).
[20] does not glob to 1 -- it globs only to 2 or 0.
[ 1], when run with the default IFS value, is word-split into [ and 1], neither of which is a valid glob, so expanding it unquoted doesn't perform any globbing operation at all.
However, [10000] -- just like [01] -- will glob to either 0 or 1, if a file by any of those names exists. In your example scenario, you clearly had a file named 1 in your current working directory.
Don't use unquoted expansion to split strings into arrays.
Instead, use read -r -a vararray <<<"$var", optionally after explicitly setting IFS to contain only the characters you want to split on.

Related

Regex - extract up until a match and not including that match

I'm attempting to capture filenames removing both a file-extension and suffix, e.g.:
TEST_EXAMPLE_SUFFIX.file
Output = TEST_EXAMPLE
I want to do this on the basis of matching the _SUFFIX part and extracting all characters prior to that (not including _SUFFIX). Ordinarily I would use something like:
FILE_EXT=_SUFFIX
/.+?(?=$FILE_EXT)/
However when piping that together as part of a for-loop:
for t in $(ls *.fastq | sed -e /.+?(?=$READ1_EXT)/)
I get the error:
command substitution: line 14: syntax error near unexpected token `('
What have I done wrong?

Don't parse ls output , You could use bash parameter expansion to
achieve what you need
for t in *_SUFFIX.fastq
do
echo "${t%_SUFFIX.fastq}" #stips _SUFFIX.fastq part
done
References
See shell [ param expansion ].
On why [ not to parse ls output ].
Edit:
For working around repeated occurrences, you could do something like this :
Consider that you have two files of interest Test_R1.file & Test_R2.file and you expect Test to appear only once in the results do something like
declare -A arry # declaring an associative array
for t in Test_R*.file
do
arry["${t%_R*.file}"]=1
# stips _R(number).file part and makes it a key to arry
# Remember arry keys are unique.
# The assignment ie '=1' is not relevant here, you can assign any value
done
# We are all set to print the unique filenames
echo "${!arry[#]}"
# "${!arry[#]}" expands to the list of array indices (keys) for arry

You can do this using bash parameter expansion only, assuming persistent format of file names:
for file in *_SUFFIX.fastq; do echo "${file%_*}"; done
The for construct iterate over the .fastq files.
Example:
$ file=TEST_EXAMPLE_SUFFIX.fastq
$ echo "${file%_*}"
TEST_EXAMPLE

bash - multiple if clauses. What's wrong with this code?

I'm practicing writing small bash scripts. I have a habit of accidentally typing "teh" instead of "the". So now I want to start at a directory and go through all the files there to find how many occurrences of "teh" there are.
Here's what I have:
#!/bin/bash
for file in `find` ; do
if [ -f "${file}" ] ; then
for word in `cat "${file}"` ; do
if [ "${word}" = "teh" -o "${word}" = "Teh" ] ; then
echo "found one"
fi
done
fi
done
When I run it, I get
found one
found one
found one
found one
found one
found one
./findTeh: line 6: [: too many arguments
Why am I getting too many arguments. Am I not doing the if statement properly?
Thanks

The behavior of test with more than three arguments is, per POSIX, not well-defined and deprecated. That's because variable arguments can be treated as logical constructs with meaning to the test command itself in that case.
Instead, use the following, which will work on all POSIX shells, not only bash:
if [ "$word" = teh ] || [ "$word" = Teh ]; then
echo "Found one"
fi
...or, similarly, a case statement:
case $word in
[Tt]eh) echo "Found one" ;;
esac
Now, let's look at the actual standard underlying the test command:
>4 arguments:
The results are unspecified.
[OB XSI] [Option Start] On XSI-conformant systems, combinations of primaries and operators shall be evaluated using the precedence and associativity rules described previously. In addition, the string comparison binary primaries '=' and "!=" shall have a higher precedence than any unary primary. [Option End]
Note the OB flag: The use of a single test command with more than four arguments is obsolete, and is an optional part of the standard regardless (which not all shells are required to implement).
Finally, consider the following revision to your script, with various other bugs fixed (albeit using various bashisms, and thus not portable to all POSIX shells):
#!/bin/bash
# read a filename from find on FD 3
while IFS= read -r -d '' filename <&3; do
# read words from a single line, as long as there are more lines
while read -a words; do
# compare each word in the most-recently-read line with the target
for word in "${words[#]}"; do
[[ $word = [Tt]eh ]] && echo "Found one"
done
done <"$filename"
done 3< <(find . -type f -print0)
Some of the details:
By delimiting filenames with NULs, this works correctly with all possible filenames, including files with spaces or newlines in their names, which for file in $(find) does not.
Quoted array expansion, ie. for word in "${words[#]}", avoids glob expansion; with the old code, if * were a word in a file, then the code would subsequently be iterating over filenames in the current directory rather than over words in a file.
Using while read -a to read in a single line at a time both avoids the aforementioned glob expansion behavior, and also acts to constrain memory usage (if very large files are in play).

A common idiom to avoid this sort of problem is to add "x" on both sides of comparisons:
if [ "x${word}" = "xteh" -o "x${word}" = "xTeh" ] ; then

Bash: Extract user path (/home/userID) from read line containing full path and replace with "~"

I'm constructing a bash script file a bit at a time. I'm learning as I
go. But I can't find anything online to help me at this point: I need to
extract a substring from a large string, and the two methods I found using ${} (curly brackets) just won't work.
The first, ${x#y}, doesn't do what it should.
The second, ${x:p} or ${x:p:n}, keeps reporting bad substitution.
It only seems to work with constants.
The ${#x} returns a string length as text, not as a number, meaning it does not work with either ${x:p} or ${x:p:n}.
Fact is, it's seems really hard to get bash to do much math at all. Except for the for statements. But that is just counting. And this isn't a task for a for loop.
I've consolidated my script file here as a means of helping you all understand what it is that I am doing. It's for working with PureBasic source files, but you only have to change the grep's "--include=" argument, and it can search other types of text files instead.
#!/bin/bash
home=$(echo ~) # Copy the user's path to a variable named home
len=${#home} # Showing how to find the length. Problem is, this is treated
# as a string, not a number. Can't find a way to make over into
# into a number.
echo $home "has length of" $len "characters."
read -p "Find what: " what # Intended to search PureBasic (*.pb?) source files for text matches
grep -rHn $what $home --include="*.pb*" --exclude-dir=".cache" --exclude-dir=".gvfs" > 1.tmp
while read line # this checks for and reads the next line
do # the closing 'done' has the file to be read appended with "<"
a0=$line # this is each line as read
a1=$(echo "$a0" | awk -F: '{print $1}') # this gets the full path before the first ':'
echo $a0 # Shows full line
echo $a1 # Shows just full path
q1=${line#a1}
echo $q1 # FAILED! No reported problem, but failed to extract $a1 from $line.
q1=${a0#a1}
echo $q1 # FAILED! No reported problem, but failed to extract $a1 from $a0.
break # Can't do a 'read -n 1', as it just reads 1 char from the next line.
# Can't do a pause, because it doesn't exist. So just run from the
# terminal so that after break we can see what's on the screen .
len=${#a1} # Can get the length of $a1, but only as a string
# q1=${line:len} # Right command, wrong variable
# q1=${line:$len} # Right command, right variable, but wrong variable type
# q1=${line:14} # Constants work, but all $home's aren't 14 characters long
done < 1.tmp

The following works:
x="/home/user/rest/of/path"
y="~${x#/home/user}"
echo $y
Will output
~/rest/of/path
If you want to use "/home/user" inside a variable, say prefix, you need to use $ after the #, i.e., ${x#$prefix}, which I think is your issue.

The hejp I got was most appreciated. I got it done, and here it is:
#!/bin/bash
len=${#HOME} # Showing how to find the length. Problem is, this is treated
# as a string, not a number. Can't find a way to make over into
# into a number.
echo $HOME "has length of" $len "characters."
while :
do
echo
read -p "Find what: " what # Intended to search PureBasic (*.pb?) source files for text matches
a0=""; > 0.tmp; > 1.tmp
grep -rHn $what $home --include="*.pb*" --exclude-dir=".cache" --exclude-dir=".gvfs" >> 0.tmp
while read line # this checks for and reads the next line
do # the closing 'done' has the file to be read appended with "<"
a1=$(echo $line | awk -F: '{print $1}') # this gets the full path before the first ':'
a2=${line#$a1":"} # renove path and first colon from rest of line
if [[ $a0 != $a1 ]]
then
echo >> 1.tmp
echo $a1":" >> 1.tmp
a0=$a1
fi
echo " "$a2 >> 1.tmp
done < 0.tmp
cat 1.tmp | less
done
What I don't have yet is an answer as to whether variables can be used in place of constants in the dollar-sign, curly brackets where you use colons to mark that you want a substring of that string returned, if it requires constants, then the only choice might be to generate a child scriot using the variables, which would appear to be constants in the child, execute there, then return the results in an environmental variable or temporary file. I did stuff like that with MSDOS a lot. Limitation here is that you have to then make the produced file executable as well using "chmod +x filename". Or call it using "/bin/bash filename".
Another bash limitation found it that you cannot use "sudo" in the script without discontinuing execution of the present script. I guess a way around that is use sudo to call /bin/bash to call a child script that you produced. I assume then that if the child completes, you return to the parent script where you stopped at. Unless you did "sudo -i", "sudo -su", or some other variation where you become super user. Then you likely need to do an "exit" to drop the super user overlay.
If you exit the child script still as super user, would typing "exit" but you back to completing the parent script? I suspect so, which makes for some interesting senarios.
Another question: If doing a "while read line", what can you do in bash to check for a keyboard key press? The "read" option is already taken while in this loop.

Iterate over all files in a directory, or only specified files in sh

I have a directory config with the following file listing:
$ ls config
file one
file two
file three
I want a bash script that will, when given no arguments, iterate over all those files; when given names of files as arguments, I want it to iterate over the named files.
#!/bin/sh
for file in ${#:-config/*}
do
echo "Processing '$file'"
done
As above, with no quotes around the list term in the for loop, it produces the expected output in the no-argument case, but breaks when you pass an argument (it splits the file names on spaces.) Quoting the list term (for file in "${#:-config/*}") works when I pass file names, but fails to expand the glob if I don't.
Is there a way to get both cases to work?

For a simpler solution, just modify your IFS variable
#!/bin/bash
IFS=''
for file in ${#:-config/*}
do
echo "Processing '$file'"
done
IFS=$' \n\t'
The $IFS is a default shell variable that lists all the separators used by the shell. If you remove the space from this list, the shell won't split on space anymore. You should set it back to its default value after you function so that it doesn't cause other functions to misbehave later in your script
NOTE: This seems to misbehave with dash (I used a debian, and #!/bin/sh links to dash). If you use an empty $IFS, args passed will be returned as only 1 file. However, if you put some random value (i.e. IFS=':'), the behaviour will be the one you wanted (except if there is a : in your files name)
This works fine with #!/bin/bash, though

Set the positional parameters explicitly if none are given; then the for loop is the same for both cases:
[ $# -eq 0 ] && set -- config/*
for file in "$#"; do
echo "Processing '$file'"
done

Put the processing code in a function, and then use different loops to call it:
if [ $# -eq 0 ]
then for file in config/*
do processing_func "$file"
done
else for file in "$#"
do processing_func "$file"
done
fi

List all the files with prefixes from a for loop using Bash

Here is a small[but complete] part of my bash script that finds and outputs all files in mydir if the have the prefix from a stored array. Strange thing I notice is that this script works perfectly if I take out the "-maxdepth 1 -name" from the script else it only gives me the files with the prefix of the first element in the array.
It would be of great help if someone explained this to me. Sorry in advance if there is some thing obviously silly that I'm doing. I'm relatively new to scripting.
#!/bin/sh
DIS_ARRAY=(A B C D)
echo "Array is : "
echo ${DIS_ARRAY[*]}
for dis in $DIS_ARRAY
do
IN_FILES=`find /mydir -maxdepth 1 -name "$dis*.xml"`
for file in $IN_FILES
do
echo $file
done
done
Output:
/mydir/Abc.xml
/mydir/Ab.xml
/mydir/Ac.xml
Expected Output:
/mydir/Abc.xml
/mydir/Ab.xml
/mydir/Ac.xml
/mydir/Bc.xml
/mydir/Cb.xml
/mydir/Dc.xml

The loop is broken either way. The reason why
IN_FILES=`find mydir -maxdepth 1 -name "$dis*.xml"`
works, whereas
IN_FILES=`find mydir "$dis*.xml"`
doesn't is because in the first one, you have specified -name. In the second one, find is listing all the files in mydir. If you change the second one to
IN_FILES=`find mydir -name "$dis*.xml"`
you will see that the loop isn't working.
As mentioned in the comments, the syntax that you are currently using $DIS_ARRAY will only give you the first element of the array.
Try changing your loop to this:
for dis in "${DIS_ARRAY[#]}"
The double quotes around the expansion aren't strictly necessary in your specific case, but required if the elements in your array contained spaces, as demonstrated in the following test:
#!/bin/bash
arr=("a a" "b b")
echo using '$arr'
for i in $arr; do echo $i; done
echo using '${arr[#]}'
for i in ${arr[#]}; do echo $i; done
echo using '"${arr[#]}"'
for i in "${arr[#]}"; do echo $i; done
output:
using $arr
a
a
using ${arr[#]}
a
a
b
b
using "${arr[#]}"
a a
b b
See this related question for further details.

#TomFenech's answer solves your problem, but let me suggest other improvements:
#!/usr/bin/env bash
DIS_ARRAY=(A B C D)
echo "Array is : "
echo ${DIS_ARRAY[*]}
for dis in "${DIS_ARRAY[#]}"
do
for file in "/mydir/$dis"*.xml
do
if [ -f "$file" ]; then
echo "$file"
fi
done
done
Your shebang line references sh, but your question is tagged bash - unless you need POSIX compliance, use a bash shebang line to take advantage of all that bash has to offer
To match files located directly in a given directory (i.e., if you don't need to traverse an entire subtree), use a glob (filename pattern) and rely on pathname expansion as in my code above - no need for find and command substitution.
Note that the wildcard char. * is UNquoted to ensure pathname expansion.
Caveat: if no matching files are found, the glob is left untouched (assuming the nullglob shell option is OFF, which it is by default), so the loop is entered once, with an invalid filename (the unexpanded glob) - hence the [ -f "$file" ] conditional to ensure that an actual match was found (as an aside: using bashisms, you could use [[ -f $file ]] instead).

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Why is bash array-splitting this line incorrectly? - bash

Related

Regex - extract up until a match and not including that match

bash - multiple if clauses. What's wrong with this code?

Bash: Extract user path (/home/userID) from read line containing full path and replace with "~"

Iterate over all files in a directory, or only specified files in sh

List all the files with prefixes from a for loop using Bash

Categories

Resources