Expand shell glob in variable into array - bash

In a bash script I have a variable containing a shell glob expression that I want to expand into an array of matching file names (nullglob turned on), like in
pat='dir/*.config'
files=($pat)
This works nicely, even for multiple patterns in $pat (e.g., pat="dir/*.config dir/*.conf), however, I cannot use escape characters in the pattern. Ideally, I would like to able to do
pat='"dir/*" dir/*.config "dir/file with spaces"'
to include the file *, all files ending in .config and file with spaces.
Is there an easy way to do this? (Without eval if possible.)
As the pattern is read from a file, I cannot place it in the array expression directly, as proposed in this answer (and various other places).
Edit:
To put things into context: What I am trying to do is to read a template file line-wise and process all lines like #include pattern. The includes are then resolved using the shell glob. As this tool is meant to be universal, I want to be able to include files with spaces and weird characters (like *).
The "main" loop reads like this:
template_include_pat='^#include (.*)$'
while IFS='' read -r line || [[ -n "$line" ]]; do
if printf '%s' "$line" | grep -qE "$template_include_pat"; then
glob=$(printf '%s' "$line" | sed -nrE "s/$template_include_pat/\\1/p")
cwd=$(pwd -P)
cd "$targetdir"
files=($glob)
for f in "${files[#]}"; do
printf "\n\n%s\n" "# FILE $f" >> "$tempfile"
cat "$f" >> "$tempfile" ||
die "Cannot read '$f'."
done
cd "$cwd"
else
echo "$line" >> "$tempfile"
fi
done < "$template"

Using the Python glob module:
#!/usr/bin/env bash
# Takes literal glob expressions on as argv; emits NUL-delimited match list on output
expand_globs() {
python -c '
import sys, glob
for arg in sys.argv[1:]:
for result in glob.iglob(arg):
sys.stdout.write("%s\0" % (result,))
' _ "$#"
}
template_include_pat='^#include (.*)$'
template=${1:-/dev/stdin}
# record the patterns we were looking for
patterns=( )
while read -r line; do
if [[ $line =~ $template_include_pat ]]; then
patterns+=( "${BASH_REMATCH[1]}" )
fi
done <"$template"
results=( )
while IFS= read -r -d '' name; do
results+=( "$name" )
done < <(expand_globs "${patterns[#]}")
# Let's display our results:
{
printf 'Searched for the following patterns, from template %q:\n' "$template"
(( ${#patterns[#]} )) && printf ' - %q\n' "${patterns[#]}"
echo
echo "Found the following files:"
(( ${#results[#]} )) && printf ' - %q\n' "${results[#]}"
} >&2

Related

bash for loop with same order as GNU "ls -v" ("version-number" sort)

In a bash script I want to do a typical "for file in somedir" but I want the files to be processed in the same order that "ls -v" returns them. I know the downfalls of using "ls" as a function. Is there some way to replicate "-v" without using "ls"? Thanks.
Assuming that this is "version number" sort order, this is also implemented by GNU sort. Thus, on a GNU platform:
somedir=/foo
while IFS= read -r -d '' filename; do
printf 'Processing file: %q\n' "$filename"
done < <(set -- "$somedir"/*; [[ -e $1 || -L $1 ]] && printf '%s\0' "$#" | sort -z -V)
If you really want to use a for loop rather than a while loop, parse into an array and iterate over that:
files=( )
while IFS= read -r -d '' filename; do
files+=( "$filename" )
done < <(set -- "$somedir"/*; [[ -e $1 || -L $1 ]] && printf '%s\0' "$#" | sort -z -V)
for filename in "${files[#]}"; do
printf 'Processing file: %q\n' "$filename"
done
To explain some of the magic above:
In < <(...), <(...) is a process substitution. It's replaced with a filename which, when read from, will return the output of the code enclosed. Thus, < <(...) will put that process substitution's output as the input to the while read loop. This loop form is described in BashFAQ #1. The reasons to use this kind of redirection instead of piping into the loop are given in BashFAQ #24.
set -- "$somedir"/* replaces the argument list within the current context (that context being the subshell running the process substitution!) with the results of "$somedir"/*; thus, (non-hidden, by default) contents of the directory named in the variable somedir.
[[ -e $1 || -L $1 ]] is true only if that glob expanded to at least one item; if it remained * (and no actual filesystem object exists by that name), gating output on this condition prevents the process substitution from emitting any output.
sort -z tells sort to delimit elements in both input and output with NULs -- a character that isn't allowed to exist in filenames.

Bash (split) file name comparison fails

In my directory I have files (*fastq.gz.fasta) and directories, whose names contain the filenames (*fastq.gz.fasta-blastdb):
IVC6_Meino.clust.gz.fasta-blastdb
IVC5_Mehiv.clust.gz.fasta-blastdb
....
IVC6_Meino.clust.gz.fasta
IVC5_Mehiv.clust.gz.fasta
....
In a bash script I want to compare the filenames with the direcories using the cut option on the latter to extract only the filename part. If those two names match I want to do further stuff (for now echo match or no match respectively).
I have written the following piece of code:
#!/bin/bash
for file in *.fasta
do
for db in *-blastdb
do
echo $file, $db | cut -d '-' -f 1
if [[ $file = "$db | cut -d '-' -f 1" ]]; then
echo "match"
else
echo "no match"
fi
done
done
But it does not detect matches. The output looks like this:
...
IVC6_Meino.clust.gz.fasta, IIIA11_Meova.clust.gz.fasta
no match
IVC6_Meino.clust.gz.fasta, IVC5_Mehiv.clust.gz.fasta
no match
IVC6_Meino.clust.gz.fasta, IVC6_Meino.clust.gz.fasta
no match
The last line should read match as you can see, the strings look the same.
What am i missing?
You can use parameter expansion to do this more easily:
for file in *.fasta
do
for db in *-blastdb
do
echo "$file", "$db"
if [[ "${file%%.fasta}" = "${db%%.fasta-blastdb}" ]]; then
echo "match"
else
echo "no match"
fi
done
done
If you want to fix yours, the problem is the use of $db | cut -d '-' -f 1 With echo it appears that echo is printing the pipe. It isn't. cut is printing. When you do [[ $file = "$db | cut -d '-' -f 1" ]] it is equivalent to [[ $file = [return code from last pipe component] ]]
You need to use the $(..) shell construct to capture the output of the pipe and you need to echo to get the contents of $db to start the pipe. You should quote "$db" so you do not have word splitting or globbing from the contents of the variable.
Like so:
for file in *.fasta
do
for db in *-blastdb
do
ts=$(echo "$db" | cut -d '-' -f 1)
echo "$file", "$ts"
if [[ "$file" = "$ts" ]]; then
echo "match"
else
echo "no match"
fi
done
done # this works I think -- not tested...
Please be careful with your quoting with Bash and liberally use ShellCheck.
The structure you have is also not the most efficient. You will loop over the *-blastdb glob once for every file in *-blastdb. If you have a lot of files, that could get really slow.
To solve that, you could rewrite this loop with Bash arrays (best if you have Bash 4+) or use awk:
ext1=.fasta
ext2=.fasta-blastdb
awk 'FNR==NR{
s=$0
sub("\\"ext1"$","",s)
seen[s]=$0
next}
{
s=$0
sub("\\"ext2"$","",s)
if (s in seen)
print seen[s], $0
}
' ext1="$ext1" ext2="$ext2" <(for fn in *$ext1; do echo "$fn"; done) <(for fn in *$ext2; do echo "$fn"; done)
Each glob is only executing once and awk is using an array to test if the basenames are the same.
Best

bash Shell: lost first element data partially

Using bash shell:
I am trying to read a file line by line.
and every line contains two meaning full file names delimited by "``"
file:1 image_config.txt
bbbbb.mp4``thumb/hashdata.gif
bbbbb.mp4``thumb/hashdata2.gif
Shell Script
#!/bin/bash
filename="image_config.txt"
while IFS='' read -r line || [[ -n "$line" ]]; do
IFS='``' read -r -a array <<< "$line"
if [ "$line" = "" ]; then
echo lineempty
else
file=${array[0]}
hash=${array[2]}
echo $file$hash;
output=$(ffmpeg -v warning -ss 2 -t 0.8 -i $file -vf scale=200:-1 -gifflags +transdiff -y $hash);
echo $output;
# echo ${array[0]}${array[1]}${array[2]}
fi;
done < "$filename"
first time executed successfully but when loop executes second time.
variable file lost bbbbb from bbbbb.mp4
and following output comes out
Output :
user#domain [~/public_html/Videos]$ sh imager.sh
bbbbb.mp4thumb/hashdata.gif
.mp4thumb/hashdata2.gif
.mp4: No such file or directory
lineempty
Please check out Bash FAQ 89 - I'm using a loop which runs once per line of input but it only seems to run once; everything after the first line is ignored? which seems to be helpful in your case.
Aside:
There is no point in using the same character twice in IFS.
IFS=\`
Is enough.
Check out this:
var='abc``def'
IFS=\`\` read -ra arr <<< "$var"
printf '<%s>\n' "${arr[#]}"
Output:
<abc>
<>
<def>
As you can see, arr[0] is abc, arr[1] is empty and arr[2] is def, and not arr[0] is abc and arr[1] is def as one might expect.
Taken from the IFS wiki of Greycat and Lhunath Bash Guide :
The IFS variable is used in shells (Bourne, POSIX, ksh, bash) as the input field separator (or internal field separator). Essentially, it is a string of special characters which are to be treated as delimiters between words/fields when splitting a line of input.
Here is how you could do differently, avoiding a read in the read:
#!/bin/bash
filename="image_config.txt"
while IFS='' read -r line || [[ -n "$line" ]]; do
if [ "$line" = "" ]; then
echo lineempty
else
file=$( echo ${line} | awk -F \` ' { print $1 } ' )
hash=$( echo ${line} | awk -F \` ' { print $3 } ' )
echo $file$hash;
output=$(ffmpeg -v warning -ss 2 -t 0.8 -i $file -vf scale=200:-1 -gifflags +transdiff -y $hash);
echo $output;
fi;
done < "$filename"

Bash script to remove lines containing any of a list of words

I have a large config file that I use to define variables for a script to pull from it, each defined on a single line. It looks something like this:
var val
foo bar
foo1 bar1
foo2 bar2
I have gathered a list of out of date variables that I want to remove from the list. I could go through it manually, but I would like to do it with a script, which would be at least more stimulating. The file that contains the vlaues may contain multiple instances. The idea is to find the value, and if it's found, remove the entire line.
Does anyone know if this is possible? I know sed does this but I do not know how to make it use a file input.
#!/bin/bash
shopt -s extglob
REMOVE=(foo1 foo2)
IFS='|' eval 'PATTERN="#(${REMOVE[*]})"'
while read -r LINE; do
read A B <<< "$LINE"
[[ $A != $PATTERN ]] && echo "$LINE"
done < input_file.txt > output_file.txt
Or (Use with a copy first)
#!/bin/bash
shopt -s extglob
FILE=$1 REMOVE=("${#:2}")
IFS='|' eval 'PATTERN="#(${REMOVE[*]})"'
SAVE=()
while read -r LINE; do
read A B <<< "$LINE"
[[ $A != $PATTERN ]] && SAVE+=("$LINE")
done < "$FILE"
printf '%s\n' "${SAVE[#]}" > "$FILE"
Running with
bash script.sh your_config_file pattern1 pattern2 ...
Or
#!/bin/bash
shopt -s extglob
FILE=$1 PATTERNS_FILE=$2
readarray -t REMOVE < "$PATTERNS_FILE"
IFS='|' eval 'PATTERN="#(${REMOVE[*]})"'
SAVE=()
while read -r LINE; do
read A B <<< "$LINE"
[[ $A != $PATTERN ]] && SAVE+=("$LINE")
done < "$FILE"
printf '%s\n' "${SAVE[#]}" > "$FILE"
Running with
bash script.sh your_config_file patterns_file
Here's one with sed. Add words to the array. Then use
./script target_filename
(assuming you put the following in a file called script). (Not very efficient). I think it might be more efficient if we concat the words and put it in the regex like bbonev did
#!/bin/bash
declare -a array=("foo1" "foo2")
for i in "${array[#]}";
do
sed -i "/^${i}\s.*/d" $1
done
It's actually even simpler using file input
If you have a word file
word1
word2
word3
.....
then the following will do the job
#!/bin/bash
while read i;
do
sed -i "/^${i}\s.*/d" $2
done <$1
usage:
./script wordlist target_file

Find lines containing all keywords in bash script

Essentially, I would like something that behaves similarly to:
cat file | grep -i keyword1 | grep -i keyword2 | grep -i keyword3
How can I do this with a bash script that takes a variable-length list of keyword arguments? The script should do a case-insensitive match of lines containing all keywords.
Use this as a script
#! /bin/bash
awk -v IGNORECASE=1 -f <(
P=; for k; do [ -z "$P" ] && P="/$k/" || P="$P&&/$k/"; done
echo "$P{print}"
)
and invoke it as
script.sh keyword1 keyword2 keyword3 < file
I don't know if this is efficient, and I think this is ugly, also there might be some utility for that, but:
#!/bin/bash
unset keywords matchlist
keywords=("$#")
for kw in "${keywords[#]}"; do
matchlist="$matchlist /$kw/ &&"
done
matchlist="${matchlist% &&}"
# awk "$matchlist { print; }" < <(tr '[:upper:]' '[:lower:]' <file)
awk "$matchlist { print; }" file
And yes, it needs some robustness regarding special characters and stuff. It's just to show the idea.
Give this a try:
shopt -s nocasematch
keywords="keyword1|keyword2|keyword3"
while read line; do [[ $line =~ $keywords ]] && echo $line; done < file
Edit:
Here's a version that tests for all keywords being present, not just any:
keywords=(keyword1 keyword2 keyword3) # or keywords=("$#")
qty=${#keywords[#]}
while read line
do
count=0
for keyword in "${keywords[#]}"
do
[[ "$line" =~ $keyword ]] && (( count++ ))
done
if (( count == qty ))
then
echo $line
fi
done < textlines
Found a way to do this with grep.
KEYWORDS=$#
MATCH_EXPR="cat file"
for keyword in ${KEYWORDS};
do
MATCH_EXPR="${MATCH_EXPR} | grep -i ${keyword}"
done
eval ${MATCH_EXPR}
you can use bash 4.0++
shopt -s nocasematch
while read -r line
do
case "$line" in
*keyword1*) f=1;;&
*keyword2*) g=1;;&
*keyword3*)
[ "$f" -eq 1 ] && [ "$g" -eq 1 ] && echo $line;;
esac
done < "file"
shopt -u nocasematch
or gawk
gawk '/keyword/&&/keyword2/&&/keyword3/' file
I'd do it in Perl.
For finding all lines that contain at least one of them:
perl -ne'print if /(keyword1|keyword2|keyword3)/i' file
For finding all lines that contain all of them:
perl -ne'print if /keyword1/i && /keyword2/i && /keyword3/i' file
Here is a script called search.sh in bash that will search lines within a file or folder for all keywords specified:
#!/bin/bash
if [ $# -lt 2 ]; then
echo "[-] $0 file_to_search/folder_to_search keyword1 keyword2 keyword3 ..."
exit
fi
all_args="$#"
i=0
results="" # this will store the cumulative results from each keyword search
for arg in $all_args; do
if [ $i -eq 0 ]; then
# first argument is the file/folder to search
file_to_search="$arg"
i=$(($i + 1))
elif [ $i -eq 1 ]; then
# search the file/folder with first keyword (first search)
results=`grep --color=always -r -n -i "$arg" "$file_to_search"`
i=$(($i + 1))
else
# now keep searching the results from first search for other keywords
results=`echo "$results" | grep --color=always -i "$arg"`
i=$(($i + 1))
fi
done
echo "$results"
Example invocation of script above will search the 'tools.txt' file for 'python' and 'jira' keywords:
./search.sh tools.txt python jira

Resources