Bash: Extract filenames by pattern and insert them into an array

Bash: Extract filenames by pattern and insert them into an array - bash

I have a list of files within a folder and I want to extract the filenames with the following pattern and insert them into array.
The pattern is that the file name always begin with either "MCABC_" or "MCBBC_" and then a date and then ends with ".csv"
An example would be "MCABC_20110101.csv" , ""MCBBC_20110304.csv"
Right now, I can only come up with the following solution which works but it is not ideal .
ls | grep -E "MCABC_[ A-Za-z0-9]*|MC221_[ A-Za-z0-9]*"
I read that it is bad to use ls. And I should use glob.
I am completely new to bash scripting. How could I extract the filenames with the patterns above and insert it into an array ? Thanks.
Update: Thanks for the answers. Really appreciate your answers. I have the following code
#!/bin/bash
shopt -s nullglob
files=(MC[1-2]21_All_[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9].csv)
echo ${#files[*]}
echo ${files[0]}
And this is the result that I got when I ran bash testing.sh.
: invalid shell option namesh: line 2: shopt: nullglob
1
(MC[1-2]21_All_[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9].csv)
However, if I just ran on the command line files=(MC[1-2]21_All_[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9].csv) and then echo ${files[*]}, I manage to get the output:
MC121_All_20180301.csv MC121_All_20180302.csv MC121_All_20180305.csv MC221_All_20180301.csv MC221_All_20180302.csv MC221_All_20180305.csv
I am very confused. Why is this happening ? (Pls note that I running this on ubuntu within window 10.)

I think you can just populate the array directly using a glob:
files=( MC[AB]BC_[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9].csv )
The "date" part can certainly be improved, since it matches completely invalid dates like 98765432, but maybe that's not a problem.

This will work in BASH.
#!/bin/bash
for file_name in M*
do
line="$line $( printf "${file_name%_*}")"
done
array=( $line )
echo "${array[2]}"
Another way :
#!/bin/bash
declare -a files_array
i=0
for file_name in M*
do
files_array[$i]="$( printf "${file_name%_*}")"
(( i++ ))
done
echo "${files_array[2]}"
Regards!

Related

How to read a file without knowing its name but knowing its extension using shell script [duplicate]

This question already has answers here:
Test whether a glob has any matches in Bash
(22 answers)
Closed 1 year ago.
I am trying to change my code to read a .txt file from a directory. That directory contains only one .txt file but I do not know its name beforehand. I only know it is a .txt file. Is it possible to do that using shell script?
Below is my current code to read a file but I have to manually specify the file name.
#!/bin/bash
declare -a var
filename='file.txt'
let count=0
while read line; do
var[$count]=$line
((count++))
done < $filename

If you are 100% sure that there is only one matching file just replace:
done < $filename
by:
done < *.txt
Of course this will fail if you have zero or more than one matching file. So, it would be better to test first. For instance with:
tmp=$(shopt -p nullglob || true)
shopt -s nullglob
declare -a filename=(*.txt)
if (( ${#filename[#]} != 1 )); then
printf 'error: zero or more than one *.txt file\n'
else
declare -a var
let count=0
while read line; do
var[$count]=$line
((count++))
done < "${filename[0]}"
fi
eval "$tmp"
The shopt stuff stores the current status of the nullglob option in variable tmp, enables the option, and restores the initial status at the end. Enabling nullglob is needed here if there is a risk that you have zero *.txt file. Without nullglob that would store literal string *.txt in array filename.
Your loop could be optimized a bit:
declare -a var
while IFS= read -r line; do
var+=("$line")
done < "${filename[0]}"
IFS= is needed to preserve leading and trailing spaces in the read lines. Remove it if this is not what you want. The -r option of read preserves backslashes in the read line. Remove it if this is not what you want. The += assignment automatically adds an element at the end of an indexed array. No need to count. If you want to know how many elements you array contains just use ${#var[#]} as we did to test the length of the filename array.
Note that storing the content of a text file in a bash array is better done with mapfile if your bash version is recent enough. See this other question, for instance.

Parameter expansion can be triggered in assignments as well!
filename=$(echo -n *.txt)
If you want to make it foolproof (catching the case that the number of matching files is not 1), assign to an array and check its size:
farr=( *.txt )
if (( ${#farr[*]} != 1 ))
then
echo You lied to may when you said that there is exactly on .txt file
else
filename=${farr[0]}
fi

You can use this. The head command is used in this context to ensure one result.
filename=$(ls *.txt | head -n 1)

bash script rename multiple files [duplicate]

This question already has answers here:
Rename filename to another name
(3 answers)
Closed 7 years ago.
Let´s say I have a bunch of files named something like this: bsdsa120226.nai bdeqa140223.nai and I want to rename them to 120226.nai 140223.nai. How can i achieve this using the script below?
#!/bin/bash
name1=`ls *nai*`
names=`ls *nai*| grep -Po '(?<=.{5}).+'`
for i in $name1
do
for y in $names
do
mv $i $y
done
done
Solution:
name1=`ls *nai*`
for i in $name1
do
y=$(echo "$i" | grep -Po '(?<=.{5}).+')
mv $i $y
done

This:
#!/bin/bash
shopt -s extglob nullglob
for file in *+([[:digit:]]).nai; do
echo mv -nv -- "$file" "${file##+([^[:digit:]])}"
done
Remove the echo if you're happy with the mv commands.
Note. This solution does not assume that there are 5 leading characters to delete. It will delete all the leading non-numeric characters.

Using only bash, you could do this:
for file in *nai* ; do
echo mv -- "$file" "${file:5}"
done
(Remove the echo when satisfied with the output.)
Avoid ls in scripts, except for displaying information. Use plain globbing instead.
See also How do I do string manipulations in bash? for more string manipulation techniques.
Your script can't work with that structure: if you have 5 files, it will call mv five times for the first file (once for each element in the second list), five times for the second, etc. You'd need to iterate over the two sets of names in lockstep. (It also doesn't deal with things like whitespace in filenames.)

You would be better off using rename (prename on some systems) since that allows you to use Perl regular expressions to do the renaming, along the lines of:
prename 's/^.{5}//' *.nai
The reason your script is not behaving is that, for every source file, you're attempting to rename it to every target file.
If you need to limit yourself to using that script, you need to work out the single target file for each source file, something like:
#!/bin/bash
for i in *.nai; do
y=$(echo "$i" | cut -c6-)
mv "$i" "$y"
done

If your system has rename tool, it's better to go with the simple rename command,
rename 's/^.{5}//' *.nai
It just remove the first 5 characters from the file name.
OR
for i in *.nai; do mv "$i" $(grep -oP '(?<=^.{5}).+' <<< "$i"); done

How to pass a variable containing space in bash as a loop argument?

good day,
I am creating a script to read one level of subfolders/directories of a path. The script is like so:
#loopdir.sh
for i in `ls -d $1`
do
echo $i
done
But when I tried to use it to read /media/My\ Passport/, it reads the argument as two different dirs:
$ ./loopdir.sh /media/My\ Passport/
ls: cannot access /media/My: No such file or directory
Passport/

Try doing this instead (my understanding is that you want to list subdirs, Am I right?) :
for i in "$1"/*; do
echo "${i%/}"
done
Parsing ls output is a bad idea : it's is a tool for interactively looking at file information. Its output is formatted for humans and will cause bugs in scripts. Use globs or find instead. Understand why: http://mywiki.wooledge.org/ParsingLs
And (last but not least) : USE MORE QUOTES! They are vital. Also, learn the difference between ' and " and `. See http://mywiki.wooledge.org/Quotes and http://wiki.bash-hackers.org/syntax/words

You need to surround your $i with quotes: echo "$i"

don't $i
this will break by space
using "$i"

Performance with bash loop when renaming files

Sometimes I need to rename some amount of files, such as add a prefix or remove something.
At first I wrote a python script. It works well, and I want a shell version. Therefore I wrote something like that:
$1 - which directory to list,
$2 - what pattern will be replacement,
$3 - replacement.
echo "usage: dir pattern replacement"
for fname in `ls $1`
do
newName=$(echo $fname | sed "s/^$2/$3/")
echo 'mv' "$1/$fname" "$1/$newName&&"
mv "$1/$fname" "$1/$newName"
done
It works but very slowly, probably because it needs to create a process (here sed and mv) and destroy it and create same process again just to have a different argument. Is that true? If so, how to avoid it, how can I get a faster version?
I thought to offer all processed files a name (using sed to process them at once), but it still needs mv in the loop.
Please tell me, how you guys do it? Thanks. If you find my question hard to understand please be patient, my English is not very good, sorry.
--- update ---
I am sorry for my description. My core question is: "IF we should use some command in loop, will that lower performance?" Because in for i in {1..100000}; do ls 1>/dev/null; done creating and destroying a process will take most of the time. So what I want is "Is there any way to reduce that cost?".
Thanks to kev and S.R.I for giving me a rename solution to rename files.

Every time you call an external binary (ls, sed, mv), bash has to fork itself to exec the command and that takes a big performance hit.
You can do everything you want to do in pure bash 4.X and only need to call mv
pat_rename(){
if [[ ! -d "$1" ]]; then
echo "Error: '$1' is not a valid directory"
return
fi
shopt -s globstar
cd "$1"
for file in **; do
echo "mv $file ${file//$2/$3}"
done
}

Simplest first. What's wrong with rename?
mkdir tstbin
for i in `seq 1 20`
do
touch tstbin/filename$i.txt
done
rename .txt .html tstbin/*.txt
Or are you using an older *nix machine?

To avoid re-executing sed on each file, you could instead setup two name streams, one original, and one transformed, then sip from the ends:
exec 3< <(ls)
exec 4< <(ls | sed 's/from/to/')
IFS=`echo`
while read -u3 orig && read -u4 to; do
mv "${orig}" "${to}";
done;

I think you can store all of file names into a file or string, and use awk and sed do it once instead of one by one.

Bash: trim a parameter from both ends

Greetings!
This are well know Bash parameter expansion patterns:
${parameter#word}, ${parameter##word}
and
${parameter%word}, ${parameter%%word}
I need to chop one part from the beginning and anoter part from the trailing of the parameter. Could you advice something for me please?

If you're using Bash version >= 3.2, you can use regular expression matching with a capture group to retrieve the value in one command:
$ path='/xxx/yyy/zzz/ABC/abc.txt'
$ [[ $path =~ ^.*/([^/]*)/.*$ ]]
$ echo ${BASH_REMATCH[1]}
ABC
This would be equivalent to:
$ path='/xxx/yyy/zzz/ABC/abc.txt'
$ path=$(echo "$path" | sed 's|^.*/\([^/]*\)/.*$|\1|p')
$ echo $path
ABC

I don't know that there's an easy way to do this without resorting to sub-shells, something you probably want to avoid for efficiency. I would just use:
> xx=hello_there
> yy=${xx#he}
> zz=${yy%re}
> echo ${zz}
llo_the
If you're not fussed about efficiency and just want a one-liner:
> zz=$(echo ${xx%re} | sed 's/^he//')
> echo ${zz}
llo_the
Keep in mind that this second method starts sub-shells - it's not something I'd be doing a lot of if your script has to run fast.

This solution uses what Andrey asked for and it does not employ any external tool. Strategy: Use the % parameter expansion to remove the file name, then use the ## to remove all but the last directory:
$ path=/path/to/my/last_dir/filename.txt
$ dir=${path%/*}
$ echo $dir
/path/to/my/last_dir
$ dir=${dir##*/}
$ echo $dir
last_dir

I would highly recommend going with bash arrays as their performance is just over 3x faster than regular expression matching.
$ path='/xxx/yyy/zzz/ABC/abc.txt'
$ IFS='/' arr=( $path )
$ echo ${arr[${#arr[#]}-2]}
ABC
This works by telling bash that each element of the array is separated by a forward slash / via IFS='/'. We access the penultimate element of the array by first determining how many elements are in the array via ${#arr[#]} then subtracting 2 and using that as the index to the array.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Bash: Extract filenames by pattern and insert them into an array - bash

I think you can just populate the array directly using a glob: files=( MC[AB]BC_[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9].csv ) The "date" part can certainly be improved, since it matches completely invalid dates like 98765432, but maybe that's not a problem.

Related

How to read a file without knowing its name but knowing its extension using shell script [duplicate]

bash script rename multiple files [duplicate]

How to pass a variable containing space in bash as a loop argument?

Performance with bash loop when renaming files

Bash: trim a parameter from both ends

Categories

Resources