looping over a few file extensions in bash

looping over a few file extensions in bash - bash

I have a list of files in a folder and I want to just work on a few of them. The folder contains files with file.qc, file.qc.gz file.qc.stat file.qc.count and so on.
I want to write a loop in bash that will open only the file.qc and file.qc.gz, while ignoring other file extensions (such as qc.stats or qc.count)

Just specify multiple globs in your loop:
#!/bin/bash
# Gracefully cases where there are no matches
shopt -s nullglob
for f in *.qc *.qc.gz
do
echo "Found: $f"
done
You can also write this shorter as *.qc{,.gz}, which expands to the same thing.

Related

Cycle through a list of terms and batch rename files

EDIT: In the course of working on and reediting this question, I was able to get this to work. However, I'm sure there's a better way to do it, so I'm leaving it up to hear from those more experienced.
Periodically I need to reproduce several dozen copies of a few files. For example, given:
company_a_results_30d.py
company_a_results_90d.py
company_a_results_120d.py
company_a_results_all_time.py
I need to make copies where company_a is replaced with company_b, company_c....etc. (The next step is to find and replace a number of terms within the files, but this I have managed to do with a perl script.)
I'm sure this should be possible with a bash script and mv, but I haven't quite got the hang of it. Something like:
#!/usr/bin/env bash
my_array=(company_b company_c company_d)
for i in "${my_array[#]}"
do
for file in *.py
do
cp "$file" "${file/company_a/$i}"
done
done
I'd prefer a solution compatible with zsh, which is what I use.

bash
Slightly modified from the OP's answer:
#!/usr/bin/env bash
set -x # So you can see what's happening - feel free to omit
company_a_files=(company_a*.py) # <== Save the list of files first
my_array=(company_b company_c company_d)
for i in "${my_array[#]}"
do
for file in "${company_a_files[#]}" # <== Use the saved list
do
cp "$file" "${file/company_a/$i}"
done
done
When the inner loop in the OP's answer runs for file in *.py, the glob will pick up whatever company_b &c. files have already been created. So you wind up with a lot of set -x output like:
+ cp company_b_1.py company_b_1.py
cp: 'company_b_1.py' and 'company_b_1.py' are the same file
Instead, save the glob of company_a files into a shell array first, and then
loop over that array.
perl
As a one-liner for Perl 5.14+:
perl -MFile::Copy=copy -E 'for my $file (#ARGV) { copy $file, $file =~ s/company_a/$_/r foreach qw(company_b company_c company_d) }' company_a*.py
The Perl version switches the loop order compared to the bash version. For each file given on the command line (the for ... #ARGV), it copies from that file to each name-modified file in turn (the foreach).
$file =~ s/company_a/$_/r is a non-destructive (/r) replace in $file (the filename) that changes company_a to $_ (the current value from foreach).

This was the solution I came up with:
#!/usr/bin/env bash
my_array=(company_b company_c company_d)
for i in "${my_array[#]}"
do
for file in *.py
do
cp "$file" "${file/company_a/$i}"
done
done

How do I rename multiple files before the extension in linux?

I want to take a group of files with names like 123456_1_2.mpg and turn it into 123456.mpg how can I do this using terminal commands?

To loop over all the available files you can use a for loop over the file names of the form ??????_?_?.mpg.
To rename the files you can retain the shortest match of a pattern from the beginning of the string using ${MYVAR%%pattern} without using any external command.
This said, your code should look like:
#!/bin/bash
shopt -s nullglob # do nothing if no matches found
for file in ??????_?_?.mpg; do
[[ -f $file ]] || continue # skip if not a regular file
new_file="${file%%_*}.mpg" # compose the new file name
echo mv "$file" "$new_file" # remove echo after testing
done

rename 's/_.*/.mpg/' *mpg
this will remove everything between the first underscore and the mpg file extension for all files ending in mpg

We can use grep to strip out everything but the first sequence of numbers. The --interactive flag will ask you if you're sure for each move, so you can make sure it's not doing anything you don't expect.
for file in *.mpg; do
mv --interactive "$file" "$(grep -o '^[0-9]\+' <<< "$file")".mpg
done
The regex ^[0-9]\+ translates to "any sequence of characters that starts with a number and is followed by zero or more numbers".

grep files based on name prefixes

I have a question on how to approach a problem I've been trying to tackle at multiple points over the past month. The scenario is like so:
I have a a base directory with multiple sub-directories all following the same sub-directory format:
A/{B1,B2,B3} where all B* have a pipeline/results/ directory structure under them.
All of these results directories have multiple *.xyz files in them. These *.xyz files have a certain hierarchy based on their naming prefixes. The naming prefixes in turn depend on how far they've been processed. They could be, for example, select.xyz, select.copy.xyz, and select.copy.paste.xyz, where the operations are select, copy and paste. What I wish to do is write a ls | grep or a find that picks these files based on their processing levels.
EDIT:
The processing pipeline goes select -> copy -> paste. The "most processed" file would be the one with the most of those stages as prefixes in its filename. i.e. select.copy.paste.xyz is more processed than select.copy, which in turn is more processed than select.xyz
For example, let's say
B1/pipeline/results/ has select.xyz and select.copy.xyz,
B2/pipeline/results/ has select.xyz
B3/pipeline/results/ has select.xyz, select.copy.xyz, and select.copy.paste.xyz
How can I write a ls | grep/find that picks the most processed file from each subdirectory? This should give me B1/pipeline/results/select.copy.xyz, B2/pipeline/results/select.xyz and B3/pipeline/results/select.copy.paste.xyz.
Any pointer on how I can think about an approach would help. Thank you!

For this answer, we will ignore the upper part A/B{1,2,3} of the directory structure. All files in some .../pipeline/results/ directory will be considered, even if the directory is A/B1/doNotIncludeMe/forbidden/pipeline/results. We assume that the file extension xyz is constant.
A simple solution would be to loop over the directories and check whether the files exist from back to front. That is, check if select.copy.paste.xyz exists first. In case the file does not exist, check if select.copy.xyz exists and so on. A script for this could look like the following:
#! /bin/bash
# print paths of the most processed files
shopt -s globstar nullglob
for d in **/pipeline/result; do
if [ -f "$d/select.copy.paste.xyz" ]; then
echo "$d/select.copy.paste.xyz"
elif [ -f "$d/select.copy.xyz" ]; then
echo "$d/select.copy.xyz"
elif [ -f "$d/select.xyz" ]; then
echo "$d/select.xyz"
else
# there is no file at all
fi
done
It does the job, but is not very nice. We can do better!
#! /bin/bash
# print paths of the most processed files
shopt -s globstar nullglob
for dir in **/pipeline/result; do
for file in "$dir"/select{.copy{.paste,},}.xyz; do
[ -f "$file" ] && echo "$file" && break
done
done
The second script does exactly the same thing as the first one, but is easier to maintain, adapt, and so on. Both scripts work with file and directory names that contain spaces or even newlines.
In case you don't have whitespace in your paths, the following (hacky, but loop-free) script can also be used.
#! /bin/bash
# print paths of the most processed files
shopt -s globstar nullglob
files=(**/pipeline/result/select{.copy{.paste,},}.xyz)
printf '%s\n' "${files[#]}" | sed -r 's#(.*/)#\1 #' | sort -usk1,1 | tr -d ' '

Do not start loop if there is no files in directory?

All,
I am running BASH in Solaris 10
I have the following shell script that loops in a directory depending on the presence of CSV files.
The problem is with this piece of code is that it still does one loop even if there is no CSV files in that directory and then calls SQL loader.
SQLLoader then produces a log file because there is no file to process and this is beginning to mess up my directory filling it with log files.
for file in *.csv ;
do
echo "SQLLoader is reading : " $file
sqlldr <User>/<Password>#<DBURL>:<PORT>/<SID> control=sqlloader.ctl log=$inbox/$file.log data=$inbox/$file
done
How do I stop it going into a loop if there is no CSV files in that directory of $inbox

Say:
shopt -s nullglob
before your for loop.
This is not the default, and saying for file in *.csv when you don't have any matching files expands it to *.csv.
Quoting from the documentation:
nullglob
If set, Bash allows filename patterns which match no files to expand to a null
string, rather than themselves.

Use find to search files
for file in `find -name "*.csv"` ;

First off, using nullglob is the correct answer if it is available. However, a POSIX-compliant option is available.
The pattern will be treated as literal text if there are no matches. You can catch this with a small hack:
for file in *.csv; do
[ -f "$file" ] || break
...
done
When there are no matches, file will be set to the literal string *.csv, which is not the name of a file, so -f "$file" will fail. Otherwise, file will be set in turn to the name of each file matching the pattern, and -f "$file" will succeed every time. Note this will work even if there is an file named *.csv. The drawback is that you have to make a redundant test for each existing file.

Shell script to execute executable over numerous files

Hi I have a file that sorts some code and reformats it. I have over 200 files to apply this to with incremental names run001, run002 etc. Is there a quick way to write a shell script to execute this file over all the files? The executable creates a new file called run001an etc so just running over all files containing run doesnt work, how do i increment the file number?
Cheers

how about:
for i in ./run*; do
process_the_file $i
done
which is valid Bash/Ksh

To be more specific with run### files you can have
for file in dir/run[0-9][0-9][0-9]; do
do_something "$file"
done
dir could simply be just . or other directories. If they have spaces, quote them around "" but only the directory parts.
In bash, you can make use of extended patterns to generate all number matches not just 3 digits:
shopt -s extglob
for file in dir/run+([0-9]); do
do_something "$file"
done

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio