How Can I Loop Edit Multiple Files in Bash script? - bash

I have 40 csv files that I need to edit. 20 have matching format and the names only differ by one character, e.g., docA.csv, docB.csv, etc. The other 20 also match and are named pair_docA.csv, pair_docB.csv, etc.
I have the code written to edit and combine docA.csv and pair_docA.csv, but I'm struggling writing a loop that calls both the above files, edits them, and combines them under the name combinedA.csv, then goes on the the next pair.
Can anyone help my rudimentary bash scripting? Here's what I have thus far. I've tried in a single for loop, and now I'm trying in 2 (probably 3) for loops. I'd prefer to keep it in a single loop.
set -x
DIR=/path/to/file/location
for file in `ls $DIR/doc?.csv`
do
#code to edit the doc*.csv files ie $file
done
for pairdoc in `ls $DIR/pair_doc?.csv`
do
#code to edit the piar_doc*.csv files ie $pairdoc
done
#still need to combine the files. I have the join written for a single iteration,
#but how do I loop the code to save each join as a different file corresponding
#to combined*.csv

Something along these lines:
#!/bin/bash
dir=/path/to/file/location
cd "$dir" || exit
for file in doc?.csv; do
pair=pair_$file
# "${file#doc}" deletes the prefix "doc"
combined=combined_${file#doc}
cat "$file" "$pair" >> "$combined"
done
ls, on principle, shouldn't be used in a shell script in order to iterate over the files. It is intended to be used interactively and nearly never needed within a script. Also, all-capitalized variable names shouldn't be used as ordinary variables, since they may collide with internal shell variables or environment variables.
Below is a version without changing the directory.
#!/bin/bash
dir=/path/to/file/location
for file in "$dir/"doc?.csv; do
basename=${file#"$dir/"}
pair=$dir/pair_$basename
combined=$dir/combined_${basename#doc}
cat "$file" "$pair" >> "$combined"
done

This might work for you (GNU parallel):
parallel cat {1} {2} \> join_{1}_{2} ::: doc{A..T}.csv :::+ pair_doc{A..T}.csv
Change the cat commands to your chosen commands where {1} represents the docX.csv files and {2} represents the pair_docX.csv file.
N.B. X represents the letters A thru T

Related

Cycle through a list of terms and batch rename files

EDIT: In the course of working on and reediting this question, I was able to get this to work. However, I'm sure there's a better way to do it, so I'm leaving it up to hear from those more experienced.
Periodically I need to reproduce several dozen copies of a few files. For example, given:
company_a_results_30d.py
company_a_results_90d.py
company_a_results_120d.py
company_a_results_all_time.py
I need to make copies where company_a is replaced with company_b, company_c....etc. (The next step is to find and replace a number of terms within the files, but this I have managed to do with a perl script.)
I'm sure this should be possible with a bash script and mv, but I haven't quite got the hang of it. Something like:
#!/usr/bin/env bash
my_array=(company_b company_c company_d)
for i in "${my_array[#]}"
do
for file in *.py
do
cp "$file" "${file/company_a/$i}"
done
done
I'd prefer a solution compatible with zsh, which is what I use.
bash
Slightly modified from the OP's answer:
#!/usr/bin/env bash
set -x # So you can see what's happening - feel free to omit
company_a_files=(company_a*.py) # <== Save the list of files first
my_array=(company_b company_c company_d)
for i in "${my_array[#]}"
do
for file in "${company_a_files[#]}" # <== Use the saved list
do
cp "$file" "${file/company_a/$i}"
done
done
When the inner loop in the OP's answer runs for file in *.py, the glob will pick up whatever company_b &c. files have already been created. So you wind up with a lot of set -x output like:
+ cp company_b_1.py company_b_1.py
cp: 'company_b_1.py' and 'company_b_1.py' are the same file
Instead, save the glob of company_a files into a shell array first, and then
loop over that array.
perl
As a one-liner for Perl 5.14+:
perl -MFile::Copy=copy -E 'for my $file (#ARGV) { copy $file, $file =~ s/company_a/$_/r foreach qw(company_b company_c company_d) }' company_a*.py
The Perl version switches the loop order compared to the bash version. For each file given on the command line (the for ... #ARGV), it copies from that file to each name-modified file in turn (the foreach).
$file =~ s/company_a/$_/r is a non-destructive (/r) replace in $file (the filename) that changes company_a to $_ (the current value from foreach).
This was the solution I came up with:
#!/usr/bin/env bash
my_array=(company_b company_c company_d)
for i in "${my_array[#]}"
do
for file in *.py
do
cp "$file" "${file/company_a/$i}"
done
done

Loop over filenames, rename them via a condition

How do I loop over separate filenames and rename them?
The "task/condition" is:
Cut the first 5 letters and the last 4 letters?
e.g. I have these files:
1212erertugg.jpg
14rtzuzuiopo.jpg
tz7878nhmnop.jpg
etc...
The result should look like this:
rertugg
uzuiopo
8nhmnop
Use parameter expansion to extract the substrings:
#!/bin/bash
for file in 1212erertugg.jpg 14rtzuzuiopo.jpg tz7878nhmnop.jpg ; do
substr=${file:5}
substr=${substr:0:-4}
mv "$file" "$substr"
done
You might need to check whether you're not overwriting an already existing file, either an original one or created by the script itself in one of the previous steps.

Bash for loop testing two boolean expressions

Below is a simple bash program. It takes file types as command line arguments and it queries the current directory and prints the files of the type specified.
I would like to be able to query two different file types and therefore need two boolean expressions to represent this.
Below is my code for querying just one file type
#!/bin/bash
for x in $(ls *$1); do
echo $x;
done
Now what I would like to be able to do is (in pseudocode)
command line args fileName .sh .c
for x in (current directory files of *.sh) OR (in current directory files of *.c) do
print .sh files
print.c files
done
I've tried using || and I get syntax errors I can not find any evidence of being able to use || for two expressions in for loop.
I've tried using two nested for loops but they do not work and yield errors.
Is there any way I can accomplish this using the same for loop system.
Thank you.
Sounds like you want something like:
for extension in "$#"; do
printf 'Files ending in %s:\n' "$extension"
printf '%s\n' *"$extension"
done
Loop through all arguments passed to the script and print all files ending in each extension + a newline character.
Note that printf is a much more useful tool than echo, as it allows you to control the format of each thing is prints.
ls doesn't do anything useful either here; it is the shell which expands the * to the list of files matching the pattern.

Output filename from input in bash

I have this script:
#!/bin/bash
FASTQFILES=~/Programs/ncbi-blast-2.2.29+/DB_files/*.fastq
FASTAFILES=~/Programs/ncbi-blast-2.2.29+/DB_files/*.fasta
clear
for file in $FASTQFILES
do cat $FASTQFILES | perl -e '$i=0;while(<>){if(/^\#/&&$i==0){s/^\#/\>/;print;}elsif($i==1){print;$i=-3}$i++;}' > ~/Programs/ncbi-blast-2.2.29+/DB_files/"${FASTQFILES%.*}.fasta"
mv $FASTAFILES ~/Programs/ncbi-blast-2.2.29+/db/
done
I'm trying it to grab the files defined in $FASTQFILES, do the .fastq to .fasta conversion, name the output with the same filename of the input, and move it to a new folder. E.g., ~/./DB_files/HELLO.fastq should give a converted ~/./db/HELLO.fasta
The problem is that the output of the conversion is a properly formatted hidden file called .fasta in the first folder instead of the expected one named HELLO.fasta. So there is nothing to mv. I think I'm messing up in the ${FASTQFILES%.*}.fasta argument but I can't seem to fix it.
I see three problems:
One part of your trouble is that you use cat $FASTQFILES instead of cat $file.
You also need to fix the I/O redirection at the end of that line to > ~/Programs/ncbi-blast-2.2.29+/DB_files/"${file%.fastq}.fasta".
The mv command needs to be executed outside the loop.
In fact, when processing a single file at a time, you don't need to use cat at all (UUOC — Useless Use Of Cat). Simply provide "$file" as an argument to the Perl script.

Shell script to execute executable over numerous files

Hi I have a file that sorts some code and reformats it. I have over 200 files to apply this to with incremental names run001, run002 etc. Is there a quick way to write a shell script to execute this file over all the files? The executable creates a new file called run001an etc so just running over all files containing run doesnt work, how do i increment the file number?
Cheers
how about:
for i in ./run*; do
process_the_file $i
done
which is valid Bash/Ksh
To be more specific with run### files you can have
for file in dir/run[0-9][0-9][0-9]; do
do_something "$file"
done
dir could simply be just . or other directories. If they have spaces, quote them around "" but only the directory parts.
In bash, you can make use of extended patterns to generate all number matches not just 3 digits:
shopt -s extglob
for file in dir/run+([0-9]); do
do_something "$file"
done

Resources