Naming an output file after the input directory - bash

I'm working with some files that are organized within a folder (named RAW) that contain several other folders with different names, all of them containing files ended by a string like _1 or _2 with the extension (.fq.gz in this case). Below I try to include a schedule for guidance.
RAW/
FOLDER1/
FILE_qwer_1.fa.gz
FILE_qwer_2.fa.gz
FOLDER2/
FILE_tyui_1.fa.gz
FILE_tyui_2.fa.gz
OTHER1/
FILE_asdf_1.fa.gz
FILE_asdf_2.fa.gz
...
So I am basically running a loop over all those directories under RAW and run a script that will create an output file, say out.
What I'm trying to accomplish is to name that out file as the folder it belongs to under $RAW (e.g. FOLDER1.eg after processing FILE_qwer_1.fa.gz and FILE_qwer_2.fa.gz above)
The loop below will work actually, but as you can imagine, it depends on how many folders I am working below the root /, as the option -f is hard-coded for the cut command.
for file1 in ${RAW}/*/*_1.fq.gz; do
file2="${file1/_1/_2}"
out="$(echo $file1 | cut -d '/' -f2)"
bash script_to_be_run.sh $file1 $file2 $out
done
Ideally, the variable out should be named as the replacement of the first * character of the glob used in the loop (e.g. FOLDER1.eg in the first iteration) followed by a custom extension, but I do not really know how to do it, nor if it is possible.

You can use ${var#prefix} to remove a prefix from the start of a variable.
for file1 in ${RAW}/*/*_1.fq.gz; do
file2="${file1/_1/_2}"
out="$(dirname "${file1#$RAW/}")" # cuts the $RAW from the beginning of the dirs
bash script_to_be_run.sh "$file1" "$file2" "$out"
done
(It's a good idea to quote variable expansions in case they contain spaces or other special character: "$file1" is safer than $file1.)

Related

Sed function in shell applied to all .gff files in a directory

I am working with .gff3 files trying to remove contig sequences in the bottom of many files in a directory. The contig sequences are separated from the rest of the file with a ##FASTA, and I wish to delete everything below (DNA sequences, FASTA format).
This script works for one file:
sed '/^##FASTA$/,$d' file1.gff > file1_altered.gff
But I fail when I try to apply it to all files in a directory like this:
for F in directory/input/*; do
N=$(basename $F) sed '/^##FASTA$/,$d' ${F} > directory/output/$N.gff
done
Any help appreciated!
You are missing a semicolon after N=$(basename $F). The way it is written is it only a one-shot assignment, i.e. N is empty when used in the redirection.
You can avoid using basename entirely if you use the shell's builtin string processing: ${F##*/} removes the longest left part matching */.
for F in directory/input/*; do
sed '/^##FASTA$/,$d' "${F}" > "directory/output/${F##*/}.gff"
done

Comparing two directories to produce output

I am writing a Bash script that will replace files in folder A (source) with folder B (target). But before this happens, I want to record 2 files.
The first file will contain a list of files in folder B that are newer than folder A, along with files that are different/orphans in folder B against folder A
The second file will contain a list of files in folder A that are newer than folder B, along with files that are different/orphans in folder A against folder B
How do I accomplish this in Bash? I've tried using diff -qr but it yields the following output:
Files old/VERSION and new/VERSION differ
Files old/conf/mime.conf and new/conf/mime.conf differ
Only in new/data/pages: playground
Files old/doku.php and new/doku.php differ
Files old/inc/auth.php and new/inc/auth.php differ
Files old/inc/lang/no/lang.php and new/inc/lang/no/lang.php differ
Files old/lib/plugins/acl/remote.php and new/lib/plugins/acl/remote.php differ
Files old/lib/plugins/authplain/auth.php and new/lib/plugins/authplain/auth.php differ
Files old/lib/plugins/usermanager/admin.php and new/lib/plugins/usermanager/admin.php differ
I've also tried this
(rsync -rcn --out-format="%n" old/ new/ && rsync -rcn --out-format="%n" new/ old/) | sort | uniq
but it doesn't give me the scope of results I require. The struggle here is that the data isn't in the correct format, I just want files not directories to show in the text files e.g:
conf/mime.conf
data/pages/playground/
data/pages/playground/playground.txt
doku.php
inc/auth.php
inc/lang/no/lang.php
lib/plugins/acl/remote.php
lib/plugins/authplain/auth.php
lib/plugins/usermanager/admin.php
List of files in directory B (new/) that are newer than directory A (old/):
find new -newermm old
This merely runs find and examines the content of new/ as filtered by -newerXY reference with X and Y both set to m (modification time) and reference being the old directory itself.
Files that are missing in directory B (new/) but are present in directory A (old/):
A=old B=new
diff -u <(find "$B" |sed "s:$B::") <(find "$A" |sed "s:$A::") \
|sed "/^+\//!d; s::$A/:"
This sets variables $A and $B to your target directories, then runs a unified diff on their contents (using process substitution to locate with find and remove the directory name with sed so diff isn't confused). The final sed command first matches for the additions (lines starting with a +/), modifies them to replace that +/ with the directory name and a slash, and prints them (other lines are removed).
Here is a bash script that will create the file:
#!/bin/bash
# Usage: bash script.bash OLD_DIR NEW_DIR [OUTPUT_FILE]
# compare given directories
if [ -n "$3" ]; then # the optional 3rd argument is the output file
OUTPUT="$3"
else # if it isn't provided, escape path slashes to underscores
OUTPUT="${2////_}-newer-than-${1////_}"
fi
{
find "$2" -newermm "$1"
diff -u <(find "$2" |sed "s:$2::") <(find "$1" |sed "s:$1::") \
|sed "/^+\//!d; s::$1/:"
} |sort > "$OUTPUT"
First, this determines the output file, which either comes from the third argument or else is created from the other inputs using a replacement to convert slashes to underscores in case there are paths, so for example, running as bash script.bash /usr/local/bin /usr/bin would output its file list to _usr_local_bin-newer-than-_usr_bin in the current working directory.
This combines the two commands and then ensures they are sorted. There won't be any duplicates, so you don't need to worry about that (if there were, you'd use sort -u).
You can get your first and second files by changing the order of arguments as you invoke this script.

How do I write a bash script to copy files into a new folder based on name?

I have a folder filled with ~300 files. They are named in this form username#mail.com.pdf. I need about 40 of them, and I have a list of usernames (saved in a file called names.txt). Each username is one line in the file. I need about 40 of these files, and would like to copy over the files I need into a new folder that has only the ones I need.
Where the file names.txt has as its first line the username only:
(eg, eternalmothra), the PDF file I want to copy over is named eternalmothra#mail.com.pdf.
while read p; do
ls | grep $p > file_names.txt
done <names.txt
This seems like it should read from the list, and for each line turns username into username#mail.com.pdf. Unfortunately, it seems like only the last one is saved to file_names.txt.
The second part of this is to copy all the files over:
while read p; do
mv $p foldername
done <file_names.txt
(I haven't tried that second part yet because the first part isn't working).
I'm doing all this with Cygwin, by the way.
1) What is wrong with the first script that it won't copy everything over?
2) If I get that to work, will the second script correctly copy them over? (Actually, I think it's preferable if they just get copied, not moved over).
Edit:
I would like to add that I figured out how to read lines from a txt file from here: Looping through content of a file in bash
Solution from comment: Your problem is just, that echo a > b is overwriting file, while echo a >> b is appending to file, so replace
ls | grep $p > file_names.txt
with
ls | grep $p >> file_names.txt
There might be more efficient solutions if the task runs everyday, but for a one-shot of 300 files your script is good.
Assuming you don't have file names with newlines in them (in which case your original approach would not have a chance of working anyway), try this.
printf '%s\n' * | grep -f names.txt | xargs cp -t foldername
The printf is necessary to work around the various issues with ls; passing the list of all the file names to grep in one go produces a list of all the matches, one per line; and passing that to xargs cp performs the copying. (To move instead of copy, use mv instead of cp, obviously; both support the -t option so as to make it convenient to run them under xargs.) The function of xargs is to convert standard input into arguments to the program you run as the argument to xargs.

Write a shell script that replaces multiple strings in multiple files

I need to search through many files in a directory for a list of keywords and add a prefix to all of them. For example, if various files in my directory contained the terms foo, bar, and baz, I would need to change all instances of these terms to: prefix_foo, prefix_bar, and prefix_baz.
I'd like to write a shell script to do this so I can avoid doing the search one keyword at a time in SublimeText (there are a lot of them). Unfortunately, my shell-fu is not that strong.
So far, following this advice, I have created a file called "replace.sed" with all of the terms formatted like this:
s/foo/prefix_foo/g
s/bar/prefix_bar/g
s/baz/prefix_baz/g
The terminal command it suggests to use with this list is:
sed -f replace.sed < old.txt > new.txt
I was able to adapt this to replace instances within the file (instead of creating a new file) by setting up the following script, which I called inline.sh:
#!/bin/sh -e
in=${1?No input file specified}
mv $in ${bak=.$in.bak}
shift
"$#" < $bak > $in
Putting it all together, I ended up with this command:
~/inline.sh old.txt sed -f replace.sed
I tried this and it works, for one file at a time. How would I adapt this to search and replace through all of the files in my entire directory?
for f in *; do
[[ -f "$f" ]] && ~/inline.sh "$f" sed -f ~/replace.sed
done
In a script:
#!/bin/bash
files=`ls -1 your_directory | egrep keyword`
for i in ${files[#]}; do
cp ${i} prefix_${i}
done
This will, of course, leave the originals where they are.

Rename multiple files, but only rename part of the filename in Bash

I know how I can rename files and such, but I'm having trouble with this.
I only need to rename test-this in a for loop.
test-this.ext
test-this.volume001+02.ext
test-this.volume002+04.ext
test-this.volume003+08.ext
test-this.volume004+16.ext
test-this.volume005+32.ext
test-this.volume006+64.ext
test-this.volume007+78.ext
If you have all of these files in one folder and you're on Linux you can use:
rename 's/test-this/REPLACESTRING/g' *
The result will be:
REPLACESTRING.ext
REPLACESTRING.volume001+02.ext
REPLACESTRING.volume002+04.ext
...
rename can take a command as the first argument. The command here consists of four parts:
s: flag to substitute a string with another string,
test-this: the string you want to replace,
REPLACESTRING: the string you want to replace the search string with, and
g: a flag indicating that all matches of the search string shall be replaced, i.e. if the filename is test-this-abc-test-this.ext the result will be REPLACESTRING-abc-REPLACESTRING.ext.
Refer to man sed for a detailed description of the flags.
Use rename as shown below:
rename test-this foo test-this*
This will replace test-this with foo in the file names.
If you don't have rename use a for loop as shown below:
for i in test-this*
do
mv "$i" "${i/test-this/foo}"
done
Function
I'm on OSX and my bash doesn't come with rename as a built-in function. I create a function in my .bash_profile that takes the first argument, which is a pattern in the file that should only match once, and doesn't care what comes after it, and replaces with the text of argument 2.
rename() {
for i in $1*
do
mv "$i" "${i/$1/$2}"
done
}
Input Files
test-this.ext
test-this.volume001+02.ext
test-this.volume002+04.ext
test-this.volume003+08.ext
test-this.volume004+16.ext
test-this.volume005+32.ext
test-this.volume006+64.ext
test-this.volume007+78.ext
Command
rename test-this hello-there
Output
hello-there.ext
hello-there.volume001+02.ext
hello-there.volume002+04.ext
hello-there.volume003+08.ext
hello-there.volume004+16.ext
hello-there.volume005+32.ext
hello-there.volume006+64.ext
hello-there.volume007+78.ext
Without using rename:
find -name test-this\*.ext | sed 'p;s/test-this/replace-that/' | xargs -d '\n' -n 2 mv
The way it works is as follows:
find will, well, find all files matching your criteria. If you pass -name a glob expression, don't forget to escape the *.
Pipe the newline-separated* list of filenames into sed, which will:
a. Print (p) one line.
b. Substitute (s///) test-this with replace-that and print the result.
c. Move on to the next line.
Pipe the newline-separated list of alternating old and new filenames to xargs, which will:
a. Treat newlines as delimiters (-d '\n').
b. Call mv repeatedly with up to 2 (-n 2) arguments each time.
For a dry run, try the following:
find -name test-this\*.ext | sed 'p;s/test-this/replace-that/' | xargs -d '\n' -n 2 echo mv
*: Keep in mind it won't work if your filenames include newlines.
to rename index.htm to index.html
rename [what you want to rename] [what you want it to be] [match on these files]
rename .htm .HTML *.htm
renames index.htm to index.html
It will do this for all files that match *.htm in the folder.
thx for your passion and answers. I also find a solution for me to rename multiple files on my linux terminal and directly add a little counter. With this I have a very good chance to have better SEO names.
Here is the command
count=1 ; zmv '(*).jpg' 'new-seo-name--$((count++)).jpg'
I also do a live coding video and publush it to YouTube

Resources