Renaming files using their content - bash

I have several files which all start with this line:
CREATE PROCEDURE **CHANGING_NAME**
I want to be able to pull the name of the procedure and use it to the rename the file. There is content to each file below this first line.
Has anyone done something like this before?
Thanks

Assuming you have all files in one directory :
#!/bin/bash
for i in *.extension :
do
# Assuming 3rd column of the first line is the new name of the file
# And **CHANGING_NAME** doesn't contain any space or meta characters
newname=$(awk 'NR==1 && /PROCEDURE/ {print $3}' "$i")
if [ "$newname" == "" ]; then
echo "There is no PROCEDURE in the first line";
echo "No new name for file $i";
else
mv "$i" "$newname"
fi
done

With a lot of care and pretending that the **CHANGING_NAME** is well-formed:
for file in *.files; do mv -i -- "$file" "$(awk '{print $3; exit}' $file)" ; done
The -i option is to prevent accidental overriding existing files.
This version works with spaces (and many other strange characters except for /):
for file in *.files; do mv -i -- "$file" "$(sed -n '1s/^CREATE\ PROCEDURE\ \(.*\)$/\1/p' $file)"; done

Since I was never great with awk I might suggest:
#! /bin/bash
#
for i in *.extension
do echo $i
newname=$(head -1 "${i}" | cut -d ' ' -f2)
mv -i "${i}" "${newname}"
done
This assumes all files you're looking for have the same extension. If not, and you need the extension, you could use:
#! /bin/bash
#
for i in *
do echo $i
ext="${i##*.}"
newname=$(head -1 "${i}" | cut -d ' ' -f2)
mv -i "${i}" "${newname}"."${ext}"
done
Both assume all the files are in a single directory.

You can try the next:
perl -lanE 'if($.==1&&/PROCEDURE/){close ARGV;say "$ARGV,$F[2]"}' files*
and if satisfied, change it to
perl -lanE 'if($.==1&&/PROCEDURE/){close ARGV;rename $ARGV,$F[2]}' files*

mv myfile `sed '1 s/.*PROCEDURE\s*//' myfile`
(the sed command will delete the text to the left of the word proceeding PROCEDURE regardless of how many spaces on only the first line and print it out the backticks make it execute in place so it is used as the filename to the mv command)
to move them all and add an extension .ext:
ls *.ext | xargs -I {} mv {} `sed '1 s/.*PROCEDURE\s*//' {}`.ext

Related

Extend filename with word from file -

I can change the filename for a file to the first word in the file.
for fname in lrccas1
do
cp $fname $(head -1 -q $fname|awk '{print $1}')
done
But I would like to extend it inset.
for fname in lrccas1
do
cp $fname $(head -1 -q $fname|awk '{print $1 FILENAME}')
done
I have tried different variations of this, but none seem to work.
Is there an easy solution?
Kind regards Svend
Firstly, let understand why you did not get desired result
head -1 -q $fname|awk '{print $1 FILENAME}'
You are redirecting standard output of head command to awk command, that is awk is reading standard input and therefore FILENAME is set to empty string. Asking GNU AWK about FILENAME when it does consume standard input does not make much sense, as only data does through pipe and there might not such things as input file at all, e.g.
seq 10 | awk '{print $1*10}'
Secondly, let find way to get desired result, you have access to filename and successfully extracted word, therefore you might concat them that is
for fname in lrccas1
do
cp $fname "$(head -1 -q $fname|awk '{print $1}')$fname"
done
Thirdly, I must warn you that your command does copy (cp) rather than rename file (rm) and does not care if target name does exist or not - if it do, it will be overwritten.
You can do it in pure bash (or sh)
for fname in lrccas1
do
read -r word rest < "$fname" && cp "$fname" "$word$fname"
done
This would do what your shell script appears to be trying to do:
awk 'FNR==1{close(out); out=$1 FILENAME} {print > out}' lrccas1
but you might want to consider something like this instead:
awk 'FNR==1{close(out); out=$1 FILENAME "_new"} {print > out}' *.txt
so your newly created files don't overwrite your existing ones and then to also remove the originals would be:
awk 'FNR==1{close(out); out=$1 FILENAME "_new"} {print > out}' *.txt &&
rm -f *.txt
That assumes your original files have some suffix like .txt or other way of identifying the original files, or you have all of your original files into some directory such as $HOME/old and can put the new files in a new directory such as $HOME/new:
cd "$HOME/old" &&
mkdir -p "$HOME/new" &&
awk -v newDir="$HOME/new" 'FNR==1{close(out); out=newDir "/" $1 FILENAME} {print > out}' * &&
echo rm -f *
remove the echo when done testing and happy with the result.
try to execute this: (bash)
for fname in file_name
do
cp $fname "$(head -1 -q $fname|awk '{print $1}')$fname"
done

How to add lines at the beginning of either empty or not file?

I want to add lines at beginning of file, it works with:
sed -i '1s/^/#INFO\tFORMAT\tunknown\n/' file
sed -i '1s/^/##phasing=none\n/' file
However it doesn't work when my file is empty. I found these commands:
echo > file && sed '1s/^/#INFO\tFORMAT\tunknown\n/' -i file
echo > file && sed '1s/^/##phasing=none\n/' -i file
but the last one erase the first one (and also if file isn't empty)
I would like to know how to add lines at the beginning of file either if the file is empty or not
I tried a loop with if [ -s file ] but without success
Thanks!
You can use the insert command (i).
if [ -s file ]; then
sed -i '1i\
#INFO\tFORMAT\tunknown\
##phasing=none' file
else
printf '#INFO\tFORMAT\tunknown\n##phasing=none' > file
fi
Note that \t for tab is not POSIX, and does not work on all sed implementations (eg BSD/Apple, -i works differently there too). You can use a raw tab instead, or a variable: tab=$(printf '\t').
You should use i command in sed:
file='inputFile'
# insert a line break if file is empty
[[ ! -s $file ]] && echo > "$file"
sed -i.bak $'1i\
#INFO\tFORMAT\tunknown
' "$file"
Or you can ditch sed and do it in the shell using printf:
{ printf '#INFO\tFORMAT\tunknown\n'; cat file; } > file.new &&
mv file.new file
With plain bash and shell utilities:
#!/bin/bash
header=(
$'#INFO\tFORMAT\tunknown'
$'##phasing=none'
)
mv file file.bak &&
{ printf '%s\n' "${header[#]}"; cat file.bak; } > file &&
rm file.bak
Explicitely creating a new file, then moving it:
#!/bin/bash
echo -e '#INFO\tFORMAT\tunknown' | cat - file > file.new
mv file.new file
or slurping the whole content of the file into memory:
#!/bin/bash
printf '#INFO\tFORMAT\tunknown\n%s' "$(<file)" > file
It is trivial with ed if available/acceptable.
printf '%s\n' '0a' $'#INFO\tFORMAT\tunknown' $'##phasing=none' . ,p w | ed -s file
It even creates the file if it does not exists.

Rename files matching pattern in a loop - Bash

I have been trying to rename some specific files based on a table but with no success. It either renames all files or gives error.
The directory contains hundreds of files named with long barcodes and I want to rename only files containing the patter _1_.
Example
barcode_1_barcode_SL484171.fastq.gz barcode_2_barcode_SL484171.fastq.gz barcode_1_barcode_SL484370.fastq.gz barcode_2_barcode_SL484370.fastq.gz
mytable.txt
oldname
newname
barcode_1_barcode_SL484171
Description1
barcode_2_barcode_SL484171
Description1
barcode_1_barcode_SL484370
Description2
barcode_2_barcode_SL484370
Description2
Desire output:
Description1.R1.fastq.gz Description2.R1.fastq.gz
As you can see in the table there are two files per description but I only want to rename the ones with the _1_ pattern.
Code I have tried:
for i in *_1_*.fastq.gz; do read oldname newname; mv "$oldname" "$newname".R1.fastq.gz; done < mytable.txt
for i in $(grep '_1_' mytable.txt); do read -r oldname newname; mv ${oldname} ${newname}.R1.fastq.gz; done < mytable.txt
for i in $(grep '_1_' mytable.txt); do oldname=$(cut -f1 $i);newname=$(cut -f2 $i); ln -s ${oldname} ${newname}.R1.fastq.gz; done
while read -r oldname newname
do
if [[ $oldname =~ "_1_" ]]
then
mv $oldname $newname
fi
done < mytable.txt
Something like this.
#!/usr/bin/env bash
while IFS= read -r files; do ##: loop through the output of `grep 'barcode_1_barcode.*' table.txt`
while read -ru9 old_name prefix; do ##: loop through the output of `find . -name 'barcode_1_barcode*.gz' | grep -f <(cut -d' ' -f1 table.txt`
if [[ $files == *"$old_name"* ]]; then ##: If the filename from the output of find matches the first field of table.txt (space delimite)
old_filename="${files%.fastq.gz}" ##: Extract the filename without the fast.gz extesntion
extension="${files#"$old_filename"}" ##: Extract the extention .fast.gz without the filename
# mv -v "$files" "$prefix.R1${extension}"
printf '%s %s %s ==> %s\n' mv -v "$files" "$prefix.R1${extension}" ##: Rename the files to the desired output
fi
done 9< <(grep 'barcode_1_barcode.*' table.txt)
done < <(find . -name 'barcode_1_barcode*.gz' | grep -f <(cut -d' ' -f1 table.txt) ) ##: Remain the first column/field of table.txt
Output from the OP's sample data/files.
renamed './barcode_1_barcode_SL484370.fastq.gz' -> 'Description2.R1.fastq.gz'
renamed './barcode_1_barcode_SL484171.fastq.gz' -> 'Description1.R1.fastq.gz'
If you're satisfied with the output either move the # from the front of mv to the
front of printf or just delete the entire line with printf and remove the # from
mv in order for mv to actually rename the files.

Extract a line from a text file using grep?

I have a textfile called log.txt, and it logs the file name and the path it was gotten from. so something like this
2.txt
/home/test/etc/2.txt
basically the file name and its previous location. I want to use grep to grab the file directory save it as a variable and move the file back to its original location.
for var in "$#"
do
if grep "$var" log.txt
then
# code if found
else
# code if not found
fi
this just prints out to the console the 2.txt and its directory since the directory has 2.txt in it.
thanks.
Maybe flip the logic to make it more efficient?
f=''
while read prev
do case "$prev" in
*/*) f="${prev##*/}"; continue;; # remember the name
*) [[ -e "$f" ]] && mv "$f" "$prev";;
done < log.txt
That walks through all the files in the log and if they exist locally, move them back. Should be functionally the same without a grep per file.
If the name is always the same then why save it in the log at all?
If it is, then
while read prev
do f="${prev##*/}" # strip the path info
[[ -e "$f" ]] && mv "$f" "$prev"
done < <( grep / log.txt )
Having the file names on the same line would significantly simplify your script. But maybe try something like
# Convert from command-line arguments to lines
printf '%s\n' "$#" |
# Pair up with entries in file
awk 'NR==FNR { f[$0]; next }
FNR%2 { if ($0 in f) p=$0; else p=""; next }
p { print "mv \"" p "\" \"" $0 "\"" }' - log.txt |
sh
Test it by replacing sh with cat and see what you get. If it looks correct, switch back.
Briefly, something similar could perhaps be pulled off with printf '%s\n' "$#" | grep -A 1 -Fxf - log.txt but you end up having to parse the output to pair up the output lines anyway.
Another solution:
for f in `grep -v "/" log.txt`; do
grep "/$f" log.txt | xargs -I{} cp $f {}
done
grep -q (for "quiet") stops the output

Sed replace substring only if expression exist

In a bash script, I am trying to remove the directory name in filenames :
documents/file.txt
direc/file5.txt
file2.txt
file3.txt
So I try to first see if there is a "/" and if yes delete everything before :
for i in **/*.scss *.scss; do
echo "$i" | sed -n '^/.*\// s/^.*\///p'
done
But it doesn't work for files in the current directory, it gives me a blank string.
I get :
file.txt
file5.txt
When you only want the filename, use basename instead of sed.
# basename /path/to/file
returns file
here is the man page
Your sed attempt is basically fine, but you should print regardless of whether you performed a substitution; take out the -n and the p at the end. (Also there was an unrelated syntax error.)
Also, don't needlessly loop over all files.
printf '%s\n' **/*.scss *.scss |
sed -n 's%^.*/%%p'
This also can be done with awk bash util.
Example:
echo "1/2/i.py" | awk 'BEGIN {FS="/"} {print $NF}'
output: i.py
Eventually, I did :
for i in **/*.scss *.scss; do
# for i in *.scss; do
# for i in _hm-globals.scss; do
name=${i##*/} # remove dir name
name=${name%.scss} # remove extension
name=`echo "$name" | sed -n "s/^_hm-//p"` # remove _hm-
if [[ $name = *"."* ]]; then
name=`echo "$name" | sed -n 's/\./-/p'` #replace . to --
fi
echo "$name" >&2
done

Resources