Append date/timestamp to existing files - shell

There are few files in a directory. I would like to find all the files based on wildcard criteria and then rename it by appending a date or timestamp field to it using single line command
Example :
foo1.txt
foo1.log
foo2.txt
foo2.log
I want to find all .log files and rename them by appending date field to it
Expected output :
foo1.txt
foo1_20210609.log
foo2.txt
foo2_20210609.log

I would use a "for" loop over the wildcard list of matches and then use parameter expansion and command substitution to splice out the rest:
for file in *.log
do
echo mv -- "$file" "${file%.log}_$(date +%Y%m%d).log"
done
The pieces in the middle break down as:
mv -- -- invoke "mv" and explicitly tell it that there are no more options; this insulates us from filenames that might start with -i, for example
"${file%.log} -- expands the "$file" variable and removes the ".log" from the end
_ -- just adds the underscore where we want it
$(date +%Y%m%d) -- calls out to the date command and inserts the resulting output
.log -- just adds the ".log" part back at the end
Remove the "echo" if you like the resulting commands.
If you want a static timestamp, then just use that text in place of the whole $(date ...) command substitution.
On your sample input, but with today's date, the output is:
mv -- foo1.log foo1_20210610.log
mv -- foo2.log foo2_20210610.log

ls | perl -lne '$date=`date '+%Y%m%d'`; chomp($date); `mv $_ $_$date`;'
ls the file
pipe it into a perl script
e option executes in line
n option splits the input into lines
l option adds new line to each
back ticks execute unix commands

Related

automatice bash command for multiple files

I have a directory with multiple files
file1_1.txt
file1_2.txt
file2_1.txt
file2_2.txt
...
And I need to run a command structured like this
command [args] file1 file2
So I was wondering if there was a way to call the command just one time on all the files, instead of having to call It each time on each pair of files.
Use find and xargs, with sort, since the order appears meaningful in your case:
find . -name 'file?_?.txt' | sort | xargs -n2 command [args]
If your command can take multiple pairs of files on the command line then it should be sufficient to run
command ... *_[12].txt
The files in expanded glob patterns (such as *_[12].txt) are automatically sorted so the files will be paired correctly.
If the command can only take one pair of files then it will need to be run multiple times to process all of the files. One way to do this automatically is:
for file1 in *_1.txt; do
file2=${file1%_1.txt}_2.txt
[[ -f $file2 ]] && echo command "$file1" "$file2"
done
You'll need to replace echo command with the correct command name and arguments.
See Removing part of a string (BashFAQ/100 (How do I do string manipulation in bash?)) for an explanation of ${file1%_1.txt}.
#!/bin/bash
cmd (){
readarray -d " " arr <<<"$#"
for ((i=0; i<${#arr[#]}; i+=2))
do
n=$(($i+1))
firstFile="${arr[$i]}"
secondFile="${arr[$n]}"
echo "pair -- ${firstFile} ${secondFile}"
done
}
cmd file*_[12].txt
pair -- file1_1.txt file1_2.txt
pair -- file2_1.txt file2_2.txt

Rename first half of all file names in a directorry before common string (hyphens to periods)

I need to convert hyphens and underscores to periods for all files in a directory, but only for the first half of each file name. All files include the string L001, which is the point where I need periods before and underscores to remain after.
An example file name:
A1898-MYSE-M-HEE_S19_L001_R1_001.fastq.gz
to
A1898.MYSE.M.HEE.S19_L001_R1_001.fastq.qz
The code I'm working with returns the following error:
line 4: fp: command not found
lp: Error - unable to access "=" - No such file or directory
line 6: new: command not found
mv: cannot stat '*L001*': No such file or directory
I put the following script in the same directory as the files and ran it:
#!/bin/bash
cd $1;
for file in *L001*; do
fp="${file%L001*}"; #first part
lp="${file#*L001}"; #last part
new="${fp//_/.}L001$lp";
mv "$file" "$new";
done
You can do:
for file in *L001*.fastq.gz; do # iterates over the required files
pre=${file%%L001*} # strips off the portion after L001 (including)
post=${file##*L001} # drops the portion before L001 (including)
echo mv -- "$file" "${pre//[-_]/.}L001${post}" # replaces all `_`
# and `-` with `.`
# in `$pre` and
# rename using `mv`
done
Drop echo to do the actual mv-ing.
Example:
% for file in *L001*.fastq.gz; do pre=${file%%L001*}; post=${file##*L001}; echo mv -- "$file" "${pre//[-_]/.}L001${post}"; done
mv -- A1898-MYSE-M-HEE_S19_L001_R1_001.fastq.gz A1898.MYSE.M.HEE.S19.L001_R1_001.fastq.gz
mv -- A1898_MYSE_M-HEE_S19_L001_R1_002.fastq.gz A1898.MYSE.M.HEE.S19.L001_R1_002.fastq.gz
rename 's/[-_](?=.*L001)/./g' *L001*
This rename command applies the perl substitution regexp on all *L001* files and uses the input and respective output to rename files. The regexp replaces all hyphens and underscores -- followed by a lookahead (?=…) of any characters and L001 -- with a single period.
You may need to install the rename package first, since it is none-essential.

Remove suffix as well as prefix from path in bash

I have filepaths of the form:
../healthy_data/F35_HC_532d.dat
I want to extract F35_HC_532d from this. I can remove prefix and suffix from this filename in bash as:
for i in ../healthy_data/*; do echo ${i#../healthy_data/}; done # REMOVES PREFIX
for i in ../healthy_data/*; do echo ${i%.dat}; done # REMOVES SUFFIX
How can I combine these so that in a single command I would be able to remove both and extract only the part that I want?
You can use BASH regex for this like this and print captured group #1:
for file in ../healthy_data/*; do
[[ $file =~ .*/([_[:alnum:]]+)\.dat$ ]] && echo "${BASH_REMATCH[1]}"
done
If you can use Awk, it is pretty simple,
for i in ../healthy_data/*
do
stringNeeded=$(awk -F/ '{split($NF,temp,"."); print temp[1]}' <<<"$i")
printf "%s\n" "$stringNeeded"
done
The -F/ splits the input string on / character, and $NF represents the last field in the string in that case, F35_HC_532d.dat, now the split() function is called with the de-limiter . to extract the part before the dot.
The options/functions in the above Awk are POSIX compatible.
Also bash does not support nested parameter expansions, you need to modify in two fold steps something like below:-
tempString="${i#*/*/}"
echo "${tempString%.dat}"
In a single-loop,
for i in ../healthy_data/*; do tempString="${i#*/*/}"; echo "${tempString%.dat}" ; done
The two fold syntax here, "${i#*/*/}" part just stores the F35_HC_532d.dat into the variable tempString and in that variable we are removing the .dat part as "${tempString%.dat}"
If all files end with .dat (as you confirmed) you can use the basename command:
basename -s .dat /path/to/files/*
If there are many(!) of those files, use find to avoid an argument list too long error:
find /path/to/files -maxdepth 1 -name '*.dat' -exec basename -s .dat {} +
For a shell script which needs to deal if any number of .dat files use the second command!
Do you count this as one step?
for i in ../healthy_data/*; do
sed 's#\.[^.]*##'<<< "${i##*/}"
done
You can't strip both a prefix and suffix in a single parameter expansion.
However, this can be accomplished in a single loop using parameter expansion operations only. Just save the prefix stripped expansion to a variable and use expansion again to remove its suffix:
for file in ../healthy_data/*; do
prefix_stripped="${file##*\/healthy_data\/}"
echo "${prefix_stripped%.dat}"
done
If you are on zsh, one way to achieve this without the need for defining another variable is
for i in ../healthy_data/*; do echo "${${i#../healthy_data/}%.dat}"; done
This removes prefix and suffix in one step.
In your specific example the prefix stems from the fact that the files are located in a different directory. You can get rid of the prefix by cding in this case.
(cd ../healthy_data ; for i in *; do echo ${i%.dat}; done)
The (parens) invoke a sub shell process and your current shell stays where it is. If you don't want a sub shell you can cd back easily:
cd ../healthy_data ; for i in *; do echo ${i%.dat}; done; cd -

rename all files of a specific type in a directory

I am trying to use bash to rename all .txt files in a directory that match a specific pattern. My two attempts below have removed the files from the directory and threw an error. Thank you :)
input
16-0000_File-A_variant_strandbias_readcount.vcf.hg19_multianno_dbremoved_removed_final_index_inheritence_import.txt
16-0002_File-B_variant_strandbias_readcount.vcf.hg19_multianno_dbremoved_removed_final_index_inheritence_import.txt
desired output
16-0000_File-A_multianno.txt
16-0002_File-B_multianno.txt
Bash attempt 1 this removes the files from the directory
for f in /home/cmccabe/Desktop/test/vcf/overall/annovar/*_classify.txt ; do
# Grab file prefix.
p=${f%%_*_}
bname=`basename $f`
pref=${bname%%.txt}
mv "$f" ${p}_multianno.txt
done
Bash attempt 2 Substitution replacement not terminated at (eval 1) line 1.
for f in /home/cmccabe/Desktop/test/vcf/overall/annovar/*_classify.txt ; do
# Grab file prefix.
p=${f%%_*_}
bname=`basename $f`
pref=${bname%%.txt}
rename -n 's/^$f/' *${p}_multianno.txt
done
You don't need a loop. rename alone can do this:
rename -n 's/(.*?_[^_]+).*/${1}_multianno.txt/g' /home/cmccabe/Desktop/test/vcf/overall/annovar/*_classify.txt
The meaning of the regular expression is roughly,
capture everything from the start until the 2nd _,
match the rest,
and replace with the captured prefix and append _multianno.txt
With the -n flag, this command will print what it would do without actually doing it.
When the output looks good, remove the -n and rerun.

Redirecting the result files to different variable file names

I have a folder with, say, ten data files I01.txt, ..., I10.txt.. Each file, when executed using the command /a.out, gives me five output files, namely f1.txt, f2.txt, ... f5.txt.
I have written a simple bash program to execute all the files and save the output printed on the screen to a variable file using the command
./ cosima_peaks_444_temp_muuttuva -args > $counter-res.txt.
Using this, I am able to save the on screen output to the file. But the five files f1 to f5 are altered to store results of the last file run, in this case I10, and the results of the first nine files are lost.
So I want to save the output of each I*.txt file (f1 ... f5) to a a different file such that, when the program executes I01.txt, using ./a.out it stores the output of the files
f1>var1-f1.txt , f2>var1-f2.txt... f5 > var1-f5.txt
and then repeats the same for I02 (f1>var2-f1.txt ...).
#!/bin/bash
# echo "for looping over all the .txt files"
echo -e "Enter the name of the file or q to quit "
read dir
if [[ $dir = q ]]
then
exit
fi
filename="$dir*.txt"
counter=0
if [[ $dir == I ]]
then
for f in $filename ; do
echo "output of $filename"
((counter ++))
./cosima_peaks_444_temp_muuttuva $f -m202.75 -c1 -ng0.5 -a0.0 -b1.0 -e1.0 -lg > $counter-res.txt
echo "counter $counter"
done
fi
If I understand you want to pass files l01.txt, l02.txt, ... to a.out and save the output for each execution of a.out to a separate file like f01.txt, f02.txt, ..., then you could use a short script that reads each file named l*.txt in the directory and passes the value to a.out redirecting the output to a file fN.txt (were N is the same number in the lN.txt filename.) This presumes you are passing each filename to a.out and that a.out is not reading the entire directory automatically.
for i in l*.txt; do
num=$(sed 's/^l\(.*\)[.]txt/\1/' <<<"$i")
./a.out "$i" > "f${num}.txt"
done
(note: that is 's/(lowercase L) ... /\one..')
note: if you do not want the same N from the filename (with its leading '0'), then you can trim the leading '0' from the N value for the output filename.
(you can use a counter as you have shown in your edited post, but you have no guarantee in sort order of the filenames used by the loop unless you explicitly sort them)
note:, this presumes NO spaces or embedded newline or other odd characters in the filename. If your lN.txt names can have odd characters or spaces, then feeding a while loop with find can avoid the odd character issues.
With f1 - f5 Created Each Run
You know the format for the output file name, so you can test for the existence of an existing file name and set a prefix or suffix to provide unique names. For example, if your first pass creates filenames 'pass1-f01.txt', 'pass1-f02.txt', then you can check for that pattern (in several ways) and increment your 'passN' prefix as required:
for f in "$filename"; do
num=$(sed 's/l*\(.*\)[.]txt/\1/' <<<"$f")
count=$(sed 's/^0*//' <<<"$num")
while [ -f "pass${count}-f${num}.txt" ]; do
((count++))
done
./a.out "$f" > "pass${count}-f${num}.txt"
done
Give that a try and let me know if that isn't closer to what you need.
(note: the use of the herestring (<<<) is bash-only, if you need a portable solution, pipe the output of echo "$var" to sed, e.g. count=$(echo "$num" | sed 's/^0*//') )
I replaced your cosima_peaks_444_temp_muuttuva with a function myprog.
The OP asked for more explanation, so I put in a lot of comment:
# This function makes 5 output files for testing the construction
function myprog {
# Fill the test output file f1.txt with the input filename and a datestamp
echo "Output run $1 on $(date)" > f1.txt
# The original prog makes 5 output files, so I copy the new testfile 4 times
cp f1.txt f2.txt
cp f1.txt f3.txt
cp f1.txt f4.txt
cp f1.txt f5.txt
}
# Use the number in the inputfile for making a unique filename and move the output
function move_output {
# The parameter ${1} is filled with something like I03.txt
# You can get the number with a sed action, but it is more efficient to use
# bash functions, even in 2 steps.
# First step: Cut off from the end as much as possiple (%%) starting with a dot.
Inumber=${1%%.*}
# Step 2: Remove the I from the Inumber (that is filled with something like "I03").
number=${Inumber#I}
# Move all outputfiles from last run
for outputfile in f*txt; do
# Put the number in front of the original name
mv "${outputfile}" "${number}_${outputfile}"
done
}
# Start the main processing. You will perform the same logic for all input files,
# so make a loop for all files. I guess all input files start with an "I",
# followed by 2 characters (a number), and .txt. No need to use ls for listing those.
for input in I??.txt; do
# Call the dummy prog above with the name of the first input file as a parameter
myprog "${input}"
# Now finally the show starts.
# Call the function for moving the 5 outputfiles to another name.
move_output "${input}"
done
I guess you have the source code of this a.out binary. If so, I would modify it so that it outputs to several fds instead of several files. Then you can solve this very cleanly using redirects:
./a.out 3> fileX.1 4> fileX.2 5> fileX.3
and so on for every file you want to output. Writing to a file or to a (redirected) fd is equivalent in most programs (notable exception: memory mapped I/O, but that is not commonly used for such scripts - look for mmap calls)
Note that this is not very esoteric, but a very well known technique that is regularly used to separate output (stdout, fd=1) from errors (stderr, fd=2).

Resources