how to rename multiple files with inserts from a list - bash

I have hundreds of numbered files
file_001.txt
file_002.txt
file_003.txt
...
and and the same number of letters in a separate file
letters.txt:
abc
def
ghi
...
How can I rename the files in this way:
file_001_abc.txt
file_002_def.txt
file_003_ghi.txt
Thanks in advance, very much!

A combination of paste and awk came to my mind first.
paste files.txt letters.txt | awk '{split($1,file,"."); print file[1]"_"$2"."file[2]}'
If you do not have a list of the files you could replace files.txt for example by <(ls -1) but this depends on your specific setup.

count=0
for i in `cat f1`;do
let count++
var=$(head -$count letters.txt | tail -1)
mv $i ${i%.txt}_$var.txt
done
Hope this helps
Thanks

I commented the line with xargs so you can check the script.
When you like it, remove the comment.
start=0
while read next; do
(( start = start + 1 ));
printf "mv file_%03d.txt file_%03d_%s.txt\n" ${start} ${start} ${next}
# printf "file_%03d.txt file_%03d_%s.txt" ${start} ${start} ${next} | xargs mv
done < letters.txt

Related

Why is this bash loop failing to concatenate the files?

I am at my wits end as to why this loop is failing to concatenate the files the way I need it. Basically, lets say we have following files:
AB124661.lane3.R1.fastq.gz
AB124661.lane4.R1.fastq.gz
AB124661.lane3.R2.fastq.gz
AB124661.lane4.R2.fastq.gz
What we want is:
cat AB124661.lane3.R1.fastq.gz AB124661.lane4.R1.fastq.gz > AB124661.R1.fastq.gz
cat AB124661.lane3.R2.fastq.gz AB124661.lane4.R2.fastq.gz > AB124661.R2.fastq.gz
What I tried (and didn't work):
Create and save file names (AB124661) to a ID file:
ls -1 R1.gz | awk -F '.' '{print $1}' | sort | uniq > ID
This creates an ID file that stores the samples/files name.
Run the following loop:
for i in `cat ./ID`; do cat $i\.lane3.R1.fastq.gz $i\.lane4.R1.fastq.gz \> out/$i\.R1.fastq.gz; done
for i in `cat ./ID`; do cat $i\.lane3.R2.fastq.gz $i\.lane4.R2.fastq.gz \> out/$i\.R2.fastq.gz; done
The loop fails and concatenates into empty files.
Things I tried:
Yes, the ID file is definitely in the folder
When I run with echo it shows the cat command correct
Any help will be very much appreciated,
Best,
AC
why are you escaping the \> ? That's going to result in a cat: '>': No such file or directory instead of a redirection.
Don't read lines with for
while IFS= read -r id; do
cat "${id}.lane3.R1.fastq.gz" "${id}.lane4.R1.fastq.gz" > "out/${id}.R1.fastq.gz"
cat "${id}.lane3.R2.fastq.gz" "${id}.lane4.R2.fastq.gz" > "out/${id}.R2.fastq.gz"
done < ./ID
Let say you have id stored in file ./ID per line
while read -r line; do
cat "$line".lane3.R1.fastq.gz "$line".lane4.R1.fastq.gz > "$line".R1.fastq.gz
cat "$line".lane3.R2.fastq.gz "$line".lane4.R2.fastq.gz > "$line".R2.fastq.gz
done < ./ID
A pure shell solution could be like that:
for file in *.fastq.gz; do
id=${file%%.*}
[ -e "$id".R1.fastq.gz ] || cat "$id".*.R1.fastq.gz > "$id".R1.fastq.gz
[ -e "$id".R2.fastq.gz ] || cat "$id".*.R2.fastq.gz > "$id".R2.fastq.gz
done
Alternatively:
printf '%s\n' *.fastq.gz | cut -d. -f1 | sort -u |
while IFS= read -r id; do
cat "$id".*.R1.fastq.gz > "$id".R1.fastq.gz
cat "$id".*.R2.fastq.gz > "$id".R2.fastq.gz
done
This solution assumes filenames of interest don't contain newline characters.

Extract a line from a text file using grep?

I have a textfile called log.txt, and it logs the file name and the path it was gotten from. so something like this
2.txt
/home/test/etc/2.txt
basically the file name and its previous location. I want to use grep to grab the file directory save it as a variable and move the file back to its original location.
for var in "$#"
do
if grep "$var" log.txt
then
# code if found
else
# code if not found
fi
this just prints out to the console the 2.txt and its directory since the directory has 2.txt in it.
thanks.
Maybe flip the logic to make it more efficient?
f=''
while read prev
do case "$prev" in
*/*) f="${prev##*/}"; continue;; # remember the name
*) [[ -e "$f" ]] && mv "$f" "$prev";;
done < log.txt
That walks through all the files in the log and if they exist locally, move them back. Should be functionally the same without a grep per file.
If the name is always the same then why save it in the log at all?
If it is, then
while read prev
do f="${prev##*/}" # strip the path info
[[ -e "$f" ]] && mv "$f" "$prev"
done < <( grep / log.txt )
Having the file names on the same line would significantly simplify your script. But maybe try something like
# Convert from command-line arguments to lines
printf '%s\n' "$#" |
# Pair up with entries in file
awk 'NR==FNR { f[$0]; next }
FNR%2 { if ($0 in f) p=$0; else p=""; next }
p { print "mv \"" p "\" \"" $0 "\"" }' - log.txt |
sh
Test it by replacing sh with cat and see what you get. If it looks correct, switch back.
Briefly, something similar could perhaps be pulled off with printf '%s\n' "$#" | grep -A 1 -Fxf - log.txt but you end up having to parse the output to pair up the output lines anyway.
Another solution:
for f in `grep -v "/" log.txt`; do
grep "/$f" log.txt | xargs -I{} cp $f {}
done
grep -q (for "quiet") stops the output

Find files which share part of a filename

In my current directory there are many files. Some of the files share part of their filename.
e.g.:
XGAE_537493_GSR.FITS
TGFE_537493_RRF.FITS
EGRE_537497_HDR.FITS
TRTE_537497_YUH.FITS
TRXX_537499_YDF.FITS
.
.
Files 1 & 2 would be a match, as would files 3 & 4. File 5 has no match. Therefore, files 1,2,3 and 4 would be moved.
I want to move the files which share part of their filename, in order to separate them from the ones that don't.
I was attempting to do this using bash. I googled but couldn't locate websites that were quite describing the process I need. So far in pseudo-code I have:
FOR F IN *
IF ${FILE:5:10} MATCHES ANY OTHER ${FILE:5:10}
MOVE ALL MATCHES TO ANOTHER DIRECTORY
Any information to help me move in the right direction would be appreciated.
Try this:
for f in ./*.FITS ; do
middleBit=$(echo $f| cut -d'_' -f 1)
count=$(ls *middleBit*.FITS | wc -l)
if [ $count -ge 1 ]
then
for match in *middleBit*.FITS ; do
mv $match ./somewhere
done
fi
done
Using associative array in BASH 4 you can do it easily:
#!/bin/bash
declare -A arr
for f in *.FITS; do
k="${f:5:6}"
[[ ${arr[$k]} ]] && mv "$f" /dest/ || arr["$k"]=1
done
if your file structure is fixed, you can scan them and find duplicates in sub fields of the file name in awk.
for example
$ ls -1 | awk -F_ 'NF==3{f[$2]=(a[$2]++?f[$2] OFS $0:$0)}
END{for(k in f) if(a[k]>1) print f[k]} '
TGFE_537493_RRF.FITS
XGAE_537493_GSR.FITS
you can then pipe the results to a cp command
$ ... | xargs -I file cp file file.DUP
adds suffix DUP to duplicate file names, or
$ ... | xargs -I file mv file anotherlocation/
moves to anotherlocation.

Take two at a time in a bash "for file in $list" construct

I have a list of files where two subsequent ones always belong together. I would like a for loop extract two files out of this list per iteration, and then work on these two files at a time (for an example, let's say I want to just concatenate, i.e. cat the two files).
In a simple case, my list of files is this:
FILES="file1_mateA.txt file1_mateB.txt file2_mateA.txt file2_mateB.txt"
I could hack around it and say
FILES="file1 file2"
for file in $FILES
do
actual_mateA=${file}_mateA.txt
actual_mateB=${file}_mateB.txt
cat $actual_mateA $actual_mateB
done
But I would like to be able to handle lists where mate A and mate B have arbitrary names, e.g.:
FILES="first_file_first_mate.txt first_file_second_mate.txt file2_mate1.txt file2_mate2.txt"
Is there a way to extract two values out of $FILES per iteration?
Use an array for the list:
files=(fileA1 fileA2 fileB1 fileB2)
for (( i=0; i<${#files[#]} ; i+=2 )) ; do
echo "${files[i]}" "${files[i+1]}"
done
You could read the values from a while loop and use xargs to restrict each read operation to two tokens.
files="filaA1 fileA2 fileB1 fileB2"
while read -r a b; do
echo $a $b
done < <(echo $files | xargs -n2)
You could use xargs(1), e.g.
ls -1 *.txt | xargs -n2 COMMAND
The switch -n2 let xargs select 2 consecutive filenames from the pipe output which are handed down do the COMMAND
To concatenate the 10 files file01.txt ... file10.txt pairwise
one can use
ls *.txt | xargs -n2 sh -c 'cat $# > $1.$2.joined' dummy
to get the 5 result files
file01.txt.file02.txt.joined
file03.txt.file04.txt.joined
file05.txt.file06.txt.joined
file07.txt.file08.txt.joined
file09.txt.file10.txt.joined
Please see 'info xargs' for an explantion.
How about this:
park=''
for file in $files # wherever you get them from, maybe $(ls) or whatever
do
if [ "$park" = '' ]
then
park=$file
else
process "$park" "$file"
park=''
fi
done
In each odd iteration it just stores the value (in park) and in each even iteration it then uses the stored and the current value.
Seems like one of those things awk is suited for
$ awk '{for (i = 1; i <= NF; i+=2) if( i+1 <= NF ) print $i " " $(i+1) }' <<< "$FILES"
file1_mateA.txt file1_mateB.txt
file2_mateA.txt file2_mateB.txt
You could then loop over it by setting IFS=$'\n'
e.g.
#!/bin/bash
FILES="file1_mateA.txt file1_mateB.txt file2_mateA.txt file2_mateB.txt file3_mat
input=$(awk '{for (i = 1; i <= NF; i+=2) if( i+1 <= NF ) print $i " " $(i+1) }'
IFS=$'\n'
for set in $input; do
cat "$set" # or something
done
Which will try to do
$ cat file1_mateA.txt file1_mateB.txt
$ cat file2_mateA.txt file2_mateB.txt
And ignore the odd case without the match.
You can transform you string to array and read this new array by elements:
#!/bin/bash
string="first_file_first_mate.txt first_file_second_mate.txt file2_mate1.txt file2_mate2.txt"
array=(${string})
size=${#array[*]}
idx=0
while [ "$idx" -lt "$size" ]
do
echo ${array[$idx]}
echo ${array[$(($idx+1))]}
let "idx=$idx+2"
done
If you have delimiter in string different from space (i.e. ;) you can use the following transformation to array:
array=(${string//;/ })
You could try something like this:
echo file1 file2 file3 file4 | while read -d ' ' a; do read -d ' ' b; echo $a $b; done
file1 file2
file3 file4
Or this, somewhat cumbersome technique:
echo file1 file2 file3 file4 |tr " " "\n" | while :;do read a || break; read b || break; echo $a $b; done
file1 file2
file3 file4

In bash, how can I print the first n elements of a list?

In bash, how can I print the first n elements of a list?
For example, the first 10 files in this list:
FILES=$(ls)
UPDATE: I forgot to say that I want to print the elements on one line, just like when you print the whole list with echo $FILES.
FILES=(*)
echo "${FILES[#]:0:10}"
Should work correctly even if there are spaces in filenames.
FILES=$(ls) creates a string variable. FILES=(*) creates an array. See this page for more examples on using arrays in bash. (thanks lhunath)
Why not just this to print the first 50 files:
ls -1 | head -50
FILE="$(ls | head -1)"
Handled spaces in filenames correctly too when I tried it.
My way would be:
ls | head -10 | tr "\n" " "
This will print the first 10 lines returned by ls, and then tr replaces all line breaks with spaces. Output will be on a single line.
echo $FILES | awk '{for (i = 1; i <= 10; i++) {print $i}}'
Edit: AAh, missed your comment that you needed them on one line...
echo $FILES | awk '{for (i = 1; i <= 10; i++) {printf "%s ", $i}}'
That one does that.
to do it interactively:
set $FILES && eval eval echo \\\${1..10}
to run it as a script, create foo.sh with contents
N=$1; shift; eval eval echo \\\${1..$N}
and run it as
bash foo.sh 10 $FILES
An addition to the answer of "Ayman Hourieh" and "Shawn Chin", in case it is needed for something else than content of a directory.
In newer version of bash you can use mapfile to store the directory in an array. See help mapfile
mapfile -t files_in_dir < <( ls )
If you want it completely in bash use printf "%s\n" * instead of ls, or just replace ls with any other command you need.
Now you can access the array as usual and get the data you need.
First element:
${files_in_dir[0]}
Last element (do not forget space after ":" ):
${files_in_dir[#]: -1}
Range e.g. from 10 to 20:
${files_in_dir[#]:10:20}
Attention for large directories, this is way more memory consuming than the other solutions.
FILES=$(ls)
echo $FILES | fmt -1 | head -10

Resources