awk to add extracted prefix from file to filename - bash

The below awk execute as is, but it renames fields within each matching file that matches $p (which is extracted from each text file) instead of adding $x which is the prefix to add (from $1 of rename) to each filename in the directory. Each $x is followed by a_ the the filename. I can see in the echo $p the correct value to use in the lookup for $2 is extracted but each file in the directory is unchanged. Not every file in the rename will be in the directory, but it will always have a match to $p. Maybe there is a better way as I am not sure what I am doing wrong. Thank you :).
rename tab-delimeted
00-0000 File-01
00-0001 File-02
00-0002 File-03
00-0003 File-04
file1
File-01_xxxx.txt
file2
File-02_yyyy.txt
desired output
00-0000_File-01-xxxx.txt
00-0001_File-02-yyyy.txt
bash
for file1 in /path/to/folders/*.txt
do
# Grab file prefix
bname=`basename $file1` # strip of path
p="$(echo $bname|cut -d_ -f1,1)" # remove after second underscore
echo $p
# add prefix to matching file
awk -v var="$p" '$2~var{x=$1}(NR=x){print $x"_",$bname}' $file1 rename OFS="\t" > tmp && mv tmp $file1
done

This script :
touch File-01-azer.txt
touch File-02-ytrf.txt
touch File-03-fdfd.txt
touch File-04-dfrd.txt
while read p f;
do
f=$(ls $f*)
mv ${f} "${p}_${f}"
done << EEE
00-0000 File-01
00-0001 File-02
00-0002 File-03
00-0003 File-04
EEE
ls -1
outputs :
00-0000_File-01-azer.txt
00-0001_File-02-ytrf.txt
00-0002_File-03-fdfd.txt
00-0003_File-04-dfrd.txt
You can use a file as input using done < rename_map.txt or cat rename_map.txt | while

Related

Is there a way to add a suffix to files where the suffix comes from a list in a text file?

So currently the searches are coming up with a single word renaming solution, where you define the (static) suffix within the code. I need to rename based on a text based filelist and so -
I have a list of files in /home/linux/test/ :
1000.ext
1001.ext
1002.ext
1003.ext
1004.ext
Then I have a txt file (labels.txt) containing the labels I want to use:
Alpha
Beta
Charlie
Delta
Echo
I want to rename the files to look like (example1):
1000 - Alpha.ext
1001 - Beta.ext
1002 - Charlie.ext
1003 - Delta.ext
1004 - Echo.ext
How would you a script which renames all the files in /home/linux/test/ to the list in example1?
Use paste to loop through the two lists in parallel. Split the filenames into the prefix and extension, then combine everything to make the new filenames.
dir=/home/linux/test
for file in "$dir"/*.ext
do
read -r label
prefix=${file%.*} # remove everything from last .
ext=${file##*.} # remove everything before last .
mv "$file" "$prefix - $label.$ext"
done < labels.txt
I originally partly got the request wrong, although this step is still useful, because it gives you the filenames you need.
#!/bin/sh
count=1000
cp labels.txt stack
cat > ed1 <<EOF
1p
q
EOF
cat > ed2 <<EOF
1d
wq
EOF
next () {
[ -s stack ] && main
}
main () {
line="$(ed -s stack < ed1)"
echo "${count} - ${line}.ext" >> newfile
ed -s stack < ed2
count=$(($count+1))
next
}
next
Now we just need to move the files:-
cp newfile stack
for i in *.ext
do
newname="$(ed -s stack < ed1)"
mv -v "${i}" "${newname}"
ed -s stack < ed2
done
rm -v ./ed1
rm -v ./ed2
rm -v ./stack
rm -v ./newfile
On the possibility that you don't have exactly the same number of files as labels, I set it up to cycle a couple of arrays in pseudo-parallel.
$: cat script
#!/bin/env bash
lst=( *.ext ) # array of files to rename
mapfile -t labels < labels.txt # array of labels to attach
for ndx in ${!lst[#]} # for each filename's numeric index
do # assign the new name
new="${lst[ndx]/.ext/ - ${labels[ndx%${#labels[#]}]}.ext}"
# show the command to rename the file
echo "mv \"${lst[ndx]}\" \"$new\""
done
$: ls -1 *ext # I added an extra file
1000.ext
1001.ext
1002.ext
1003.ext
1004.ext
1005.ext
$: ./script # loops back if more files than labels
mv "1000.ext" "1000 - Alpha.ext"
mv "1001.ext" "1001 - Beta.ext"
mv "1002.ext" "1002 - Charlie.ext"
mv "1003.ext" "1003 - Delta.ext"
mv "1004.ext" "1004 - Echo.ext"
mv "1005.ext" "1005 - Alpha.ext"
$: ./script > do # use ./script to write ./do
$: ./do # use ./do to change the names
$: ls -1
'1000 - Alpha.ext'
'1001 - Beta.ext'
'1002 - Charlie.ext'
'1003 - Delta.ext'
'1004 - Echo.ext'
'1005 - Alpha.ext'
do
labels.txt
script
You can just remove the echo to have ./script rename the files there.
I renamed labels to labels.txt to match your example.
If you aren't using bash this will need a call to something like sed or awk. Here's a short awk-based script that will do the same.
$: cat script2
#!/bin/env sh
printf "%s\n" *.ext > files.txt
awk 'NR==FNR{label[i++]=$0}
NR>FNR{ if (! label[i] ) { i=0 } cmd="mv \""$0"\" \""gensub(/[.]ext/, " - "label[i++]".ext", 1)"\"";
print cmd;
# system(cmd);
}' labels.txt files.txt
Uncomment the system line to make it actually do the renames as well.
It does assume your filenames don't have embedded newlines. Let us know if that's a problem.

Cat content of files to .txt files with common pattern name in bash

I have a series of .dat files and a series of .txt files that have a common matching pattern. I want to cat the content of the .dat files into each respective .txt file with the matching pattern in the file name, in a loop. Example files are:
xfile_pr_WRF_mergetime_regionA.nc.dat
xfile_pr_GFDL_mergetime_regionA.nc.dat
xfile_pr_RCA_mergetime_regionA.nc.dat
#
yfile_pr_WRF_mergetime_regionA.nc.dat
yfile_pr_GFDL_mergetime_regionA.nc.dat
yfile_pr_RCA_mergetime_regionA.nc.dat
#
pr_WRF_mergetime_regionA_final.txt
pr_GFDL_mergetime_regionA_final.txt
pr_RCA_mergetime_regionA_final.txt
What I have tried so far is the following (I am trying to cat the content of all files starting with "xfile" to the respective model .txt file.
#
find -name 'xfile*' | sed 's/_mergetime_.*//' | sort -u | while read -r pattern
do
echo "${pattern}"*
cat "${pattern}"* >> "${pattern}".txt
done
Let me make some assumptions:
All filenames contain _mergetime_* substring.
The pattern is the portion such as pr_GFDL and is essential to
identify the file.
Then would you try the following:
declare -A map # create an associative array
for f in xfile_*.dat; do # loop over xfile_* files
pattern=${f%_mergetime_*} # remove _mergetime_* substring to extract pattern
pattern=${pattern#xfile_} # remove xfile_ prefix
map[$pattern]=$f # associate the pattern with the filename
done
for f in *.txt; do # loop over *.txt files
pattern=${f%_mergetime_*} # extract the pattern
[[ -f ${map[$pattern]} ]] && cat "${map[$pattern]}" >> "$f"
done
If I understood you correctly, you want the following:
- xfile_pr_WRF_mergetime_regionA.nc.dat
- yfile_pr_WRF_mergetime_regionA.nc.dat
----> pr_WRF_mergetime_regionA_final.txt
- xfile_pr_GFDL_mergetime_regionA.nc.dat
- yfile_pr_GFDL_mergetime_regionA.nc.dat
----> pr_GFDL_mergetime_regionA_final.txt
- xfile_pr_RCA_mergetime_regionA.nc.dat
- yfile_pr_RCA_mergetime_regionA.nc.dat
----> pr_RCA_mergetime_regionA_final.txt
So here's what you want to do in the script:
Get all .nc.dat files in the directory
Extra the pr_TYPE_mergetime_region from the file
Append the _final.txt part to the output file
Then actually pipe the cat output onto that file
So I ended up with the following code:
find *.dat | while read -r pattern
do
output=$(echo $pattern | sed -e 's![^(pr)]*!!' -e 's!.nc.dat!!')
cat $pattern >> "${output}_final.txt"
done
And here are the files I ended up with:
pr_GFDL_mergetime_regionA_final.txt
pr_RCA_mergetime_regionA_final.txt
pr_WRF_mergetime_regionA_final.txt
Kindly let me know in the comments if I misunderstood anything or missed anything.
Seems like what you asks for:
concatxy.sh:
#!/usr/bin/env bash
# do not return the pattern if no file matches
shopt -s nullglob
# Iterate all xfiles
for xfile in "xfile_pr_"*".nc.dat"; do
# Regex to extract the common filename part
[[ "$xfile" =~ ^xfile_(.*)\.nc\.dat$ ]]
# Compose the matching yfile name
yfile="yfile_${BASH_REMATCH[1]}.nc.dat"
# Compose the output text file name
txtfile="${BASH_REMATCH[1]}_final.txt"
# Perform the concatenation of xfile and yfile into the .txt file
cat "$xfile" "$yfile" >"$txtfile"
done
Creating populated test files:
preptest.sh:
#!/usr/bin/env bash
# Populating test files
echo "Content of xfile_pr_WRF_mergetime_regionA.nc.dat" >xfile_pr_WRF_mergetime_regionA.nc.dat
echo "Content of xfile_pr_GFDL_mergetime_regionA.nc.dat" >xfile_pr_GFDL_mergetime_regionA.nc.dat
echo "Content of xfile_pr_RCA_mergetime_regionA.nc.dat" >xfile_pr_RCA_mergetime_regionA.nc.dat
#
echo "Content of yfile_pr_WRF_mergetime_regionA.nc.dat" > yfile_pr_WRF_mergetime_regionA.nc.dat
echo "Content of yfile_pr_GFDL_mergetime_regionA.nc.dat" >yfile_pr_GFDL_mergetime_regionA.nc.dat
echo "Content of yfile_pr_RCA_mergetime_regionA.nc.dat" >yfile_pr_RCA_mergetime_regionA.nc.dat
#
#pr_WRF_mergetime_regionA_final.txt
#pr_GFDL_mergetime_regionA_final.txt
#pr_RCA_mergetime_regionA_final.txt
Running test
$ bash ./preptest.sh
$ bash ./concatxy.sh
$ ls -tr1
concatxy.sh
preptest.sh
yfile_pr_WRF_mergetime_regionA.nc.dat
yfile_pr_RCA_mergetime_regionA.nc.dat
yfile_pr_GFDL_mergetime_regionA.nc.dat
xfile_pr_WRF_mergetime_regionA.nc.dat
xfile_pr_RCA_mergetime_regionA.nc.dat
xfile_pr_GFDL_mergetime_regionA.nc.dat
pr_GFDL_mergetime_regionA_final.txt
pr_WRF_mergetime_regionA_final.txt
pr_RCA_mergetime_regionA_final.txt
$ cat pr_GFDL_mergetime_regionA_final.txt
Content of xfile_pr_GFDL_mergetime_regionA.nc.dat
Content of yfile_pr_GFDL_mergetime_regionA.nc.dat
$ cat pr_WRF_mergetime_regionA_final.txt
Content of xfile_pr_WRF_mergetime_regionA.nc.dat
Content of yfile_pr_WRF_mergetime_regionA.nc.dat
$ cat pr_RCA_mergetime_regionA_final.txt
Content of xfile_pr_RCA_mergetime_regionA.nc.dat
Content of yfile_pr_RCA_mergetime_regionA.nc.dat

How to iterate over several files in bash

I have several files to compare. con and ref files contain list of paths to .txt files that should be compared,and the output should contain the variable name of con_vs_ref_1.txt.
con:
/home/POP_xpclr/A.txt
/home/POP_xpclr/B.txt
ref:
/home/POP_xpclr/C.txt
/home/POP_xpclr/D.txt
#!/usr/bin/env bash
XPCLR="/home/Tools/XPCLR/bin/XPCLR"
CON="/home/POP_xpclr/con"
REF="/home/POP_xpclr/ref"
MAPS="/home/POP_xpclr/1"
OUTDIR="/home/POP_xpclr/Results"
$XPCLR -xpclr $CON $REF $MAPS $OUTDIR -w1 0.5 200 1000000 $MAPS -p1 0.95
Comments in code.
# create an MCVE, ie. input files:
cat <<EOF >con
/home/POP_xpclr/A.txt
/home/POP_xpclr/B.txt
EOF
cat <<EOF >ref
/home/POP_xpclr/C.txt
/home/POP_xpclr/D.txt
ref
# join streams
paste <(
# repeat ref file times con file has lines
seq $(<con wc -l) |
xargs -i cat ref
) <(
# repeat each line from con file times ref file has lines
# from https://askubuntu.com/questions/594554/repeat-each-line-in-a-text-n-times
awk -v max=$(<ref wc -l) '{for (i = 0; i < max; i++) print $0}' con
) |
# ok, we have all combinations of lines
# now read them field by field and do whatever we want
while read -r file1 file2; do
# run the compare function
cmp "$file1" "$file2"
# probably you want something along:
"$XPCLR" -xpclr "$file1" "$file2" "$MAPS" "$OUTDIR" -w1 0.5 200 1000000 "$MAPS" -p1 0.95
done
Looping over the file paths in your con and ref files is pretty easy in bash.
As for "the output should contain the variable name of con_vs_ref_1.txt", you haven't explained what you want very well, but I'll guess that you want the file created to be named according to that formula and inside the output directory. Something like /home/POP_xpclr/Results/A_vs_C_1.txt.
#!/usr/bin/env bash
XPCLR="/home/Tools/XPCLR/bin/XPCLR"
CON="/home/POP_xpclr/con"
REF="/home/POP_xpclr/ref"
MAPS="/home/POP_xpclr/1"
OUTDIR="/home/POP_xpclr/Results"
for FILE1 in $(cat $CON)
do
for FILE2 in $(cat $REF)
do
OUTFILE="$OUTDIR/$(basename ${FILE1%.txt})_vs_$(basename ${FILE2%.txt})_1.txt"
$XPCLR -xpclr $FILE1 $FILE2 $MAPS $OUTFILE -w1 0.5 200 1000000 $MAPS -p1 0.95
done
done
What's this doing...
$(cat $CON) creates a subshell and runs cat to read your CON file, inserting the output (i.e. all the file paths) into the script at that point
for FILE1 in $(cat $CON) creates a loop where all the values read from your CON file are iterated across and assigned to the FILE1 variable one at a time.
for FILE2 in $(cat $REF) as above but with the REF file.
${FILE1%.txt} inserts the value of FILE1 variable, with ".txt" extension removed from the end. This is called parameter expansion.
$(basename ${FILE1%.txt}) makes a subshell as before, basename strips the path of all the leading directories and returns just the filename, which we have already stripped of the ".txt" extension with the parameter expansion.
OUTFILE="$OUTDIR/$(basename ${FILE1%.txt})_vs_$(basename ${FILE2%.txt})_1.txt" combines the above two dot points to create your new file path based on your formula.
do and done are parts of the for loop construct that I hope are pretty self explanatory.

How to process tr across all files in a directory and output to a different name in another directory?

mpu3$ echo * | xargs -n 1 -I {} | tr "|" "/n"
which outputs:
#.txt
ag.txt
bg.txt
bh.txt
bi.txt
bid.txt
dh.txt
dw.txt
er.txt
ha.txt
jo.txt
kc.txt
lfr.txt
lg.txt
ng.txt
pb.txt
r-c.txt
rj.txt
rw.txt
se.txt
sh.txt
vr.txt
wa.txt
is what I have so far. What is missing is the output; I get none. What I really want is to get a list of txt files, use their name up to the extension, process out the "|" and replace it with a LF/CR and put the new file in another directory as [old-name].ics. HALP. THX in advance. - Idiot me.
You can loop over the files and use sed to process the file:
for i in *.txt; do
sed -e 's/|/\n/g' "$i" > other_directory/"${i%.txt}".ics
done
No need to use xargs, especially with echo which would risk the filenames getting word split and having globbing apply to them, so could well do the wrong thing.
Then we use sed and use s to substitute | with \n g makes it a global replace. We redirect that to the other director you want and use bash's parameter expansion to strip off the .txt from the end
Here's an awk solution:
$ awk '
FNR==1 { # for first record of every file
close(f) # close previous file f
f="path_to_dir/" FILENAME # new filename with path
sub(/txt$/,"ics",f) } # replace txt with ics
{
gsub(/\|/,"\n") # replace | with \n
print > f }' *.txt # print to new file

Getting different output files

I'm doing a test with these files:
comp900_c0_seq1_Glicose_1_ACTTGA_merge_R1_001.fastq
comp900_c0_seq1_Glicose_1_ACTTGA_merge_R2_001.fastq
comp900_c0_seq2_Glicose_1_ACTTGA_merge_R1_001.fastq
comp900_c0_seq2_Glicose_1_ACTTGA_merge_R2_001.fastq
comp995_c0_seq1_Glicose_1_ACTTGA_merge_R2_001.fastq
comp995_c0_seq1_Xilano_1_AGTCAA_merge_R1_001.fastq
comp995_c0_seq1_Xilano_1_AGTCAA_merge_R2_001.fastq
I want to get the files that have the same code until the first _ (underscore) and have the code R1 in different output files. The output files should be called according with the code until the first _ (underscore).
-This is my code, but I'm having trouble on making the output files.
#!/bin/bash
for i in {900..995}; do
if [[ ${i} -eq ${i} ]]; then
cat comp${i}_*_R1_001.fastq
fi
done
-I want to have two outputs:
One output will have all lines from:
comp900_c0_seq1_Glicose_1_ACTTGA_merge_R1_001.fastq
comp900_c0_seq2_Glicose_1_ACTTGA_merge_R1_001.fastq
and its name should be comp900_R1.out
The other output will have lines from:
comp995_c0_seq1_Xilano_1_AGTCAA_merge_R1_001.fastq
and its name should be comp995_R1.out
Finally, as I said, this is a small test. I want my script to work with a lot of files that have the same characteristics.
Using awk:
ls -1 *.fastq | awk -F_ '$8 == "R1" {system("cat " $0 ">>" $1 "_R1.out")}'
List all files *.fastq into awk, splitting on _. Check if 8:th part $8 is R1, then append cat >> the file into first part $1 + _R1.out, which will be comp900_R1.out or comp995_R1.out. It is assumed that no filenames contain spaces or other special characters.
Result:
File comp900_R1.out containing all lines from
comp900_c0_seq1_Glicose_1_ACTTGA_merge_R1_001.fastq
comp900_c0_seq2_Glicose_1_ACTTGA_merge_R1_001.fastq
and file comp995_R1.out containing all lines from
comp995_c0_seq1_Xilano_1_AGTCAA_merge_R1_001.fastq
My stab at a general solution:
#!/bin/bash
for f in *_R1_*; do
code=$(echo $f | cut -d _ -f 1)
cat $f >> ${code}_c0_seq1_Glicose_1_ACTTGA_merge_R1_001.fastq
done
Iterates over files with _R1_ in it, then appends its output to a file based on code.
cut pulls out the code by splitting the filename (-d _) and returning the first field (-f 1).

Resources