Rename files in a directory; new name depends on part of old name - bash

I want to rename files in a directory whose names match certain suffixes, namely 810 or 814. I want the new names to include a string selected depending on the suffix (for 814, EB_ENROLL_REQ; for 810, EB_BCHG_REQ).
Examples of the input filenames (all in $source_dir) are:
CCRD_LLX_814_20160218043477.EDI810
CCRD_LLX_814_20160218043407.EDI814
helloworld
CCRD_LLX_814_20160218043487.EDI814
test123
files.txt
CCRD_LLX_814_20160218043467.EDI810
I want to read all files in the directory and rename only the files ending with 814 or 810, ignoring the rest.
I tried:
export search_dir=/home/test2
declare -a myArray
myArray[814]=EB_ENROLL_REQ
myArray[810]=EB_BCHG_REQ
for entry in "$search_dir"/*
do
pattern=${entry: -3}
#if ??
mv "$entry" "$entry.XHS.JOBRUNID.${myArray[$pattern]}.$entry.XHE"
done
but didn't get what I need.
The output filename for an 814 file should be, for example:
CCRD_LLX_814_20160218043487.EDI814.XHS.JOBRUNID.EB_ENROLL_REQ.CCRD_LLX_814_20160218043487.EDI814.XHE

Try this:
declare -A myArray # -A, not -a --- index by strings
myArray["814"]=EB_ENROLL_REQ # string suffixes, not numeric
myArray["810"]=EB_BCHG_REQ
cd "$search_dir" # Otherwise you have to strip $search_dir out of $entry
for entry in *81[04] # Only work on the files that end with 810 or 814
do
pattern=${entry: -3} # the string "810" or "814"
mv "$entry" "$entry.XHS.JOBRUNID.${myArray[$pattern]}.$entry.XHE"
done

Related

Rename multiple datetime files in Unix by inserting - and _ characters

I have many files in a directory that I want to rename so that they are recognizable according to a certain convention:
SURFACE_OBS:2019062200
SURFACE_OBS:2019062206
SURFACE_OBS:2019062212
SURFACE_OBS:2019062218
SURFACE_OBS:2019062300
etc.
How can I rename them in UNIX to be as follows?
SURFACE_OBS:2019-06-22_00
SURFACE_OBS:2019-06-22_06
SURFACE_OBS:2019-06-22_12
SURFACE_OBS:2019-06-22_18
SURFACE_OBS:2019-06-23_00
A bash shell loop using mv and parameter expansion could do it:
for file in *:[[:digit:]][[:digit:]][[:digit:]][[:digit:]][[:digit:]][[:digit:]][[:digit:]][[:digit:]][[:digit:]][[:digit:]]
do
prefix=${file%:*}
suffix=${file#*:}
mv -- "${file}" "${prefix}:${suffix:0:4}-${suffix:4:2}-${suffix:6:2}_${suffix:8:2}"
done
This loop picks up every file that matches the pattern:
* -- anything
: -- a colon
[[:digit:]][[:digit:]][[:digit:]][[:digit:]][[:digit:]][[:digit:]][[:digit:]][[:digit:]][[:digit:]][[:digit:]] -- 10 digits
... and then renames it by inserting dashes and and underscore in the desired locations.
I've chosen the wildcard for the loop carefully so that it tries to match the "input" files and not the renamed files. Adjust the pattern as needed if your actual filenames have edge cases that cause the wildcard to fail (and thus rename the files a second time).
#!/bin/bash
strindex() {
# get position of character in string
x="${1%%"$2"*}"
[[ "$x" = "$1" ]] && echo -1 || echo "${#x}"
}
get_new_filename() {
# change filenames like: SURFACE_OBS:2019062218
# into filenames like: SURFACE_OBS:2019-06-22_18
src_str="${1}"
# add last underscore 2 characters from end of string
final_underscore_pos=${#src_str}-2
src_str="${src_str:0:final_underscore_pos}_${src_str:final_underscore_pos}"
# get position of colon in string
colon_pos=$(strindex "${src_str}" ":")
# get dash locations relative to colon position
y_dash_pos=${colon_pos}+5
m_dash_pos=${colon_pos}+8
# now add dashes in date
src_str="${src_str:0:y_dash_pos}-${src_str:y_dash_pos}"
src_str="${src_str:0:m_dash_pos}-${src_str:m_dash_pos}"
echo "${src_str}"
}
# accept path as argument or default to /tmp/baz/data
target_dir="${1:-/tmp/baz/data}"
while read -r line ; do
# since file renaming depends on position of colon extract
# base filename without path in case path has colons
base_dir=${line%/*}
filename_to_change=$(basename "${line}")
echo "mv ${line} ${base_dir}/$(get_new_filename "${filename_to_change}")"
# find cmd attempts to exclude files that have already been renamed
done < <(find "${target_dir}" -name 'SURFACE*' -a ! -name '*_[0-9]\{2\}$')

How to pack csv files having same prefix name into zip in bash

Assume that I have many csv file located into /home/user/test
123_24112021_DONG.csv
122_24112021_DONG.csv
145_24112021_DONG.csv
123_24112021_FINA.csv
122_24112021_FINA.csv
145_24112021_FINA.csv
123_24112021_INDEM.csv
122_24112021_INDEM.csv
145_24112021_INDEM.csv
As you can see, all files have three unique prefix :
145
123
122
And, I need to create zip per prefix which will contains csv files. Note that in reality, I dont know the number of csv file, It is just an example (3 csv files per prefix).
I developed a code that extract prefixes from all csv names in bash table :
for entry in "$search_dir"/*
do
# extract csv files
f1=${entry##*/}
echo $f1
# extract prefix of each file
f2=${f1%%_*}
echo $f2
# add prefix in table
liste_sirets+=($f2)
done
# get uniq prefix in unique_sorted_list
unique_sorted_list=($(printf "%s\n" "${liste_sirets[#]}" | sort -u ))
echo $unique_sorted_list
which give the result :
145
123
122
Now I want to zip each three files defined by their prefix in same zip file :
In other word, create 123_24112021_M2.zip which will contains
123_24112021_DONG.csv
123_24112021_FINA.csv
123_24112021_INDEM.csv
and 122_24112021_M2.zip 145_24112021_M2.zip ...
So, I developed a loop which will focus on each prefix name of csv files located in local path then zip all having the same prefix name :
for i in $unique_sorted_list
do
for j in "$search_dir"/*
do
if $(echo $j| cut -d'_' -f1)==$i
zip -jr $j
done
But, it does not work, any help, please ! thank you !
Using bash and shell utilities:
#!/bin/bash
printf '%s\n' *_*.csv | cut -d_ -f1 | uniq |
while read -r prefix
do
zip "$prefix".zip "$prefix"_*.csv
done
Update:
It is also requested to group files by date (the second part of the filename):
#!/bin/bash
printf '%s\n' *_*_*.csv | cut -d_ -f2 | sort -u |
while read -r date
do
zip "$date".zip ./*_"$date"_*.csv
done
Using bash 4+ associative arrays:
# declare an associative array
declare -A unq
# store unique prefixes in array unq
for f in *_*.csv; do
unq["${f%%_*}"]=1
done
# iterate through unq and create zip files
for i in "${!unq[#]}"; do
zip "$i" "${i}_"*
done

BASH: File sorting according to file name

I need to sort 12000 filles into 1000 groups, according to its name and create for each group a new folder containing filles of this group. The name of each file is given in multi-column format (with _ separator), where the second column is varried from 1 to 12 (number of the part) and the last column ranged from 1 to 1000 (number of the system), indicating that initially 1000 different systems (last column) were splitted on 12 separate parts (second column).
Here is an example for a small subset based on 3 systems devided by 12 parts, totally 36 filles.
7000_01_lig_cne_1.dlg
7000_02_lig_cne_1.dlg
7000_03_lig_cne_1.dlg
...
7000_12_lig_cne_1.dlg
7000_01_lig_cne_2.dlg
7000_02_lig_cne_2.dlg
7000_03_lig_cne_2.dlg
...
7000_12_lig_cne_2.dlg
7000_01_lig_cne_3.dlg
7000_02_lig_cne_3.dlg
7000_03_lig_cne_3.dlg
...
7000_12_lig_cne_3.dlg
I need to group these filles based on the second column of their names (01, 02, 03 .. 12), thus creating 1000 folders, which should contrain 12 filles for each system in the following manner:
Folder1, name: 7000_lig_cne_1, it contains 12 filles: 7000_{this is from 01 to 12}_lig_cne_1.dlg
Folder2, name: 7000_lig_cne_2, it contains 12 filles 7000_{this is from 01 to 12}_lig_cne_2.dlg
...
Folder1000, name: 7000_lig_cne_1000, it contains 12 filles 7000_{this is from 01 to 12}_lig_cne_1000.dlg
Assuming that all *.dlg filles are present withint the same dir, I propose bash loop workflow, which only lack some sorting function (sed, awk ??), organized in the following manner:
#set the name of folder with all DLG
home=$PWD
FILES=${home}/all_DLG/7000_CNE
# set the name of protein and ligand library to analyse
experiment="7000_CNE"
#name of the output
output=${home}/sub_folders_to_analyse
#now here all magic comes
rm -r ${output}
mkdir ${output}
# sed sollution
for i in ${FILES}/*.dlg # define this better to suit your needs
do
n=$( <<<"$i" sed 's/.*[^0-9]\([0-9]*\)\.dlg$/\1/' )
# move the file to proper dir
mkdir -p ${output}/"${experiment}_lig$n"
cp "$i" ${output}/"${experiment}_lig$n"
done
! Note: there I indicated beggining of the name of each folder as ${experiment} to which I add the number of the final column $n at the end. Would it be rather possible to set up each time the name of the new folder automatically based on the name of the coppied filles? Manually it could be achived via skipping the second column in the name of the folder
cp ./all_DLG/7000_*_lig_cne_987.dlg ./output/7000_lig_cne_987
Iterate over files. Extract the destination directory name from the filename. Move the file.
for i in *.dlg; do
# extract last number with your favorite tool
n=$( <<<"$i" sed 's/.*[^0-9]\([0-9]*\)\.dlg$/\1/' )
# move the file to proper dir
echo mkdir -p "folder$n"
echo mv "$i" "folder$n"
done
Notes:
Do not use upper case variables in your scripts. Use lower case variables.
Remember to quote variables expansions.
Check your scripts with http://shellcheck.net
Tested on repl
update: for OP's foldernaming convention:
for i in *.dlg; do
foldername="$HOME/output/${i%%_*}_${i#*_*_}"
echo mkdir -p "$foldername"
echo mv "$i" "$foldername"
done
This might work for you (GNU parallel):
ls *.dlg |
parallel --dry-run 'd={=s/^(7000_).*(lig.*)\.dlg/$1$2/=};mkdir -p $d;mv {} $d'
Pipe the output of ls command listing files ending in .dlg to parallel, which creates directories and moves the files to them.
Run the solution as is, and when satisfied the output of the dry run is ok, remove the option --dry-run.
The solution could be one instruction:
parallel 'd={=s/^(7000_).*(lig.*)\.dlg/$1$2/=};mkdir -p $d;mv {} $d' ::: *.dlg
Using POSIX shell's built-in grammar only and sort:
#!/usr/bin/env sh
curdir=
# Create list of files with newline
# Safe since we know there is no special
# characters in name
printf -- %s\\n *.dlg |
# Sort the list by 5th key with _ as field delimiter
sort -t_ -k5 |
# Iterate reading the _ delimited fields of the sorted list
while IFS=_ read -r _ _ c d e; do
# Compose the new directory name
newdir="${c}_${d}_${e%.dlg}"
# If we enter a new group / directory
if [ "$curdir" != "$newdir" ]; then
# Make the new directory current
curdir="$newdir"
# Create the new directory
echo mkdir -p "$curdir"
# Move all its files into it
echo mv -- *_"$curdir.dlg" "$curdir/"
fi
done
Optionally as a sort and xargs arguments stream:
printf -- %s\\n * |
sort -u -t_ -k5
xargs -n1 sh -c
'd="lig_cne_${0##*_}"
d="${d%.dlg}"
echo mkdir -p "$d"
echo mv -- *"_$d.dlg" "$d/"
'
Here is a very simple awk script that do the trick in single sweep.
script.awk
BEGIN{FS="[_.]"} # make field separator "_" or "."
{ # for each filename
dirName=$1"_"$3"_"$4"_"$5; # compute the target dir name from fields
sysCmd = "mkdir -p " dirName"; cp "$0 " "dirName; # prepare bash command
system(sysCmd); # run bash command
}
running script.awk
ls -1 *.dlg | awk -f script.awk
oneliner awk script
ls -1 *.dlg | awk 'BEGIN{FS="[_.]"}{d=$1"_"$3"_"$4"_"$5;system("mkdir -p "d"; cp "$0 " "d);}'

Cat content of files to .txt files with common pattern name in bash

I have a series of .dat files and a series of .txt files that have a common matching pattern. I want to cat the content of the .dat files into each respective .txt file with the matching pattern in the file name, in a loop. Example files are:
xfile_pr_WRF_mergetime_regionA.nc.dat
xfile_pr_GFDL_mergetime_regionA.nc.dat
xfile_pr_RCA_mergetime_regionA.nc.dat
#
yfile_pr_WRF_mergetime_regionA.nc.dat
yfile_pr_GFDL_mergetime_regionA.nc.dat
yfile_pr_RCA_mergetime_regionA.nc.dat
#
pr_WRF_mergetime_regionA_final.txt
pr_GFDL_mergetime_regionA_final.txt
pr_RCA_mergetime_regionA_final.txt
What I have tried so far is the following (I am trying to cat the content of all files starting with "xfile" to the respective model .txt file.
#
find -name 'xfile*' | sed 's/_mergetime_.*//' | sort -u | while read -r pattern
do
echo "${pattern}"*
cat "${pattern}"* >> "${pattern}".txt
done
Let me make some assumptions:
All filenames contain _mergetime_* substring.
The pattern is the portion such as pr_GFDL and is essential to
identify the file.
Then would you try the following:
declare -A map # create an associative array
for f in xfile_*.dat; do # loop over xfile_* files
pattern=${f%_mergetime_*} # remove _mergetime_* substring to extract pattern
pattern=${pattern#xfile_} # remove xfile_ prefix
map[$pattern]=$f # associate the pattern with the filename
done
for f in *.txt; do # loop over *.txt files
pattern=${f%_mergetime_*} # extract the pattern
[[ -f ${map[$pattern]} ]] && cat "${map[$pattern]}" >> "$f"
done
If I understood you correctly, you want the following:
- xfile_pr_WRF_mergetime_regionA.nc.dat
- yfile_pr_WRF_mergetime_regionA.nc.dat
----> pr_WRF_mergetime_regionA_final.txt
- xfile_pr_GFDL_mergetime_regionA.nc.dat
- yfile_pr_GFDL_mergetime_regionA.nc.dat
----> pr_GFDL_mergetime_regionA_final.txt
- xfile_pr_RCA_mergetime_regionA.nc.dat
- yfile_pr_RCA_mergetime_regionA.nc.dat
----> pr_RCA_mergetime_regionA_final.txt
So here's what you want to do in the script:
Get all .nc.dat files in the directory
Extra the pr_TYPE_mergetime_region from the file
Append the _final.txt part to the output file
Then actually pipe the cat output onto that file
So I ended up with the following code:
find *.dat | while read -r pattern
do
output=$(echo $pattern | sed -e 's![^(pr)]*!!' -e 's!.nc.dat!!')
cat $pattern >> "${output}_final.txt"
done
And here are the files I ended up with:
pr_GFDL_mergetime_regionA_final.txt
pr_RCA_mergetime_regionA_final.txt
pr_WRF_mergetime_regionA_final.txt
Kindly let me know in the comments if I misunderstood anything or missed anything.
Seems like what you asks for:
concatxy.sh:
#!/usr/bin/env bash
# do not return the pattern if no file matches
shopt -s nullglob
# Iterate all xfiles
for xfile in "xfile_pr_"*".nc.dat"; do
# Regex to extract the common filename part
[[ "$xfile" =~ ^xfile_(.*)\.nc\.dat$ ]]
# Compose the matching yfile name
yfile="yfile_${BASH_REMATCH[1]}.nc.dat"
# Compose the output text file name
txtfile="${BASH_REMATCH[1]}_final.txt"
# Perform the concatenation of xfile and yfile into the .txt file
cat "$xfile" "$yfile" >"$txtfile"
done
Creating populated test files:
preptest.sh:
#!/usr/bin/env bash
# Populating test files
echo "Content of xfile_pr_WRF_mergetime_regionA.nc.dat" >xfile_pr_WRF_mergetime_regionA.nc.dat
echo "Content of xfile_pr_GFDL_mergetime_regionA.nc.dat" >xfile_pr_GFDL_mergetime_regionA.nc.dat
echo "Content of xfile_pr_RCA_mergetime_regionA.nc.dat" >xfile_pr_RCA_mergetime_regionA.nc.dat
#
echo "Content of yfile_pr_WRF_mergetime_regionA.nc.dat" > yfile_pr_WRF_mergetime_regionA.nc.dat
echo "Content of yfile_pr_GFDL_mergetime_regionA.nc.dat" >yfile_pr_GFDL_mergetime_regionA.nc.dat
echo "Content of yfile_pr_RCA_mergetime_regionA.nc.dat" >yfile_pr_RCA_mergetime_regionA.nc.dat
#
#pr_WRF_mergetime_regionA_final.txt
#pr_GFDL_mergetime_regionA_final.txt
#pr_RCA_mergetime_regionA_final.txt
Running test
$ bash ./preptest.sh
$ bash ./concatxy.sh
$ ls -tr1
concatxy.sh
preptest.sh
yfile_pr_WRF_mergetime_regionA.nc.dat
yfile_pr_RCA_mergetime_regionA.nc.dat
yfile_pr_GFDL_mergetime_regionA.nc.dat
xfile_pr_WRF_mergetime_regionA.nc.dat
xfile_pr_RCA_mergetime_regionA.nc.dat
xfile_pr_GFDL_mergetime_regionA.nc.dat
pr_GFDL_mergetime_regionA_final.txt
pr_WRF_mergetime_regionA_final.txt
pr_RCA_mergetime_regionA_final.txt
$ cat pr_GFDL_mergetime_regionA_final.txt
Content of xfile_pr_GFDL_mergetime_regionA.nc.dat
Content of yfile_pr_GFDL_mergetime_regionA.nc.dat
$ cat pr_WRF_mergetime_regionA_final.txt
Content of xfile_pr_WRF_mergetime_regionA.nc.dat
Content of yfile_pr_WRF_mergetime_regionA.nc.dat
$ cat pr_RCA_mergetime_regionA_final.txt
Content of xfile_pr_RCA_mergetime_regionA.nc.dat
Content of yfile_pr_RCA_mergetime_regionA.nc.dat

awk to add extracted prefix from file to filename

The below awk execute as is, but it renames fields within each matching file that matches $p (which is extracted from each text file) instead of adding $x which is the prefix to add (from $1 of rename) to each filename in the directory. Each $x is followed by a_ the the filename. I can see in the echo $p the correct value to use in the lookup for $2 is extracted but each file in the directory is unchanged. Not every file in the rename will be in the directory, but it will always have a match to $p. Maybe there is a better way as I am not sure what I am doing wrong. Thank you :).
rename tab-delimeted
00-0000 File-01
00-0001 File-02
00-0002 File-03
00-0003 File-04
file1
File-01_xxxx.txt
file2
File-02_yyyy.txt
desired output
00-0000_File-01-xxxx.txt
00-0001_File-02-yyyy.txt
bash
for file1 in /path/to/folders/*.txt
do
# Grab file prefix
bname=`basename $file1` # strip of path
p="$(echo $bname|cut -d_ -f1,1)" # remove after second underscore
echo $p
# add prefix to matching file
awk -v var="$p" '$2~var{x=$1}(NR=x){print $x"_",$bname}' $file1 rename OFS="\t" > tmp && mv tmp $file1
done
This script :
touch File-01-azer.txt
touch File-02-ytrf.txt
touch File-03-fdfd.txt
touch File-04-dfrd.txt
while read p f;
do
f=$(ls $f*)
mv ${f} "${p}_${f}"
done << EEE
00-0000 File-01
00-0001 File-02
00-0002 File-03
00-0003 File-04
EEE
ls -1
outputs :
00-0000_File-01-azer.txt
00-0001_File-02-ytrf.txt
00-0002_File-03-fdfd.txt
00-0003_File-04-dfrd.txt
You can use a file as input using done < rename_map.txt or cat rename_map.txt | while

Resources