Find groups of files that end with the same 17 characters - bash

I'm grabbing files that have a unique and common pattern. I'm trying to match on the common. Currently trying with bash. I can use python or whatever.
file1_02_01_2021_002244.mp4
file2_02_01_2021_002244.mp4
file3_02_01_2021_002244.mp4
# _02_01_2021_002244.mp4 should be the 'match all files that contain this string'
file1_03_01_2021_092200.mp4
file2_03_01_2021_092200.mp4
file3_03_01_2021_092200.mp4
# _03_01_2021_092200.mp4 is the match
...
file201_01_01_2022_112230.mp4
file202_01_01_2022_112230.mp4
file203_01_01_2022_112230.mp4
# _01_01_2022_112230.mp4 is the match
the goal is to find all that are matching from the very end of the file back to the first uniq character, then move them into a folder. The actionable part will be easy. I just need help with the matching.
find -type f $("all that match the same last 17 characters of the file name"); do
do things
done
this is my example directory:
total 28480
drwxr-xr-x 2 user user 64B Feb 24 10:49 dir1
drwxr-xr-x 2 user user 64B Feb 24 10:49 dir2
-rw-r--r-- 2 user user 6.8M Feb 24 08:59 file1_02_01_2021_002244.mp4
-rw-r--r-- 2 user user 468K Feb 24 09:06 file1_03_01_2021_092200.mp4
-rw-r--r-- 2 user user 4.5M Feb 24 08:59 file2_02_01_2021_002244.mp4
-rw-r--r-- 2 user user 665K Feb 24 09:06 file2_03_01_2021_092200.mp4
-rw-r--r-- 1 user user 0B Feb 24 10:49 otherfile1
-rw-r--r-- 1 user user 0B Feb 24 10:49 otherfile2
I've got it to work with suggestions from the answer marked as correct. They python method probably could work better (especially with the file names that have spaces in them) but I'm not proficient with python enough to make it do everything I want. the script in full is found below:
#!/usr/local/bin/bash
# this is my solution
# create array with patterns
aPATTERN=($(find . -type f -name "*.mp4" | sed 's/^[^_]*//'|sort -u ))
# itterate through all patterns, do things
for each in ${aPATTERN[#]}; do
# create a temp working directory for files that match the pattern
vDIR=`gmktemp -d -p $(pwd)`
# create array of all files found matching the pattern
aFIND+=(`find . -mindepth 1 -maxdepth 1 -type f -iname \*$each`)
# move all files that match the match to the working temp directory
for file in ${aFIND[#]}; do
mv -iv $file $vDIR
done
# reset the found files array, get ready for next pattern
aFIND=()
done

In python:
import os
os.chdir("folder_path")
data = {}
data = [[file[-22:], file] for file in os.listdir()]
output = {}
for pattern, filename in data:
output.setdefault(pattern, []).append(filename)
print(output)
This will create a dict associating each file with the corresponding pattern.
Output:
{
'_03_01_2021_092200.mp4': ['file1_03_01_2021_092200.mp4', 'file3_03_01_2021_092200.mp4', 'file2_03_01_2021_092200.mp4'],
'_01_01_2022_112230.mp4': ['file202_01_01_2022_112230.mp4', 'file201_01_01_2022_112230.mp4', 'file203_01_01_2022_112230.mp4'],
'_02_01_2021_002244.mp4': ['file1_02_01_2021_002244.mp4', 'file2_02_01_2021_002244.mp4', 'file3_02_01_2021_002244.mp4']
}

Try to play with this
first get all pattern sorted and uniq
find ./data -type f -name "*.mp4" | sed 's/^[^_]*//'|sort -u
or with regex
find ./data -type f -regextype sed -regex '.*_[0-9]\{2\}_[0-9]\{2\}_[0-9]\{4\}_[0-9]\{6\}\.mp4$'| sed 's/^[^_]*//'|sort -u
then iterate the the pattern via while loop to find files for every pattern
while read pattern
do
# find and exec
find ./data -type f -name "*$pattern" -exec mv {} /to/whatever/you/want/ \;
#or find and xargs
find ./data -type f -name "*$pattern" | xargs -I {} mv {} /to/whaterver/you/want/
done < <(find ./data -type f -name "*.mp4" | sed 's/^[^_]*//'|sort -u)

There are several ways to approach this, including writing a bash script, but if it were me, I'd take the quick and easy road. Use grep and read:
PATTERN=_02_01_2021_002244.mp4
find . -name '*.mp4' | grep $PATTERN; while read -t 1 A; do echo $A; done
There are probably better ways that I haven't thought of but this gets the job done.

Try this:
#!/bin/bash
while IFS= read -r line
do
if [[ "$line" == *_+([0-9])_+([0-9])_+([0-9])_+([0-9])\.mp4 ]]
then
echo "MATCH: $line"
else
echo "no match: $line"
fi
done < <(/bin/ls -c1)
Remember that is uses globbing, not regex when you build your pattern.
That is why I did not use [0-9]{2} to match 2 digits, {} does not do that in globbing, like it does in regex.
To use regex, use:
#!/bin/bash
while IFS= read -r line
do
if [[ $(echo "$line" | grep -cE '*_[0-9]{2}_[0-9]{2}_[0-9]{4}_[0-9]{6}\.mp4') -ne 0 ]]
then
echo "MATCH: $line"
else
echo "no match: $line"
fi
done < <(/bin/ls -c1)
This is a more precise match since you can specify how many digits to accept in each sub-pattern.

Related

Find folder position number in folder listing

Suppose i have a /home folder with these sub-folders:
/home/alex
/home/luigi
/home/marta
I can list and count all sub-folders in a folder like this:
ls 2>/dev/null -Ubad1 -- /home/* | wc -l
3
But I need to find the position (2, in this example) if folder (or basename) is === luigi
Is this possible in bash?
After William Pursell comment, this does the job:
ls 2>/dev/null -Ubad1 -- /home/* | awk '/luigi$/{print NR}'
2
Note the $ at the end, this will avoid doubles like joe and joey.
Thanks.
You could just use a bash shell loop over the wildcard expansion, keeping an index as you go, and report the index when the base directory name matches:
index=1
for dir in /home/*
do
if [[ "$dir" =~ /luigi$ ]]
then
echo $index
break
fi
((index++))
done
This reports the position (among the expansion of /home/*) of the "luigi" directory -- anchored with the directory separator / and the end of the line $.
$ find /home/ -mindepth 1 -maxdepth 1 -type d | awk -F/ '$NF=="luigi" {print NR}'
2
$ find /home/ -mindepth 1 -maxdepth 1 -type d | awk -F/ '$NF=="alex" {print NR}'
1
Better use find to get subfolders and a list to iterate with indexes:
subfolders=($(find /home -mindepth 1 -maxdepth 1 -type d))
Find will get realpath, so if you need relative you can use something like:
luigi_folder=${subfolders[2]##*/}

How to change multiple directories' name by computation on the numbers in the names?

I have some directories. Their names are as follows,
s1_tw
s2_tw
s3_tw
s4_tw
How to change their names by add a fixed integer to the number following "s"? How can I change the directories' name to
s22_tw
s23_tw
s24_tw
s25_tw
by changing s1 to s22 (1 + 21 = 22), s2 to s23, etc? (Here adding 21 is expected)
I tried with
for f in s*_tw
do
for i in `seq 1 1 4`
do
mv -i "${f//s${i}/s(${i}+21)}"
done
done
But I know it is not correct, because I can not perform addition operation in this command. Could you please give some suggestions?
This will rename your directories:
#!/bin/bash
find . -maxdepth 1 -type d -name "s?_tw" -print0 | while IFS= read -r -d '' dir
do
digit=$(echo "$dir" | sed 's#./s\([0-9]\)_tw#\1#')
echo "DIGIT=$digit"
(( newdigit = digit + 21 ))
echo "NEWDIGIT=$newdigit"
mv "$dir" "s${newdigit}_tw"
done
The find -print0 with while and read comes from https://mywiki.wooledge.org/BashFAQ/001.

Copy file into directories, and change text inside file to match index of directory

I have the following files in a directory: text.txt and text2.txt
My goal is to:
1) copy these two files into non-existing directories m06/, m07/...m20/.
2) Then, in the file text.txt, in the line containing the string mlist 06 (all the files will contain such a string), I wish to change the "06" to match the index of the directory name (for example, in m13, that line in the text.txt file would be mlist 13.
For goal 1), I got the following script which works succesfully:
#!/bin/bash
mkdir $(printf "m%02i " $(seq 6 20))
find . -maxdepth 1 -type d -name 'm[0-9][0-9]' -print0 | xargs -0 -I {} cp text.txt {}
find . -maxdepth 1 -type d -name 'm[0-9][0-9]' -print0 | xargs -0 -I {} cp text2.txt {}
For goal 2), I wish to implement a command similar to
sed -i -e '/mlist/s/06/index/' ./*/text.inp
where index would correspond to the name of the directory (i.e. index = 13 in the m13/directory).
How can I make the sed command replace 06 with the correct "index" corresponding to the name of the directory?
This would probably be easier to manage if you used loop syntax instead of one-liners:
#!/bin/sh
for i in $(seq 6 20); do
# Add a leading 0 and generate the directory name
i_z=$(printf "%02d" "$i")
dir="m${i_z}"
# Create dir
mkdir -p "$dir"
# Copy base files into dir
cp test.txt test2.txt "$dir"
# Edit the index in the files to match the dir index
sed -i -e "s/mlist.*/mlist $i_z/g" \
"${dir}/test.txt" "${dir}/test2.txt"
done

BASH script more smart with cat

I have multiple files in multiple folders
[tiagocastro#cascudo clean_reads]$ ls
11 13 14 16 17 18 3 4 5 6 8 9
and I want to make a tiny bash script to concatenate these files inside :
11]$ ls
FCC4UE9ACXX-HUMcqqTAAFRAAPEI-206_L6_1.fq FCC4UE9ACXX-HUMcqqTAAFRAAPEI-206_L7_1.fq
FCC4UE9ACXX-HUMcqqTAAFRAAPEI-206_L6_2.fq FCC4UE9ACXX-HUMcqqTAAFRAAPEI-206_L7_2.fq
But only L6 with L6 and L7 with L7
Right now I am on the basic level. I want to learn how to do it more smartly, instead of reproduce the commands I could do in terminal in the script.
Thank you everybody, for helping me.
This isn't an free programmiing service, but you can learn something from the next:
#!/bin/bash
echo2() { echo "$#" >&2; }
get_Lnums() {
find . -type f -regextype posix-extended -iregex '.*_L[0-9]+_[0-9]+\.fq' -maxdepth 1 -print | grep -oP '_\KL\d+' | sort -u
}
docat() {
echo2 doing $(pwd)
for lnum in $(get_Lnums)
do
echo cat *_${lnum}_*.fq "> new_${lnum}.all" #remove (comment out) this line when satisfied
#cat *_${lnum}_*.fq > new_${lnum}.all #and uncomment this
done
}
while read -r -d $'\0' dir
do
(cd "$dir" && docat) #subshell - don't need cd back
done < <(find . -type dir -maxdepth 1 -mindepth 1 -print0)

Recursively count specific files BASH

My goal is to write a script to recursively search through the current working directory and the sub dirctories and print out a count of the number of ordinary files, a count of the directories, count of block special files, count of character special files,count of FIFOs, and a count of symbolic links. I have to use condition tests with [[ ]]. Problem is I am not quite sure how to even start.
I tried the something like the following to search for all ordinary files but I'm not sure how recursion exactly works in BASH scripting:
function searchFiles(){
if [[ -f /* ]]; then
return 1
fi
}
searchFiles
echo "Number of ordinary files $?"
but I get 0 as a result. Anyone help on this?
Why would you not use find?
$ # Files
$ find . -type f | wc -l
327
$ # Directories
$ find . -type d | wc -l
64
$ # Block special
$ find . -type b | wc -l
0
$ # Character special
$ find . -type c | wc -l
0
$ # named pipe
$ find . -type p | wc -l
0
$ # symlink
$ find . -type l | wc -l
0
Something to get you started:
#!/bin/bash
directory=0
file=0
total=0
for a in *
do
if test -d $a; then
directory=$(($directory+1))
else
file=$(($file+1))
fi
total=$(($total+1))
echo $a
done
echo Total directories: $directory
echo Total files: $file
echo Total: $total
No recursion here though, for that you could resort to ls -lR or similar; but then again if you are to use an external program you should resort to using find, that's what it's designed to do.

Resources