moving files to equivalent folders in bash shell - bash

Im sorry for the very basic question but I am frankly extremely new at bash and can't seem to work out the below. Any help would be appreciated.
In my working directory '/test' I have a number of files named :
mm(a 10 digit code)_Pool_1_text.csv
mm(same 10 digit code)_Pool_2_text.csv
mm(same 10 digit code)_Pool_3_text.csv
how can I write a loop that would take the first file and put it in a folder at :
/this/that/Pool_1/
the second file at :
/this/that/Pool_2/
etc
Thank you :)

Using awk you many not need to create an explicit loop:
awk 'FNR==1 {match(FILENAME,/Pool_[[:digit:]]+/);system("mv " FILENAME " /this/that/" substr(FILENAME, RSTART, RLENGTH) "/")}' mm*_Pool_*.text.csv
the shell glob selects the files (we could use extglob, but I wanted to keep it simple)
awk gets the filenames
we match pool and digit
we move the file using the match to extract the pool name

Related

bash script to rearrange multiple file names correctly(go pro)

I have multiple GoPro files in a directory that all need to be renamed and would like a bash script to do it. Basically I want to move the 3rd and 4th characters back to the 12th and 13th spot with dashes around it. The only other GoPro post I found puts it at the end of the file but I really need it in the middle like that. Thank you
Example
Original filename is:
GX010112_1647792633961.MP4
and I need it to look like:
GX0112_164-01-7792633961.MP4
A possible answer with the rename command-line. It allows you to rename multiple files according to PERL regex.
#installation of rename
apt install rename
#rename your files
rename 's/^(.{2})(.{2})(.{8})(.*)/$1$3-$2-$4/' *
REGEX explanation:
The regex is structured as s/ MATCHING PATTERN / REPLACEMENT /
^: set position at the beginning of string.
(.{2}) : Match the first 2 characters and store it in $1.
(.{2}) : Match the following 2 characters (the 3th and 4th) and store it in $2.
(.{8}) : Match the following 8 characters (from the 5th to 12th included) and store it in $3.
(.*) : Match the rest of you string and store it in $4.
Hope it helps.
After many hours I figured it out based on this thread:
Rename several gopro files
for file in GX*; do
file1="${file#*??}"
file2=${file1#*??}
file3=${file1#*??????}
file4=${file1%*$file2}
file5=${file2%*?????????????????}
mv -v "$file" "${file5}${file4}${file3}"
done
I hope it comes in handy for any other GoProers out there!

Count number of distinct folders in find result

I am running a process producing arbitrary number of files in an arbitrary number of sub-folders. I am interested in the number of distinct sub-folders and currently am trying to solve this with bash and find (I do not want to use a scripting language).
So far I have:
find models/quarter/ -name settings.json | wc -l
However, this obviously does not consider the structure of the result from find and just counts all files returned.
Sample of the find return:
models/quarter/1234/1607701623/settings.json
models/quarter/1234/1607701523/settings.json
models/quarter/3456/1607701623/settings.json
models/quarter/3456/1607702623/settings.json
models/quarter/7890/1607703223/settings.json
I am interested in the number of distinct folders in top-folder models/quarter, so the appropriate result for the sample above would be 3 (1234, 3456, 7890). It is a requirement that the folders to be counted contain a sub-folder (which is a Unix timestamp as you might have recognized) and the sub-folder contains the file settings.json.
My guts tell me it should be possible, e.g. with awk, but I am certainly no bash pro. Any help is greatly appreciated, thanks.
find models/quarter/ -name settings.json | awk -F\/ '{ if (strftime("%s",$4) == $4) { fil[$3]="" } } END { print length(fil) }'
Using awk. Pass the output of find to awk and set / as the field separator. Check that the 4th field is a timestamp and then if it is, create an array with the third field as the index. At the end, print the length of the array fil.

Bash Parsing Large Log Files

I am new to bash, awk, scripting; so do please help me improve.
I have huge number of text files, each several hundred MB in size. Unfortunately, they are not all fully standardized in any one format. Plus there is a lot of legacy in here, and a lot of junk and garbled text. I wish to check all of these files to find rows with a valid email ID, and if it exists then print it to a file named the first-char of the email ID. Hence, multiple text files get parsed and organized into files named a-z and 0-9. In case the email address starts with a special character, then it will get written into a file called "_" underscore. The script also trims the rows to remove whitespaces; and replaces single and double quotes (this is an application requirement)
My script works fine. There is no error/bug in here. But it is incredibly slow. My question: is there a more efficient way to achieve this? Parsing 30 GB logs takes me about 12 hrs - way too much! Will grep/cut/sed/another be any faster?
Sample txt File
!bar#foo.com,address
#john#foo.com;address
john#foo.com;address µÖ
email1#foo.com;username;address
email2#foo.com;username
email3#foo.com,username;address [spaces at the start of the row]
email4#foo.com|username|address [tabs at the start of the row]
My Code:
awk -F'[,|;: \t]+' '{
gsub(/^[ \t]+|[ \t]+$/, "")
if (NF>1 && $1 ~ /^[[:alnum:]_.+-]+#[[:alnum:]_.-]+\.[[:alnum:]]+$/)
{
gsub(/"/, "DQUOTES")
gsub("\047", "SQUOTES")
r=gensub("[,|;: \t]+",":",1,$0)
a=tolower(substr(r,1,1))
if (a ~ /^[[:alnum:]]/)
print r > a
else
print r > "_"
}
else
print $0 > "ErrorFile"
}' *.txt

method for merging two files, opinion needed

Problem: I have two folders (one is Delta Folder-where the files get updated, and other is Original Folder-where the original files exist). Every time the file updates in Delta Folder I need merge the file from Original folder with updated file from Delta folder.
Note: Though the file names in Delta folder and Original folder are unique, but the content in the files may be different. For example:
$ cat Delta_Folder/1.properties
account.org.com.email=New-Email
account.value.range=True
$ cat Original_Folder/1.properties
account.org.com.email=Old-Email
account.value.range=False
range.list.type=String
currency.country=Sweden
Now, I need to merge Delta_Folder/1.properties with Original_Folder/1.properties so, my updated Original_Folder/1.properties will be:
account.org.com.email=New-Email
account.value.range=True
range.list.type=String
currency.country=Sweden
Solution i opted is:
find all *.properties files in Delta-Folder and save the list to a temp file(delta-files.txt).
find all *.properties files in Original-Folder and save the list to a temp file(original-files.txt)
then i need to get the list of files that are unique in both folders and put those in a loop.
then i need to loop each file to read each line from a property file(1.properties).
then i need to read each line(delta-line="account.org.com.email=New-Email") from a property file of delta-folder and split the line with a delimiter "=" into two string variables.
(delta-line-string1=account.org.com.email; delta-line-string2=New-Email;)
then i need to read each line(orig-line=account.org.com.email=Old-Email from a property file of orginal-folder and split the line with a delimiter "=" into two string variables.
(orig-line-string1=account.org.com.email; orig-line-string2=Old-Email;)
if delta-line-string1 == orig-line-string1 then update $orig-line with $delta-line
i.e:
if account.org.com.email == account.org.com.email then replace
account.org.com.email=Old-Email in original folder/1.properties with
account.org.com.email=New-Email
Once the loop finishes finding all lines in a file, then it goes to next file. The loop continues until it finishes all unique files in a folder.
For looping i used for loops, for splitting line i used awk and for replacing content i used sed.
Over all its working fine, its taking more time(4 mins) to finish each file, because its going into three loops for every line and splitting the line and finding the variable in other file and replace the line.
Wondering if there is any way where i can reduce the loops so that the script executes faster.
With paste and awk :
File 2:
$ cat /tmp/l2
account.org.com.email=Old-Email
account.value.range=False
currency.country=Sweden
range.list.type=String
File 1 :
$ cat /tmp/l1
account.org.com.email=New-Email
account.value.range=True
The command + output :
paste /tmp/l2 /tmp/l1 | awk '{print $NF}'
account.org.com.email=New-Email
account.value.range=True
currency.country=Sweden
range.list.type=String
Or with a single awk command if sorting is not important :
awk -F'=' '{arr[$1]=$2}END{for (x in arr) {print x"="arr[x]}}' /tmp/l2 /tmp/l1
I think your two main options are:
Completely reimplement this in a more featureful language, like perl.
While reading the delta file, build up a sed script. For each line of the delta file, you want a sed instruction similar to:
s/account.org.com.email=.*$/account.org.email=value_from_delta_file/g
That way you don't loop through the original files a bunch of extra times. Don't forget to escape & / and \ as mentioned in this answer.
Is using a database at all an option here?
Then you would only have to write code for extracting data from the Delta files (assuming that can't be replaced by a database connection).
It just seems like this is going to keep getting more complicated and slower as time goes on.

Shell script to find files with similar names

I have a directory with multiple files in the form of:
file001_a
file002_a
file002_b
file003_a
Using a shell script, I was wondering what the easiest way would be to list all files within this directory that have duplicates in the first 7 letters; ie the output above would be:
file002_a
file002_b
any help would be much appreciated!
ls -1 *_*| awk '{fn=substr($0,1,7);a[fn]=a[fn]" "substr($0,8)}END{for(i in a) print i,a[i]}'

Resources