Shell script to find files with similar names - shell

I have a directory with multiple files in the form of:
file001_a
file002_a
file002_b
file003_a
Using a shell script, I was wondering what the easiest way would be to list all files within this directory that have duplicates in the first 7 letters; ie the output above would be:
file002_a
file002_b
any help would be much appreciated!

ls -1 *_*| awk '{fn=substr($0,1,7);a[fn]=a[fn]" "substr($0,8)}END{for(i in a) print i,a[i]}'

Related

moving files to equivalent folders in bash shell

Im sorry for the very basic question but I am frankly extremely new at bash and can't seem to work out the below. Any help would be appreciated.
In my working directory '/test' I have a number of files named :
mm(a 10 digit code)_Pool_1_text.csv
mm(same 10 digit code)_Pool_2_text.csv
mm(same 10 digit code)_Pool_3_text.csv
how can I write a loop that would take the first file and put it in a folder at :
/this/that/Pool_1/
the second file at :
/this/that/Pool_2/
etc
Thank you :)
Using awk you many not need to create an explicit loop:
awk 'FNR==1 {match(FILENAME,/Pool_[[:digit:]]+/);system("mv " FILENAME " /this/that/" substr(FILENAME, RSTART, RLENGTH) "/")}' mm*_Pool_*.text.csv
the shell glob selects the files (we could use extglob, but I wanted to keep it simple)
awk gets the filenames
we match pool and digit
we move the file using the match to extract the pool name

Bash: Move multiple files with same name into same directory by renaming

I currently have a bunch of .txt files in separate folders and want to move them all into the same folder, except all the files have the same name. I would like to preserve all the files by adding some sort of number so that each one isn't overwritten, like FolderA/file.txt becomes NewFolder/file_1.txt, and FolderB/file.txt becomes NewFolder/file_2.txt, etc. Is there a clean way to do this using bash? Thanks in advance for your help!
You could do something like this (either in a script or right on the command line):
for i in A B C D E
do
mv Folder$i/file.txt NewFolder/file_$i.txt
done
It won't convert letters to numbers, but it does the basics of what you're looking for in a fairly simple fashion.
Hope this helps.
Following the previous answer, you can add two lines of code in bash to achieve your desired output:
declare -i n=1;
for i in A B C D E
do
mv Folder$i/file.txt NewFolder/file_$n.txt
n+=1
done

How to sort "dated" file directory names in linux

I have a directory on server B that contains 'dated' directories like:
2015-03-01_10.07.11
2015-03-02_10.05.02
2015-02-25_11.05.02
2015-02-24_11.07.05
I need to copy the content of the directory with the latest date.
In my example, I'd have to copy contents of the 2015-03-02_10.05.02 directory.
How would I do that?
Thanks,
These directories sort correctly according to their names, so you can use the usual ls -t ls -t commands to sort them.
So then the problem becomes how to capture the sort and extract the first (or last). Either an array or a string with regex can do this. There are probably many other ways too. For example look at find and sort manpages
I ended up using ls -1lr | tail -n 1

bash - reading multiple input files and creating matching output files by name and sequence

I do not know much bash scripting, but I know the task I would like to do would be greatly simplified by it. I would like to test a program against expected output using many test input files.
For example, I have files named "input1.txt, input2.txt, input3.text..." and expected output in files "output1.txt, output2.txt, output3.txt...". I would like to run my program with each of the input files and output a corresponding "test1.txt, test2.txt, test3.txt...". Then I would do a "cmp output1.txt test1.txt" for each file.
So I think it would start like this.. roughly..
for i in input*;
do
./myprog.py < "$i" > someoutputthing;
done
One question I have is: how would I match the numbers in the filename? Thanks for your help.
If the input file name pattern is inputX.txt, you need to remove input from the beginning. You do not have to remove the extension, as you want to use the same for output:
output=output${i#input}
See Parameter Expansion in man bash.

method for merging two files, opinion needed

Problem: I have two folders (one is Delta Folder-where the files get updated, and other is Original Folder-where the original files exist). Every time the file updates in Delta Folder I need merge the file from Original folder with updated file from Delta folder.
Note: Though the file names in Delta folder and Original folder are unique, but the content in the files may be different. For example:
$ cat Delta_Folder/1.properties
account.org.com.email=New-Email
account.value.range=True
$ cat Original_Folder/1.properties
account.org.com.email=Old-Email
account.value.range=False
range.list.type=String
currency.country=Sweden
Now, I need to merge Delta_Folder/1.properties with Original_Folder/1.properties so, my updated Original_Folder/1.properties will be:
account.org.com.email=New-Email
account.value.range=True
range.list.type=String
currency.country=Sweden
Solution i opted is:
find all *.properties files in Delta-Folder and save the list to a temp file(delta-files.txt).
find all *.properties files in Original-Folder and save the list to a temp file(original-files.txt)
then i need to get the list of files that are unique in both folders and put those in a loop.
then i need to loop each file to read each line from a property file(1.properties).
then i need to read each line(delta-line="account.org.com.email=New-Email") from a property file of delta-folder and split the line with a delimiter "=" into two string variables.
(delta-line-string1=account.org.com.email; delta-line-string2=New-Email;)
then i need to read each line(orig-line=account.org.com.email=Old-Email from a property file of orginal-folder and split the line with a delimiter "=" into two string variables.
(orig-line-string1=account.org.com.email; orig-line-string2=Old-Email;)
if delta-line-string1 == orig-line-string1 then update $orig-line with $delta-line
i.e:
if account.org.com.email == account.org.com.email then replace
account.org.com.email=Old-Email in original folder/1.properties with
account.org.com.email=New-Email
Once the loop finishes finding all lines in a file, then it goes to next file. The loop continues until it finishes all unique files in a folder.
For looping i used for loops, for splitting line i used awk and for replacing content i used sed.
Over all its working fine, its taking more time(4 mins) to finish each file, because its going into three loops for every line and splitting the line and finding the variable in other file and replace the line.
Wondering if there is any way where i can reduce the loops so that the script executes faster.
With paste and awk :
File 2:
$ cat /tmp/l2
account.org.com.email=Old-Email
account.value.range=False
currency.country=Sweden
range.list.type=String
File 1 :
$ cat /tmp/l1
account.org.com.email=New-Email
account.value.range=True
The command + output :
paste /tmp/l2 /tmp/l1 | awk '{print $NF}'
account.org.com.email=New-Email
account.value.range=True
currency.country=Sweden
range.list.type=String
Or with a single awk command if sorting is not important :
awk -F'=' '{arr[$1]=$2}END{for (x in arr) {print x"="arr[x]}}' /tmp/l2 /tmp/l1
I think your two main options are:
Completely reimplement this in a more featureful language, like perl.
While reading the delta file, build up a sed script. For each line of the delta file, you want a sed instruction similar to:
s/account.org.com.email=.*$/account.org.email=value_from_delta_file/g
That way you don't loop through the original files a bunch of extra times. Don't forget to escape & / and \ as mentioned in this answer.
Is using a database at all an option here?
Then you would only have to write code for extracting data from the Delta files (assuming that can't be replaced by a database connection).
It just seems like this is going to keep getting more complicated and slower as time goes on.

Resources