How to write a shell script that reads all the file names in the directory and finds a particular string in file names? - shell

I need a shell script to find a string in file like the following one:
FileName_1.00_r0102.tar.gz
And then pick the highest value from multiple occurrences.
I am interested in "1.00" part of the file name.
I am able to get this part separately in the UNIX shell using the commands:
find /directory/*.tar.gz | cut -f2 -d'_' | cut -f1 -d'.'
1
2
3
1
find /directory/*.tar.gz | cut -f2 -d'_' | cut -f2 -d'.'
00
02
05
00
The problem is there are multiple files with this string:
FileName_1.01_r0102.tar.gz
FileName_2.02_r0102.tar.gz
FileName_3.05_r0102.tar.gz
FileName_1.00_r0102.tar.gz
I need to pick the file with FileName_("highest value")_r0102.tar.gz
But since I am new to shell scripting I am not able to figure out how to handle these multiple instances in script.
The script which I came up with just for the integer part is as follows:
#!/bin/bash
for file in /directory/*
file_version = find /directory/*.tar.gz | cut -f2 -d'_' | cut -f1 -d'.'
done
OUTPUT: file_version:command not found
Kindly help.
Thanks!

If you just want the latest version number:
cd /path/to/files
printf '%s\n' *r0102.tar.gz | cut -d_ -f2 | sort -n -t. -k1,2 |tail -n1
If you want the file name:
cd /path/to/files
lastest=$(printf '%s\n' *r0102.tar.gz | cut -d_ -f2 | sort -n -t. -k1,2 |tail -n1)
printf '%s\n' *${lastest}_r0102.tar.gz

You could try the following which finds all the matching files, sorts the filenames, takes the last in that list, and then extracts the version from the filename.
#!/bin/bash
file_version=$(find ./directory -name "FileName*r0102.tar.gz" | sort | tail -n1 | sed -r 's/.*_(.+)_.*/\1/g')
echo ${file_version}

I have tried and thats worth working below script line, that You need.
echo `ls ./*.tar.gz | sort | sed -n /[0-9]\.[0-9][0-9]/p|tail -n 1`;

It's unnecessary to parse the filename's version number prior to finding the actual filename. Use GNU ls's -v (natural sort of (version) numbers within text) option:
ls -v FileName_[0-9.]*_r0102.tar.gz | tail -1

Related

sort and get unique files after removing extension of filename

I am trying to remove filename after the second underscore and get the unique files. I saw many answers and formed a script. This script is working fine till the cut command but it is not able to give the unique filenames. I have tried the following command but i am not getting desired output.
script used:
for filename in ${path/to/files}/*.gz;
do
fname=$(basename ${filename} | cut -f 1-2 -d "_" | sort | uniq)
echo "${fname}"
done
file example:
filename1_00_1.gz
filename1_00_2.gz
filename2_00_1.gz
filename2_00_2.gz
Required output:
filename1_00
filename2_00
So, with all of that said. how can I get a unique list of files in the required output format?
Thanks a lot in advance.
Apply uniq and sort are you are done printing the files (it's better to identify uniques first before sorting them):
for filename in ${path/to/files}/*.gz;
do
fname=$(basename ${filename} | cut -f 1-2 -d "_" );
echo "${fname}";
done | uniq | sort
Or just do
for filename in ${path/to/files}/*.gz; do echo ${filename%_*.gz}; done | uniq | sort
for f in *.gz; do echo ${f%_*.gz}; done | sort | uniq

How can I pipe the results of grep to a perl one liner?

I have a grep command that find the files that need a value replaced. Then I have a perl one liner that needs to be executed on each file to replace a variables found in that file.
How can I pipe the results of my grep command to the perl one liner?
grep -Irc "/env/file1/" /env/scripts/ | cut -d':' -f1 | sort | uniq
/env/scripts/config/MainDocument.pl
/env/scripts/config/MainDocument.pl2
/env/scripts/config/MainDocument.pl2.bak
perl -p -i.bak -e 's{/env/file1/}{/env/file2/}g' /env/scripts/config/MainDocument.pl
Thanks for your help.
With the $(...) bash syntax.
perl -p -i.bak -e 's{/env/file1/}{/env/file2/}g' $(grep -Irc "/env/file1/" /env/scripts/ | cut -d':' -f1 | sort | uniq)
I'd forget the perl one liner to use xargs and sed instead.
grep -Irc "/env/file1/" /env/scripts/ | cut -d':' -f1 | sort | uniq | xargs sed -ibak ':/env/file1/:/env/file2/:'

using cut on a line having multiple instances of the same delimiter - unix

I am trying to write a generic script which can have different file name inputs.
This is just a small part of my bash script.
for example, lets say folder 444-55 has 2 files
qq.filter.vcf
ee.filter.vcf
I want my output to be -
qq
ee
I tried this and it worked -
ls /data2/delivery/Stack_overflow/1111_2222_3333_23/secondary/444-55/*.filter.vcf | sort | cut -f1 -d "." | xargs -n 1 basename
But lets say I have a folder like this -
/data2/delivery/Stack_overflow/de.1111_2222_3333_23/secondary/444-55/*.filter.vcf
My script's output would then be
de
de
How can I make it generic?
Thank you so much for your help.
Something like this in a script will "cut" it:
for i in /data2/delivery/Stack_overflow/1111_2222_3333_23/secondary/444-55/*.filter.vcf
do
basename "$i" | cut -f1 -d.
done | sort
advantages:
it does not parse the output of ls, which is frowned upon
it cuts after having applied the basename treatment, and the cut ignores the full path.
it also sorts last so it's guaranteed to be sorted according to the prefix
Just move the basename call earlier in the pipeline:
printf "%s\n" /data2/delivery/Stack_overflow/1111_2222_3333_23/secondary/444-55/*.filter.vcf |
xargs -n 1 basename |
sort |
cut -f1 -d.

How to find most frequent string in file

I have a question about bash script, lets say there is file witch contains lines, each line will have path to a file and a date, the problem is how to find most frequent path.
Thanks in advance.
Here's a suggestion
$ cut -d' ' -f1 file.txt | sort | uniq -c | sort -rn | head -n1
# \_____________________/ \__/ \_____/ \______/ \_______/
# select the file column sort print sort on print top
# files counts count result
Example use:
$ cat file.txt
/home/admin/fileA jan:17:13:46:27:2015
/home/admin/fileB jan:17:13:46:27:2015
/home/admin/fileC jan:17:13:46:27:2015
/home/admin/fileA jan:17:13:46:27:2015
/home/admin/fileA jan:17:13:46:27:2015
$ cut -d' ' -f1 file.txt | sort | uniq -c | sort -rn | head -n1
3 /home/admin/fileA
You can strip out 3 from the final result by another cut.
Reverse the lines, cut the begginning (the date), reverse them again, then sort and count unique lines:
cat file.txt | rev | cut -b 22- | rev | sort | uniq -c
If you're absolutely sure you won't have whitespace in your paths, you can avoid rev altogether:
cat file.txt | cut -d " " -f 1 | sort | uniq -c
If the output is too long to inspect visually, aioobe's suggestion of following this with sort -rn | head -n1 will serve you well
It's worth noticing, as aioobe mentioned, that many unix commands optionally take a file argument. By using it, you can avoid the extra cat command in the beginning, by supplying its argument to the next command:
cat file.txt | rev | ... vs rev file.txt | ...
While I personally find the first option both easier to remember and understand, the second is preferred by many (most?) people, as it saves up system resources (specifically, the memory and references used by an additional process) and can have better performance in some specific use cases. Wikipedia's cat article discusses this in detail.

awk issue, summing lines in various files

I have a list of files starting with the word "output", and I want to sum up the total number of rows in all the files.
Here's my strategy:
for f in `find outpu*`;do wc -l $f | awk '{x+=$1}END{print $1}' ; done
Before piping over, if there were a way I could do something like >> to a temporary variable and then run the awk command after, I could accomplish this goal.
Any tips?
use this to see details and sum :
wc -l output*
and this to see only the sum:
wc -l output* | tail -n1 | cut -d' ' -f1
Here is some stuff for fun, check it out:
grep -c . out* | cut -d':' -f2- | paste -sd+ | bc
all lines, including empty ones:
grep -c '' out* | cut -d':' -f2- | paste -sd+ | bc
you can play in grep with conditions on lines in files
Watch out, this find command will only find stuff in your current directory if there is one file matching outpu*.
One way of doing it:
awk 'END{print NR}' $(find 'outpu*')
Provided that there is not an insane amount of matching filenames that overflows the maximum command length limit of your shell.

Resources