Run script on all files in dir sharing a common id - bash

I have some files in a dir:
A573R25.file_1.txt
A573R25.file_2.txt
A573R25.file_3.txt
A573R27.file_1.txt
A573R27.file_2.txt
A573R29.file_1.txt
A573R29.file_2.txt
A573R29.file_3.txt
A573R31.file_1.txt
A573R31.file_2.txt
A573R31.file_3.txt
A573R33.file_1.txt
A573R33.file_2.txt
A573R33.file_3.txt
I want to run a script on all files sharing a common id (but with varying text separating the id (e.g. A573R25) and .txt). For example:
perl my_script.pl A573R25*.txt
However, I want to do this for all files in the dir in a bash script.
Here's what I've tried:
samples+=$(ls -1 *.txt | cut -d '.' -f 1)
for ((i=0;i<${#samples[#]};++i))
do
ls -1 ${samples[i]}*.txt
done
But in each case I get (e.g.):
ls: A573R25: No such file or directory
My expected output for the first id is:
A573R25.file_1.txt
A573R25.file_2.txt
A573R25.file_3.txt
What am I doing wrong?

You need a sort -u in your sample collection, and it needs to be an array set:
samples+=( $( ls -1 *.txt | cut -d '.' -f 1 | sort -u ) )
Here is full code and results:
$ unset samples
$ samples+=( $(ls -1 *.txt | cut -d '.' -f 1 | sort -u ) )
$ for ((i=0;i<${#samples[#]};++i)); do ls -1 ${samples[i]}*.txt; done
A573R25.file_1.txt
A573R25.file_2.txt
A573R25.file_3.txt
A573R27.file_1.txt
A573R27.file_2.txt
A573R29.file_1.txt
A573R29.file_2.txt
A573R29.file_3.txt
A573R31.file_1.txt
A573R31.file_2.txt
A573R31.file_3.txt
A573R33.file_1.txt
A573R33.file_2.txt
A573R33.file_3.txt

Related

Bash: How do I check (and return) the results of a command filtered by file content

I executed a command on Linux to list all the files & subfiles (with specific format) in a folder.
This command is:
ls -R | grep -e "\.txt$" -e "\.py$"
In an other hand, I have some filenames stored in a file .txt (line by line).
I want to show the result of my previous command, but I want to filter the result using the file called filters.txt.
If the result is in the file, I keep it
Else, I do not keep it.
How can I do it, in bash, in only one line?
I suppose this is something like:
ls -R | grep -e "\.txt$" -e "\.py$" | grep filters.txt
An example of the files:
# filters.txt
README.txt
__init__.py
EDIT 1
I am trying to a file instead a list of argument because I get the error:
'/bin/grep: Argument list too long'
EDIT 2
# The result of the command ls -R
-rw-r--r-- 1 XXX 1 Oct 28 23:36 README.txt
-rw-r--r-- 1 XXX 1 Oct 28 23:36 __init__.py
-rw-r--r-- 1 XXX 1 Oct 28 23:36 iamaninja.txt
-rw-r--r-- 1 XXX 1 Oct 28 23:36 donttakeme.txt
-rw-r--r-- 1 XXX 1 Oct 28 23:36 donttakeme2.txt
What I want as a result:
-rw-r--r-- 1 XXX 1 Oct 28 23:36 README.txt
-rw-r--r-- 1 XXX 1 Oct 28 23:36 __init__.py
You can use comm :
comm -12 <(ls -R | grep -e "\.txt$" -e "\.py$" ) <(cat filters.txt)
This will give you the intersection of the two lists.
EDIT
It seems that ls is not great for this, maybe find Would be safer
find . -type f | xargs grep $(sed ':a;N;$!ba;s/\n/\\|/g' filters.txt)
That is, for each of your files, take your filters.txt and replace all newlines with \| using sed and then grep for all the entries.
Grep uses \| between items when grepping for more than one item. So the sed command transforms the filters.txt into such a list of items to be used by grep.
grep -f filters.txt -r .
..where . is your current folder.
You can run this script in the target directory, giving the list file as a single argument.
#!/bin/bash -e
# exit early if awk fails (ie. can't read list)
shopt -s lastpipe
find . -mindepth 1 -type f -name '*.txt' -o -name '*.py' -print0 |
awk -v exclude_list_file="${1?:no list file provided}" \
'BEGIN {
while ((getline line < exclude_list_file) > 0) {
exclude_list[c++] = line
}
close(exclude_list_file)
if (c==0) {
exit 1
}
FS = "/"
RS = "\000"
}
{
for (i in exclude_list) {
if (exclude_list[i] == $NF) {
next
}
}
print
}'
It prints all paths, recursively, excluding any filename which exactly matches a line in the list file (so lines not ending .py or .txt wouldn’t do anything).
Only the filename is considered, the preceding path is ignored.
It fails immediately if no argument is given or it can't read a line from the list file.
The question is tagged bash, but if you change the shebang to sh, and remove shopt, then everything in the script except -print0 is POSIX. -print0 is common, it’s available on GNU (Linux), BSDs (including OpenBSD), and busybox.
The purpose of lastpipe is to exit immediately if the list file can't be read. Without it, find keeps runs until completion (but nothing gets printed).
If you specifically want the ls -l output format, you could change awk to use a null output record separator (add ORS = "\000" to the end of BEGIN, directly below RS="\000"), and pipe awk in to xargs -0 ls -ld.

Get directory name with grep and remove it

please is there any simple way how can I get NAME output only from lines, where DATE < 5 days ago and then call other command called rm on these lines with NAME as argument?
I have the following output from mega-ls path/ -l (mega.nz) command:
FLAGS VERS SIZE DATE NAME
d--- - - 06Feb2020 05:00:01 bk_20200206050000
d--- - - 07Feb2020 05:00:01 bk_20200207050000
d--- - - 08Feb2020 05:00:01 bk_20200208050000
d--- - - 09Feb2020 05:00:01 bk_20200209050000
d--- - - 10Feb2020 05:00:01 bk_20200210050000
d--- - - 11Feb2020 05:00:01 bk_20200211050000
I tried grep, sort and other ways e.g. mega-ls path/ -l | head -n 5 but I don't know how to search these lines based on the date.
Thank you a lot.
I try find simple way for you request ;)
mega-ls path/ -l | head -n 5 | tr -s ' ' | cut -d ' ' -f6 | grep -v -e '^$' | grep '^bk_20200206.*' | xargs rm -f
Part 1 : This is you command (returned folders list by extra data)
mega-ls path/ -l | head -n 5
Part 2 : Try to remove extra space in your part 1 result
tr -s ' '
Part 3 : Try to use cut command to delimit result part 2 and return Name Folders column
cut -d ' ' -f6
Part 4 : Try to remove Empty lines from result part 3 (result of header line)
grep -v -e '^$'
Part 5 : This your request for search folders name by date yyyymmdd format example : 20200206 (replace 20200206 to your real date need)
grep '^bk_20200206.*'
Part 6 : (Very Important!!) If you need to delete result folders use this part (Very Important!!)
xargs rm -f
Best Regards

Loop Script from Input File

I have a reference file with device names in them. For example WABEL8499IPM101. I'm using this script to set the base name (without the last 3 digits) to look at the reference file and see what is already used. If 101 is used it will create a file for me with 102, 103 if I request 2 total. I'm looking to use an input file to run it multiple times. I'm also trying to figure out how to start at 101 if there isn't a name found when searching the reference file
I would like to loop this using an input file instead of manually entering bash test.sh WABEL8499IPM 2 each time. I would like to be able to build an input file of all the names that need compared and then output. It would also be nice that if there isn't a match that it starts creating names at WABEL8499IPM101 instead of just WABEL8499IPM1.
Input file example:
ColumnA (BASE NAME) ColumnB (QUANTITY)
WABEL8499IPM 2
Script:
SRCFILE="~/Desktop/deviceinfo.csv"
LOGDIR="~/Desktop/"
LOGFILE="$LOGDIR/DeviceNames.csv"
# base name, such as "WABEL8499IPM"
device_name=$1
# quantity, such as "2"
quantityNum=$2
# the largest in sequence, such as "WABEL8499IPM108"
max_sequence_name=$(cat $SRCFILE | grep -o -e "$device_name[0-9]*" | sort --reverse | head -n 1)
# extract the last 3digit number (such as "108") from max_sequence_name
max_sequence_num=$(echo $max_sequence_name | rev | cut -c 1-3 | rev)
# create new sequence_name
# such as ["WABEL8499IPM109", "WABEL8499IPM110"]
array_new_sequence_name=()
for i in $(seq 1 $quantityNum);
do
cnum=$((max_sequence_num + i))
array_new_sequence_name+=($(echo $device_name$cnum))
done
#CODE FOR CREATING OUTPUT FILE HERE
#for fn in ${array_new_sequence_name[#]}; do touch $fn; done;
# write log
for sqn in ${array_new_sequence_name[#]};
do
echo $sqn >> $LOGFILE
done
Usage:
bash test.sh WABEL8499IPM 2
Result in the log file:
WABEL8499IPM109
WABEL8499IPM110
Just wrap a loop around your code instead of assuming the args come in on the command line.
SRCFILE="~/Desktop/deviceinfo.csv"
LOGDIR="~/Desktop/"
LOGFILE="$LOGDIR/DeviceNames.csv"
while read device_name quantityNum
do max_sequence_name=$( grep -o -e "$device_name[0-9]*" $SRCFILE |
sort --reverse | head -n 1)
max_sequence_num=${max_sequence_name: -3}
array_new_sequence_name=()
for i in $(seq 1 $quantityNum)
do cnum=$((max_sequence_num + i))
array_new_sequence_name+=("$device_name$cnum")
done
for sqn in ${array_new_sequence_name[#]};
do echo $sqn >> $LOGFILE
done
done < input.file
I'd maybe pass the input file as the parameter now.

Cleanest way to get the highest suffix (or prefix) of a certain file type in a set of directories with bash?

I have a set of data files across a number of directories with format
ls lcp01/output/
> dst000.dat dst001.dat ... dst075.dat nn000.dat nn001.dat ... nn036.dat aa000.dat aa001.dat ... aa040.dat
That is to say, there are a set of directories lcp01 through lcp25 with a collection of different data files in their output folders. I want to know what the highest number dstXXX.dat file is in each directory (in the example shown the result would be 75).
I wrote a script which achieves this, but I'm not satisfied with the final step which feels a bit hacky:
#!/bin/bash
for i in `seq -f "%02g" 1 25`; #specify dir extensions 1 through 25
do
echo " "
echo $i
names=($(ls lcp$i/output | grep dst )) #dir containing dst files
NUMS=()
for j in "${names[#]}";
do
temp="$(echo $j | tr -dc '0-9' && printf " ")" # record suffixes for each dst file
NUMS+=("$((10#$temp))") #force base 10 interpretation of dst suffixes
done
numList="$(echo "${NUMS[*]}" | sort -nr | head -n1)"
echo ${numList:(-3)} #print out the last 3 characters of the sorted list - the largest file suffix
done
The final two steps organise the list of output indices, then I show the last 3 characters of that list which will be my largest file number (providing the file numbers are smaller than 100).
Is there a cleaner way of doing this? Ideally I would like more control over the output format, but mainly it's the step of reading the last 3 characters out. I would like to be able to just output the largest number, which should be the last element of the list but I cannot figure out how.
You could do something like the following:
for d in lc[0-9][0-9]; do find $d -name 'dst*.dat' -print | sort -u | tail -n1; done
Above command will only work if the numbering has the same number of digits (dst001..999.dat), as it is sorted as a string; if that's not the case:
for d in lc[0-9][0-9]; do echo -n $d: ; find $d -name 'dst*.dat' -print | grep -o '[0-9]*.dat' | sort -n | tail -n1; done
using filename expansions
for d in lcp*/output; do
files=( $d/dst*.dat )
file=${files[-1]}
[[ -e $file ]] || continue
file=${file#dst*}
echo ${file%.dat}
done
or with extension option to restrict pattern to numbers
shopt -s extglob
... lcp*([0-9])/output
... $d/dst*([0-9]).dat
...
file=${file##dst*(0)}
...

bash for loop with numerated names

I'm currently working on a maths project and just run into a bit of a brick wall with programming in bash.
Currently I have a directory containing 800 texts files, and what I want to do is run a loop to cat the first 80 files (_01 through to _80) into a new file and save elsewhere, then the next 80 (_81 to _160) files etc.
all the files in the directory are listed like so: ath_01, ath_02, ath_03 etc.
Can anyone help?
So far I have:
#!/bin/bash
for file in /dir/*
do
echo ${file}
done
Which just simple lists my file. I know I need to use cat file1 file2 > newfile.txt somehow but it's confusing my with the numerated extension of _01, _02 etc.
Would it help if I changed the name of the file to use something other than an underscore? like ath.01 etc?
Cheers,
Since you know ahead of time how many files you have and how they are numbered, it may be easier to "unroll the loop", so to speak, and use copy-and-paste and a little hand-tweaking to write a script that uses brace expansion.
#!/bin/bash
cat ath_{001..080} > file1.txt
cat ath_{081..160} > file2.txt
cat ath_{161..240} > file3.txt
cat ath_{241..320} > file4.txt
cat ath_{321..400} > file5.txt
cat ath_{401..480} > file6.txt
cat ath_{481..560} > file7.txt
cat ath_{561..640} > file8.txt
cat ath_{641..720} > file9.txt
cat ath_{721..800} > file10.txt
Or else, use nested for-loops and the seq command
N=800
B=80
for n in $( seq 1 $B $N ); do
for i in $( seq $n $((n+B - 1)) ); do
cat ath_$i
done > file$((n/B + 1)).txt
done
The outer loop will iterate n through 1, 81, 161, etc. The inner loop will iterate i over 1 through 80, then 81 through 160, etc. The body of the inner loops just dumps the contents if the ith file to standard output, but the aggregated output of the loop is stored in file 1, then 2, etc.
You could try something like this:
cat "$file" >> "concat_$(( ${file#/dir/ath_} / 80 ))"
with ${file#/dir/ath_} you remove the prefix /dir/ath_ from the filename
$(( / 80 )) you get the suffix divided by 80 (integer division)
Also change the loop to
for file in /dir/ath_*
So you only get the files you need
If you want groups of 80 files, you'd do best to ensure that the names are sortable; that's why leading zeroes were often used. Assuming that you only have one underscore in the file names, and no newlines in the names, then:
SOURCE="/path/to/dir"
TARGET="/path/to/other/directory"
(
cd $SOURCE || exit 1
ls |
sort -t _ -k2,2n |
awk -v target="$TARGET" \
'{ file[n++] = $1
if (n >= 80)
{
printf "cat"
for (i = 0; i < 80; i++)
printf(" %s", file[i]
printf(" >%s/%s.%.2d\n", target, "newfile", ++number)
n = 0
}
END {
if (n > 0)
{
printf "cat"
for (i = 0; i < n; i++)
printf(" %s", file[i]
printf(" >%s/%s.%.2d\n", target, "newfile", ++number)
}
}' |
sh -x
)
The two directories are specified (where the files are and where the summaries should go); the command changes directory to the source directory (where the 800 files are). It lists the names (you could specify a glob pattern if you needed to) and sorts them numerically. The output is fed into awk which generates a shell script on the fly. It collects 80 names at a time, then generates a cat command that will copy those files to a single destination file such as "newfile.01"; tweak the printf() command to suit your own naming/numbering conventions. The shell commands are then passed to a shell for execution.
While testing, replace the sh -x with nothing, or sh -vn or something similar. Only add an active shell when you're sure it will do what you want. Remember, the shell script is in the source directory as it is running.
Superficially, the xargs command would be nice to use; the difficulty is coordinating the output file number. There might be a way to do that with the -n 80 option to group 80 files at a time and some fancy way to generate the invocation number, but I'm not aware of it.
Another option is to use xargs -n to execute a shell script that can deduce the correct output file number by listing what's already in the target directory. This would be cleaner in many ways:
SOURCE="/path/to/dir"
TARGET="/path/to/other/directory"
(
cd $SOURCE || exit 1
ls |
sort -t _ -k2,2n |
xargs -n 80 cpfiles "$TARGET"
)
Where cpfiles looks like:
TARGET="$1"
shift
if [ $# -gt 0 ]
then
old=$(ls -r newfile.?? | sed -n -e 's/newfile\.//p; 1q')
new=$(printf "%.2d" $((old + 1)))
cat "$#" > "$TARGET/newfile. $new
fi
The test for zero arguments avoids trouble with xargs executing the command once with zero arguments. On the whole, I prefer this solution to the one using awk.
Here's a macro for #chepner's first solution, using GNU Make as the templating language:
SHELL := /bin/bash
N = 800
B = 80
fileNums = $(shell seq 1 $$((${N}/${B})) )
files = ${fileNums:%=file%.txt}
all: ${files}
file%.txt : start = $(shell echo $$(( ($*-1)*${B}+1 )) )
file%.txt : end = $(shell echo $$(( $* * ${B} )) )
file%.txt:
cat ath_{${start}..${end}} > $#
To use:
$ make -n all
cat ath_{1..80} > file1.txt
cat ath_{81..160} > file2.txt
cat ath_{161..240} > file3.txt
cat ath_{241..320} > file4.txt
cat ath_{321..400} > file5.txt
cat ath_{401..480} > file6.txt
cat ath_{481..560} > file7.txt
cat ath_{561..640} > file8.txt
cat ath_{641..720} > file9.txt
cat ath_{721..800} > file10.txt

Resources