How to automate concatenating several series of files using bash extended globbing and negation patterns? - bash

thank you so much for any advice and feedback on this matter.
This is is my situation:
I have a directory with several hundred files, all that start with foo* and end with *.txt. However, they differ in between the beginning and end with a unique identifier that is "Group#.#" and the files look like so:
foo.Group1.1.txt
foo.Group1.2.txt
foo.Group1.4.txt
foo.Group2.45.txt
.
.
.
foo.Group16.9.txt
The files begin with Group1 and end at Group 16. They are simple one column txt files, each file has several thousand lines. Each row is a number.
I want to so a series of concatenations with these files in which I concatenate all but those files with the "Group1" and then all the files except "Group1" and "Group2" and then all the files except "Group1", "Group2", and "Group3" and so on, until I am left with just the last Group: "Group16"
In order to do this I use a bash extended globbing expression with a negation syntax to concatenate all files except those that have "Group1" as their ID.
I make a directory "jacks" and output the concatenated file into a txt file within this subdirectory:
cat !(*Group1.*) > jacks/jackknife1.freqs.txt
I can then continue using this command, but adding "Group2" and "Group3" for subsequent concatenations.
cat !(*Group1.*|*Group2.*) > jacks/jackknife2.freqs.txt
cat !(*Group1.*|*Group2.*|*Group3.*) > jacks/jackknife3.freqs.txt
Technically, this works. And 16 Groups isn't too terrible to do this manually.
But I am wondering if there is a way, perhaps using loops or bash scripting to automate this process and speed it up?
I would appreciate any advice or leads on this question!
thank you very much,
daniela

Some tries around bash globbing
Try using echo before cat !
touch foo.Group{1..3}.{1..5}.txt
ls -l
total 0
-rw-r--r-- 1 user user 0 Oct 21 18:37 foo.Group1.1.txt
-rw-r--r-- 1 user user 0 Oct 21 18:37 foo.Group1.2.txt
-rw-r--r-- 1 user user 0 Oct 21 18:37 foo.Group1.3.txt
-rw-r--r-- 1 user user 0 Oct 21 18:37 foo.Group1.4.txt
-rw-r--r-- 1 user user 0 Oct 21 18:37 foo.Group1.5.txt
-rw-r--r-- 1 user user 0 Oct 21 18:37 foo.Group2.1.txt
-rw-r--r-- 1 user user 0 Oct 21 18:37 foo.Group2.2.txt
-rw-r--r-- 1 user user 0 Oct 21 18:37 foo.Group2.3.txt
-rw-r--r-- 1 user user 0 Oct 21 18:37 foo.Group2.4.txt
-rw-r--r-- 1 user user 0 Oct 21 18:37 foo.Group2.5.txt
-rw-r--r-- 1 user user 0 Oct 21 18:37 foo.Group3.1.txt
-rw-r--r-- 1 user user 0 Oct 21 18:37 foo.Group3.2.txt
-rw-r--r-- 1 user user 0 Oct 21 18:37 foo.Group3.3.txt
-rw-r--r-- 1 user user 0 Oct 21 18:37 foo.Group3.4.txt
-rw-r--r-- 1 user user 0 Oct 21 18:37 foo.Group3.5.txt
Then
echo !(*Group1.*)
foo.Group2.1.txt foo.Group2.2.txt foo.Group2.3.txt foo.Group2.4.txt foo.Group2.5.txt foo.Group3.1.txt foo.Group3.2.txt foo.Group3.3.txt foo.Group3.4.txt foo.Group3.5.txt
Ok, and
echo !(*Group[23].*)
foo.Group1.1.txt foo.Group1.2.txt foo.Group1.3.txt foo.Group1.4.txt foo.Group1.5.txt
Or
echo !(*Group*(1|3).*)
foo.Group2.1.txt foo.Group2.2.txt foo.Group2.3.txt foo.Group2.4.txt foo.Group2.5.txt
Or even
echo !(*Group*(1|*.3).*)
foo.Group2.1.txt foo.Group2.2.txt foo.Group2.4.txt foo.Group2.5.txt foo.Group3.1.txt foo.Group3.2.txt foo.Group3.4.txt foo.Group3.5.txt
and
echo !(*Group*(1|*.[2-4]).*)
foo.Group2.1.txt foo.Group2.5.txt foo.Group3.1.txt foo.Group3.5.txt
I will let you think about last two sample! ;-)

Related

Calling a bash script function from another bash script doesn't display my shell command

I have 2 bash scripts, one calling another but depending on how I call it, it does or does not display my ls command.
script2.sh
#!/bin/bash
function test() {
i=0
while IFS= read -r line; do
IFS=',' read -ra ITEM <<<"$line"
printf "\n[${i}] ${ITEM}"
((i = i + 1))
done <<<$(ls $1)
printf "\nPress any other keys to abort.\n\n"
read -p "Please enter your selection: " ANSWER
echo $ANSWER
}
script1a.sh WORKS
#!/bin/bash
. ./scripts/bash/script2.sh
PARAM='-lag'
(test $PARAM)
Returns:
[0] total 5
[1] drwxr-xr-x 1 1049089 0 Oct 29 09:10 .
[2] drwxr-xr-x 1 1049089 0 Oct 9 23:11 ..
[3] -rw-r--r-- 1 1049089 87 Jul 6 14:19 .eslintignore
[4] -rw-r--r-- 1 1049089 449 Jul 10 13:56 .forceignore
[5] drwxr-xr-x 1 1049089 0 Oct 29 09:11 .git
Press any other keys to abort.
Please enter your selection:
script1b.sh FAILS
#!/bin/bash
. ./scripts/bash/script2.sh
PARAM='-lag'
myanswer=$(test $PARAM)
Returns:
Please enter your selection:
Anyone knows why this odd behavior and how to get around it? Thanks in advance.

How to iterate through multiple directories with multiple ifs in bash?

unfortunately I'm quite new at bash, and I want to write a script that will start in a main directory, and check all subdirectories one by one for the presence of certain files, and if those files are present, perform an operation on them. For now, I have written a simplified version to test whether I can do the first part (checking for the files in each directory). This code runs without any errors that I can tell, but it does not echo anything to say that it has successfully found the files which I know are there.
#!/bin/bash
runlist=(1 2 3 4 5 6 7 8 9)
for f in *; do
if [[ -d {$f} ]]; then
#if f is a directory then cd into it
cd "{$f}"
for b in $runlist; do
if [[ -e "{$b}.png" ]]; then
echo "Found {$b}"
#if the file exists then say so
fi
done
cd -
fi
done
'''
Welcome to stackoverflow.
The following will do the trick (a combination of find, array, and if then else):
# list of files we are looking for
runlist=(1 2 4 8 16 32 64 128)
#find each of above anywhere below current directory
# using -maxdepth 1 because, based on on your exam you want to look one level only
# if that's not what you want then take out -maxdepth 1 from the find command
for b in ${runlist[#]}; do
echo
PATH_TO_FOUND_FILE=`find . -name $b.png`
if [ -z "$PATH_TO_FOUND_FILE" ]
then
echo "nothing found" >> /dev/null
else
# You wanted a postive confirmation, so
echo found $b.png
# Now do something with the found file. Let's say ls -l: change that to whatever
ls -l $PATH_TO_FOUND_FILE
fi
done
Here is an example run:
mamuns-mac:stack foo$ ls -lR
total 8
drwxr-xr-x 4 foo 1951595366 128 Apr 11 18:03 dir1
drwxr-xr-x 3 foo 1951595366 96 Apr 11 18:03 dir2
-rwxr--r-- 1 foo 1951595366 652 Apr 11 18:15 find_file_and_do_something.sh
./dir1:
total 0
-rw-r--r-- 1 foo 1951595366 0 Apr 11 17:58 1.png
-rw-r--r-- 1 foo 1951595366 0 Apr 11 17:58 8.png
./dir2:
total 0
-rw-r--r-- 1 foo 1951595366 0 Apr 11 18:03 64.png
mamuns-mac:stack foo$ ./find_file_and_do_something.sh
found 1.png
-rw-r--r-- 1 foo 1951595366 0 Apr 11 17:58 ./dir1/1.png
found 8.png
-rw-r--r-- 1 foo 1951595366 0 Apr 11 17:58 ./dir1/8.png
found 64.png
-rw-r--r-- 1 foo 1951595366 0 Apr 11 18:03 ./dir2/64.png

I dont want to print repeated lines based on column 6 and 7

I don't want to print repeated lines based on column 6 and 7. sort -u does not seem to help
cat /tmp/testing :-
-rwxrwxr-x. 1 root root 52662693 Feb 27 13:11 /home/something/bin/proxy_exec
-rwxrwxr-x. 1 root root 27441394 Feb 27 13:12 /home/something/bin/keychain_exec
-rwxrwxr-x. 1 root root 45570820 Feb 27 13:11 /home/something/bin/wallnut_exec
-rwxrwxr-x. 1 root root 10942993 Feb 27 13:12 /home/something/bin/log_exec
-rwxrwxr-x. 1 root root 137922408 Apr 16 03:43 /home/something/bin/android_exec
When I try cat /tmp/testing | sort -u -k 6,6 -k 7,7 I get :-
-rwxrwxr-x. 1 root root 137922408 Apr 16 03:43 /home/something/bin/android_exec
-rwxrwxr-x. 1 root root 52662693 Feb 27 13:11 /home/something/bin/proxy_exec
Desired output is below, as that is the only file different from others based on month and date column
-rwxrwxr-x. 1 root root 137922408 Apr 16 03:43 /home/something/bin/android_exec
[not] to print repeated lines based on column 6 and 7 using awk, you could:
$ awk '
++seen[$6,$7]==1 { # count seen instances
keep[$6,$7]=$0 # keep first seen ones
}
END { # in the end
for(i in seen)
if(seen[i]==1) # the ones seen only once
print keep[i] # get printed
}' file # from file or pipe your ls to the awk
Output for given input:
-rwxrwxr-x. 1 root root 137922408 Apr 16 03:43 /home/something/bin/android_exec
Notice: All standard warnings against parsing ls output still apply.
tried on gnu sed
sed -E '/^\s*(\S+\s+){5}Feb\s+27/d' testing
tried on gnu awk
awk 'NR==1{a=$6$7;next} a!=$6$7{print}' testing

How to get a filename list with ncftp?

So I tried
ncftpls -l
which gives me a list
-rw-r--r-- 1 100 ftpgroup 3817084 Jan 29 15:50 1548773401.tar.gz
-rw-r--r-- 1 100 ftpgroup 3817089 Jan 29 15:51 1548773461.tar.gz
-rw-r--r-- 1 100 ftpgroup 3817083 Jan 29 15:52 1548773521.tar.gz
-rw-r--r-- 1 100 ftpgroup 3817085 Jan 29 15:53 1548773582.tar.gz
-rw-r--r-- 1 100 ftpgroup 3817090 Jan 29 15:54 1548773642.tar.gz
But all I want is to check the timestamp (which is the name of the tar.gz)
How to only get the timestamp list ?
As requested, all I wanted to do is delete old backups, so awk was a good idea (at least it was effective) even it wasn't the right params. My method to delete old backup is probably not the best but it works
ncftpls *authParams* | (awk '{match($9,/^[0-9]+/, a)}{ print a[0] }') | while read fileCreationDate; do
VALIDITY_LIMIT="$((`date +%s`-600))"
a=$VALIDITY_LIMIT
b=$fileCreationDate
if [ $b -lt $a ];then
deleteFtpFile $b
fi
done;
You can use awk to only display the timestamps from the output like so:
ncftpls -l | awk '{ print $5 }'

Get list of files group by Date

I have a directory with files coming for every day. Now I want to zip those files group by dates. Is there anyway to group/list the files which landed in same date.
Suppose there are below files in a directory
-rw-r--r--. 1 anirban anirban 1598 Oct 14 07:19 hello.txt
-rw-r--r--. 1 anirban anirban 1248 Oct 14 07:21 world.txt
-rw-rw-r--. 1 anirban anirban 659758 Oct 14 11:55 a
-rw-rw-r--. 1 anirban anirban 9121 Oct 18 07:37 b.csv
-rw-r--r--. 1 anirban anirban 196 Oct 20 08:46 go.xls
-rw-r--r--. 1 anirban anirban 1698 Oct 20 08:52 purge.sh
-rw-r--r--. 1 anirban anirban 47838 Oct 21 08:05 code.java
-rw-rw-r--. 1 anirban anirban 9446406 Oct 24 05:51 cron
-rw-rw-r--. 1 anirban anirban 532570 Oct 24 05:57 my.txt
drwxrwsr-x. 2 anirban anirban 67 Oct 25 05:05 look_around.py
-rw-rw-r--. 1 anirban anirban 44525 Oct 26 17:23 failed.log
So there are no way to group the files with any suffix/prefix, since all are unique. Now when I will run the command I am seeking I will get a set of lines like below based on group by dates.
[ [hello.txt world.txt a] [b.csv] [go.xls purge.sh] [code.java] ... ] and so on.
With that list I will loop through and make archive
tar -zvcf Oct_14.tar.gz hello.txt world.txt a
If you have the GNU version of the date command, you can get the date of modification of a file with the -r flag, which can be very useful.
For example, given the file list in your question, date +%b_%d -r hello.txt will output Oct_14.
Using this, you could loop over the files, and build up tar files:
If the tar file doesn't exist, create it with a single file
If the tar file exists, add the file to it
After the loop, zip the tar files
Like this:
#!/usr/bin/env bash
tarfiles=()
for file; do
tarfile=$(date +%b_%d.tar -r "$file")
if ! [ -f "$tarfile" ]; then
tar cf "$tarfile" "$file"
tarfiles+=("$tarfile")
else
tar uf "$tarfile" "$file"
fi
done
for tarfile in "${tarfiles[#]}"; do
gzip "$tarfile"
done
Pass the list of files you want to archive as command line parameters, for example if /path/to/files is the directory where you want to archive files (listed in your question), and you save this script in ~/bin/tar-by-dates.sh, then you can use like this:
cd /path/to/files
~/bin/tar-by-dates.sh *
Create zero-seperated list of (Month_Day.tar FILENAME) pairs and use xargs to add each file to the corresponding archive:
find . -maxdepth 1 -mindepth 1 -type f -printf "%Tb%Td.tar\0%f\0"|xargs -n 2 -0 tar uf

Resources