Bash: find and concatenate files - bash

I have the following structure:
/home/
├── DIR1/
│ └── file_ab.csv
├── DIR2/
│ └── file_cd.csv
└── DIR3/
└── file3_ef.csv
Where file_**.csv contains rows of floats, different floats for each DIR.
I want to grab the contents of all of the file_**.csv files and concatenate them.
I found this answer here:
find /home -type f -name '*.csv' -exec cat {} \; > pl_parameters
But I get an empty file called 'pl_parameters'. Why is the file empty? How can I fix this?

find /home/DIR* -name 'file*csv' |xargs cat > output.csv
find /home/DIR* -name '*csv' gives you the files absolute paths.
xargs cat will iterate the files and cat print the files content

With Bash 4.0+, you can use globstar and use a more straight forward command:
shopt -s globstar
cd /home
cat **/*.csv > pl_parameters
**/ expands to the entire directory tree underneath the current directory.
Your command:
find /home -type f -name '*.csv' -exec cat {} \; > pl_parameters
looks good to me - not sure why you got a zero by output file.

Related

Rename files based on their parent directory in Bash

Been trying to piece together a couple previous posts for this task.
The directory tree looks like this:
TEST
|ABC_12345678
3_XYZ
|ABC_23456789
3_XYZ
etc
Each folder within the parent folder named "TEST" always starts with ABC_\d{8} -the 8 digits are always different. Within the folder ABC_\d{8} is always a folder entitled 3_XYZ that always has a file named "MD2_Phd.txt". The goal is to rename each "MD2_PhD.txt" file with the specific 8 digit ID found in the ABC folder name i.e. "\d{8}_PhD.txt"
After several iterations on various bits of code from different posts this is the best I can come up with,
cd /home/etc/Desktop/etc/TEST
find -type d -name 'ABC_(\d{8})' |
find $d -name "*_PhD.txt" -execdir rename 's/MD2$/$d/' "{}" \;
done
find + bash solution:
find -type f -regextype posix-egrep -regex ".*/TEST/ABC_[0-9]{8}/3_XYZ/MD2_Phd\.txt" \
-exec bash -c 'abc="${0%/*/*}"; fp="${0%/*}/";
mv "$0" "$fp${abc##*_}_PhD.txt" ' {} \;
Viewing results:
$ tree TEST/ABC_*
TEST/ABC_12345678
└── 3_XYZ
└── 12345678_PhD.txt
TEST/ABC_1234ss5678
└── 3_XYZ
└── MD2_Phd.txt
TEST/ABC_23456789
└── 3_XYZ
└── 23456789_PhD.txt
You are piping find output to another find. That won't work.
Use a loop instead:
dir_re='^.+_([[:digit:]]{8})/'
for file in *_????????/3_XYZ/MD2_PhD.txt; do
[[ -f $file ]] || continue
if [[ $file =~ $dir_re ]]; then
dir_num="${BASH_REMATCH[1]}"
new_name="${file%MD2_PhD.txt/$dir_num.txt}" # replace the MD2_PhD at the end
echo mv "$file" "$new_name" # remove echo from here once tested
fi
done

Use find to get all folders that don't have a .git subfolder

How to use find to get all folders that have not a .git folder?
On this structure::
$ tree -a -d -L 2
.
├── a
│ └── .git
├── b
│ ├── b1
│ └── b2
├── c
└── d
└── .git
├── lkdj
└── qsdqdf
This::
$ find . -name ".git" -prune -o -type d -print
.
./a
./b
./b/b1
./b/b2
./c
./d
$
get all folders except .git
I would like to get this::
$ find . ...
.
./b
./b/b1
./b/b2
./c
$
It's inefficient (runs a bunch of subprocesses), but the following will do the job with GNU or modern BSD find:
find . -type d -exec test -d '{}/.git' ';' -prune -o -type d -print
If you're not guaranteed to have a find with any functionality not guaranteed in the POSIX standard, then you might need to take even more of an efficiency loss (to make {} its own token, rather than a substring, by having a shell run the test):
find . -type d -exec sh -c 'test -d "$1/.git"' _ '{}' ';' -prune -o -type d -print
This works by using -exec as a predicate, running a test that find doesn't have support for built-in.
Note the use of the inefficient -exec [...] {} [...] \; rather than the more efficient -exec [...] {} +; as the latter passes multiple filenames to each invocation, it has no way to get back individual per-filename results and so always evaluates as true.
If you don't mind using a temporary file, then:
find . -type d -print > all_dirs
fgrep -vxf <(grep '/\.git$' all_dirs | sed 's#/\.git$##') all_dirs | grep -vE '/\.git$|/\.git/'
rm all_dirs
The first step gets all subdirectory paths into all_dirs file
The second steps filters out the directories that have a .git subdirectory as well as the .git subdirectories. The -x option is necessary because we need to eliminate only the lines that match in entirety.
This will be a little more efficient compared to Charles' answer in that it doesn't run so many subprocesses. However, it would give a wrong output if any of the directories have a newline character in them.
In case you want to find only the top directories add the option -maxdepth 1 like
$ find . -type d -exec test -d '{}/.git' ';' -maxdepth 1 -prune -o -type d -print
.
./b
./c
$

Renaming ZIP files according to the parent directory name

For a number of files I want to get the parent directory and append its name to the filename. For example, in the following path:
A/B/C/file.zip
I want to rename file.zip to file_C.zip.
Here is my code. I have to find directory which does not contain subdirectory and zip files in it, and I want to rename it to refer to the parent directory.
find ${WORKDIR} -daystart -mtime +3 -type d -links 2 -exec bash -c 'zip -rm "${1%}".zip "$1"' _ {} \;
Here is a pure Bash solution:
find "$WORKDIR" -type f -name '*.zip' | while read file
do
basename=$(basename "$file")
dirname=$(dirname "$file")
suffix=$(basename "$dirname")
if [[ "$basename" != *"_${suffix}.zip" ]]; then
mv -v "$file" "${dirname}/${basename%.zip}_${suffix}.zip"
fi
done
The script processes all *.zip files found in $WORKDIR with a loop. In the loop it checks whether $file already has a suffix equal to the parent directory name. If it hasn't such suffix, the script renames the file appending "_{parent_directory_name}" to the filename just before the extension.
Sample Tree
A
├── B
│   ├── abc.zip.zip
│   └── C
│   └── file_C.zip
└── one.zip
Sample Output
‘./t/A/one.zip’ -> ‘./t/A/one_A.zip’
‘./t/A/B/abc.zip.zip’ -> ‘./t/A/B/abc.zip_B.zip’
A
├── B
│   ├── abc.zip_B.zip
│   └── C
│   └── file_C.zip
└── one_A.zip
where WORKDIR=./t.
Note, I deliberately simplified the find command, as it is not important for the algorithm. You can adjust the options according to your needs.
The best tool for this job is the rename utility that comes with Perl. (Beware that util-linux also contains a utility named rename. It is not what you want. If you have that on your system, investigate how to get rid of it and replace it with the one that comes with Perl.)
With this utility, it's as simple as
find $WORKDIR -name '*.zip' -exec \
rename 's:/([^/]+)/(.+?)\.zip$:/${2}_${1}.zip:' '{}' +
You can stick arbitrary Perl code in that first argument, which makes it even more powerful than it looks from this example.
Note that your find command appears to do something unrelated, involving the creation of .zip files.

Create a folder inside other folders using bash

I have a long list of folders that are siblings to each other, they all start with "0" and are numerically named (001, 002, 003...) but names are not only numerical and are not correlative (for example I have 0010_foo, 0032_bar, 0150_baz, etc).
I need to create a new folder (js) inside each of the folders on my list. I'd like to do it recursively using the command line.
I've tried:
$ cd path/to/my/root/folder
$ find 0* -type d -exec mkdir js {} \;
But I get an error for each attempt: "mkdir: js: file exists". No need to say there's no directory named js inside my folders but they are files with .js extension.
Where is the error in my command and how can I fix it? Thanks!
(Why your find command doesn't work is already explained in bishop's (now deleted) answer — I'm only giving an alternative to find).
You can replace find by a shell for loop as so:
for i in 0*/; do mkdir "$i"js; done
mkdir js {} tries to create two directories; you want mkdir {}/js.
To prevent find from repeatedly finding your new directory, ignore any directory named js.
find 0* ! -path '*/js' -type d -exec mkdir {}/js \;
I'm not 100% sure of your directory structure after your edit, but give this a whirl:
cd /path/to/my/root/folder
find . -maxdepth 1 ! -path . -type d -exec mkdir -p {}/js \;
Seems to work ok:
$ cd /path/to/my/root/folder
$ tree
.
├── 001
│   └── js
└── 002
$ find . -maxdepth 1 ! -path . -type d -exec mkdir -p {}/js \;
.
├── 001
│   └── js
└── 002
└── js
What this find does: In the current directory (.), it finds sub-directories (-type d) -- except the current directory itself (! -path .) and any sub-sub-directories (-maxdepth 1). In those found directories, it creates the desired sub-directory (-exec ...). The mkdir -p part creates the directory and silences any errors about parents not existing. find replaces the {} part with the actual directory it found.

Count number of specific file type of a directory and its sub dir in mac

I use ls -l *.filetype | wc -l but it can only find files in current directory.
How can I also count all files with specific extension in its sub dirs?
Thank you very much.
You can do that with find command:
find . -name "*.filetype" | wc -l
The following compound command, albeit somewhat verbose, guarantees an accurate count because it handles filenames that contain newlines correctly:
total=0; while read -rd ''; do ((total++)); done < <(find . -name "*.filetype" -print0) && echo "$total"
Note: Before running the aforementioned compound command:
Firstly, cd to the directory that you want to count all files with specific extension in.
Change the filetype part as appropriate, e.g. txt
Demo:
To further demonstrate why piping the results of find to wc -l may produce incorrect results:
Run the following compound command to quickly create some test files:
mkdir -p ~/Desktop/test/{1..2} && touch ~/Desktop/test/{1..2}/a-file.txt && touch ~/Desktop/test/{1..2}/$'b\n-file.txt'
This produces the following directory structure on your "Desktop":
test
├── 1
│   ├── a-file.txt
│   └── b\n-file.txt
└── 2
├── a-file.txt
└── b\n-file.txt
Note: It contains a total of four .txt files. Two of which have multi-line filenames, i.e. b\n-file.txt.
On newer version of macOS the files named b\n-file.txt will appear as b?-file.txt in the "Finder", i.e. A question mark indicates the newline in a multi-line filename
Then run the following command that pipes the results of find to wc -l:
find ~/Desktop/test -name "*.txt" | wc -l
It incorrectly reports/prints:
6
Then run the following suggested compound command:
total=0; while read -rd ''; do ((total++)); done < <(find ~/Desktop/test -name "*.txt" -print0) && echo "$total"
It correctly reports/prints:
4

Resources