Exclude only specific folder in tar command

Exclude only specific folder in tar command - shell

I want to tar a directory that looks like this:
dir
└── workspace
└── node_modules
└── subfolder
└── workspace
└── node_modules
└── other_folder
I want to exclude all folders named node_modules and exclude the top level folder called workspace, but no sub folders called workspace.
So what I want to end up with is this:
dir
└── subfolder
└── workspace
└── other_folder
I'm running this command: tar -czf ./output.tar.gz --exclude=node_modules --exclude=./workspace dir/.
But it's removing all folders called workspace and node_modules, so I instead end up with this:
dir
└── subfolder
└── other_folder
How do I remove only the specific workspace folder that I want, and not all folders with the same name?

For the required case, possible to use tar excludes:
--exclude dir/./folder -- apply to folder directly under dir
--exclude folder -- will exclude folder anywhere in the tree
Should be possible to use:
tar -czf ./output.tar.gz --exclude=node_modules --exclude=dir/./workspace dir/.
Of course possible to use --files-from, and to generate the list using another tool. This is usually preferred when the list could be large number of files, vs using xargs.
find dir/. -type f ... | tar cvz ./output.tar.gz -T-

find has many, many, many options for including, excluding paths, files, directories, generally filtering options however you want to.
For your case I think it would be:
# exclude all folders named node_modules
# exclude the top level folder called workspace
# but no sub folders called workspace
find dir -type f \
-not -regex '.*/node_modules/.*' -a \
-not -regex 'dir/workspace/.*' \
-exec tar -czf ./output.tar.gz {} +
You may prefer instead of -exec for example find ... -print0 | xargs -0 tar -czf ./output.tar.gz. I think the best would be find ... -print0 | tar -czf ./output.tar.gz --null -T - as it will not fail if there are too many files, ie. too many arguments to pass to tar, I think.
I recreated dir directory with:
while read l; do
mkdir -p "$(dirname "$l")"
touch "$l"
done <<EOF
dir/workspace/1.txt
dir/node_modules/2.txt
dir/subfolder/workspace/3.txt
dir/subfolder/node_modules/4.txt
dir/subfolder/other_folder/5.txt
EOF
then tested on repl and the tar -tf ./output.tar.gz prints:
dir/subfolder/workspace/3.txt
dir/subfolder/other_folder/5.txt

Related

Find files inside subfolders bash

I want to write a script which takes in a folder and deletes all files within subfolders in that folder.
eg:
abc
a.txt
b.txt
efg
e.txt
x.txt
The script when run, should delete a.txt, b.txt and e.txt and not x.txt(since it is not inside a folder).

The first thing you want to decide when you write a bash script is to decide which command you want to use.
The find command returns all files in a folder, recursively.
find ${dir} -name "*.txt" -delete
The above command searches the dir(directory stored in a variable) for file names ending with .txt and deletes them.
But what if you want to find files within sub directories only?
You could use:
find ${dir}/*/ -name "*.txt" -delete
Notice how we added /*/ to denote that find for all folders inside this folder.
You could additionally add the check -type f to affirm that we are deleting a file and not anything else.

With find command:
Sample test folder structure:
$ tree test
test
├── abc
│   ├── a.txt
│   └── b.txt
├── efg
│   └── e.txt
└── x.txt
The crucial command:
find test -mindepth 2 -type f -delete
Viewing results:
$ tree test
test
├── abc
├── efg
└── x.txt

this one here
find */ -name *.txt -type f | xargs rm -f

Bash: find and concatenate files

I have the following structure:
/home/
├── DIR1/
│ └── file_ab.csv
├── DIR2/
│ └── file_cd.csv
└── DIR3/
└── file3_ef.csv
Where file_**.csv contains rows of floats, different floats for each DIR.
I want to grab the contents of all of the file_**.csv files and concatenate them.
I found this answer here:
find /home -type f -name '*.csv' -exec cat {} \; > pl_parameters
But I get an empty file called 'pl_parameters'. Why is the file empty? How can I fix this?

find /home/DIR* -name 'file*csv' |xargs cat > output.csv
find /home/DIR* -name '*csv' gives you the files absolute paths.
xargs cat will iterate the files and cat print the files content

With Bash 4.0+, you can use globstar and use a more straight forward command:
shopt -s globstar
cd /home
cat **/*.csv > pl_parameters
**/ expands to the entire directory tree underneath the current directory.
Your command:
find /home -type f -name '*.csv' -exec cat {} \; > pl_parameters
looks good to me - not sure why you got a zero by output file.

Renaming ZIP files according to the parent directory name

For a number of files I want to get the parent directory and append its name to the filename. For example, in the following path:
A/B/C/file.zip
I want to rename file.zip to file_C.zip.
Here is my code. I have to find directory which does not contain subdirectory and zip files in it, and I want to rename it to refer to the parent directory.
find ${WORKDIR} -daystart -mtime +3 -type d -links 2 -exec bash -c 'zip -rm "${1%}".zip "$1"' _ {} \;

Here is a pure Bash solution:
find "$WORKDIR" -type f -name '*.zip' | while read file
do
basename=$(basename "$file")
dirname=$(dirname "$file")
suffix=$(basename "$dirname")
if [[ "$basename" != *"_${suffix}.zip" ]]; then
mv -v "$file" "${dirname}/${basename%.zip}_${suffix}.zip"
fi
done
The script processes all *.zip files found in $WORKDIR with a loop. In the loop it checks whether $file already has a suffix equal to the parent directory name. If it hasn't such suffix, the script renames the file appending "_{parent_directory_name}" to the filename just before the extension.
Sample Tree
A
├── B
│   ├── abc.zip.zip
│   └── C
│   └── file_C.zip
└── one.zip
Sample Output
‘./t/A/one.zip’ -> ‘./t/A/one_A.zip’
‘./t/A/B/abc.zip.zip’ -> ‘./t/A/B/abc.zip_B.zip’
A
├── B
│   ├── abc.zip_B.zip
│   └── C
│   └── file_C.zip
└── one_A.zip
where WORKDIR=./t.
Note, I deliberately simplified the find command, as it is not important for the algorithm. You can adjust the options according to your needs.

The best tool for this job is the rename utility that comes with Perl. (Beware that util-linux also contains a utility named rename. It is not what you want. If you have that on your system, investigate how to get rid of it and replace it with the one that comes with Perl.)
With this utility, it's as simple as
find $WORKDIR -name '*.zip' -exec \
rename 's:/([^/]+)/(.+?)\.zip$:/${2}_${1}.zip:' '{}' +
You can stick arbitrary Perl code in that first argument, which makes it even more powerful than it looks from this example.
Note that your find command appears to do something unrelated, involving the creation of .zip files.

Create a folder inside other folders using bash

I have a long list of folders that are siblings to each other, they all start with "0" and are numerically named (001, 002, 003...) but names are not only numerical and are not correlative (for example I have 0010_foo, 0032_bar, 0150_baz, etc).
I need to create a new folder (js) inside each of the folders on my list. I'd like to do it recursively using the command line.
I've tried:
$ cd path/to/my/root/folder
$ find 0* -type d -exec mkdir js {} \;
But I get an error for each attempt: "mkdir: js: file exists". No need to say there's no directory named js inside my folders but they are files with .js extension.
Where is the error in my command and how can I fix it? Thanks!

(Why your find command doesn't work is already explained in bishop's (now deleted) answer — I'm only giving an alternative to find).
You can replace find by a shell for loop as so:
for i in 0*/; do mkdir "$i"js; done

mkdir js {} tries to create two directories; you want mkdir {}/js.
To prevent find from repeatedly finding your new directory, ignore any directory named js.
find 0* ! -path '*/js' -type d -exec mkdir {}/js \;

I'm not 100% sure of your directory structure after your edit, but give this a whirl:
cd /path/to/my/root/folder
find . -maxdepth 1 ! -path . -type d -exec mkdir -p {}/js \;
Seems to work ok:
$ cd /path/to/my/root/folder
$ tree
.
├── 001
│   └── js
└── 002
$ find . -maxdepth 1 ! -path . -type d -exec mkdir -p {}/js \;
.
├── 001
│   └── js
└── 002
└── js
What this find does: In the current directory (.), it finds sub-directories (-type d) -- except the current directory itself (! -path .) and any sub-sub-directories (-maxdepth 1). In those found directories, it creates the desired sub-directory (-exec ...). The mkdir -p part creates the directory and silences any errors about parents not existing. find replaces the {} part with the actual directory it found.

How to copy a directory structure but only include certain files

I found a solution for my question in Windows but I'm using Ubuntu: How to copy a directory structure but only include certain files using Windows batch files?
As the title says, how can I recursively copy a directory structure but only include some files? For example, given the following directory structure:
folder1
folder2
folder3
data.zip
info.txt
abc.xyz
folder4
folder5
data.zip
somefile.exe
someotherfile.dll
The files data.zip and info.txt can appear everywhere in the directory structure. How can I copy the full directory structure, but only include files named data.zip and info.txt (all other files should be ignored)?
The resulting directory structure should look like this:
copy_of_folder1
folder2
folder3
data.zip
info.txt
folder4
folder5
data.zip
Could you tell me a solution for Ubuntu?

$ rsync --recursive --include="data.zip" --include="*.txt" --filter="-! */" dir_1 copy_of_dir_1
To exclude dir3 regardless of where it is in the tree (even if it contains files that would match the --includes):
--exclude 'dir3/' (before `--filter`)
To exclude dir3 only at at specific location in the tree, specify an absolute path, starting from your source dir:
--exclude '/dir1/dir2/dir3/' (before `--filter`)
To exclude dir3 only when it's in dir2, but regardless of where dir2 is:
--exclude 'dir2/dir3/' (before `--filter`)
Wildcards can also be used in the path elements where * means a directory with any name and ** means multiple nested directories.
To specify only files and dirs to include, run two rsyncs, one for the files and one for the dirs. The problem with getting it done in a single rsync is that when you don't include a dir, rsync won't enter the dir and so won't discover any files in that branch that may be matching your include filter. So, you start by copying the files you want while not creating any dirs that would be empty. Then copy any dirs that you want.
$ rsync --recursive --prune-empty-dirs --include="*.txt" --filter="-! */" dir_1 copy_of_dir_1
$ rsync --recursive --include '/dir1/dir2/' --include '/dir3/dir4/' --filter="-! */" dir_1 copy_of_dir_1
You can combine these if you don't mind that your specified dirs don't get copied if they're empty:
$ rsync --recursive --prune-empty-dirs --include="*.txt" --include '/dir1/dir2/' --include '/dir3/dir4/' --filter="-! */" dir_1 copy_of_dir_1
The --filter="-! */" is necessary because rsync includes all files and folders that match none of the filters (imagine it as an invisible --include filter at the end of the list of filters). rsync checks each item to be copied against the list of filters and includes or excludes the item depending on the first match it finds. If there's no match, it hits that invisible --include and goes on to include the item. We wanted to change this default to --exclude, so we added an exclude filter (the - in -! */), then we negate the match (!) and match all dirs (*/). Since this is a negated match, the result is that we allow rsync to enter all the directories (which, as I mentioned earlier, allows rsync to find the files we want).
We use --filter instead of --exclude for the final filter because --exclude does not allow specifying negated matches with the ! operator.

I don't have a beautiful one liner, but since nobody else has answered you can always:
find . -name 'file_name.extension' -print | cpio -pavd /path/to/receiving/folder
For each specific file after copying the directories.
(Make sure you're in the original folder first, of course! :) )

Here is a one-liner using rsync:
rsync -a -f"+ info.txt" -f"+ data.zip" -f'-! */' folder1/ copy_of_folder1/
If you already have a file list, and want a more scalable solution
cat file.list | xargs -i rsync -a -f"+ {}" -f'-! */' folder1/ copy_of_folder1/

cp -pr folder1 copy_of_folder1; find copy_of_folder1 -type f ! \( -name data.zip -o -name info.txt \) -exec rm -f {} \;
first time : copy entirely folder1 to copy_of_folder1
second time : erase all files differents from data.zip and
info.txt
At the end, you have your complete structure with only the file data.zip and info.txt

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio