Shell: Copy list of files with full folder structure stripping N leading components from file names - bash

Consider a list of files (e.g. files.txt) similar (but not limited) to
/root/
/root/lib/
/root/lib/dir1/
/root/lib/dir1/file1
/root/lib/dir1/file2
/root/lib/dir2/
...
How can I copy the specified files (not any other content from the folders which are also specified) to a location of my choice (e.g. ~/destination) with a) intact folder structure but b) N folder components (in the example just /root/) stripped from the path?
I already managed to use
cp --parents `cat files.txt` ~/destination
to copy the files with an intact folder structure, however this results in all files ending up in ~/destination/root/... when I'd like to have them in ~/destination/...

I think I found a really nice an concise solution by using GNU tar:
tar cf - -T files.txt | tar xf - -C ~/destination --strip-components=1
Note the --strip-components option that allows to remove an arbitrary number of path components from the beginning of the file name.
One minor problem though: It seems tar always "compresses" the whole content of folders mentioned in files.txt (at least I couldn't find an option to ignore folders), but that is most easily solved using grep:
cat files.txt | grep -v '/$' > files2.txt

This might not be the most graceful solution - but it works:
for file in $(cat files.txt); do
echo "checking for $file"
if [[ -f "$file" ]]; then
file_folder=$(dirname "$file")
destination_folder=/destination/${file_folder#/root/}
echo "copying file $file to $destination_folder"
mkdir -p "$destination_folder"
cp "$file" "$destination_folder"
fi
done
I had a look at cp and rsync, but it looks like they would benefit more if you to cd into /root first.
However, if you did cd to the correct directory before hand, you could always run it as a subshell so that you would be returned to your original location once the subshell has finished.

Related

Unpack .tar.gz and modify result files

I wanted to write a bash script that will unpack .tar.gz archives and for each result file it will set an additional attribute with the name of the original archive. Just to know what the origin is of the unpacked file.
I tried to store the inside files in an array and then for-loop them.
for archive in "$1"*.tar.gz; do
if [ -f "${archive}" ]
then
readarray -t fileNames < <(tar tzf "$archive")
for file in "${fileNames}"; do
echo "${file}"
tar xvzf "${archive}" -C "$1" --no-wildcards "${file}" &&
attr -s package -V "${archive}" "${file}"
done
fi
done
The result is that only one file is extracted and no extra attribute is set.
#! /bin/bash
for archive in "$1"*.tar.gz; do
if [ -f "${archive}" ] ; then
# Unpack the archive into subfolder $1
tar xvf "$archive" -C "$1"
# Assign attributes
tar tf "$archive" | (cd "$1" && xargs -t -L1 attr -s package -V "$archive" )
fi
done
Notes:
Script is unpacking each archive with a single 'tar'. This is more efficient than unpacing one file at a time. It also avoid issues with unpacking folders, which will lead to unnecessary repeated work.
Script is using 'attr'. Will be better to use 'setfattr', if supported on target file system to set attributes on multiple files with a few calls (using xargs, with multiple files per command)
It is not clear what is the structure of the output folder. From the question, it looks as if all archives will be placed into the same folder "$1". The following solution assume that this is the intended behavior, and that each archive will have distinct file names. If each archive is to be placed into different sub folder, it will be easier/more efficient to implement.

Bash script to separate files into directories, reverse sort and print in an HTML file works on some files but not others

Goal
Separate files into directories according to their filenames, run a Bash script that reverse sorts them and assembles the content into one file (I know steps to achieve this are already documented on Stack Overflow, but please keep reading...)
Problem
Scripts work on all files but two
State
Root directory
dos-18-1-18165-03-for-sql-server-2012---15-june-2018.html
dos-18-1-18165-03-for-sql-server-2016---15-june-2018.html
dos-18-1-18176-03-for-sql-server-2012---10-july-2018.html
dos-18-1-18197-01-for-sql-server-2012---23-july-2018.html
dos-18-1-18197-01-for-sql-server-2016---23-july-2018.html
dos-18-1-18232-01-for-sql-server-2012---21-august-2018.html
dos-18-1-18232-01-for-sql-server-2016---21-august-2018.html
dos-18-1-18240-01-for-sql-server-2012---5-september-2018.html
dos-18-1-18240-01-for-sql-server-2016---5-september-2018.html
dos-18-2-release-notes.html
dos-18-2-known-issues.html
Separate the files into directories according to their SQL Server version or name
ls | grep "^dos-18-1.*2012.*" | xargs -i cp {} dos181-2012
ls | grep "^dos-18-1.*2016.*" | xargs -i cp {} dos181-2016
ls | grep ".*notes.*" | xargs -i cp {} dos-18-2-release-notes
ls | grep ".*known.*" | xargs -i cp {} dos-18-2-known-issues
Result (success)
/dos181-2012:
dos-18-1-18165-03-for-sql-server-2012---15-june-2018.html
dos-18-1-18176-03-for-sql-server-2012---10-july-2018.html
dos-18-1-18197-01-for-sql-server-2012---23-july-2018.html
dos-18-1-18232-01-for-sql-server-2012---21-august-2018.html
dos-18-1-18240-01-for-sql-server-2012---5-september-2018.html
/dos181-2016:
dos-18-1-18165-03-for-sql-server-2016---15-june-2018.html
dos-18-1-18197-01-for-sql-server-2016---23-july-2018.html
dos-18-1-18232-01-for-sql-server-2016---21-august-2018.html
dos-18-1-18240-01-for-sql-server-2016---5-september-2018.html
/dos-18-2-known-issues
dos-18-2-known-issues.html
/dos-18-2-release-notes
dos-18-2-release-notes.html
Variables (all follow this pattern)
dos181-2012.sh
file="dos181-2012"
export
dos-18-2-known-issues
file="dos-18-2-known-issues"
export
Reverse sort and assemble (assumes /$file exists; after testing all lines of code I believe this is where the problem lies):
cat $( ls "$file"/* | sort -r ) > "$file"/"$file".html
Result (success and failure)
dos181-2012.html has the correct content in the correct order.
dos-18-2-known-issues.html is empty.
What I have tried
I tried to ignore the two files in the command:
cat $( ls "$file"/* -i (grep ".*known.*" ) | sort -r ) > "$file"/"$file".html
Result: The opposite occurs
dos181-2012.html is empty
dos-18-2-known-issues.html is not empty
Thank you
I am completely baffled. Why do these scripts work on some files but not others? (I can share more information about the file contents if that will help, but the file contents are nearly identical.) Thank you for any insights.
first off, you question is quite incomplete. You start great, showing the input files and directories. But then you talk about variables and $files, but you do not show the code from which these originate. So I based my answer on the explanation in the first paragraph and what I deduced from the rest of the question.
I did this:
#!/bin/bash
cp /etc/hosts dos-18-1-18165-03-for-sql-server-2012---15-june-2018.html
cp /etc/hosts dos-18-1-18165-03-for-sql-server-2016---15-june-2018.html
cp /etc/hosts dos-18-1-18176-03-for-sql-server-2012---10-july-2018.html
cp /etc/hosts dos-18-1-18197-01-for-sql-server-2012---23-july-2018.html
cp /etc/hosts dos-18-1-18197-01-for-sql-server-2016---23-july-2018.html
cp /etc/hosts dos-18-1-18232-01-for-sql-server-2012---21-august-2018.html
cp /etc/hosts dos-18-1-18232-01-for-sql-server-2016---21-august-2018.html
cp /etc/hosts dos-18-1-18240-01-for-sql-server-2012---5-september-2018.html
cp /etc/hosts dos-18-1-18240-01-for-sql-server-2016---5-september-2018.html
cp /etc/hosts dos-18-2-release-notes.html
cp /etc/hosts dos-18-2-known-issues.html
DIRS='dos181-2012 dos181-2016 dos-18-2-release-notes dos-18-2-known-issues'
for DIR in $DIRS
do
if [ ! -d $DIR ]
then
mkdir $DIR
fi
done
cp dos-18-1*2012* dos181-2012
cp dos-18-1*2016* dos181-2016
cp *notes* dos-18-2-release-notes
cp *known* dos-18-2-known-issues
for DIR in $DIRS
do
/bin/ls -c1r $DIR >$DIR.html
done
The cp commands are just to create the files with something in them.
You did not specify how the directory names were produced, so I went with the easy option and listed them in a variable ($DIRS). These could be built based on the filenames, but you did not mention that.
Then created the directories (first for).
Then 4 cp commands. Your code is very complicated for something so basic. cp, like rm;mv;ls;... can do wildcard expansion, so there is no need for complex grep and xargs to copy files around.
Finally in the last for loop, list the files (ls), in 1 column (-c1, strictly output formatting), reversed the sort order (-r). The result of that ls is sent to a ".html" file of the same name as the directory.

Gzip no such file or directory error, still zips files

I'm just learning shell scripting specifically in bash, I want to be able to use gzip to take files from a target directory and send them to a different directory. I enter directories in the command line. ext is for the extensions I want to zip and file will be the new zipped file. My script zips the files correctly, to and from the desired directories, but I get a no such file or directory error. How do I avoid this?
Current code
cd $1
for ext in $*; do
for file in `ls *.$ext`; do
gzip -c $file > $2/$file.gz
done
done
and my I/O
blackton#ltsp-amd64-charlie:~/Desktop/60256$ bash myCompress /home/blackton/Desktop/ /home/blackton/ txt
ls: cannot access *./home/blackton/Desktop/: No such file or directory
ls: cannot access *./home/blackton/: No such file or directory
gzip: alg: No such file or directory
gzip: proj.txt: No such file or directory
There are two separate things causing problems here.
In your outer loop
for ext in $*; do
done
you are looping over all the command line parameters, using each as the extension to search for - including the directory names.
Since the extension is the third parameter, you only want to run the inner loop once on $3:
for file in `ls *.$3`; do
gzip -c $file > $2/$file.gz
done
The next problem is spaces.
You do not want to run ls here - the wildcard expansion will provide the filenames directly, e.g. for file in *.$3, and it will fill $file with a whole filename at a time. The output from ls is split on each space, so you end up with two filenames alg and proj.txt, instead of one alg proj.txt.
That is not enough by itself, though. You also need to quote $file whenever you use it, so the command expands to gzip -c "alg proj.txt" instead of gzip -c alg proj.txt, which tells gzip to compress two files. In general, all variable expansions that you expect to be a filename should be quoted:
cd "$1"
for file in *."$3"; do
gzip -c "$file" > "$2/$file.gz"
done
One further problem is that if there are no files matching the extension, the wildcard will not expand and the command executed will be
gzip -c "*.txt" > "dir/*.txt.gz"
This will create a file that is literally called "*.txt.gz" in the target directory. A simple way to avoid this would be to check that the original file exists first - this will also avoid accidentally trying to gzip an oddly named directory.
cd "$1"
for file in *."$3"; do
if [ -f "$file" ]; then
gzip -c "$file" > "$2/$file.gz"
fi
done
you can try this;
#!/bin/bash
Src=$1
Des=$2
ext="txt"
for file in $Src/*; do
if [ "${file##*.}" = "${ext}" ]; then
base=$(basename $file)
mkdir -p $2 #-p ensures creation if directory does not exist
gzip -c $file > $Des/$base.gz
fi
done

Collapse nested directories in bash

Often after unzipping a file I end up with a directory containing nothing but another directory (e.g., mkdir foo; cd foo; tar xzf ~/bar.tgz may produce nothing but a bar directory in foo). I wanted to write a script to collapse that down to a single directory, but if there are dot files in the nested directory it complicates things a bit.
Here's a naive implementation:
mv -i $1/* $1/.* .
rmdir $1
The only problem here is that it'll also try to move . and .. and ask overwrite ./.? (y/n [n]). I can get around this by checking each file in turn:
IFS=$'\n'
for file in $1/* $1/.*; do
if [ "$file" != "$1/." ] && [ "$file" != "$1/.." ]; then
mv -i $file .
fi
done
rmdir $1
But this seems like an inelegant workaround. I tried a cleaner method using find:
for file in $(find $1); do
mv -i $file .
done
rmdir $1
But find $1 will also give $1 as a result, which gives an error of mv: bar and ./bar are identical.
While the second method seems to work, is there a better way to achieve this?
Turn on the dotglob shell option, which allows the your pattern to match files beginning with ..
shopt -s dotglob
mv -i "$1"/* .
rmdir "$1"
First, consider that many tar implementations provide a --strip-components option that allows you to strip off that first path. Not sure if there is a first path?
tar -tf yourball.tar | awk -F/ '!s[$1]++{print$1}'
will show you all the first-level contents. If there is only that one directory, then
tar --strip-components=1 -tf yourball.tar
will extract the contents of that directory in tar into the current directory.
So that's how you can avoid the problem altogether. But it's also a solution to your immediate problem. Having extracted the files already, so you have
foo/bar/stuff
foo/bar/.otherstuff
you can do
tar -cf- foo | tar --strip-components=2 -C final_destination -xf-
The --strip-components feature is not part of the POSIX specification for tar, but it is on both the common GNU and OSX/BSD implementations.

Unzip ZIP file and extract unknown folder name's content

My users will be zipping up files which will look like this:
TEMPLATE1.ZIP
|--------- UnknownName
|------- index.html
|------- images
|------- image1.jpg
I want to extract this zip file as follows:
/mysite/user_uploaded_templates/myrandomname/index.html
/mysite/user_uploaded_templates/myrandomname/images/image1.jpg
My trouble is with UnknownName - I do not know what it is beforehand and extracting everything to the "base" level breaks all the relative paths in index.html
How can I extract from this ZIP file the contents of UnknownName?
Is there anything better than:
1. Extract everything
2. Detect which "new subdidrectory" got created
3. mv newsubdir/* .
4. rmdir newsubdir/
If there is more than one subdirectory at UnknownName level, I can reject that user's zip file.
I think your approach is a good one. Step 2 could be improved my extracting to a newly created directory (later deleted) so that "detection" is trivial.
# Bash (minimally tested)
tempdest=$(mktemp -d)
unzip -d "$tempdest" TEMPLATE1.ZIP
dir=("$tempdest"/*)
if (( ${#dir[#]} == 1 )) && [[ -d $dir ]]
# in Bash, etc., scalar $var is the same as ${var[0]}
mv "$dir"/* /mysite/user_uploaded_templates/myrandomname
else
echo "rejected"
fi
rm -rf "$tempdest"
The other option I can see other than the one you suggested is to use the unzip -j flag which will dump all paths and put all files into the current directory. If you know for certain that each of your TEMPLATE1.ZIP files includes an index.html and *.jpg files then you can just do something like:
destdir=/mysite/user_uploaded_templates/myrandomname
unzip -j -d "$destdir"
mkdir "${destdir}/images"
mv "${destdir}/*.jpg" "${destdir}/images"
It's not exactly the cleanest solution but at least you don't have to do any parsing like you do in your example. I can't seem to find any option similar to patch -p# that lets you specify the path level.
Each zip and unzip command differs, but there's usually a way to list the file contents. From there, you can parse the output to determine the unknown directory name.
On Windows, the 1996 Wales/Gaily/van der Linden/Rommel version it is unzip -l.
Of course, you could just simply allow the unzip to unzip the files to whatever directory it wants, then use mv to rename the directory to what you want it as.
$tempDir = temp.$$
mv $zipFile temp.$$
cd $tempDir
unzip $zipFile
$unknownDir = * #Should be the only directory here
mv $unknownDir $whereItShouldBe
cd ..
rm -rf $tempDir
It's always a good idea to create a temporary directory for these types of operations in case you end up running two instances of this command.

Resources