Bash script to separate files into directories, reverse sort and print in an HTML file works on some files but not others - bash

Goal
Separate files into directories according to their filenames, run a Bash script that reverse sorts them and assembles the content into one file (I know steps to achieve this are already documented on Stack Overflow, but please keep reading...)
Problem
Scripts work on all files but two
State
Root directory
dos-18-1-18165-03-for-sql-server-2012---15-june-2018.html
dos-18-1-18165-03-for-sql-server-2016---15-june-2018.html
dos-18-1-18176-03-for-sql-server-2012---10-july-2018.html
dos-18-1-18197-01-for-sql-server-2012---23-july-2018.html
dos-18-1-18197-01-for-sql-server-2016---23-july-2018.html
dos-18-1-18232-01-for-sql-server-2012---21-august-2018.html
dos-18-1-18232-01-for-sql-server-2016---21-august-2018.html
dos-18-1-18240-01-for-sql-server-2012---5-september-2018.html
dos-18-1-18240-01-for-sql-server-2016---5-september-2018.html
dos-18-2-release-notes.html
dos-18-2-known-issues.html
Separate the files into directories according to their SQL Server version or name
ls | grep "^dos-18-1.*2012.*" | xargs -i cp {} dos181-2012
ls | grep "^dos-18-1.*2016.*" | xargs -i cp {} dos181-2016
ls | grep ".*notes.*" | xargs -i cp {} dos-18-2-release-notes
ls | grep ".*known.*" | xargs -i cp {} dos-18-2-known-issues
Result (success)
/dos181-2012:
dos-18-1-18165-03-for-sql-server-2012---15-june-2018.html
dos-18-1-18176-03-for-sql-server-2012---10-july-2018.html
dos-18-1-18197-01-for-sql-server-2012---23-july-2018.html
dos-18-1-18232-01-for-sql-server-2012---21-august-2018.html
dos-18-1-18240-01-for-sql-server-2012---5-september-2018.html
/dos181-2016:
dos-18-1-18165-03-for-sql-server-2016---15-june-2018.html
dos-18-1-18197-01-for-sql-server-2016---23-july-2018.html
dos-18-1-18232-01-for-sql-server-2016---21-august-2018.html
dos-18-1-18240-01-for-sql-server-2016---5-september-2018.html
/dos-18-2-known-issues
dos-18-2-known-issues.html
/dos-18-2-release-notes
dos-18-2-release-notes.html
Variables (all follow this pattern)
dos181-2012.sh
file="dos181-2012"
export
dos-18-2-known-issues
file="dos-18-2-known-issues"
export
Reverse sort and assemble (assumes /$file exists; after testing all lines of code I believe this is where the problem lies):
cat $( ls "$file"/* | sort -r ) > "$file"/"$file".html
Result (success and failure)
dos181-2012.html has the correct content in the correct order.
dos-18-2-known-issues.html is empty.
What I have tried
I tried to ignore the two files in the command:
cat $( ls "$file"/* -i (grep ".*known.*" ) | sort -r ) > "$file"/"$file".html
Result: The opposite occurs
dos181-2012.html is empty
dos-18-2-known-issues.html is not empty
Thank you
I am completely baffled. Why do these scripts work on some files but not others? (I can share more information about the file contents if that will help, but the file contents are nearly identical.) Thank you for any insights.

first off, you question is quite incomplete. You start great, showing the input files and directories. But then you talk about variables and $files, but you do not show the code from which these originate. So I based my answer on the explanation in the first paragraph and what I deduced from the rest of the question.
I did this:
#!/bin/bash
cp /etc/hosts dos-18-1-18165-03-for-sql-server-2012---15-june-2018.html
cp /etc/hosts dos-18-1-18165-03-for-sql-server-2016---15-june-2018.html
cp /etc/hosts dos-18-1-18176-03-for-sql-server-2012---10-july-2018.html
cp /etc/hosts dos-18-1-18197-01-for-sql-server-2012---23-july-2018.html
cp /etc/hosts dos-18-1-18197-01-for-sql-server-2016---23-july-2018.html
cp /etc/hosts dos-18-1-18232-01-for-sql-server-2012---21-august-2018.html
cp /etc/hosts dos-18-1-18232-01-for-sql-server-2016---21-august-2018.html
cp /etc/hosts dos-18-1-18240-01-for-sql-server-2012---5-september-2018.html
cp /etc/hosts dos-18-1-18240-01-for-sql-server-2016---5-september-2018.html
cp /etc/hosts dos-18-2-release-notes.html
cp /etc/hosts dos-18-2-known-issues.html
DIRS='dos181-2012 dos181-2016 dos-18-2-release-notes dos-18-2-known-issues'
for DIR in $DIRS
do
if [ ! -d $DIR ]
then
mkdir $DIR
fi
done
cp dos-18-1*2012* dos181-2012
cp dos-18-1*2016* dos181-2016
cp *notes* dos-18-2-release-notes
cp *known* dos-18-2-known-issues
for DIR in $DIRS
do
/bin/ls -c1r $DIR >$DIR.html
done
The cp commands are just to create the files with something in them.
You did not specify how the directory names were produced, so I went with the easy option and listed them in a variable ($DIRS). These could be built based on the filenames, but you did not mention that.
Then created the directories (first for).
Then 4 cp commands. Your code is very complicated for something so basic. cp, like rm;mv;ls;... can do wildcard expansion, so there is no need for complex grep and xargs to copy files around.
Finally in the last for loop, list the files (ls), in 1 column (-c1, strictly output formatting), reversed the sort order (-r). The result of that ls is sent to a ".html" file of the same name as the directory.

Related

How to show subdirectories using SFTP ls command

I'm trying to get subdirectories list in a folder
echo "ls -1 /path/to/folder/*/" | sftp -i /path/to/key user#host | grep -v 'sftp>'
If there is more than one subdirectory I get list of subdirectories:
/path/to/folder/subdirectory1/
/path/to/folder/subdirectory2/
If there is only one subdirectory I get nothing.
Thank you for your suggestions.
Note: using SSH is not allowed
If there is only one subdirectory I get nothing.
You should only get nothing if the only one subdirectory is empty, because ls if given a single directory argument lists its contents. With the normal ls we could solve this problem simply by means of the option -d, but unfortunately sftp's ls doesn't have that option. The only way coming to my mind is to filter the desired directories from a long listing:
echo "ls -l /path/to/folder" | sftp -i /path/to/key user#host | awk '/^d/{print "/path/to/folder/"$NF}'

Rename files in bash based on content inside

I have a directory which has 70000 xml files in it. Each file has a tag which looks something like this, for the sake of simplicity:
<ns2:apple>, <ns2:orange>, <ns2:grapes>, <ns2:melon>. Each file has only one fruit tag, i.e. there cannot be both apple and orange in the same file.
I would like rename every file (add "1_" before the beginning of each filename) which has one of: <ns2:apple>, <ns2:orange>, <ns2:melon> inside of it.
I can find such files with egrep:
egrep -r '<ns2:apple>|<ns2:orange>|<ns2:melon>'
So how would it look as a bash script, which I can then user as a cron job?
P.S. Sorry I don't have any bash script draft, I have very little experience with it and the time is of the essence right now.
This may be done with this script:
#!/bin/sh
find /path/to/directory/with/xml -type f | while read f; do
grep -q -E '<ns2:apple>|<ns2:orange>|<ns2:melon>' "$f" && mv "$f" "1_${f}"
done
But it will rescan the directory each time it runs and append 1_ to each file containing one of your tags. This means a lot of excess IO and files with certain tags will be getting 1_ prefix each run, resulting in names like 1_1_1_1_file.xml.
Probably you should think more on design, e.g. move processed files to two directories based on whether file has certain tags or not:
#!/bin/sh
# create output dirs
mkdir -p /path/to/directory/with/xml/with_tags/ /path/to/directory/with/xml/without_tags/
find /path/to/directory/with/xml -maxdepth 1 -mindepth 1 -type f | while read f; do
if grep -q -E '<ns2:apple>|<ns2:orange>|<ns2:melon>'; then
mv "$f" /path/to/directory/with/xml/with_tags/
else
mv "$f" /path/to/directory/with/xml/without_tags/
fi
done
Run this command as a dry run, then remove --dry_run to actually rename the files:
grep -Pl '(<ns2:apple>|<ns2:orange>|<ns2:melon>)' *.xml | xargs rename --dry-run 's/^/1_/'
The command-line utility rename comes in many flavors. Most of them should work for this task. I used the rename version 1.601 by Aristotle Pagaltzis. To install rename, simply download its Perl script and place into $PATH. Or install rename using conda, like so:
conda install rename
Here, grep uses the following options:
-P : Use Perl regexes.
-l : Suppress normal output; instead print the name of each input file from which output would normally have been printed.
SEE ALSO:
grep manual

Shell Script: How to copy files with specific string from big corpus

I have a small bug and don't know how to solve it. I want to copy files from a big folder with many files, where the files contain a specific string. For this I use grep, ack or (in this example) ag. When I'm inside the folder it matches without problem, but when I want to do it with a loop over the files in the following script it doesn't loop over the matches. Here my script:
ag -l "${SEARCH_QUERY}" "${INPUT_DIR}" | while read -d $'\0' file; do
echo "$file"
cp "${file}" "${OUTPUT_DIR}/${file}"
done
SEARCH_QUERY holds the String I want to find inside the files, INPUT_DIR is the folder where the files are located, OUTPUT_DIR is the folder where the found files should be copied to. Is there something wrong with the while do?
EDIT:
Thanks for the suggestions! I took this one now, because it also looks for files in subfolders and saves a list with all the files.
ag -l "${SEARCH_QUERY}" "${INPUT_DIR}" > "output_list.txt"
while read file
do
echo "${file##*/}"
cp "${file}" "${OUTPUT_DIR}/${file##*/}"
done < "output_list.txt"
Better implement it like below with a find command:
find "${INPUT_DIR}" -name "*.*" | xargs grep -l "${SEARCH_QUERY}" > /tmp/file_list.txt
while read file
do
echo "$file"
cp "${file}" "${OUTPUT_DIR}/${file}"
done < /tmp/file_list.txt
rm /tmp/file_list.txt
or another option:
grep -l "${SEARCH_QUERY}" "${INPUT_DIR}/*.*" > /tmp/file_list.txt
while read file
do
echo "$file"
cp "${file}" "${OUTPUT_DIR}/${file}"
done < /tmp/file_list.txt
rm /tmp/file_list.txt
if you do not mind doing it in just one line, then
grep -lr 'ONE\|TWO\|THREE' | xargs -I xxx -P 0 cp xxx dist/
guide:
-l just print file name and nothing else
-r search recursively the CWD and all sub-directories
match these works alternatively: 'ONE' or 'TWO' or 'THREE'
| pipe the output of grep to xargs
-I xxx name of the files is saved in xxx it is just an alias
-P 0 run all the command (= cp) in parallel (= as fast as possible)
cp each file xxx to the dist directory
If i understand the behavior of ag correctly, then you have to
adjust the read delimiter to '\n' or
use ag -0 -l to force delimiting by '\0'
to solve the problem in your loop.
Alternatively, you can use the following script, that is based on find instead of ag.
while read file; do
echo "$file"
cp "$file" "$OUTPUT_DIR/$file"
done < <(find "$INPUT_DIR" -name "*$SEARCH_QUERY*" -print)

Shell: Copy list of files with full folder structure stripping N leading components from file names

Consider a list of files (e.g. files.txt) similar (but not limited) to
/root/
/root/lib/
/root/lib/dir1/
/root/lib/dir1/file1
/root/lib/dir1/file2
/root/lib/dir2/
...
How can I copy the specified files (not any other content from the folders which are also specified) to a location of my choice (e.g. ~/destination) with a) intact folder structure but b) N folder components (in the example just /root/) stripped from the path?
I already managed to use
cp --parents `cat files.txt` ~/destination
to copy the files with an intact folder structure, however this results in all files ending up in ~/destination/root/... when I'd like to have them in ~/destination/...
I think I found a really nice an concise solution by using GNU tar:
tar cf - -T files.txt | tar xf - -C ~/destination --strip-components=1
Note the --strip-components option that allows to remove an arbitrary number of path components from the beginning of the file name.
One minor problem though: It seems tar always "compresses" the whole content of folders mentioned in files.txt (at least I couldn't find an option to ignore folders), but that is most easily solved using grep:
cat files.txt | grep -v '/$' > files2.txt
This might not be the most graceful solution - but it works:
for file in $(cat files.txt); do
echo "checking for $file"
if [[ -f "$file" ]]; then
file_folder=$(dirname "$file")
destination_folder=/destination/${file_folder#/root/}
echo "copying file $file to $destination_folder"
mkdir -p "$destination_folder"
cp "$file" "$destination_folder"
fi
done
I had a look at cp and rsync, but it looks like they would benefit more if you to cd into /root first.
However, if you did cd to the correct directory before hand, you could always run it as a subshell so that you would be returned to your original location once the subshell has finished.

Do actions in each folder from current directory via terminal

I'm trying to run a series of commands on a list of files in multiple directories located directly under the current branch.
An example hierarchy is as follows:
/tmp
|-1
| |-a.txt
| |-b.txt
| |-c.txt
|-2
| |-a.txt
| |-b.txt
| |-c.txt
From the /tmp directory I'm sitting at my prompt and I'm trying to run a command against the a.txt file by renaming it to d.txt.
How do I get it to go into each directory and rename the file? I've tried the following and it won't work:
for i in ./*; do
mv "$i" $"(echo $i | sed -e 's/a.txt/d.txt/')"
done
It just doesn't jump into each directory. I've also tried to get it to create files for me, or folders under each hierarchy from the current directory just 1 folder deep, but it won't work using this:
for x in ./; do
mkdir -p cats
done
OR
for x in ./; do
touch $x/cats.txt
done
Any ideas ?
Place the below script in your base directory
#!/bin/bash
# Move 'a.txt's to 'd.txt's recursively
mover()
{
CUR_DIR=$(dirname "$1")
mv "$1" "$CUR_DIR/d.txt"
}
export -f mover
find . -type f -name "a.txt" -exec bash -c 'mover "$0"' {} \;
and execute it.
Note:
If you wish be a bit more innovative and generalize the script, you could accept directory name to search for as a parameter to the script and pass the directory name to find
> for i in ./*; do
As per your own description, this will assign ./1 and then ./2 to i. Neither of those matches any of the actual files. You want
for i in ./*/*; do
As a further aside, the shell is perfectly capable of replacing simple strings using glob patterns. This also coincidentally fixes the problem with not quoting $i when you echo it.
mv "$i" "${i%/a.txt}/d.txt"

Resources