How to automate dos2unix using shell script? - bash

I have a bunch of xml files in a directory that need to have the dos2unix command performed on them and new files will be added every so often. I Instead of manually performing dos2unix command on each files everytime I would like to automate it all with a script. I have never even looked at a shell script in my life but so far I have this from what I have read on a few tutorials:
FILES=/tmp/testFiles/*
for f in $FILES
do
fname=`basename $f`
dos2unix *.xml $f $fname
done
However I keep getting the 'usage' output showing up. I think the problem is that I am not assigning the name of the new file correctly (fname).

The reason you're getting a usage message is that dos2unix doesn't take the extra arguments you're supplying. It will, however, accept multiple filenames (also via globs). You don't need a loop unless you're processing more files than can be accepted on the command line.
dos2unix /tmp/testFiles/*.xml
Should be all you need, unless you need recursion:
find /tmp/testFiles -name '*.xml' -exec dos2unix {} +
(for GNU find)

If all files are in one directory (no recursion needed) then you're almost there.
for file in /tmp/testFiles/*.xml ; do
dos2unix "$file"
done
By default dos2unix should convert in place and overwrite the original.
If recursion is needed you'll have to use find as well:
find /tmp/testFiles -name '*.xml' -print0 | while IFS= read -d '' file ; do
dos2unix "$file"
done
Which will work on all files ending with .xml in /tmp/testFiles/ and all of its sub-directories.
If no other step are required you can skip the shell loop entirely:
Non-recursive:
find /tmp/testFiles -maxdepth 1 -name '*.xml' -exec dos2unix {} +
And for recursive:
find /tmp/testFiles -name '*.xml' -exec dos2unix {} +
In your original command I see you finding the base name of each file name and trying to pass that to dos2unix, but your intent is not clear. Later, in a comment, you say you just want to overwrite the files. My solution performs the conversion, creates no backups and overwrites the original with the converted version. I hope this was your intent.

mkdir /tmp/testFiles/converted/
for f in /tmp/testFiles/*.xml
do
fname=`basename $f`
dos2unix $f ${f/testFiles\//testFiles\/converted\/}
# or for pure sh:
# dos2unix $f $(echo $f | sed s#testFiles/#testFiles/converted/#)
done
The result will be saved in the converted/ subdirectory.
The construction ${f/testFiles\//testFiles\/converted\/} (thanks to Rush)
or sed is used here to add converted/ before the name of the file:
$ echo /tmp/testFiles/1.xml | sed s#testFiles/#testFiles/converted/#
/tmp/testFiles/converted/1.xml

It is not clear which implementation of dos2unix you are using. Different implementations require different arguments. There are many different implementations around.
On RedHat/Fedora/Suse Linux you could just type
dos2unix /tmp/testFiles/*.xml
On SunOS you are required to give an input and output file name, and the above command would destroy several of your files.

Related

script read file contents and copy files

I wrote a script in bash that should read the contents of a text file, look for the corresponding files for each line, and copy them to another folder. It's not copying all the files, only two, the third and the last.
#!/bin/bash
filelist=~/Desktop/file.txt
sourcedir=~/ownCloud2
destdir=~/Desktop/file_out
while read line; do
find $sourcedir -name $line -exec cp '{}' $destdir/$line \;
echo find $sourcedir -name $line
sleep 1
done < "$filelist"
If I use this string on the command line it finds me and copies the file.
find ~/ownCloud2 -name 123456AA.pdf -exec cp '{}' ~/Desktop/file_out/123456AA.pdf \;
If I use the script instead it doesn't work.
I used your exact script and had no problems, for both bash or sh, so maybe you are using another shell in your shebang line.
Use find only when you need to find the file "somewhere" in multiple directories under the search start point.
If you know the exact directory in which the file is located, there is no need to use find. Just use the simple copy command.
Also, if you use "cp -v ..." instead of the "echo", you might see what the command is actually doing, from which you might spot what is wrong.

Rename files in bash based on content inside

I have a directory which has 70000 xml files in it. Each file has a tag which looks something like this, for the sake of simplicity:
<ns2:apple>, <ns2:orange>, <ns2:grapes>, <ns2:melon>. Each file has only one fruit tag, i.e. there cannot be both apple and orange in the same file.
I would like rename every file (add "1_" before the beginning of each filename) which has one of: <ns2:apple>, <ns2:orange>, <ns2:melon> inside of it.
I can find such files with egrep:
egrep -r '<ns2:apple>|<ns2:orange>|<ns2:melon>'
So how would it look as a bash script, which I can then user as a cron job?
P.S. Sorry I don't have any bash script draft, I have very little experience with it and the time is of the essence right now.
This may be done with this script:
#!/bin/sh
find /path/to/directory/with/xml -type f | while read f; do
grep -q -E '<ns2:apple>|<ns2:orange>|<ns2:melon>' "$f" && mv "$f" "1_${f}"
done
But it will rescan the directory each time it runs and append 1_ to each file containing one of your tags. This means a lot of excess IO and files with certain tags will be getting 1_ prefix each run, resulting in names like 1_1_1_1_file.xml.
Probably you should think more on design, e.g. move processed files to two directories based on whether file has certain tags or not:
#!/bin/sh
# create output dirs
mkdir -p /path/to/directory/with/xml/with_tags/ /path/to/directory/with/xml/without_tags/
find /path/to/directory/with/xml -maxdepth 1 -mindepth 1 -type f | while read f; do
if grep -q -E '<ns2:apple>|<ns2:orange>|<ns2:melon>'; then
mv "$f" /path/to/directory/with/xml/with_tags/
else
mv "$f" /path/to/directory/with/xml/without_tags/
fi
done
Run this command as a dry run, then remove --dry_run to actually rename the files:
grep -Pl '(<ns2:apple>|<ns2:orange>|<ns2:melon>)' *.xml | xargs rename --dry-run 's/^/1_/'
The command-line utility rename comes in many flavors. Most of them should work for this task. I used the rename version 1.601 by Aristotle Pagaltzis. To install rename, simply download its Perl script and place into $PATH. Or install rename using conda, like so:
conda install rename
Here, grep uses the following options:
-P : Use Perl regexes.
-l : Suppress normal output; instead print the name of each input file from which output would normally have been printed.
SEE ALSO:
grep manual

execute command on files returned by grep

Say I want to edit every .html file in a directory one after the other using vim, I can do this with:
find . -name "*.html" -exec vim {} \;
But what if I only want to edit every html file containing a certain string one after the other? I use grep to find files containing those strings, but how can I pipe each one to vim similar to the find command. Perphaps I should use something other than grep, or somehow pipe the find command to grep and then exec vim. Does anyone know how to edit files containing a certain string one after the other, in the same fashion the find command example I give above would?
grep -l 'certain string' *.html | xargs vim
This assumes you don't have eccentric file names with spaces etc in them. If you have to deal with eccentric file names, check whether your grep has a -z option to terminate output lines with null bytes (and xargs has a -0 option to read such inputs), and if so, then:
grep -zl 'certain string' *.html | xargs -0 vim
If you need to search subdirectories, maybe your version of Bash has support for **:
grep -zl 'certain string' **/*.html | xargs -0 vim
Note: these commands run vim on batches of files. If you must run it once per file, then you need to use -n 1 as extra options to xargs before you mention vim. If you have GNU xargs, you can use -r to prevent it running vim when there are no file names in its input (none of the files scanned by grep contain the 'certain string').
The variations can be continued as you invent new ways to confuse things.
With find :
find . -type f -name '*.html' -exec bash -c 'grep -q "yourtext" "${1}" && vim "${1}"' _ {} \;
On each files, calls bash commands that grep the file with yourtext and open it with vim if text is matching.
Solution with a for cycle:
for i in $(find . -type f -name '*.html'); do vim $i; done
This should open all files in a separate vim session once you close the previous.

Rename all files in a directory by omitting last 3 characters

I am trying to write a bash command that will rename all the files in the current directory by omitting the last 3 characters. I am not sure if it is possible thats why I am asking here.
I have a lots of files named like this : 720-1458907789605.ts
I need to rename all of them by omitting last 3 characters to obtain from 720-1458907789605.ts ---> 720-1458907789.ts for all files in the current directory.
Is it possible using bash commands? I am new to bash scripts.
Thank you!
Native bash solution:
for f in *.ts; do
[[ -f "$f" ]] || continue # if you do not need to rename directories
mv "$f" "${f:: -6}.ts"
done
This solution is slow if you have really many files: star-expansion in for will take up memory and time.
Ref: bash substring extraction.
If you have a really large data set, a bit more complex but faster solution will be:
find . -type f -name '*.ts' -depth 1 -print0 | while read -d $\0 f; do
mv "$f" "${f%???.ts}.ts"
done
With Larry Wall's rename:
rename -n 's/...\.ts$/.ts/' *.ts
If everything looks okay remove dry run option -n.

Move all files from subdirectory into a new directory without overwriting

I want to consolidate into 1 directory files that are in multiple subdirectories.
The following comes close except that the random string is added after the extension; I want it before the extension:
find . -type f -iname "[a-z,0-9]*" -exec bash -c 'mv -v "$0" "./$( mktemp "$( basename "$0" ).XXX" )"' '{}' \;
I've searched through dozens of other posts but nothing addressed the specifics of my situation:
I'm on OS X (so it's a BSD flavor of Bash; for ex. there's no -t option for mv)
Many of the files have identical names so I need to rewrite them during the mv (and I can't just use the -n option for mv because there too many files would thus not get moved)
The files are not all the same kind, so I need to use a find -type f
I want to exclude .DS_store files, so it seems like a good option is find -type f -iname "[a-z,0-9]*"
I want the rewritten files's names to be in the form of: oldname-random_string.xyz (but I'm also OK with having the files being renamed as a sequential list: 00001.xyz, 00002.xyz, etc.)
The files are buried 4 levels down from my master directory:
Master/Top dir
Dir 2
Dir 3
Dir 4
Dir 5
file
For the sake of simplicity I prefer a bash command to a .sh script (but I'm happy with either)
GNU Solution
This uses basically the same command that you were using but I supply a template to mktemp so that the XXX pattern appears just before the suffix. With GNU sed:
find . -type f -iname "[a-z,0-9]*" -exec bash -c 'mv -v "$1" "./$(mktemp -u "$(basename "$1" | sed -E -e '\''s/\.([^.]+)$/.XXX.\1/'\'' -e '\''/XXX/ !s/$/.XXX/'\'')" )"' _ '{}' \;
The key addition above is the use of sed to insert XXX before the suffix in the file name:
sed -E -e 's/\.([^.]+)$/.XXX.\1/' -e '/XXX/ !s/$/.XXX/'
This has two commands. The first puts .XXX before the extension. The second command is run only if the file name has no extension in which case it adds .XXX to the end of the file name.
In the first command, the source regex consists of two parts. The first is \. which matches a period. The second is ([^.]+)$ which captures the extension into group 1. The substitution replaces this with .XXX.\1 where \1 is sed notation for group 1 which, in our case, is the file's extension.
OSX Solution
Under OSX, mktemp is not useful because it only supports templates with the XXX part trailing. As a workaround, we can use a bash script that generates non-overlapping file names:
#!/bin/bash
find . -type f -iname "[a-z,0-9]*" -print0 |
while IFS= read -r -d '' fname
do
new=$(basename "$fname")
[ "$fname" = "./$new" ] && continue
[ "$new" = .DS_store ] && continue
name=${new%.*}
ext=${new#"$name"}
n=0
new=$(printf '%s.%03i%s' "$name" "$n" "$ext")
while [ -f "$new" ]
do
n=$(($n + 1))
new=$(printf '%s.%03i%s' "$name" "$n" "$ext")
done
mv -v "$fname" "$new"
done
The above uses the find command to get the file names. The option -print0 is used to assure that it works with difficult file names. The while loop reads these file names one by one, into the variable fname. fname includes the full path to the source file. The file name without the path is then stored in new. Then two checks are performed. If the source file is already in the current directory, the script continues on to the next loop. Similarly, if the file name id .DS_Store, it is also skipped. (The find command, as given, already skips these files. This line is there just for future flexibility.) Next, the file name is split into two parts: the name and ext, the extension. ext includes the leading period. Next, a loop checks for files of the form name.NNN.ext and stops at the first one that doesn't yet exist. The source file is moved to a file of that name.
Related Notes Regarding the GNU Solution and its Compatibility
Quoting in the above GNU command is complex. The argument to bash -c needs to be in single-quotes to prevent the calling bash from performing premature variable substitution. In addition, the sed commands need to be in single-quotes when executed by the bash subshell to prevent history expansion from interfering with the use of negation, !, within the sed command.
The OSX (BSD) sed does not support combining commands together with semicolons. Consequently, each command is supplied to sed via a separate -e option.
The OSX (BSD) sed seems to treat + differently from the GNU sed. This incompatibility seems to go away when using the -E (extended regex) option. (The corresponding GNU option is -r but, as an undocumented compatibility feature, GNU sed supports -E also.

Resources