About: extracting *.gz files and move a original file to other folder - shell

I am almost new on shell script but don't know some commands.
I am trying to write below shell script , please give some direction.
1. Read *.gz files from specific directory
2. Extract it to other folder
3. Move a original file to another folder.
i can do it three separate shell scripts but i want it include one shell script. Then this script will be cronjob and will run every 5 minutes.
i was trying to start like below but somehow i am bit confused how to get filelist. I can do here another script but want to include in one script."
#!/bin/bash
while IFS= read file; do
gzip -c "$file" > "zipdir/$(basename "$file").gz"
done < filelist
-----------------------------------------
PS: Files are created in every 5 minutes.

There are several ways to implement what you're looking for (I would consider notify). Anyhow... this is a very simple implementation:
$ source=~/tmp/source # directory where .gz files will be created
$ target=~/tmp/target # target directory for uncompressed files
$ archive=~/tmp/archive # archive dir for .gz files
$ shopt -s nullglob # avoid retiring unexpanded paths
$ for gz in ${source}/*.gz ; do gzip -dc "$gz" > ${target}/$(basename "$gz" .gz) ; mv "$gz" ${archive}/ ; done
$ shopt -u nullglob # reset nullglob
If you know for sure "source" directory will always contain .gz files you can avoid shopt.
Another solution (not requiring shopt) is this:
find ${source} -name '*.gz' -print0 | while read -d '' -r gz; do
gzip -dc "$gz" > ${target}/$(basename "$gz" .gz)
mv "$gz" ${archive}/
done
The first line looks a little bit complicated because it manages source file names containing spaces...

Related

Rename files in bash based on content inside

I have a directory which has 70000 xml files in it. Each file has a tag which looks something like this, for the sake of simplicity:
<ns2:apple>, <ns2:orange>, <ns2:grapes>, <ns2:melon>. Each file has only one fruit tag, i.e. there cannot be both apple and orange in the same file.
I would like rename every file (add "1_" before the beginning of each filename) which has one of: <ns2:apple>, <ns2:orange>, <ns2:melon> inside of it.
I can find such files with egrep:
egrep -r '<ns2:apple>|<ns2:orange>|<ns2:melon>'
So how would it look as a bash script, which I can then user as a cron job?
P.S. Sorry I don't have any bash script draft, I have very little experience with it and the time is of the essence right now.
This may be done with this script:
#!/bin/sh
find /path/to/directory/with/xml -type f | while read f; do
grep -q -E '<ns2:apple>|<ns2:orange>|<ns2:melon>' "$f" && mv "$f" "1_${f}"
done
But it will rescan the directory each time it runs and append 1_ to each file containing one of your tags. This means a lot of excess IO and files with certain tags will be getting 1_ prefix each run, resulting in names like 1_1_1_1_file.xml.
Probably you should think more on design, e.g. move processed files to two directories based on whether file has certain tags or not:
#!/bin/sh
# create output dirs
mkdir -p /path/to/directory/with/xml/with_tags/ /path/to/directory/with/xml/without_tags/
find /path/to/directory/with/xml -maxdepth 1 -mindepth 1 -type f | while read f; do
if grep -q -E '<ns2:apple>|<ns2:orange>|<ns2:melon>'; then
mv "$f" /path/to/directory/with/xml/with_tags/
else
mv "$f" /path/to/directory/with/xml/without_tags/
fi
done
Run this command as a dry run, then remove --dry_run to actually rename the files:
grep -Pl '(<ns2:apple>|<ns2:orange>|<ns2:melon>)' *.xml | xargs rename --dry-run 's/^/1_/'
The command-line utility rename comes in many flavors. Most of them should work for this task. I used the rename version 1.601 by Aristotle Pagaltzis. To install rename, simply download its Perl script and place into $PATH. Or install rename using conda, like so:
conda install rename
Here, grep uses the following options:
-P : Use Perl regexes.
-l : Suppress normal output; instead print the name of each input file from which output would normally have been printed.
SEE ALSO:
grep manual

Gzip no such file or directory error, still zips files

I'm just learning shell scripting specifically in bash, I want to be able to use gzip to take files from a target directory and send them to a different directory. I enter directories in the command line. ext is for the extensions I want to zip and file will be the new zipped file. My script zips the files correctly, to and from the desired directories, but I get a no such file or directory error. How do I avoid this?
Current code
cd $1
for ext in $*; do
for file in `ls *.$ext`; do
gzip -c $file > $2/$file.gz
done
done
and my I/O
blackton#ltsp-amd64-charlie:~/Desktop/60256$ bash myCompress /home/blackton/Desktop/ /home/blackton/ txt
ls: cannot access *./home/blackton/Desktop/: No such file or directory
ls: cannot access *./home/blackton/: No such file or directory
gzip: alg: No such file or directory
gzip: proj.txt: No such file or directory
There are two separate things causing problems here.
In your outer loop
for ext in $*; do
done
you are looping over all the command line parameters, using each as the extension to search for - including the directory names.
Since the extension is the third parameter, you only want to run the inner loop once on $3:
for file in `ls *.$3`; do
gzip -c $file > $2/$file.gz
done
The next problem is spaces.
You do not want to run ls here - the wildcard expansion will provide the filenames directly, e.g. for file in *.$3, and it will fill $file with a whole filename at a time. The output from ls is split on each space, so you end up with two filenames alg and proj.txt, instead of one alg proj.txt.
That is not enough by itself, though. You also need to quote $file whenever you use it, so the command expands to gzip -c "alg proj.txt" instead of gzip -c alg proj.txt, which tells gzip to compress two files. In general, all variable expansions that you expect to be a filename should be quoted:
cd "$1"
for file in *."$3"; do
gzip -c "$file" > "$2/$file.gz"
done
One further problem is that if there are no files matching the extension, the wildcard will not expand and the command executed will be
gzip -c "*.txt" > "dir/*.txt.gz"
This will create a file that is literally called "*.txt.gz" in the target directory. A simple way to avoid this would be to check that the original file exists first - this will also avoid accidentally trying to gzip an oddly named directory.
cd "$1"
for file in *."$3"; do
if [ -f "$file" ]; then
gzip -c "$file" > "$2/$file.gz"
fi
done
you can try this;
#!/bin/bash
Src=$1
Des=$2
ext="txt"
for file in $Src/*; do
if [ "${file##*.}" = "${ext}" ]; then
base=$(basename $file)
mkdir -p $2 #-p ensures creation if directory does not exist
gzip -c $file > $Des/$base.gz
fi
done

Linux bash script to copy files by list

I'm new in bash and I need a help please. I have a file call list.txt containing pattern like
1210
1415
1817
What I want to do is to write a bash script which will copy all file in my current directory which name contain that pattern towards a new directory called toto.
Example of my file in the current directory :
1210_ammm.txt
1415_xdffmslk.txt
1817_lsmqlkksk.txt
201247_kksjdjdjd.txt
The goal is to copy 1210_ammm.txt, 1415_xdffmslk.txt, 1817_lsmqlkksk.txt to toto.
Transferred from an 'answer'.
My list.txt and toto directory are in my current directory. That is what I try
#!/bin/bash
while read p; do # read my list file
for i in `find -name $p -type f` # find all file match the pattern
do
cp $i toto # copy all files find into toto
done
done < partB.txt
I don't have an error but it doesn't do the job.
Here is what you need to implement :
read tokens from an input file
for each token
search the files whose name contain said token
for each file found
copy it to toto
To read tokens from the input file, you can use a read command in a while loop (and the Bash FAQ generally, and Bash FAQ 24 specifically.
To search files whose name contain a string, you can use a for loop and globbing. For example, for file in ./*test*; do echo $file; done will print the name of the files in the current directory which contain test.
To copy a file, use cp.
You can check this ideone sample for a working implementation.
Use below script:
cp "$(ls | grep -f list.txt)" toto
ls | grep -f list.txt will grep for the pattern found in list.txt in the ls output.
cp copies the matched files to toto directory.
NOTE: If list.txt and toto are not in current directory, provide absolute paths in the script.
I needed this too, I tried #Zaziln's answer, but it gave me errors. I just found a better answer. I think others will be interested too.
mapfile -t files < test1.txt
cp -- "${files[#]}" Folder/
I found it on this post --> https://unix.stackexchange.com/questions/106219/copy-files-from-a-list-to-a-folder#106231

Rename files in shell

I've folder and file structure like
Folder/1/fileNameOne.ext
Folder/2/fileNameTwo.ext
Folder/3/fileNameThree.ext
...
How can I rename the files such that the output becomes
Folder/1_fileNameOne.ext
Folder/2_fileNameTwo.ext
Folder/3_fileNameThree.ext
...
How can this be achieved in linux shell?
How many different ways do you want to do it?
If the names contain no spaces or newlines or other problematic characters, and the intermediate directories are always single digits, and if you have the list of the files to be renamed in a file file.list with one name per line, then one of many possible ways to do the renaming is:
sed 's%\(.*\)/\([0-9]\)/\(.*\)%mv \1/\2/\3 \1/\2_\3%' file.list | sh -x
You'd avoid running the command through the shell until you're sure it will do what you want; just look at the generated script until its right.
There is also a command called rename — unfortunately, there are several implementations, not all equally powerful. If you've got the one based on Perl (using a Perl regex to map the old name to the new name) you'd be able to use:
rename 's%/(\d)/%/${1}_%' $(< file.list)
Use a loop as follows:
while IFS= read -d $'\0' -r line
do
mv "$line" "${line%/*}_${line##*/}"
done < <(find Folder -type f -print0)
This method handle spaces, newlines and other special characters in the file names and the intermediate directories don't necessarily have to be single digits.
This may work if the name is always the same, ie "file":
for i in {1..3};
do
mv $i/file ${i}_file
done
If you have more dirs on a number range, change {1..3} for {x..y}.
I use ${i}_file instead of $i_file because it would consider $i_file a variable of name i_file, while we just want i to be the variable and file and text attached to it.
This solution from AskUbuntu worked for me.
Here is a bash script that does that:
Note: This script does not work if any of the file names contain spaces.
#! /bin/bash
# Only go through the directories in the current directory.
for dir in $(find ./ -type d)
do
# Remove the first two characters.
# Initially, $dir = "./directory_name".
# After this step, $dir = "directory_name".
dir="${dir:2}"
# Skip if $dir is empty. Only happens when $dir = "./" initially.
if [ ! $dir ]
then
continue
fi
# Go through all the files in the directory.
for file in $(ls -d $dir/*)
do
# Replace / with _
# For example, if $file = "dir/filename", then $new_file = "dir_filename"
# where $dir = dir
new_file="${file/\//_}"
# Move the file.
mv $file $new_file
done
# Remove the directory.
rm -rf $dir
done
Copy-paste the script in a file.
Make it executable using
chmod +x file_name
Move the script to the destination directory. In your case this should be inside Folder/.
Run the script using ./file_name.

Backup script: How to keep the last N entries?

For a backup script, I need to clean old backups. How can I keep the last N backups and delete the rest?
A backup is either a single folder or a single file and the script will either keep all backups in folders or files (no mixing).
If possible, I'd like to avoid parsing of the output of ls. Even though all the entries in the backup folder should have been created by the backup script and there should be no funny characters in the entry names, a hacker might be able to create new entries in there.
This should do it (untested!):
#!/usr/bin/env bash
set -o errexit -o noclobber -o nounset -o pipefail
i=0
max=7 # Could be anything you want
while IFS= read -r -d '' -u 9
do
let ++i
if [ "$i" -gt "$max" ]
then
rm -- "$REPLY"
fi
done 9< <(find /var/backup -type f -maxdepth 1 -regex '.*/[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]\.tar\.gz' -print0 | sort -rz)
Explained from the outside in:
Ensure that the script stops at any common errors.
Find all files in /var/backup (and not subdirectories) matching a YYYY-MM-DD.tar.gz format.
Reverse sort these, so the latest are listed first.
Send these to file descriptor 9. This avoids any problems with cat, ssh or other programs which read standard input by default.
Read files one by one from FD 9, separated by NUL.
Count files until you get past your given max.
Nuke the rest from orbit.

Resources