bash script to tar specific files in a directory - bash

I need to bash script to tar half of the files in a directory. The files are .gz files with the naming convention x.gz where x is a number starting from 1 and ends with 100, I need to tar the first half of the files. How do I do this?

Your question is a little unclear. I assume you have x.gz and you want to add 1.gz to 50.gz into a tar file. If that is the case:
tar cjf MyArchive.tar.bz2 {1..50}.gz
The above command will put the first 50 .gz files into an archive named MyArchive.tar.bz2

I understand that you have an arbitrary number of files named x.gz in the current directory. You want to tar half of them.
But as you see from the answers, your description is not detailed enough.
I tried to provide the most flexible.
files=`find . -maxdepth 1 -mindepth 1 -type f -printf '%f\n'|grep -P '^\d+.gz$'|sort -n`
n=`echo $files|sed "s/ /\n/g"|wc -l`
half=$(( $n / 2 ))
d=`echo $files|sed "s/ /\n/g"|head -$half`
tar czf archive.tar.gz $d

Here's one way to do this (using KSH which should be available anywhere BASH is)
save below script in x.sh and chmod +x x.sh; then run it
#!/bin/ksh
#
#
## first create 100 dummy GZ files
##
x=0
while :
do
x=$((${x}+1));
if [ x -gt 100 ]; then
break;
fi
touch ${x}.gz;
done
## next parse the list, sort it and loop tar it stopping at 50.gz
##
for x in `ls *.gz | sed 's/\.gz//g' | sort -n`
do
if [ $x -gt 50 ]; then
exit 0;
fi
tar -rvf all-50.tar ${x}.gz
done

Related

bash move 500 directories at a time to subdirectory from a total of 160,000 directories

I needed to move a large s3 bucket to a local file store for a variety of reasons, and the files were stored as 160,000 directories with subdirectories.
As this is just far too many folders to look at with something like a gui FTP interface, I'd like to move the 160,000 root directories into, say, 320 directories - 500 directories in each.
I'm a newbie at bash scripting, and I just wrote this up, but I'm scared I'm going to mangle the whole thing and have to redo the transfer. I tested with [[ "$i" -ge 3 ]]; and some directories with subdirectories and it looked like it worked okay, but I'm quite nervous. Do not want to retransfer all this data.
i=0;
j=0;
for file in *; do
if [[ -d "$file" && ! -L "$file" ]];
then
((i++))
echo "directory $file is being written to assets_$j";
mv $file ./assets_$j/;
if [[ "$i" -ge 499 ]];
then
((j++));
((i=0));
fi
fi;
done
Thanks for the help!
find all the directories in the current folder.
Read a count of the folders.
Exec mv for each chunk
find . -mindepth 1 -maxdepth 1 -type d |
while IFS= readarray -n10 -t files && ((${#files[#]})); do
dest="./assets_$((j++))/"
echo mkdir -v -p "$dest"
echo mv -v "${files[#]}" "$dest";
done
On the condition that assets_1, assets_2, etc. do not exist in the working directory yet:
dirs=(./*/)
for (( i=0,j=1; i<${#dirs[#]}; i+=500,j++ )); do
echo mkdir ./assets_$j/
echo mv "${dirs[#]:i:500}" ./assets_$j/
done
If you're happy with the output, remove echos.
A possible way, but you have no control on the counter, is:
find . -type d -mindepth 1 -maxdepth 1 -print0 \
| xargs -0 -n 500 sh -c 'echo mkdir -v ./assets_$$ && echo mv -v "$#" ./assets_$$' _
This gets the counter of assets from the PID which only recycles when the wrap-around is reached (Linux PID recycling)
The order which findreturns is slight different then the glob * (Find command default sorting order)
If you want to have the sort order alphabetically, you can add a simple sort:
find . -type d -mindepth 1 -maxdepth 1 -print0 | sort -z \
| xargs -0 -n 500 sh -c 'echo mkdir -v ./assets_$$ && echo mv -v "$#" ./assets_$$' _
note: remove the echo if you are pleased with the output

Check if files exists in 3 different directories and move them one to another

I'm quite new in creating shell scripts.
I'm developing a shell script that will backup my files once a day only.
I need to check which *.war files are in three different folders (input folder, production folder, backup folder)
If the same files exists in the three directories, don't perform backup.
If it doesn't, it must move the files in folder 2 to folder 3.
This is what I've done so far.
===============================
TODAY=$(date +%d-%m-%Y)
INPUT=/home/bruno.ogasawara/entrada/
BACKUP=/home/bruno.ogasawara/backup/
PROD=/home/bruno.ogasawara/producao/
DIR1=$(ls $INPUT)
DIR2=$(ls $PROD)
DIR3=$(ls $BACKUP$TODAY)
for i in $DIR1; do
for j in $DIR2; do
for k in $DIR3; do
if [ $i == $j ] && [ $j == $k ]; then
exit 1; else
mv -f $PROD$j $BACKUP$TODAY
fi
done
done
done
mv -f $INPUT*.war $PROD
===============================
The verification is not working. Only thing working is the mv -f $INPUT*.war $PROD in the end.
Where am I missing something or doing something wrong?
Thanks in advance people.
What I understand is you want to sync those three folders.
In that case you should not modify the file names as we are using file names to compare them.Otherwise you should use md5 or sha checksums.But linux filesystem already has timestamps feature you don't have to attach date to filename.
In your code you used ls to list files ...but actually ls command lists files in column mode which is not comaptible with for loop in bash.
correct command is
find $DIR -maxdepth 1 -type f -exec basename {} \;
you want to sync the *.war files to all folders...then simply you can use this:
#!/bin/bash
DIR1=/home/bruno.ogasawara/entrada/
DIR2=/home/bruno.ogasawara/backup/
DIR3=/home/bruno.ogasawara/producao/
cp -n $DIR1/*.war $DIR2
cp -n $DIR1/*.war $DIR3
cp -n $DIR2/*.war $DIR1
cp -n $DIR2/*.war $DIR3
cp -n $DIR3/*.war $DIR1
cp -n $DIR3/*.war $DIR2
-n: will check if file already exists.it will not overwrite the existing file.

Split a folder into multiple subfolders in terminal/bash script

I have several folders, each with between 15,000 and 40,000 photos. I want each of these to be split into sub folders - each with 2,000 files in them.
What is a quick way to do this that will create each folder I need on the go and move all the files?
Currently I can only find how to move the first x items in a folder into a pre-existing directory. In order to use this on a folder with 20,000 items... I would need to create 10 folders manually, and run the command 10 times.
ls -1 | sort -n | head -2000| xargs -i mv "{}" /folder/
I tried putting it in a for-loop, but am having trouble getting it to make folders properly with mkdir. Even after I get around that, I need the program to only create folders for every 20th file (start of a new group). It wants to make a new folder for each file.
So... how can I easily move a large number of files into folders of an arbitrary number of files in each one?
Any help would be very... well... helpful!
Try something like this:
for i in `seq 1 20`; do mkdir -p "folder$i"; find . -type f -maxdepth 1 | head -n 2000 | xargs -i mv "{}" "folder$i"; done
Full script version:
#!/bin/bash
dir_size=2000
dir_name="folder"
n=$((`find . -maxdepth 1 -type f | wc -l`/$dir_size+1))
for i in `seq 1 $n`;
do
mkdir -p "$dir_name$i";
find . -maxdepth 1 -type f | head -n $dir_size | xargs -i mv "{}" "$dir_name$i"
done
For dummies:
create a new file: vim split_files.sh
update the dir_size and dir_name values to match your desires
note that the dir_name will have a number appended
navigate into the desired folder: cd my_folder
run the script: sh ../split_files.sh
This solution worked for me on MacOS:
i=0; for f in *; do d=dir_$(printf %03d $((i/100+1))); mkdir -p $d; mv "$f" $d; let i++; done
It creates subfolders of 100 elements each.
This solution can handle names with whitespace and wildcards and can be easily extended to support less straightforward tree structures. It will look for files in all direct subdirectories of the working directory and sort them into new subdirectories of those. New directories will be named 0, 1, etc.:
#!/bin/bash
maxfilesperdir=20
# loop through all top level directories:
while IFS= read -r -d $'\0' topleveldir
do
# enter top level subdirectory:
cd "$topleveldir"
declare -i filecount=0 # number of moved files per dir
declare -i dircount=0 # number of subdirs created per top level dir
# loop through all files in that directory and below
while IFS= read -r -d $'\0' filename
do
# whenever file counter is 0, make a new dir:
if [ "$filecount" -eq 0 ]
then
mkdir "$dircount"
fi
# move the file into the current dir:
mv "$filename" "${dircount}/"
filecount+=1
# whenever our file counter reaches its maximum, reset it, and
# increase dir counter:
if [ "$filecount" -ge "$maxfilesperdir" ]
then
dircount+=1
filecount=0
fi
done < <(find -type f -print0)
# go back to top level:
cd ..
done < <(find -mindepth 1 -maxdepth 1 -type d -print0)
The find -print0/read combination with process substitution has been stolen from another question.
It should be noted that simple globbing can handle all kinds of strange directory and file names as well. It is however not easily extensible for multiple levels of directories.
The code below assumes that the filenames do not contain linefeeds, spaces, tabs, single quotes, double quotes, or backslashes, and that filenames do not start with a dash. It also assumes that IFS has not been changed, because it uses while read instead of while IFS= read, and because variables are not quoted. Add setopt shwordsplit in Zsh.
i=1;while read l;do mkdir $i;mv $l $((i++));done< <(ls|xargs -n2000)
The code below assumes that filenames do not contain linefeeds and that they do not start with a dash. -n2000 takes 2000 arguments at a time and {#} is the sequence number of the job. Replace {#} with '{=$_=sprintf("%04d",$job->seq())=}' to pad numbers to four digits.
ls|parallel -n2000 mkdir {#}\;mv {} {#}
The command below assumes that filenames do not contain linefeeds. It uses the implementation of rename by Aristotle Pagaltzis which is the rename formula in Homebrew, where -p is needed to create directories, where --stdin is needed to get paths from STDIN, and where $N is the number of the file. In other implementations you can use $. or ++$::i instead of $N.
ls|rename --stdin -p 's,^,1+int(($N-1)/2000)."/",e'
I would go with something like this:
#!/bin/bash
# outnum generates the name of the output directory
outnum=1
# n is the number of files we have moved
n=0
# Go through all JPG files in the current directory
for f in *.jpg; do
# Create new output directory if first of new batch of 2000
if [ $n -eq 0 ]; then
outdir=folder$outnum
mkdir $outdir
((outnum++))
fi
# Move the file to the new subdirectory
mv "$f" "$outdir"
# Count how many we have moved to there
((n++))
# Start a new output directory if we have sent 2000
[ $n -eq 2000 ] && n=0
done
The answer above is very useful, but there is a very import point in Mac(10.13.6) terminal. Because xargs "-i" argument is not available, I have change the command from above to below.
ls -1 | sort -n | head -2000| xargs -I '{}' mv {} /folder/
Then, I use the below shell script(reference tmp's answer)
#!/bin/bash
dir_size=500
dir_name="folder"
n=$((`find . -maxdepth 1 -type f | wc -l`/$dir_size+1))
for i in `seq 1 $n`;
do
mkdir -p "$dir_name$i";
find . -maxdepth 1 -type f | head -n $dir_size | xargs -I '{}' mv {} "$dir_name$i"
done
This is a tweak of Mark Setchell's
Usage:
bash splitfiles.bash $PWD/directoryoffiles splitsize
It doesn't require the script to be located in the same dir as the files for splitting, it will operate on all files, not just the .jpg and allows you to specify the split size as an argument.
#!/bin/bash
# outnum generates the name of the output directory
outnum=1
# n is the number of files we have moved
n=0
if [ "$#" -ne 2 ]; then
echo Wrong number of args
echo Usage: bash splitfiles.bash $PWD/directoryoffiles splitsize
exit 1
fi
# Go through all files in the specified directory
for f in $1/*; do
# Create new output directory if first of new batch
if [ $n -eq 0 ]; then
outdir=$1/$outnum
mkdir $outdir
((outnum++))
fi
# Move the file to the new subdirectory
mv "$f" "$outdir"
# Count how many we have moved to there
((n++))
# Start a new output directory if current new dir is full
[ $n -eq $2 ] && n=0
done
Can be directly run in the terminal
i=0;
for f in *;
do
d=picture_$(printf %03d $((i/2000+1)));
mkdir -p $d;
mv "$f" $d;
let i++;
done
This script will move all files within the current directory into picture_001, picture_002... and so on. Each newly created folder will contain 2000 files
2000 is the chunked number
%03d is the suffix digit you can adjust (currently 001,002,003)
picture_ is the folder prefix
This script will chunk all files into its directory (create subdirectory)
You'll certainly have to write a script for that.
Hints of things to include in your script:
First count the number of files within your source directory
NBFiles=$(find . -type f -name *.jpg | wc -l)
Divide this count by 2000 and add 1, to determine number of directories to create
NBDIR=$(( $NBFILES / 2000 + 1 ))
Finally loop through your files and move them accross the subdirs.
You'll have to use two imbricated loops : one to pick and create the destination directory, the other to move 2000 files in this subdir, then create next subdir and move the next 2000 files to the new one, etc...

Please help. I need to add a log file to this multi-threaded rsync script

Define source, target, maxdepth and cd to source
source="/media"
target="/tmp"
depth=20
cd "${source}"
Set the maximum number of concurrent rsync threads
maxthreads=5
How long to wait before checking the number of rsync threads again
sleeptime=5
Find all folders in the source directory within the maxdepth level
find . -maxdepth ${depth} -type d | while read dir
do
Make sure to ignore the parent folder
if [ `echo "${dir}" | awk -F'/' '{print NF}'` -gt ${depth} ]
then
Strip leading dot slash
subfolder=$(echo "${dir}" | sed 's#^\./##g')
if [ ! -d "${target}/${subfolder}" ]
then
Create destination folder and set ownership and permissions to match source
mkdir -p "${target}/${subfolder}"
chown --reference="${source}/${subfolder}" "${target}/${subfolder}"
chmod --reference="${source}/${subfolder}" "${target}/${subfolder}"
fi
Make sure the number of rsync threads running is below the threshold
while [ `ps -ef | grep -c [r]sync` -gt ${maxthreads} ]
do
echo "Sleeping ${sleeptime} seconds"
sleep ${sleeptime}
done
Run rsync in background for the current subfolder and move one to the next one
nohup rsync -au "${source}/${subfolder}/" "${target}/${subfolder}/"
</dev/null >/dev/null 2>&1 &
fi
done
Find all files above the maxdepth level and rsync them as well
find . -maxdepth ${depth} -type f -print0 | rsync -au --files-from=- --from0 ./ "${target}/"
Thank you for all your help. By adding the -v switch to rsync, I solved the problem.
Not sure if this is what you are after (I don't know what rsync is), but can you not just run the script as,
./myscript > logfile.log
or
./myscript | tee logfile.log
(ie: pipe to tee if you want to see the output as it goes along)?
Alternatively... not sure this is what real coders do, but you could append the output of each command in the script to a logfile, eg:
#at the beginning define the logfile name:
logname="logfile"
#remove the file if it exists
if [ -a ${logfile}.log ]; then rm -i ${logfile}.log; fi
#for each command that you want to capture the output of, use >> $logfile
#eg:
mkdir -p "${target}/${subfolder}" >> ${logfile}.log
If rsync has several threads with names, I imagine you could store to separate logfiles as >> ${logfile}${thread}.log and concatenate the files at the end into 1 logfile.
Hope that is helpful? (am new to answering things - so I apologise if what I post is basic/bad, or if you already considered these ideas!)

How to move folders of certain size in command line?

In my external HDD I have two partitions, one is for Mac and the other for Windows (FAT32). Since my Mac partition is almost full due to Time Machine backup, I want to move some of my old folders (in which are movies) from the Mac partition to the Windows partition. However, the FAT32 file system only allows each file less than 4GB. But my some of the folders contain files larger than 4G. I don't want to manually go through each folder , check the size and then copy & paste the folders of small size.
So my question is:
What is the command for moving all the folders (including the sub-directories) less than 4GB to the new partition? Does it have anything to do with the options of mv command?
Thanks
--- Update 12/7/2014---
I ran find . -mindepth 1 -type d -exec bash -c 'f="$1";set $(du -bs "$f"); \ [[ $1 -lt 4294967296 ]] && echo mv "$f" /dest-dir' - '{}' \; >> output.txt.
The following was the first a few lines of my output:
Apple_PubSub_Socket_Render=/private/tmp/com.apple.launchd.stZBByQJc0/Render
BASH=/bin/bash
BASH_ARGC=([0]="1")
BASH_ARGV=([0]=".")
BASH_EXECUTION_STRING=$'f="$1";set $(du -bs "$f"); \\\n [[ $1 -lt 4294967296 ]] && echo mv "$f" /Volumes/WIN_PANC/movies/'
BASH_LINENO=()
BASH_SOURCE=()
BASH_VERSINFO=([0]="3" [1]="2" [2]="53" [3]="1" [4]="release" [5]="x86_64-apple-darwin14")
BASH_VERSION='3.2.53(1)-release'
CLICOLOR=1
COLORFGBG='15;0'
They are not the folders I want to move. Am I doing right?
You can use this find command to list directories that have files greater than 4GB:
find . -mindepth 1 -type d -exec bash -c 'f="$1"; read s _ < <(du -s "$f"); \
[[ $s -lt 4194304 ]] && echo mv "$f" /dest-dir' - '{}' \;
Remove echo before mv command once you're satisfied with the listing.
Using the following codes can do this for you (for files >4G
#! /bin/bash
my_files=`ls --almost-all -1v -s -A --block-size=G|sort|sed -e 's#^[0-4]*G##g' -e '$ s#.*##g'`
echo "$my_files" >> my_files.txt
while read -r file; do
echo "MOVING FILE : $file"
mv "$file" "destination_location"
sleep 0.5
done < my_files.txt
rm -rf my_files.txt
Note: change your directory to where all your files to be copied are present in a terminal, then you can run script from the same terminal. Ensure you replace "destination_location" with the directory you want to move the file to inside the codes. Afterwards execute script.
Note: You will have to change your directory and run the codes in each directory.

Resources