ksh : Need to delete multiple directories quickly and reliably

ksh : Need to delete multiple directories quickly and reliably - fork

I have many directories and need to delete them periodically with minimum time.
Additionally for each directories delete status need to know i.e whether deleted successfully or not.
I need to write on the ksh . Could you please help me out.
The sample code which I am using, in which I tried launching rm-rf in the background, is not working.
for var1 in 1...10
rm -rf <DIR> &
pid[var1]=$!
done
my_var=1
for my_var in 1...var1
wait $pid[my_var]
if [ $? -eq 1 ]
then
echo falied
else
echo passed
fi
done

You are out of luck. The bottleneck here is the filesystem, and you are very unlikely to find a filesystem that performs atomic operations (like directory deletion) in parallel. No amount of fiddling with shell code is going to make the OS or the filesystem do its job faster. There is only one disk, and if every deletion requires a write to disk, it is going to be slow.
Your best bet is to switch to a journaling filesystem that does deletions quickly. I have had good luck with XFS deleting large files (10-40GB) quickly, but I have not tried deleting directories. In any case, your path to improved performance lies in finding the right filesystem, not the right shell script.

This is generally the form that your script should take, with corrections for the serious errors in syntax. However, as Norman noted, it's not going to do what you want. Also, wait isn't going to work in a loop like you seem to intend.
# this script still won't work
find . -type d | while read dir
# or: for dir in ${dirs[#]}
do
rm -rf $dir &
pid[++var1]=$!
done
for my_var in {1...$var1}
do
wait ${pid[my_var]}
if [ $? -eq 1 ]
then
echo failed
else
echo passed
fi
done

Related

bash is zipping entire home

I am trying to back up a all world* folders from /home/mc/server/ and drop the zipped in /home/mc/backup/
#!/bin/bash
moment=$(date +"%Y%m%d%H%M")
backup="/home/mc/backup/map$moment.zip"
map="/home/mc/server/world*"
zipping="zip -r -9 $backup $map"
eval $zipping
The zipped file is created in backup folder as expected, but when I unzipped it contants the entire /home dir. I am running this bash in two ways:
Manually
Using user's crontab
Finally, If I put an echo of echo $zipping this prints correctly the command that I need to trigger. What am I missing? Thank you in advance.

There's no reason to use eval here (and no, justifying it on DRY grounds if you want to both log a command line and subsequently execute it does not count as a good reason IMO.)
Define a function and call it with the appropriate arguments:
#!/bin/bash
moment=$(date +"%Y%m%d%H%M")
zipping () {
output=$1
shift
zip -r -9 "$output" "$#"
}
zipping "/home/mc/backup/map$moment.zip" /home/mc/server/world*
(I'll admit, I don't know what is causing the behavior you report, but it would be better to confirm it is not somehow specific to the use of eval before trying to diagnose it further.)

Is it possible to write a cron/ SHELL script that runs only when there is a change in the folder size?

Is it possible to write a cron/script that runs only when there is a change in the folder size .. i.e. the files inside a folders get changed or a new file gets created and hence the folder size would change and the cron or the script would run

There is no support for such a monitor event in standard cron: cron is strictly time-based.
Assuming that cron is used, this task would need to be handled in a "woken up" job, which could then choose to sleep/end immediately or do something else depending on comparing the folder with a previous-known state ..
Now, if cron is removed from the role of being the launch/monitor platform, then there are "non polling" ways to monitor a filesystem such as inotify.
If just looking for a system daemon to supplement standard cron for this task, see the following alternatives.
incron:
incron is an "inotify cron" system. It works like the regular cron but is driven by filesystem events instead of time periods. It contains two programs, a daemon called "incrond" (analogous to crond) and a table manipulator "incrontab" (like "crontab").
Watcher:
Watcher is a daemon that watches specified files/folders for changes and fires commands in response to those changes. It is similar to incron, however, configuration uses a simpler to read ini file instead of a plain text file. Unlike incron it can also recursively monitor directories. It's also written in Python, making it easier to hack.

Your many solution :
-Using inotifywait, as an example:
inotifywait -m /path 2>&- | awk '$2 == "CREATE" { print $3; fflush() }' |
while read file; do
echo "$file"
# do something with the file
done
In Ubuntu inotifywait is provided by the inotify-tools package.
-Using incron
You can see a full example here: http://www.cyberciti.biz/faq/linux-inotify-examples-to-replicate-directories/
-Simple
ls -1A isempty | wc -l

My idiom is usually:
dir=/dir/to/watch
if [ $dir -nt $dir.flag ]; then
touch -r $dir $dir.flag
do_work
fi
This however test against modification time, not size. Size of a directory is not a very useful concept, as it only changes infrequently.
$dir.flag cannot be created in $dir by the way, as this makes $dir change after $dir.flag, so you need to store $dir.flag somewhere where you have write permission.

Slow load time of bash in cygwin

At the moment bash takes about 2 seconds to load. I have ran bash with -x flag and I am seeing the output and it seems as though PATH is being loaded many times in cygwin. The funny thing is I use the same file in linux environment, but it works fine, without the reload problem. Could the following cause the problem?
if [ `uname -o` = "Cygwin" ]; then
....
fi

As you've noted in your answer, the problem is Cygwin's bash-completion package. The quick and easy fix is to disable bash-completion, and the correct way to do that is to run Cygwin's setup.exe (download it again if you need to) and select to uninstall that package.
The longer solution is to work through the files in /etc/bash_completion.d and disable the ones you don't need. On my system, the biggest culprits for slowing down Bash's load time (mailman, shadow, dsniff and e2fsprogs) all did exactly nothing, since the tools they were created to complete weren't installed.
If you rename a file in /etc/bash_completion.d to have a .bak extension, it'll stop that script being loaded. Having disabled all but a select 37 scripts on one of my systems in that manner, I've cut the average time for bash_completion to load by 95% (6.5 seconds to 0.3 seconds).

In my case that was windows domain controller.
I did this to find the issue:
I started with a simple, windows cmd.exe and the, typed this:
c:\cygwin\bin\strace.exe c:\cygwin\bin\bash
In my case, I noticed a following sequence:
218 12134 [main] bash 11304 transport_layer_pipes::connect: Try to connect to named pipe: \\.\pipe\cygwin-c5e39b7a9d22bafb-lpc
45 12179 [main] bash 11304 transport_layer_pipes::connect: Error opening the pipe (2)
39 12218 [main] bash 11304 client_request::make_request: cygserver un-available
1404719 1416937 [main] bash 11304 pwdgrp::fetch_account_from_windows: line: <CENSORED_GROUP_ID_#1>
495 1417432 [main] bash 11304 pwdgrp::fetch_account_from_windows: line: <CENSORED_GROUP_ID_#2>
380 1417812 [main] bash 11304 pwdgrp::fetch_account_from_windows: line: <CENSORED_GROUP_ID_#3>
etc...
The key thing was identifying the client_request::make_request: cygserver un-available line. You can see, how after that, cygwin tries to fetch every single group from windows, and execution times go crazy.
A quick google revealed what a cygserver is:
https://cygwin.com/cygwin-ug-net/using-cygserver.html
Cygserver is a program which is designed to run as a background
service. It provides Cygwin applications with services which require
security arbitration or which need to persist while no other cygwin
application is running.
The solution was, to run the cygserver-config and then net start cygserver to start the Windows service. Cygwin startup times dropped significantly after that.

All of the answers refer to older versions of bash_completion, and are irrelevant for recent bash_completion.
Modern bash_completion moved most of the completion files to /usr/share/bash-completion/completions by default, check the path on your system by running
# pkg-config --variable=completionsdir bash-completion
/usr/share/bash-completion/completions
There are many files in there, one for each command, but that is not a problem, since they are loaded on demand the first time you use completion with each command. The old /etc/bash_completion.d is still supported for compatibility, and all files from there are loaded when bash_completion starts.
# pkg-config --variable=compatdir bash-completion
/etc/bash_completion.d
Use this script to check if there are any stale files left in the old dir.
#!/bin/sh
COMPLETIONS_DIR="$(pkg-config --variable=completionsdir bash-completion)"
COMPAT_DIR="$(pkg-config --variable=compatdir bash-completion)"
for file in "${COMPLETIONS_DIR}"/*; do
file="${COMPAT_DIR}/${file#${COMPLETIONS_DIR}/}"
[ -f "$file" ] && printf '%s\n' $file
done
It prints the list of files in compat dir that are also present in the newer (on-demand) completions dir. Unless you have specific reasons to keep some of them, review, backup and remove all of those files.
As a result, the compat dir should be mostly empty.
Now, for the most interesting part - checking why bash startup is slow.
If you just run bash, it will start a non-login, interactive shell - this one on Cygwin source /etc/bash.bashrc and then ~/.bashrc. This most likely doesn't include bash completion, unless you source it from one of rc files. If you run bash -l (bash --login), start Cygwin Terminal (depends on your cygwin.bat), or log in via SSH, it will start a login, interactive shell - which will source /etc/profile, ~/.bash_profile, and the aforementioned rc files. The /etc/profile script itself sources all executable .sh files in /etc/profile.d.
You can check how long each file takes to source. Find this code in /etc/profile:
for file in /etc/profile.d/*.$1; do
[ -e "${file}" ] && . "${file}"
done
Back it up, then replace it with this:
for file in /etc/profile.d/*.$1; do
TIMEFORMAT="%3lR ${file}"
[ -e "${file}" ] && time . "${file}"
done
Start bash and you will see how long each file took. Investigate files that take a significant amount of time. In my case, it was bash_completion.sh and fzf.sh (fzf is fuzzy finder, a really nice complement to bash_completion). Now the choice is to disable it or investigate further. Since I wanted to keep using fzf shortcuts in bash, I investigated, found the source of the slowdown, optimized it, and submitted my patch to fzf's repo (hopefully it will be accepted).
Now for the biggest time spender - bash_completion.sh. Basically that script sources /usr/share/bash-completion/bash_completion. I backed up that file, then edited it. On the last page there is for loop that sources all the files in compat dir - /etc/bash_completion.d. Again, I added TIMEFORMAT and time, and saw which script was causing the slow starting. It was zzz-fzf (fzf package). I investigated and found a subshell ($()) being executed multiple times in a for loop, rewrote that part without using a subshell, making the script work quickly. I already submitted my patch to fzf's repo.
The biggest reason for all these slowdowns is: fork is not supported by Windows process model, Cygwin did a great job emulating it, but it's painfully slow compared to a real UNIX. A subshell or a pipeline that does very little work by itself spends most of it's execution time for fork-ing. E.g. compare the execution times of time echo msg (0.000s on my Cygwin) vs time echo $(echo msg) (0.042s on my Cygwin) - day and night. The echo command itself takes no appreciable time to execute, but creating a subshell is very expensive. On my Linux system, these commands take 0.000s and 0.001s respectively. Many packages Cygwin has are developed by people who use Linux or other UNIX, and can run on Cygwin unmodified. So naturally these devs feel free to use subshells, pipelines and other features wherever convenient, since they don't feel any significant performance hit on their system, but on Cygwin those shell scripts might run tens and hundreds of times slower.
Bottom line, if a shell script works slowly in Cygwin - try to locate the source of fork calls and rewrite the script to eliminate them as much as possible.
E.g. cmd="$(printf "$1" "$2")" (uses one fork for subshell) can be replaced with printf -v cmd "$1" "$2".
Boy, it came out really long. Any people still reading up to here are real heros. Thanks :)

I know this is an old thread, but after a fresh install of Cygwin this week I'm still having this problem.
Instead of handpicking all of the bash_completion files, I used this line to implement #me_and's approach for anything that isn't installed on my machine. This significantly reduced the startup time of bash for me.
In /etc/bash_completion.d, execute the following:
for i in $(ls|grep -v /); do type $i >/dev/null 2>&1 || mv $i $i.bak; done

New answer for an old thread, relating to the PATH of the original question.
Most of the other answers deal with the bash startup. If you're seeing a slow load time when you run bash -i within the shell, those may apply.
In my case, bash -i ran fast, but anytime I opened a new shell (be it in a terminal or in xterm), it took a really long time. If bash -l is taking a long time, it means it's the login time.
There are some approaches at the Cygwin FAQ at https://cygwin.com/faq/faq.html#faq.using.startup-slow but they didn't work for me.
The original poster asked about the PATH, which he diagnosed using bash -x. I too found that although bash -i was fast, bash -xl was slow and showed a lot of information about the PATH.
There was such a ridiculously long Windows PATH that the login process kept on running programs and searching the entire PATH for the right program.
My solution: Edit the Windows PATH to remove anything superfluous. I'm not sure which piece I removed that did the trick, but the login shell startup went from 6 seconds to under 1 second.
YMMV.

My answer is the same as npe's above. But, since I just joined, I cannot comment or even upvote it! I hope this post doesn't get deleted because it offer reassurance for anyone looking for an answer to the same problem.
npe's solution worked for me. There's only one caveat - I had to close all cygwin processes before I got the best out of it. That includes running cygwin services, like sshd, and the ssh-agent that I start from my login scripts. Before that, the window for the cygwin terminal would appear instantly but hang for several seconds before it presents the prompt. And it hanged for several seconds upon closing the window. After I killed all processes and started the cygserver service (btw I prefer to use the Cygwin way - 'cygrunsrv -S cygserver', than 'net start cygserver'; I don't know if it makes any practical difference) it starts immediately. So thanks to npe again!

I'm on a corporate network with a pretty complicated setup, and it seems that really kills cygwin startup times. Related to npe's answer, I also had to follow some of the steps laid out here: https://cygwin.com/faq/faq.html#faq.using.startup-slow
Another cause for AD client system is slow DC replies, commonly observed in configurations with remote DC access. The Cygwin DLL queries information about every group you're in to populate the local cache on startup. You may speed up this process a little by caching your own information in local files. Run these commands in a Cygwin terminal with write access to /etc:
getent passwd $(id -u) > /etc/passwd
getent group $(id -G) > /etc/group
Also, set /etc/nsswitch.conf as follows:
passwd: files db
group: files db
This will limit the need for Cygwin to contact the AD domain controller (DC) while still allowing for additional information to be retrieved from DC, such as when listing remote directories.
After doing that plus starting the cygserver my cygwin startup time dropped significantly.

As someone mentioned above, one possible issue is the PATH environment variable contains too much path, cygwin will search all of them. I prefer direct edit the /etc/profile, just overwrite the PATH variable to cygwin related path, e.g. PATH="/usr/local/bin:/usr/bin". Add additional path if you want.

I wrote a Bash function named 'minimizecompletion' for inactivating not needed completion scripts.
Completion scripts can add more than one completion specification or have completion specifications for shell buildins, therefore it is not sufficient to compare script names with executable files found in $PATH.
My solution is to remove all loaded completion specifications, to load a completion script and check if it has added new completion specifications. Depending on this it is inactivated by adding .bak to the script file name or it is activated by removing .bak. Doing this for all 182 scripts in /etc/bash_completion.d results in 36 active and 146 inactive completion scripts reducing the Bash start time by 50% (but it should be clear this depends on installed packages).
The function also checks inactivated completion scripts so it can activate them when they are needed for new installed Cygwin packages. All changes can be undone with argument -a that activates all scripts.
# Enable or disable global completion scripts for speeding up Bash start.
#
# Script files in directory '/etc/bash_completion.d' are inactived
# by adding the suffix '.bak' to the file name; they are activated by
# removing the suffix '.bak'. After processing all completion scripts
# are reloaded by calling '/etc/bash_completion'
#
# usage: [-a]
# -a activate all completion scripts
# output: statistic about total number of completion scripts, number of
# activated, and number of inactivated completion scripts; the
# statistic for active and inactive completion scripts can be
# wrong when 'mv' errors occure
# return: 0 all scripts are checked and completion loading was
# successful; this does not mean that every call of 'mv'
# for adding or removing the suffix was successful
# 66 the completion directory or loading script is missing
#
minimizecompletion() {
local arg_activate_all=${1-}
local completion_load=/etc/bash_completion
local completion_dir=/etc/bash_completion.d
(
# Needed for executing completion scripts.
#
local UNAME='Cygwin'
local USERLAND='Cygwin'
shopt -s extglob progcomp
have() {
unset -v have
local PATH="$PATH:/sbin:/usr/sbin:/usr/local/sbin"
type -- "$1" &>/dev/null && have='yes'
}
# Print initial statistic.
#
printf 'Completion scripts status:\n'
printf ' total: 0\n'
printf ' active: 0\n'
printf ' inactive: 0\n'
printf 'Completion scripts changed:\n'
printf ' activated: 0\n'
printf ' inactivated: 0\n'
# Test the effect of execution for every completion script by
# checking the number of completion specifications after execution.
# The completion scripts are renamed depending on the result to
# activate or inactivate them.
#
local completions total=0 active=0 inactive=0 activated=0 inactivated=0
while IFS= read -r -d '' f; do
((++total))
if [[ $arg_activate_all == -a ]]; then
[[ $f == *.bak ]] && mv -- "$f" "${f%.bak}" && ((++activated))
((++active))
else
complete -r
source -- "$f"
completions=$(complete | wc -l)
if (( $completions > 0 )); then
[[ $f == *.bak ]] && mv -- "$f" "${f%.bak}" && ((++activated))
((++active))
else
[[ $f != *.bak ]] && mv -- "$f" "$f.bak" && ((++inactivated))
((++inactive))
fi
fi
# Update statistic.
#
printf '\r\e[6A\e[15C%s' "$total"
printf '\r\e[1B\e[15C%s' "$active"
printf '\r\e[1B\e[15C%s' "$inactive"
printf '\r\e[2B\e[15C%s' "$activated"
printf '\r\e[1B\e[15C%s' "$inactivated"
printf '\r\e[1B'
done < <(find "$completion_dir" -maxdepth 1 -type f -print0)
if [[ $arg_activate_all != -a ]]; then
printf '\nYou can activate all scripts with %s.\n' "'$FUNCNAME -a'"
fi
if ! [[ -f $completion_load && -r $completion_load ]]; then
printf 'Cannot reload completions, missing %s.\n' \
"'$completion_load'" >&2
return 66
fi
)
complete -r
source -- "$completion_load"
}
This is an example output and the resulting times:
$ minimizecompletion -a
Completion scripts status:
total: 182
active: 182
inactive: 0
Completion scripts changed:
activated: 146
inactivated: 0
$ time bash -lic exit
logout
real 0m0.798s
user 0m0.263s
sys 0m0.341s
$ time minimizecompletion
Completion scripts status:
total: 182
active: 36
inactive: 146
Completion scripts changed:
activated: 0
inactivated: 146
You can activate all scripts with 'minimizecompletion -a'.
real 0m17.101s
user 0m1.841s
sys 0m6.260s
$ time bash -lic exit
logout
real 0m0.422s
user 0m0.092s
sys 0m0.154s

How to increment directory number in bash?

I have a watchdog implemented in bash that is restarting a service on certain conditions and it does move the old logs to an old directory.
The problem is that I do want to move the logs to old_1, old_2, ... if previous one exists.
How can I implement this in bash?

You can search for the first non-existing log like this:
#!/bin/bash
num=1
while [[ -f log_$num ]] ; do
let num++
done
echo Fresh new: log_$num

That is a pain to write, handle missing folders (which will break choroba solution for instace). This is why most systems requiring logs are just suffixing their names with dates, I encourage you to do the same, its easier to handle and also easier to retrieve a log afterward.

Protect against accidental deletion

Today I first saw the potential of a partial accidental deletion of a colleague's home directory (2 hours lost in a critical phase of a project).
I was enough worried about it to start thinking of the problem ad a possible solution.
In his case a file named '~' somehow went into a test folder, which he after deleted with rm -rf... when rm arrived to the file bash expanded it to his home folder (he managed to CTRL-C almost in time).
A similar problem could happen if one have a file named '*'.
My first thought was to prevent creation of files with "dangerous names", but that would still not solve the problem as mv or other corner case situations could lead to the risky situation as well.
Second thought was creating a listener (don't know if this is even possible) or an alias of rm that checks what files it processes and if it finds a dangerous one skips sending a message.
Something similar to this:
take all non-parameter arguments (so to get the files one wants to delete)
cycle on these items
check if current item is equal to a dangerous item (say for example '~' or '*'), don't know if this works, at this point is the item already expanded or not?
if so echo a message, don't do anything on the file
proceed with iteration
Third thought: has anyone already done or dealed with this? :]

There's actually pretty good justification for having critical files in your home directory checked into source control. As well as protecting against the situation you've just encountered it's nice being able to version control .bashrc, etc.

Since the shell probably expands the parameter, you can't really catch 'dangerous' names like that.
You could alias 'rm -rf' to 'rm -rfi' (interactive), but that can be pretty tedious if you actually mean 'rm -rf *'.
You could alias 'rm' to 'mv $# $HOME/.thrash', and have a separate command to empty the thrash, but that might cause problems if you really mean to remove the files because of disk quotas or similar.
Or, you could just keep proper backups or use a file system that allows "undeletion".

Accidents do happen. You only can reduce the impact of them.
Both version control (regular checkins) and backups are of vital importance here.
If I can't checkin (because it does not work yet), I backup to an USB stick.
And if the deadline aproaches, the backup frequency increases because Murphy strikes at the most inapropriate moment.

One thing I do is always have a file called "-i" in my $HOME.
My other tip is to always use "./*" or find instead of plain "*".

The version control suggestion gets an upvote from me. I'd recommend that for everything, not just source.
Another thought is a shared drive on a server that's backed up and archived.
A third idea is buying everyone an individual external hard drive that lets them back up their local drive. This is a good thing to do because there are two kinds of hard drives: those that have failed and those that will in the future.

You could also create an alias from rm that runs through a simple script that escapes all characters, effectively stopping you from using wildcards. Then create another alias that runs through real rm without escaping. You would only use the second if you are really sure. Bu then again, that's kinda the point of rm -rf.
Another option I personally like is create an alias that redirects through a script and then passes everything on to rm. If the script finds any dangerous characters, it prompts you Y/N if you want to continue, N cancelling the operation, Y continuing on as normal.

One company where I worked we had a cron job which ran every half an hour which copied all the source code files from everyone's home directory to backup directory structure elsewhere on the system just using find.
This wouldn't prevent actual deletion but it did minimise the work lost on a number of occasions.

This is pretty odd behaviour really - why is bash expanding twice?
Once * has expanded to
old~
this~
~
then no further substitution should happen!
I bravely tested this on my mac, and it just deleted ~, and not my home directory.
Is it possible your colleague somehow wrote code that expanded it twice?
e.g.
ls | xargs | rm -rf

You may disable file name generation (globbing):
set -f
Escaping special chars in file paths could be done with Bash builtins:
filepath='/abc*?~def'
filepath="$(printf "%q" "${filepath}")"
filepath="${filepath//\~/\\~}"
printf "%s\n" "${filepath}"

I use this in my ~/.basrc
alias rm="rm -i"
rm prompts before deleting anything, and the alias can be circumvented either with the -f flag, or by escabing, e.g.
\rm file
Degrades the problem yes; solves it no.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio