Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I'm looking for a shell script idea to work on for practice with shell scripting. Can you please suggest intermediate ideas to work on?
I'm a developer and I prefer working on an idea that deals with files.
For shell scripting, think of a task that you do frequently - and think how you would automate that task.
You can start off with a basic script that just about does what you need. Then you realize that there are small variations on the task, and you start to allow the script to handle those. And it gently becomes more complex.
Almost all of the scripts I have (some hundreds of them) started off as "I've done that before; how can I avoid having to do it again?".
Can you give an example?
No - because I don't know what tasks you do sufficiently often to be (minor) irritants that could be salved by writing a script.
Yes - because I've got scripts that I wrote incrementally, in an attempt to work around some issue or other in my environment.
One task that I'm working on - still a work in progress - is:
Identify duplicate files
Starting at some nominated directory (default, $HOME), find all the files, and for each file, establish a checksum (MD5, SHA1, SHA256 - it is not critical which) for the file; record the file name and checksum (and maybe device number and inode number).
Establish which checksums are repeated - hence identifying identical files.
Eliminate the unique checksums.
Group the duplicate files together with appropriate identifying information.
This much is fairly easy - it requires some medium-grade shell scripting and you might have to find a command to generate the checksum (but you might be OK with sum or cksum, though neither of those reaches even the level of MD5). I've done this in both shell and Perl.
The hard part - where I've not yet gotten a good solution - is then dealing with the duplicates. I have some 8,500 duplicated hashes, with about 27,000 file names in total. Some of the duplicates are images like smileys used in chat transcripts - there are a lot of that particular image. Others are duplicate PDF files collected from various machines at various times; I need to organize them so I have one copy of the file on disk, with perhaps links in the other locations. But some of the other locations should go - they were convenient ways to get the material from retired machines onto my current machine.
I have not yet got a good solution to the second part.
Here are two scripts from my personal library. They are simple enough not to require a full blown programming language, but aren't trivial, particularly if you aim to get all the details right (support all flags, return same exit code, etc.).
cvsadd
Write a script to perform a recursive cvs add so you don't have to manually add each sub-directory and its files. Make it so it detects the file types and adds the -kb flag for binary files as needed.
For bonus points: Allow the user to optionally specify a list of directories or files to restrict the search to. Handle file names with spaces correctly. If you can't figure out if a file is text or binary, ask the user.
#!/bin/bash
#
# Usage: cvsadd [FILE]...
#
# Mass `cvs add' script. Adds files and directories recursively, automatically
# figuring out if they are text or binary. If no file names are specified, looks
# for unversioned files and directories in the current directory.
svnfind
Write a wrapper around find which performs the same job, recursively finding files matching arbitrary criteria, but ignores .svn directories.
For bonus points: Allow other actions besides the default -print. Support the -H, -L, and -P options. Don't erroneously filter out files which simply happen to contain the substring .svn. Make usage identical to the regular find command.
#!/bin/bash
#
# Usage: svnfind [-H] [-L] [-P] [path...] [expression]
#
# Attempts to behave identically to a plain `find' command while ignoring .svn
# directories. Usage is identical to `find'.
You could try some simple CGI scripting. It can be done in shell and involves a lot of here documents, parsing and extracting of form values, a bit of escaping and whatever you want to do as payload. (I do not recommend exposing such a script to the hostile internet, though.)
Related
I'm having trouble monitoring a file for changes. I need to be able to know when a file changes, and when it does, I need the new line that was added. I intend to parse each line and find ones that match certain criteria, and act on information in those lines. I know the expected number of matching lines ahead of time, but I do not know how many lines in total will be added to the file, or where the matching lines will be.
I've tried 2 packages so far, with no avail.
fsnotify/fsnotify
As fas as I can tell, fsnotify can only tell me when a file is modified, not what the details of the modification was. Since I need to know what exactly was added to the file, this is no good for me.
(As a side-question, can this be run in a loop? The example that I tried exited after just one modification. I need to monitor for multiple modifications.)
hpcloud/tail
This package tries to mimic the Unix tail command, but it seems to have its own issues. The output that I get includes timestamps and other data - I just want the added line, nothing else. Also, it seems to think a file has been modified multiple times, even when it's just one edit. Further, the deal breaker here is that it does not output the last line if the line was not followed by a newline character.
Delegating to tail
I came across this answer, which suggests to delegate this work to the tail command itself, but I need this to work cross-platform (specifically, macOS, Linux and Windows). I don't believe that an equivalent command exists on Windows.
How do I go about tackling this?
#user2515526,
Usually changed diff is out of scope of file watchers' functionality, because, you know, you could change an image, and a watcher would need to keep a track several Mb of a diff in memory, and what if we have thousands of files?
However, as bad as it sounds, this may be exactly the way you want to implement this (sure, depends on your app, etc. - could be fine for text files), i.e. - keeping a map of diffs (1 diff per file) since last modification. Cannot say I like it, but sounds like fsnotify has no support for changes/diffs that you need.
Also, regarding your question about running in a loop, maybe you can get some hints here: https://github.com/kataras/iris/blob/8370d76910cdd8de043753ed81ae080eae8dc798/utils/file.go
Its a framework that allows to build a server that watches for TypeScript file changes. So sounds similar to your case/question.
Cheers,
-D
This question already has answers here:
Monitor directory content change
(2 answers)
Closed 8 years ago.
I want to write a shell script which checks all the logic files placed at /var/www/html/ and, if a user makes any change in those files, sends an alert to an administrator informing them that "User x has made change in f file." I do not have much experience in writing shell scripts, and for this I do not know how to start. Any help will be highly appreciated.
This is answered in superuser: https://superuser.com/questions/181517/how-to-execute-a-command-whenever-a-file-changes
Basically you use inotifywait
Simple, using inotifywait:
while inotifywait -e close_write myfile.py; do ./myfile.py; done
This has a big limitation: if some program replaces myfile.py with a different file, rather than writing to the existing myfile, inotifywait will die. Most editors work that way.
To overcome this limitation, use inotifywait on the directory:
while true; do
change=$(inotifywait -e close_write,moved_to,create .)
change=${change#./ * }
if [ "$change" = "myfile.py" ]; then ./myfile.py; fi
done
The asker has even put a script online for that, which can be called like this:
while sleep_until_modified.sh derivation.tex ; do latexmk -pdf derivation.tex ; done
To answer your question directly, you should probably take a look at the Advanced Bash Scripting Guide. It's about Bash specifically, but you may find it helpful even if you're not using Bash. As far as watching for changes in files, try inotify. There are also tools available to make it usable directly from the command line, which have worked quite nicely in my experience.
Now, there are a few other ways you might approach this problem. You might look at md5deep and ssdeep. They are tools designed for digital forensics which can create a list of cryptographic hashes (in the case of md5deep) or fuzzy hashes (in ssdeep's case) and later scan against that list and tell you which files have appeared, disappeared, changed, moved, etc. Or, if you want to detect potentially unauthorized changes, you might look at a host-based intrusion detection system such as OSSEC. Apparently, these can both scan for file changes and watch for other signs of unauthorized activity.
Want to upgrade my file management productivity by replacing 2 panel file manager with command line (bash or cygwin). Can commandline give same speed? Please advise a guru way of how to do e.g. copy of some file in directory A to the directory B. Is it heavy use of pushd/popd? Or creation of links to most often used directories? What are the best practices and a day-to-day routine to manage files of a command line master?
Can commandline give same speed?
My experience is that commandline copying is significantly faster (especially in the Windows environment). Of course the basic laws of physics still apply, a file that is 1000 times bigger than a file that copies in 1 second will still take 1000 seconds to copy.
..(howto) copy of some file in directory A to the directory B.
Because I often have 5-10 projects that use similar directory structures, I set up variables for each subdir using a naming convention :
project=NewMatch
NM_scripts=${project}/scripts
NM_data=${project}/data
NM_logs=${project}/logs
NM_cfg=${project}/cfg
proj2=AlternateMatch
altM_scripts=${proj2}/scripts
altM_data=${proj2}/data
altM_logs=${proj2}/logs
altM_cfg=${proj2}/cfg
You can make this sort of thing as spartan or baroque as needed to match your theory of living/programming.
Then you can easily copy the cfg from 1 project to another
cp -p $NM_cfg/*.cfg ${altM_cfg}
Is it heavy use of pushd/popd?
Some people seem to really like that. You can try it and see what you thing.
Or creation of links to most often used directories?
Links to dirs are, in my experience used more for software development where a source code is expecting a certain set of dir names, and your installation has different names. Then making links to supply the dir paths expected is helpful. For production data, is just one more thing that can get messed up, or blow up. That's not always true, maybe you'll have a really good reason to have links, but I wouldn't start out that way, just because it is possible to do.
What are the best practices and a day-to-day routine to manage files of a command line master?
( Per above, use standardized directory structure for all projects.
Have scripts save any small files to a directory your dept keeps in the /tmp dir, .
i.e /tmp/MyDeptsTmpFile (named to fit your local conventions) )
It depends. If you're talking about data and logfiles, dated fileNames can save you a lot of time. I recommend dateFmts like YYYYMMDD(_HHMMSS) if you need the extra resolution.
Dated logfiles are very handy, when a current process seems like it is taking a long time, you can look at the log file from a week ago and quantify exactly how long this process took, a week, month, 6 months (up to how much space you can afford). LogFiles should also capture all STDERR messages, so you never have to re-run a bombed program just to see what the error message was.
This is Linux/Unix you're using, right? Read the man page for the cp cmd installed on your machine. I recommend using an alias like alias CP='/bin/cp -pi' so you always copy a file with the same permissions and with the original files' time stamp. Then it is easy to use /bin/ls -ltr to see a sorted list of files with the most recent files showing up at the bottom of the list. (No need to scroll back to the top, when you sort by time,reverse). Also the '-i' option will warn you that you are going to overwrite a file, and this has saved me more than a couple of times.
I hope this helps.
P.S. as you appear to be a new user, if you get an answer that helps you please remember to mark it as accepted, and/or give it a + (or -) as a useful answer.
I'm learning UNIX/LINUX shell scripting and trying to think about it appropriate usage?
The only thing that comes into mind - it'll be nice for let's say backup operations and logs management....But I'm sure it goes way beyond that...or is it?
I'm sure there are people on this server who use Shell scripting on the daily basis.
Can you tell me what do you use it for in your organization/business?
Thanks:)
Why use shell scripts
Basically, there are any number of tasks related to backup, maintenance, etc. that need to be automated, and shell scripts do that.
You can do quite everything in shell, but it is easy to write ugly and slow scripts.
First domain of expertise of shells is to start and combine other programs. This is exceptionally well suited for:
file manipulations: list, move, copy, compress, archive
text lines manipulation: filter (grep), modify (sed), delete lines (sed), combine files (paste), sort (sort), unify (sort -u)
All those operation are NOT shell operation, but the shell is the glue that put them all together.
file operations are generally combined with flow control instructions (while, if, for)
line operations are combined with pipes | and named pipe mkfifo
Things you can do in less than 20 lines with shell commands.
I personally use it to batch miscellaneous daily/weekly commands and start up long running processes. They can be unwieldy and hard to debug when they get big. Unknown variables evaluate to empty strings (icky).
Scripting languages languages such as Python, Perl, and Ruby become more attractive as the code becomes more complex.
I work on an actively developed software project that runs in a unix environment. Unfortunately it uses a lot of different environment variables for configuration and stashes binary programs, data files, and shared libraries on version dependent paths.
All that is a pain to set up.
But it gets worse: at any given time I might want to work with the stable version, the pretty-stable-but-more-up-to-date version, the bleeding-edge-every-new-feature version, or my personally hacked development version.
Switching between them is a even bigger hassle.
Enter a shell script which insures that I am set up for exactly one version at a time. Ta da!
BTW--The script I use for this makes extensive use of the accepted answer to How do I manipulate $PATH elements in shell scripts?, so you know Stack Overflow works for me in the real world. More over, I've infected several other people with this technology.
I've seen and worked-on full-blown applications (medical records and scheduling processing) written in Korn shell.
Batch programming, PostScript print filters, automatic mailers and automated airline checkin systems, regular stock price tracking, software installers, et al, et al.
Better question = what could not be programmed in Shell?
for our company, we use shell scripts for the following:
backups - it would be very disastrous for us if we lose our data. Various parts of our backup like database backup, offsite backup, continuous backups etc all uses shell scripts that runs daily and some runs once a week.
update dates - we do not use ntp so we rely on sh scripts to update the date due to firewall restrictions.
log cleanup
send emails
I didn't think bash programming was particularly powerful until I saw that the OS startup scripts are all written in it. That made me re-examine my assumptions. I now have several dozen important shell scripts that I've written over the years that automate some common tasks.
For example, I wrote one that polls the current load average, and then executes a provided command if it exceeds a certain value (useful for examining events that only happen once or twice a day).
Another that I wrote iterates through all the mysql databases on the server and outputs a mysqldump for each one into its own appropriately-named .sql file.
Another iterates through a list of homedirs and changes the ownership of all the files under the corresponding public_html dir to match the user who should own them to be compliant with suPHP's restrictions.
Another examines the current hardware configuration and downloads, installs, and configures appropriate software for monitoring the health of the currently-attached RAID controller.
These are all relatively simple tasks that could be done by hand -- but whenever I find myself doing the same task more than once, I write a shell script to automate the process.
I also built a base-64 decoder in bash just to see if I could. It works, but it's terribly slow. I use shell scripting for simple tasks that primarily involve executing other programs. I often use Perl when a significant amount of string processing is required, and I use Python for the more complex scripting tasks. The more languages you know, the better you will be at choosing the right one for the job.
First off; I am not necessarily looking for Delphi code, spit it out any way you want.
I've been searching around (especially here) and found a bit about people looking for ways to compare to directories (inclusive subdirs) though they were using byte-by-byte methods. Second off, I am not looking for a difftool, I am "just" looking for a way to find files which do not match and, just as important, files which are in one directory but not the other and vice versa.
To be more specific: I have one directory (the backup folder) which I constantly update using FindFirstChangeNotification. Though the first time I need to copy all files and I also need to check the backup directory against the original when the applications starts (in case something happened when the application wasn't running or FindFirstChangeNotification didn't catch a file change). To solve this I am thinking of creating a CRC list for the backed up files and then run through the original directory computing the CRC for every file and finally compare the two CRCs. Then somehow look for files which are in one directory and not the other (again; vice versa).
Here's the question: Is this the fastest way? If so, how would one (roughly) get the job done?
You don't necessarily need CRCs for each file, you can just compare the "last modified" date for every file for most normal purposes. It's WAY faster. If you need additional safety, you can also compare the lengths. You get both of these metrics for free with the find functions.
And in your change notification, you should probably add the files to a queue and use a timer object to copy the new queued files every ~30sec or something, so you don't bog down the system with frequent updates/checks.
For additional speed, use the Win32 functions wherever possible, avoid any Delphi find/copy/getfileinfo functions. I'm not familiar with the Delphi framework but for example the C# stuff is WAY WAY WAY slower than the Win32 functions.
Regardless of you "not looking for a difftool", are you opposed to using Cygwin with it's "diff" command for the shell? If you are open to this its quite easy, particularly using diff with the -r "recursive" option.
The following generates the differences between 2 Rails installs on my machine, and greps out not only information about differences between files but also, specifically by grepping for 'Only', finds files in one directory, but not the other:
$ diff -r pgnindex pgnonrails | egrep '^Only|diff'
Only in pgnindex/app/controllers: openings_controller.rb
Only in pgnindex/app/helpers: openings_helper.rb
Only in pgnindex/app/views: openings
diff -r pgnindex/config/environment.rb pgnonrails/config/environment.rb
diff -r pgnindex/config/initializers/session_store.rb pgnonrails/config/initializers/session_store.rb
diff -r pgnindex/log/development.log pgnonrails/log/development.log
Only in pgnindex/test/functional: openings_controller_test.rb
Only in pgnindex/test/unit: helpers
The fastest way to compare one directory on the local machine to a directory on another machine thousands of miles away is exactly as you propose:
generate a CRC/checksum for every file
send the name, path, and CRC/checksum for each file over the internet to the other machine
compare
Perhaps the easiest way to do that is to use rsync with the "--dryrun" or "--list-only" option.
(Or use one of the many applications that use the rsync algorithm,
or compile the rsync algorithm into your application).
cd some_backup_directory
rsync --dryrun myname#remote_host:latest_version_directory .
For speed, the default rsync assumes, as Blindy suggested, that two files with the same name and the same path and the same length and the same modification time are the same.
For extra safety, you can give rsync the "--checksum" option to ignore the length and modification time and force it to compare (the checksum of) the actual contents of the file.