Shell script - Multiple user appending the same file at the same time - shell

I have a script in my server that displays its performance to the user in a text file. When the same script is executed in parallel by multiple users the information in the text file gets mixed up. I append many details of the server in the text file which takes roughly less than a minute to come up with the output. If i do file locking will it hit the performance or is there any way i need to look upon.
Please help me on how to proceed .
Thanks
Balakrishnan

You could make use of a message queueing system:
POSIX message queues:
http://www.linuxhowtos.org/manpages/7/mq_overview.htm
Beanstalkd: http://kr.github.io/beanstalkd/
POSIX Message Queue for Ruby: http://rubygems.org/gems/posix_mq
Perl: http://search.cpan.org/~iljatabac/POSIX-RT-MQ-0.03/MQ.pm
Python IPC: http://semanchuk.com/philip/posix_ipc/
Other threads:
Are message queues obsolete in linux?
https://unix.stackexchange.com/questions/70837/linux-command-to-check-posix-message-queue
https://stackoverflow.com/questions/40296/what-is-the-best-free-tool-for-managing-msmq-queues-and-messages
The idea is to create a server process that would receive messages and store on a buffer. It would only print a line on the logfile everytime a message from a process already has a complete line.

FLoM http://sourceforge.net/projects/flom/ can manage the lock you need: it's easy to use, it's fast, the same resource can be locked/unlocked by different users and it implements a rich lock model.
This example use case could give you some ideas about the tool: http://sourceforge.net/p/flom/wiki/Use%20Case%206/
Cheers
Ch.F.

Related

Limitations on file append when using in multi-processed environment

My process creates a log file and appends a new line at the end of the file by using a, e.g:
fopen("log.txt", "a");
The order of the writes is not critical, but I need to ensure that fopen always succeeds. My question is, can the call above be executed from multiple processes at the same time on Windows, Linux and macOS without any race-condition?
If not, what is the most common and easy way to ensure I can write to the log file? There is file-lokcing, but also a file-lock (aka log.txt.lock) possible. Could anyone share some insights or resources which go more into detail?
If you do not use any synchronization between processes, you'll highly likely have moment when several processes will try to write to the file and the best you can get is mesh of input strings.
In order to synchronize any work in several processes (multiprocessing module). Use Lock. It will prevent several processes to do some work simultaneously.
It will look something like this:
import multiprocessing
# create lock in main process and "send" it to child processes.
lock = multiprocessing.Lock()
# ...
# in child Process
with lock:
do_some_work()
If you need more detailed example, feel free to ask.
Also you can check example in official docs

Reading file in parallel from multiple processes

I'm running multiple processes in parallel and each of these processes read the same file in parallel. It looks like some of the processes see a corrupted version of the file if I increase the number of processes to > 15 or so. What is the recommended way of handling such a scenario?
More details:
The file being read in parallel is actually a perl script. The multiple jobs are python processes, and each of them launch this perl script independently with different input parameters. When the number of jobs is increased, some of these jobs give errors that the perl script has invalid syntax (which is not true). Hence, I suspect that some of these jobs read in corrupted versions of the perl script.
I'm running all of this on a 32core machine.
If any process is also writing to the file, then you need to enforce some synchronization, for example with a global named mutex.
If there is no asynchronous writing going on, I would not expect to see corruption during the reads. Are you opening the files with "r" access? If you're still encountering troubles, it might be worth experimenting with reducing read buffer size. Or call out to a native win32 API for the file access.
Good luck!

2-way communication with background process (I/O)

I have a program that runs in the command line (i.e. $ run program starts up a prompt) that runs mathematical calculations. It has it's own prompt that takes in text input and responds back through standard-out/error (or creates a separate x-window if needed, but this can be disabled). Sometimes I would like to send it small input, and other times I send in a large text file filled with a series of input on each line. This program takes a lot of resources and also has a large startup time, so it would be best to only have one instance of it running at a time. I could keep open the program-prompt and supply the input this way, or I can send the process with an exit command (to leave prompt) which just prints the output. The problem with sending the request with an exit command is that the program must startup each time (slow ...). Furthermore, the output of this program is sometimes cryptic and it would be helpful to filter the output in some way (eg. simplify output, apply ANSI colors, etc).
This all makes me want to put some 2-way IO filter (or is that "pipe"? or "wrapper"?) around the program so that the program can run in the background as single process. I would then communicate with it without having to restart. I would also like to have this all while filtering the output to be more user friendly. I have been looking all over for ideas and I am stumped at how to accomplish this in some simple shell accessible manor.
Some things I have tried were redirecting stdin and stdout to files, but the program hangs (doesn't quit) and only reads the file once making me unable to continue communication. I think this was because the prompt is waiting for some user input after the EOF. I thought that this could be setup as a local server, but I am uncertain how to begin accomplishing that.
I would love to find some simple way to accomplish this. Additionally, if you can think of a way to perform this, do you think there is a way to also allow for attaching or detaching to the prompt by request? Any help and ideas would be greatly appreciated.
You could create two named pipes (man mkfifo) and redirect input and output:
myprog < fifoin > fifoout
Then you could open new terminal windows and do this in one:
cat > fifoin
And this in the other:
cat < fifoout
(Or use tee to save the input/output as well.)
To dump a large input file into the program, use:
cat myfile > fifoin

What will happen if two different programs try to write to the same file simulatneously?

What will happen if two different
programs try to write to the same
file simultaneously?
Will one of the programs experience
a file lock error?
How should programs be designed to
handle this scenario?
When the second app (or thread) try to open the file for writing it would throw IO exception.. simple..
say you have user A and user B, what you can do is, let both of them modify the content, there will be a small difference in time however "simultaneous" you want it to be, so check which user has "submitted" the changes first, save those changes and prompt a smart message to the next user saying "file has been updated, check changes before.. blah blah".
Use FileLock to avoid IO Exception when file is being accessed by multiple threads.

Verify whether ftp is complete or not?

I got an application which is polling on a folder continuously. Once any file is ftp to the folder, the application has to move this file to some other folder for processing.
Here, we don't have any option to verify whether ftp is complete or not.
One command "lsof" is suggested in the technical forums. It got a file description column which gives the file status.
Since, this is a free bsd command and not present in old versions of linux, I want to clarify the usage of this command.
Can you guys tell us your experience in file verification and is there any other alternative solution available?
Also, is there any risk in using this utility?
Appreciate your help in advance.
Thanks,
Mathew Liju
We've done this before in a number of different ways.
Method one:
If you can control the process sending the files, have it send the file itself followed by a sentinel file. For example, send the real file "contracts.doc" followed by a one-byte "contracts.doc.sentinel".
Then have your listener process watch out for the sentinel files. When one of them is created, you should process the equivalent data file, then delete both.
Any data file that's more than a day old and doesn't have a corresponding sentinel file, get rid of it - it was a failed transmission.
Method two:
Keep an eye on the files themselves (specifically the last modification date/time). Only process files whose modification time is more than N minutes in the past. That increases the latency of processing the files but you can usually be certain that, if a file hasn't been written to in five minutes (for example), it's done.
Conclusion:
Both those methods have been used by us successfully in the past. I prefer the first but we had to use the second one once when we were not allowed to change the process sending the files.
The advantage of the first one is that you know the file is ready when the sentinel file appears. With both lsof (I'm assuming you're treating files that aren't open by any process as ready for processing) and the timestamps, it's possible that the FTP crashed in the middle and you may be processing half a file.
There are normally three approaches to this sort of problem.
providing a signal file so that when your file is transferred, an additional file is sent to mark that transfer is complete
add an entry to a log file within that directory to indicate a transfer is complete (this really only works if you have a single peer updating the directory, to avoid concurrency issues)
parsing the file to determine completeness. e.g. does the file start with a length field, or is it obviously incomplete ? e.g. parsing an incomplete XML file will result in a parse error due to the lack of an end element. Depending on your file's size and format, this can be trivial, or can be very time-consuming.
lsof would possibly be an option, although you've identified your Linux portability issue. If you use this, note the -F option, which formats the output suitable for processing by other programs, rather than being human-readable.
EDIT: Pax identified a fourth (!) method I'd forgotten - using the fact that the timestamp of the file hasn't updated in some time.
There is a fifth method. You can also check if the FTP Session is still active. This will work if every peer has it's own ftp user account. As long as the user is not logged off from FTP, assume the files are not complete.

Resources