Parse a log file remembering last run - windows

I have a simple script that opens a file (log file), parses through it looking for specific log entries/keywords and for each entry that matches it triggers an alert.
The problem that I am trying to solve is that I would like to modify the script to remember the alarms that were already sent when it was last run, so that if the script re-runs it won't keep sending an alert for previously sent alerts.
The coding language is Golang, what are the a valid approach to do this? A database sounds like overkill, but I don't know what other alternatives are out there?

It depends on the nature of the log file: server log (classic) or transation log.
Even assuming the former, it depends on its Log Management (long term retention, rotation, ...)
Assuming a classic log files whose data are appended (not overwritten), a simple approach would be to generate in a file the line where the alert is found.
At the next run, if that line matches one stored in that special "flag" file, the alert would not be sent again.

Related

Does SQL*Loader have any functionality that allows for customizing the log file?

I have been asked to create a system for allowing third party companies to dump data into several of our tables. These third parties provide csv files on a periodic basis, and after doing some research it seemed like Oracle themselves had a standard tool for doing so, "sqlldr". I've since gotten it working to an acceptable degree, and we have a job scheduled to run that script once a day.
But one of the third parties supplies really dirty data, of the sort where I can't expect it to always load every row/record (looking like up to about 8% will fail). My boss asked me to forward "all output" from the first few tests to him, and like a moron I also sent the log file.
He has asked that this "report" be modified to include those exceptions that aren't unique constraints along with the line in the input file that caused the exception.
This means that I need data from the log file, but also from the (I believe) reject file in a single document. Rather than write a convoluted shell script to combine those two, does SQL*Loader itself allow any customization that might achieve the same thing? I've read through the Oracle documentation and haven't found anything that suggests this, but I've also learned not to trust it entirely either.
Is this possible? Ideally, the solution would allow me to add values to the reject file that don't exist in the original input file, but I'm also interested in any customization of the log file or reject file.
No.
I was going to stop there, but you can define the name of the log file, which might help with issue. Most automation with SQL*Loader involves wrapping it within shell scripts; aka "roll your own."

How do you use logrotate with output redirect?

I'm currently running a ruby script which logs its HTTP traffic to stdout. Since I wanted the logs to be persistent, I redirected the output to a log file with ruby ruby_script.rb >> /var/log/ruby_script.log. However, the logs are now getting very large so I wanted to implement logrotate using the following:
"/var/log/ruby_script.log" {
missingok
daily
rotate 10
dateext
}
However, after running logrotate --force -v ruby_script where "ruby_script" is the name of the logrotate.d configuration file, no new file is created for the script to write to, and it writes to the rotated file instead. I'm guessing this behavior happens because the file descriptor that is passed by >> sticks to the file regardless of moving it, and is unrelated to the filename after the first call. Thus, my question is, what is the correct way to achieve the functionality I'm looking for?
Take a look at option copytruncate.
From man logrotate:
copytruncate: Truncate the original log file to zero size in place after creating a copy, instead of moving the old log file and
optionally creating a new one. It can be used when some program cannot be told to close its logfile and thus might
continue writing (appending) to the previous log file forever. Note that there is a very small time slice between
copying the file and truncating it, so some logging data might be lost. When this option is used, the create option
will have no effect, as the old log file stays in place.

NodeJS - failed to read newly uploaded file

I was trying to build a system (NodeJS + Express 4) that reads a user uploaded text file, process it, and feed it back to the user. I was trying to use ajax upload, and multer as the parser for multi-part data. The whole workflow is supposedly to be like this:
User chooses a local file, and clicks the upload button.
Server received file, and read it.
Server do some processing with the data
Send results back
Every part of the link works except the server read part - sometimes the file is not read fully even though the server signals that the file upload was completed (I have tried multiple libraries, like multer, busboy, formidable that triggers the file upload complete event). I have done various experiments, and here's what I find (with 1000 lines of file):
the fs.readFile sometimes ends prematurely. The result file can be anywhere between 100 - 1000 lines.
missing part is almost always the last small piece, feels like the pipe was not fully flushed yet. I have tried file size between 1000 lines to 200,000 lines, and it's always missing the last few hundred lines.
using streaming almost solved the issue - like createReadStream, or byline, line-by-line, but sometimes the result can be 'undefined' or missing the last few lines, but a lot less frequent.
trigger the read twice, and the second time is almost guaranteed to read the full 1000 lines.
Is there anyway to force NodeJS to 'flush' the uploaded file? Somehow I feel the upload complete event was triggered (regardless of library, and everyone is dependent on FileSystem I guess) before the last piece of file was flushed in the stream. Or maybe there are some other issues - reading a static files always give the correct results. I could use the http POST forms but I'd like to use ajax to improve user experience.
Any thoughts?

Obtaining Dynamically Changing Log Files

Does the problem I am facing have some kind of a fancy name like "Dining philosophers problem" or "Josephus problem" etc etc? This is so that I can do some research on it.
I want to retrieve the latest log file in Windows. The log file will change its name to log.2, log.3, log.4 .....and so on when it is full(50MB let's say) and the incoming log will be inserted in log.1.
Now, I have a solution to this. I try to poll the server intermittently if the latest file (log.1) has any changes or not.
However, I soon found out that the log.1 is changing to log.2 at an unpredictable time causing me to miss the log file (because I will only retrieve log.1 if log.1 has any changes in its' "Date Modified" properties).
I hope there is some kind of allegory I can give to make this easy to understand. The closest thing I can relate is that of a stroboscope freezing a fan with an unknown frequency giving the illusion of the fan is freezing but the fan has actually spin lot of time. You get the gist.
Thanks in advance.
The solution will be to have your program keep track of the last modified dates for both files log.1 and log.2. When you poll, check log.2 for changes and then check log.1 for changes.
Most of the time, log.2 will not have changed. When it does, you read the updated data there, and then read the updated data in log.1. In code, it would look something like this:
DateTime log1ModifiedDate // saved, and updated whenever it changes
DateTime log2ModifiedDate
if log2.DateModified != log2ModifiedDate
Read and process data from log.2
update log2ModifiedDate
if log1.DateModified != log1ModifiedDate
Read and process data from log.1
update log1ModifiedDate
I'm assuming that you poll often enough that log.1 won't have rolled over twice such that the file that used to be log.1 is now log.3. If you think that's likely to happen, you'll have to check log.3 as well as log.2 and log.1.
Another way to handle this in Windows is to implement file change notification, which will tell you whenever a file changes in a directory. Those notifications are delivered to your program asynchronously. So rather than polling, you respond to notifications. In .NET, you'd use FileSystemWatcher. With the Windows API, you'd use FindFirstChangeNotification and associated functions. This CodeProject article gives a decent example.
Get file-list, sort it in decending order, take first file, read log lines!

Verify whether ftp is complete or not?

I got an application which is polling on a folder continuously. Once any file is ftp to the folder, the application has to move this file to some other folder for processing.
Here, we don't have any option to verify whether ftp is complete or not.
One command "lsof" is suggested in the technical forums. It got a file description column which gives the file status.
Since, this is a free bsd command and not present in old versions of linux, I want to clarify the usage of this command.
Can you guys tell us your experience in file verification and is there any other alternative solution available?
Also, is there any risk in using this utility?
Appreciate your help in advance.
Thanks,
Mathew Liju
We've done this before in a number of different ways.
Method one:
If you can control the process sending the files, have it send the file itself followed by a sentinel file. For example, send the real file "contracts.doc" followed by a one-byte "contracts.doc.sentinel".
Then have your listener process watch out for the sentinel files. When one of them is created, you should process the equivalent data file, then delete both.
Any data file that's more than a day old and doesn't have a corresponding sentinel file, get rid of it - it was a failed transmission.
Method two:
Keep an eye on the files themselves (specifically the last modification date/time). Only process files whose modification time is more than N minutes in the past. That increases the latency of processing the files but you can usually be certain that, if a file hasn't been written to in five minutes (for example), it's done.
Conclusion:
Both those methods have been used by us successfully in the past. I prefer the first but we had to use the second one once when we were not allowed to change the process sending the files.
The advantage of the first one is that you know the file is ready when the sentinel file appears. With both lsof (I'm assuming you're treating files that aren't open by any process as ready for processing) and the timestamps, it's possible that the FTP crashed in the middle and you may be processing half a file.
There are normally three approaches to this sort of problem.
providing a signal file so that when your file is transferred, an additional file is sent to mark that transfer is complete
add an entry to a log file within that directory to indicate a transfer is complete (this really only works if you have a single peer updating the directory, to avoid concurrency issues)
parsing the file to determine completeness. e.g. does the file start with a length field, or is it obviously incomplete ? e.g. parsing an incomplete XML file will result in a parse error due to the lack of an end element. Depending on your file's size and format, this can be trivial, or can be very time-consuming.
lsof would possibly be an option, although you've identified your Linux portability issue. If you use this, note the -F option, which formats the output suitable for processing by other programs, rather than being human-readable.
EDIT: Pax identified a fourth (!) method I'd forgotten - using the fact that the timestamp of the file hasn't updated in some time.
There is a fifth method. You can also check if the FTP Session is still active. This will work if every peer has it's own ftp user account. As long as the user is not logged off from FTP, assume the files are not complete.

Resources