I need to write tests after ETL process and need to format the output in standard error messages. What are the standard messages? Is there such a list?
ETL is not a standard or a standard process, it is rather a reference to a group of common operations. Since there is no standard there will not be a standard list of messages.
Related
What is the procedure to define new external collectors in bosun using scollector.
Can we write python or shell scripts to collect data?
The documentation around this is not quite up to date. You can do it as described in http://godoc.org/bosun.org/cmd/scollector#hdr-External_Collectors , but we also support JSON output which is better.
Either way, you write something and put it in the external collectors directory, followed by a frequency directory, and then an executable script or binary. Something like:
<external_collectors_dir>/<freq_sec>/foo.sh.
If the directory frequency is zero 0, then the the script is expected to be continuously running, and you put a sleep inside the code (This is my preferred method for external collectors). The scripts outputs the telnet format, or the undocumented JSON format to stdout. Scollector picks it up, and queues that information for sending.
I created an issue to get this documented not long ago https://github.com/bosun-monitor/bosun/issues/1225. Until one of us gets around to that, here is the PR that added JSON https://github.com/bosun-monitor/bosun/commit/fced1642fd260bf6afa8cba169d84c60f2e23e92
Adding to what Kyle said, you can take a look at some existing external collectors to see what they output. here is one written in java that one of our colleagues wrote to monitor jvm stuff. It uses the text format, which is simply:
metricname timestamp value tag1=foo tag2=bar
If you want to use the JSON format, here is an example from one of our collectors:
{"metric":"exceptional.exceptions.count","timestamp":1438788720,"value":0,"tags":{"application":"AdServer","machine":"ny-web03","source":"NY_Status"}}
And you can also send metadata:
{"Metric":"exceptional.exceptions.count","Name":"rate","Value":"counter"}
{"Metric":"exceptional.exceptions.count","Name":"unit","Value":"errors"}
{"Metric":"exceptional.exceptions.count","Name":"desc","Value":"The number of exceptions thrown per second by applications and machines. Data is queried from multiple sources. See status instances for details on exceptions."}`
Or send error messages to stderror:
2015/08/05 15:32:00 lookup OR-SQL03: no such host
There are several messages from DFSORT, which is internally used by the COBOL program that has several sort operations. I would like to remove those DFSORT messages and retain only those from the COBOL program.
You have three options.
Use the OUTDD(ddname) Enterprise COBOL compiler option to change the DDName used for DISPLAY output.
Use the DFSPARM as you have discovered, to change the DDName SORT uses for its messages when it is invoked (called) from a program (as when using the SORT or MERGE verbs in COBOL).
Use the SORT-MESSAGE special-register.
If your SORT was stand-alone, you could also change the SORT messages file with using the OPTION Control Statement, OPTION MSGDDN=ddname. DFSPARM is the way to allow OPTION to be provided for an invoked SORT/MERGE.
You also have Language Environment which can use SYSOUT during the run-unit, for messages from Language Environment (run-time errors, abends, requested information). There is a MSGFILE(ddname) run-time option to get LE to use a different ddname.
The easiest resolution to your problem is to use the OUTDD(ddname) compiler option. Then you don't have to worry about DFSORT (or SyncSORT at a different site) or Language Environment.
You can suppress all DFSORT messages with the MSGPRT option. You can treat multiple invocations of DFSORT differently by specifying a DFSPARM DD with FREE=CLOSE for each invocation.
DFSORT messages of COBOL progrom (using internal sort) can be redirected by specifying ddname MSGDDN in exec step of program in JCL.
e.g.
//DFSOUTDD DD DISP=SHR,DSN=XXX.DFSOUT
//DFSPARM DD *
MSGDDN=DFSOUTDD
/*
I have a script in my server that displays its performance to the user in a text file. When the same script is executed in parallel by multiple users the information in the text file gets mixed up. I append many details of the server in the text file which takes roughly less than a minute to come up with the output. If i do file locking will it hit the performance or is there any way i need to look upon.
Please help me on how to proceed .
Thanks
Balakrishnan
You could make use of a message queueing system:
POSIX message queues:
http://www.linuxhowtos.org/manpages/7/mq_overview.htm
Beanstalkd: http://kr.github.io/beanstalkd/
POSIX Message Queue for Ruby: http://rubygems.org/gems/posix_mq
Perl: http://search.cpan.org/~iljatabac/POSIX-RT-MQ-0.03/MQ.pm
Python IPC: http://semanchuk.com/philip/posix_ipc/
Other threads:
Are message queues obsolete in linux?
https://unix.stackexchange.com/questions/70837/linux-command-to-check-posix-message-queue
https://stackoverflow.com/questions/40296/what-is-the-best-free-tool-for-managing-msmq-queues-and-messages
The idea is to create a server process that would receive messages and store on a buffer. It would only print a line on the logfile everytime a message from a process already has a complete line.
FLoM http://sourceforge.net/projects/flom/ can manage the lock you need: it's easy to use, it's fast, the same resource can be locked/unlocked by different users and it implements a rich lock model.
This example use case could give you some ideas about the tool: http://sourceforge.net/p/flom/wiki/Use%20Case%206/
Cheers
Ch.F.
I'm streaming data in a pig script through an executable that returns an xml fragment for each line of input I stream to it. That xml fragment happens to span multiple lines and I have no control whatsoever over the output of the executable I stream to
In relation to Use Hadoop Pig to load data from text file w/ each record on multiple lines?, the answer was suggesting writing a custom record reader. The problem is, this works fine if you want to implement a LoadFunc that reads from a file, but to be able to use streaming, it has to implement StreamToPig. StreamToPig allows you to only read one line at a time as far as I understood
Does anyone know how to handle such a situation?
If you are absolutely sure, then one option is to manage it internally to the streaming solution. That is to say, you build up the tuple yourself, and when you hit whatever your desired size is, you do the processing and return a value. In general, evalfuncs in pig have this issue.
Using Google protobuf, I am saving my serialized messaged data to a file - in each file there are several messages. We have both C++ and Python versions of the code, so I need to use protobuf functions that are available in both languages. I have experimented with using SerializeToArray and SerializeAsString and there seems to be the following unfortunate conditions:
SerializeToArray: As suggested in one answer, the best way to use this is to prefix each message with it's data size. This would work great for C++, but in Python it doesn't look like this is possible - am I wrong?
SerializeAsString: This generates a serialized string equivalent to it's binary counterpart - which I can save to a file, but what happens if one of the characters in the serialization result is \n - how do we find line endings, or the ending of messages for that matter?
Update:
Please allow me to rephrase slightly. As I understand it, I cannot write binary data in C++ because then our Python application cannot read the data, since it can only parse string serialized messages. Should I then instead use SerializeAsString in both C++ and Python? If yes, then is it best practice to store such data in a text file rather than a binary file? My gut feeling is binary, but as you can see this doesn't look like an option.
We have had great success base64 encoding the messages, and using a simple \n to separate messages. This will ofcoirse depend a lot on your use - we need to store the messages in "log" files. It naturally has overhead encoding/decoding this - but this has not even remotely been an issue for us.
The advantage of keeping these messages as line separated text has so far been invaluable for maintenance and debugging. Figure out how many messages are in a file ? wc -l . Find the Nth message - head ... | tail. Figure out what's wrong with a record on a remote system you need to access through 2 VPNs and a citrix solution ? copy paste the message and mail it to the programmer.
The best practice for concatenating messages in this way is to prepend each message with its size. That way you read in the size (try a 32bit int or something), then read that number of bytes into a buffer and deserialize it. Then read the next size, etc. etc.
The same goes for writing, you first write out the size of the message, then the message itself.
See Streaming Multiple Messages on the protobuf documentation for more information.
Protobuf is a binary format, so reading and writing should be done as binary, not text.
If you don't want binary format, you should consider using something other than protobuf (there are lots of textual data formats, such as XML, JSON, CSV); just using text abstractions is not enough.