pig shell setup: automatically executing pig scripts - hadoop

Is there a way to automatically run a pig script when invoking pig from command line?
The reason I'm wondering about this is that I have several import and define statements that I use constantly over and over to set everything up. Is it possible to define this collection of statements somewhere so that when I start pig, it will automatically execute those lines? I apologize in advance if this is something trivial that I missed from the documentation.

yes you can certainly do so from version 0.11 onwards.
You need to use .pigbootup file.
Here is a nice blogpost on setting up the pigbootup file
http://hadoopified.wordpress.com/2013/02/06/pig-specify-a-default-script/
If you want to include Pig-Macros from a file you can use the import command
Take a look at http://pig.apache.org/docs/r0.9.1/cont.html#import-macros for reference

Related

naming convention of part files in HDFS

When we do an INSERT INTO command in Hive, the result of the execution creates multiple part files in HDFS.
e.g. part-*-***** or 000000_0,000001_0 etc or something else.
Is there a configuration/setting that controls the naming of these part files?
The cluster I work in creates 000000_0, 000001_0, 000000_1 etc. I would like to change this to part- or text- etc so that its easier for me to pick these files up and merge them if needed.
If there is a setting that can be set in Hive right before executing the HQL, that would be ideal.
Thanks in advance.
I think you should be able
set mapreduce.output.basename = part-;
This won't work. The only way I have found is with a custom file writer.

Adding Helper Methods to Mongo Shell

Is there any way of adding "helper" methods to the mongo shell that loads each time you use it?
Basically whenever you want to query by _id, you have to do something like this.
db.collectionName.findOne({_id: ObjectId('THIS-IS-AN-OBJECTID')})
Whenever I'm going to be doing a lot of command line commands, I alias the ObjectId function to make it easier to type.
var ob = ObjectId;
db.collectionName.findOne({_id: ob('AN-OBJECTID')})
db.collectionName.findOne({_id: ob('ANOTHER-ONE')})
db.collectionName.findOne({_id: ob('ANOTHER')})
It would be pretty chill if there was a way of either running a specified piece of JS / add a chunk of code that runs each time mongo is pulled up from the shell. I checked out MongoDB's CLI documentation, but didn't see anything like that available, so I figured I would ask here.
I know there is a possibility of using this nefariously, so this might be a situation where it might be unsupported by the mongo shell by default. This might be a situation where we can create a helper bash script of some sort to launch the shell, then inject keyboard input to create the helper ob function? Not sure how this could be tackled personally, but would love some insight on how to do something like this, either natively or through a helper script of some sort.
If you want code to execute every time you launch the shell, then whatever you place in .mongorc.js will be run on launch:
.mongorc.js File
When starting, mongo checks the user’s HOME directory for a JavaScript file named .mongorc.js. If found, mongo interprets the content of .mongorc.js before displaying the prompt for the first time. If you use the shell to evaluate a JavaScript file or expression, either by using the --eval option on the command line or by specifying a .js file to mongo, mongo will read the .mongorc.js file after the JavaScript has finished processing. You can prevent .mongorc.js from being loaded by using the --norc option.
So simply define your variable association there.
You could also supply a file of your choice along with the --shell option to let the command know you want the shell opened on completion of any instructions contained:
mongo --shell file_with_javascript.js
But as mentioned, the .mongorc.js file would still be called (if present) unless the --norc option was also specified.

Setting multiple values for Vim command -complete attribute

I am trying to enable command completion for a custom command that I am setting up for a plugin in the following manner:
command! -complete=shellcmd -nargs=* EScratch call s:ShellScratch(<f-args>)
I would like to enable complete options for shellcmd and file. However it seems that the complete attribute would only take 1 option.
To give a bit more context as to what I am trying to achieve: I am working on a plugin to create a simple scratch buffer. I would like to be able to run a shell command from the command mode and copy the output to the scratch buffer. I have been able to achieve all this but it would be much more productive to have auto completion similar to shell. The complete script can be viewed here https://github.com/ifthikhan/vimscratch/blob/master/plugin/vimscratch.vim. Any pointers will be highly appreciated.
Unfortunately, you can't. If you really need this, you have to either
define two separate commands, e.g. :ScratchShell and :ScratchFile, with the corresponding completions, or
use a -complete=custom[list] and provide your own complete function, where you have to re-implement both sources yourself. Filename completion actually is quite easily done with glob(); I'm not so sure about shell commands.

PIG - LOAD continue on error

New to pig.
I'm loading data into a relation like so:
raw_data = LOAD '$input_path/abc/def.*;
It works great, but if it can't find any files matching def.* the entire script fails.
Is here a way to continue with the rest of the script when there are no matches. Just produce an empty set?
I tried to do:
raw_data = LOAD '$input_path/abc/def.* ONERROR Ignore();
But that doesn't parse.
You could write a custom load UDF that returns either the file or an empty tuple.
http://wiki.apache.org/pig/UDFManual
No, there is no such feature, at least the one that I've heard of.
Also I would say that "producing an empty set" is "not running the script at all".
If you don't want to run a Pig script under some circumstances then I recommend using wrapper shell scripts or Pig embedding:
http://pig.apache.org/docs/r0.11.1/cont.html

cacti - multi cpu util - multi line OID

I have the OID: .1.3.6.1.2.1.25.3.3.1.2
I got 24 rows (I have 24 core server),
I want to create one graph with all the rows to see the utilization.
Please help me :)
Thanks...
Had the same problem and I created a data input methode in Perl which uses Net::SNMP.
Get the script here:
https://gist.github.com/1139477
Get the data template here:
https://gist.github.com/1237260
Put the script into $CACTI_HOME/scripts, make sure it's executable and import the template.
Make sure you got Perl's Net::SNMP installed.
Have fun!
Alex.

Resources