How to create a hbase custom command on jruby? - hadoop

I am new to Hadoop and whole IT itself. I want to know whether I can create a custom hbase command similar to already available scan, put commands.. I have a sample jruby script, client.rb that outputs the Row ID and Value by taking Tablename, Family, Limit as input. I can find the ruby scripts of other default commands like scan.rb, put.rb, in $HBASE_HOME/src/main/ruby/shell folder. If I want my custom command's script to be there in that folder and use that command in hbase shell, what I have to do?
hbase 0.94.10, Hadoop 1.2.1, Distribution: Apache
Seeking help please...

In addition to creating the ruby shell command like you've said, you also need to add said command to shell.rb.
See here for more information.

Related

Use non-built-in bash commands without modifying .bashsrc

I'm working on cluster and using custom toolkits (more specifically SRA toolkit). In order to use it, I fist had to download (and unpack it) to a specific folder in my directory.
Then I had to modify .bashsrc to include the following segment:
# User specific aliases and functions
export PATH="$PATH:/home/MYNAME/APPS/SRATOOLS/bin"
Now I can use a stuff from SRATools in bash command line, e.g.
prefetch SR111111
My question is, can I use those tools without modifying my .bashsrc?
The reason that I want to do that is because I wrote a .sh script that takes a long time to run, and my cluster has Sun Grid Engine job management system, and I submitted my script to it, only to see the process fail - because a SRA Toolkit command I used was unrecognized.
EDIT (1):
I modified the location where my prefetch command is, and now it looks like:
/MYNAME/APPS/SRA_TOOLS/bin
different from how it is in .bashsrc:
export PATH="$PATH:/home/MYNAME/APPS/SRATOOLS/bin"
And run what #Darkman suggested (put IF THEN ELSE FI and under ELSE put export). The output is that it didn't find SRATools (because path in .bashsrc is different), but it found them under ELSE and script is running normally. Weird. It works on my job management system.
Thanks everybody.

Shell script does not run: simple

Writing a shell script to switch between ruby versions because currently my rvm setup requires me to write 2-3 lines to switch ruby versions, and im constantly doing this because im writing a ruby app which requires 2.2.1 and latex documents which requires ruby 1.9.1. My current code probably looks more like pseudocode, so please help me to get it to run. Here's the code:
#!/bin/sh
/bin/bash --login
rvm list // this is an external shell command
echo -n Use which one? >
read text
rvm use $text // this is an external shell command
That script is problematic since it will run bash as a login shell and then refuse to run any of those other lines until you exit it.
You probably don't need a shell script for what you're trying to do, just have two aliases set up in your profile:
alias rlist='rvm list'
alias ruse='rvm use'
Then you can enter rlist if you want a list of them, or ruse 2.2.1 (for example) to select one.
Alternatively, as Walter A points out in a comment, you could also hard-code the possibilities assuming you don't want it too dynamic:
alias rbapp='rvm use 2.2.1'
alias rbltx='rvm use 1.9.1'
This has the added advantage of allowing you to do more things at the end if needed:
alias rbltx='rvm use 1.9.1; echo Using Latex ruby'
something that's not normally possible with aliases needing parameters.

pigrc feature available?

Is there a way to "automatically" set certain variables when i invoke the pig grunt intractive shell. I understand that we could use the define/default command to but then it is manual. Usecase could be the setting various variables that point to different HDFS path. I also understand that such an option can be used when calling the pig file using
pig -param_file -f somefile.pig
. But even if i use the -param_file during invoking the pig shell it does not work (pig -param_file ).
What i am looking for is kind of ".hiverc" file feature, do we have one ?
As per this JIRA you already have it. But you need to be on pig-0.11.0(or later) if you want to have this working.

Pig in grunt mode

I have installed cygwin, hadoop and pig in windows. The configuration seems ok, as I can run pig scripts in batch and embedded mode.
When I try to run pig in grunt mode, something strange happens. Let me explain.
I try to run a simple command like
grunt> A = load 'passwd' using PigStorage(':');
When I press Enter, nothing happens. The cursor goes to the next line and the grunt> prompt does not appear at all anymore. It seems as I am typing in a text editor.
Has anything similar ever happened to you? Do you have any idea how can I solve this?
The behavior is consistent with what you are observing. I will take the pig tutorial for example.
The following command does not result in any activity by pig.
raw = LOAD 'excite.log' USING PigStorage('\t') AS (user, time, query);
But if you invoke a command that results in using data from variable raw using some map-reduce thats when you will see some action in your grunt shell. Some thing along the lines of second command that is mentioned there.
clean1 = FILTER raw BY org.apache.pig.tutorial.NonURLDetector(query);
Similarly, your command will not result in any action, you have to use the data from variable A which results in map-reduce command to see some action on grunt shell:
grunt> A = load 'passwd' using PigStorage(':');
Pig will only process the commands when you use a command that creates output namely DUMP (to console) or STORE you can also use command DESCRIBE to get the structure of an alias and EXPLAIN to see the map/reduce plan
so basically DUMP A; will give you all the records in A
Please try to run in the windows command window.
C:\FAST\JDK64\1.6.0.31/bin/java -Xmx1000m -Dpig.log.dir=C:/cygwin/home/$USERNAME$/nubes/pig/logs -Dpig.log.file=pig.log -Dpig.home.dir=C:/cygwin/home/$USERNAME$/nubes/pig/ -classpath C:/cygwin/home/$USERNAME$/nubes/pig/conf;C;C:/FAST/JDK64/1.6.0.31/lib/tools.jar;C:/cygwin/home/$USERNAME$/nubes/pig/lib/jython-standalone-2.5.3.jar;C:/cygwin/home/$USERNAME$/nubes/pig/conf;C:/cygwin/home/$USERNAME$/nubes/hadoop/conf;C:/cygwin/home/$USERNAME$/nubes/pig/pig-0.11.1.jar org.apache.pig.Main -x local
Replace $USERNAME$ with your user id accordingly ..
Modify the class path and conf path accordingly ..
It works well in both local as well as map reduce mode ..
Pig shell hangs up in cygwin. But pig script successfully executed from pig script file.
As below:
$pig ./user/input.txt
For local mode:
pig -x local ./user/input.txt
I came across the same problem as you yesterday,and I spent one whole day to find what was wrong with my pig or my hotkey and fix it finally. I found that it's only because I copied the pig code from other resource,then the bending quotation marks cannot be identified in pig command line, which only admits straight quotation marks, so the input stream would not end.
My suggestion is that you should take care of the valid characters in the code, especially when you just copy codes into the command line, which always causes unexpected faults.

how to invoke ruby script containing system command with cron job?

I have a ruby script containing system command like http://gist.github.com/235833, while I ran this script from shell, it works correctly, but when I added it to my cron job list, it doesn't work any more, the cron job is like:
10/* * * * * cd /home/hekin; /usr/bin/ruby my_script.rb
any idea what's going wrong with what i've done? Thank you.
Thank you all for your answers.
It's my mistake.
Since I'm using ssh key forwarding on the local machine, while I executed the script from the shell, the ssh key forwarding related environment variables are all sitting there, but from cron job context, those environment variables are missing.
Try to separate the things that might go wrong. The ones I can think of are:
The cron syntax - is the time value given legal and fitting your shell?
Permissions - execute permissions and read permissions for the relevant directory and file
Quoting - what scope does cron cover? Does it run only the first command?
In order to dissect this, I suggest you first run a really simple cron job, like 'ls'. Next run a single-liner script. Next embed your commands in a shell-script file. Somewhere along these lines you should find the problem.
The problem is your environment. While testing in your shell its fully equipped and boosted by your shell environment. While running under cron its very, very stripped down.
Where is the destination "." for your script? I guess it will be "/" and may not "$HOME" thus your script won't be able to write at that location and fails. Try using an absolut path for the destination.

Resources