Is there a way to "automatically" set certain variables when i invoke the pig grunt intractive shell. I understand that we could use the define/default command to but then it is manual. Usecase could be the setting various variables that point to different HDFS path. I also understand that such an option can be used when calling the pig file using
pig -param_file -f somefile.pig
. But even if i use the -param_file during invoking the pig shell it does not work (pig -param_file ).
What i am looking for is kind of ".hiverc" file feature, do we have one ?
As per this JIRA you already have it. But you need to be on pig-0.11.0(or later) if you want to have this working.
Related
I'm working on cluster and using custom toolkits (more specifically SRA toolkit). In order to use it, I fist had to download (and unpack it) to a specific folder in my directory.
Then I had to modify .bashsrc to include the following segment:
# User specific aliases and functions
export PATH="$PATH:/home/MYNAME/APPS/SRATOOLS/bin"
Now I can use a stuff from SRATools in bash command line, e.g.
prefetch SR111111
My question is, can I use those tools without modifying my .bashsrc?
The reason that I want to do that is because I wrote a .sh script that takes a long time to run, and my cluster has Sun Grid Engine job management system, and I submitted my script to it, only to see the process fail - because a SRA Toolkit command I used was unrecognized.
EDIT (1):
I modified the location where my prefetch command is, and now it looks like:
/MYNAME/APPS/SRA_TOOLS/bin
different from how it is in .bashsrc:
export PATH="$PATH:/home/MYNAME/APPS/SRATOOLS/bin"
And run what #Darkman suggested (put IF THEN ELSE FI and under ELSE put export). The output is that it didn't find SRATools (because path in .bashsrc is different), but it found them under ELSE and script is running normally. Weird. It works on my job management system.
Thanks everybody.
Wondering if there is any way to somehow hide sqoop process output in Unix shell?
For example instead of that output put some text like "sqoop processing"
Thanks
The way I deal with this for pig scripts (which also tend to give a lot of output, and run for a long time) is as follows:
Rather than running
pig mypath/myscript.pig
I will run
nohup pig mypath/myscript.pig &
In your case that would mean something like
nohup oozie -job something &
This has the additional benefit that it will not stop your query if your SSH connection times out. If you do not use SSH at the moment, this may be an additional required step.
I am new to Hadoop and whole IT itself. I want to know whether I can create a custom hbase command similar to already available scan, put commands.. I have a sample jruby script, client.rb that outputs the Row ID and Value by taking Tablename, Family, Limit as input. I can find the ruby scripts of other default commands like scan.rb, put.rb, in $HBASE_HOME/src/main/ruby/shell folder. If I want my custom command's script to be there in that folder and use that command in hbase shell, what I have to do?
hbase 0.94.10, Hadoop 1.2.1, Distribution: Apache
Seeking help please...
In addition to creating the ruby shell command like you've said, you also need to add said command to shell.rb.
See here for more information.
I have couple of questions around parameter substitution in Pig.
I am on Pig 0.10
Can i access unix environemnt variables in grunt shell ? In Hive we can do this via ${env:variable}
I have bunch of Pig scripts that are automated and running in batch mode. I have used bunch of parameters inside it and I substitute them from command line (either -param or -param_file). When i need to enhance (or debug) the pig script in grunt mode, i am left with manually replacing the parameters with the value. Is there a better way of handling this situations.
Thanks for the help !
For the first question, Pig does not support to use the environment. Is there any special requirement? You should be able to pass the environment by the Pig command line parameters.
For the second question, now Pig does not support to use parameters in Grunt. You can check the issue and discussion in PIG-2122. Aniket Mokashi suggests to use the following way:
Store your script line in a file (with $params included).
Start grunt interactively
type run -param a=b -param c=d myscript.pig
I have installed cygwin, hadoop and pig in windows. The configuration seems ok, as I can run pig scripts in batch and embedded mode.
When I try to run pig in grunt mode, something strange happens. Let me explain.
I try to run a simple command like
grunt> A = load 'passwd' using PigStorage(':');
When I press Enter, nothing happens. The cursor goes to the next line and the grunt> prompt does not appear at all anymore. It seems as I am typing in a text editor.
Has anything similar ever happened to you? Do you have any idea how can I solve this?
The behavior is consistent with what you are observing. I will take the pig tutorial for example.
The following command does not result in any activity by pig.
raw = LOAD 'excite.log' USING PigStorage('\t') AS (user, time, query);
But if you invoke a command that results in using data from variable raw using some map-reduce thats when you will see some action in your grunt shell. Some thing along the lines of second command that is mentioned there.
clean1 = FILTER raw BY org.apache.pig.tutorial.NonURLDetector(query);
Similarly, your command will not result in any action, you have to use the data from variable A which results in map-reduce command to see some action on grunt shell:
grunt> A = load 'passwd' using PigStorage(':');
Pig will only process the commands when you use a command that creates output namely DUMP (to console) or STORE you can also use command DESCRIBE to get the structure of an alias and EXPLAIN to see the map/reduce plan
so basically DUMP A; will give you all the records in A
Please try to run in the windows command window.
C:\FAST\JDK64\1.6.0.31/bin/java -Xmx1000m -Dpig.log.dir=C:/cygwin/home/$USERNAME$/nubes/pig/logs -Dpig.log.file=pig.log -Dpig.home.dir=C:/cygwin/home/$USERNAME$/nubes/pig/ -classpath C:/cygwin/home/$USERNAME$/nubes/pig/conf;C;C:/FAST/JDK64/1.6.0.31/lib/tools.jar;C:/cygwin/home/$USERNAME$/nubes/pig/lib/jython-standalone-2.5.3.jar;C:/cygwin/home/$USERNAME$/nubes/pig/conf;C:/cygwin/home/$USERNAME$/nubes/hadoop/conf;C:/cygwin/home/$USERNAME$/nubes/pig/pig-0.11.1.jar org.apache.pig.Main -x local
Replace $USERNAME$ with your user id accordingly ..
Modify the class path and conf path accordingly ..
It works well in both local as well as map reduce mode ..
Pig shell hangs up in cygwin. But pig script successfully executed from pig script file.
As below:
$pig ./user/input.txt
For local mode:
pig -x local ./user/input.txt
I came across the same problem as you yesterday,and I spent one whole day to find what was wrong with my pig or my hotkey and fix it finally. I found that it's only because I copied the pig code from other resource,then the bending quotation marks cannot be identified in pig command line, which only admits straight quotation marks, so the input stream would not end.
My suggestion is that you should take care of the valid characters in the code, especially when you just copy codes into the command line, which always causes unexpected faults.