HDInsight Hive not finding SerDe jar in ADD JAR statement - hadoop

I've uploaded json-serde-1.1.9.2.jar to the blob store with path "/lib/" and added
ADD JAR /lib/json-serde-1.1.9.2.jar
But am getting
/lib/json-serde-1.1.9.2.jar does not exist
I've tried it without the path and also provided the full url to the ADD JAR statement with the same result.
Would really appreciate some help on this, thanks!

If you don't include the scheme, then Hive is going to look on the local filesystem (you can see the code around line 768 of the source)
when you included the URI, make sure you use the full form:
ADD JAR wasb:///lib/json-serde-1.1.9.2.jar
If that still doesn't work, provide your updated command as well as some details about how you are launching the code. Are you RDP'd in to the cluster running via the Hive shell, or running remote via PowerShell or some other API?

Related

Nifi path is invalid

I am running a Nifi server, basically I have a ListFile running perfectly OK in this path : /tmp/nifi/info/daily. Here I can work and run the processor without any issue.
Because an specific reason, I had to create another ListFile, which its information is on the path: /tmp/nifi/info/last_month. When I add this second value, it says the path doesn't exist.
I checked the permissions with an ls -l, they are exactly the same, and same group:user, so I'm confused:
drwxr-xr-x. 2 nifi hadoop
I even tried re-starting Nifi to see if it was that but not. Is there any way I can test (other than keep trying input paths in the config) to see which access Nifi have? Why it doesn't see the folder?
Thanks.
As #Ben Yaakobi mentioned I was missing to create the folder on every node.

elephant bird does not exist error while loading json data in pig 0.16

Can anyone help me figure out why i am getting error while using REGISTER to register the jar file 'elephant bird' to load json data:
I work in the local mode of the pig 0.16 and get the error:
/home/shanky/Downloads/elephant-bird-hadoop-compat-4.1.jar' does not exist.
/home/shanky/Downloads/elephant-bird-pig-4.1.jar' does not exist.
Code to load json data:
REGISTER '/home/shanky/elephant-bird-hadoop-compat-4.1.jar';
REGISTER '/home/shanky/Downloads/elephant-bird-pig-4.1.jar';
REGISTER '/home/shanky/Downloads/json-simple-1.1.1.jar';
load_tweets = LOAD '/home/shanky/Downloads/data.json' USING com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad') AS myMap;
dump load_tweets;
I tried replacing REGISTER statement by removing quotes and putting hdfs:// but nothing work for me.
The quotes shouldn't be included per the pig documentation (https://pig.apache.org/docs/r0.16.0/basic.html#register-jar), but your syntax did work for me (I'm using 0.12.0-cdh5.12.0 though).
Since you said you tried it without the quotes, some thoughts:
*You mention trying adding hdfs://, are these dependencies on hdfs by any chance? It doesn't seem like it since they have Downloads in the path, but if they are, you won't be able to locate them running pig in local mode. If they are on your local filesystem, you should be able to access them with the path as you have it whether you run it locally or not.
*Are the files actually there? Are the permissions right? Etc.
*Assuming you just want to get around the issue for now, have you tried any of the other methods of registering a jar, such as -Dpig.additional.jars.uris=/home/shanky/elephant-bird-hadoop-compat-4.1.jar,/home/shanky/Downloads/elephant-bird-pig-4.1.jar

Using DSE Hadoop/Mahout, NoClassDef of org.w3c.dom.Document

Trying to run a simple hadoop job, but hadoop is throwing a NoClassDef on "org/w3c/dom/Document"
I'm trying to run the basic examples from the "Mahout In Action" book (https://github.com/tdunning/MiA).
I do this using nearly the same maven setup but tooled for cassandra use rather than a file data model.
But, when I try to run the *-job.jar, it spits a NoClassDef from the datastax/hadoop end.
I'm using 1.0.5-dse of the driver as that's the only one that supports the current DSE version of Cassandra(1.2.1) if that helps at all though the issue seems to be deeper.
Attached is a gist with more info included.
There is the maven file, this brief overview, and the console output.
https://gist.github.com/zmarcantel/8d56ae4378247bc39be4
Thanks
try dropping the jar file for class of org.w3c.dom.Document to $DSE/resource/hadoop/lib/ folder as a work around.

Hadoop avro correct jar files issue

I'm writing my first Avro job that is meant to take an avro file and output text. I tried to reverse engineer it from this example:
https://gist.github.com/chriswhite199/6755242
I am getting the error below though.
Error: Found interface org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected
I looked around and found it was likely an issue with what jar files are being used. I'm running CDH4 with MR1 and am using the jar files below:
avro-tools-1.7.5.jar
hadoop-core-2.0.0-mr1-cdh4.4.0.jar
hadoop-mapreduce-client-core-2.0.2-alpha.jar
I can't post code for security reasons but it shouldn't need anything not used in the example code. I don't have maven set up yet either so I can't follow those routes. Is there something else I can try to get around these issues?
Try using avro 1.7.3
AVRO-1170 bug

How to install applications to a WebSphere 7.0 cluster using wsadmin?

I want to deploy to all four processes on a Websphere cluster with two nodes. Is there a way of doing this with one Jython command or do I have to call 'AdminControl.invoke' on each one?
Easiest way to install an application using wsadmin is with AdminApp and not AdminControl.
I suggest you download wsadminlib.py (Got the link from here)
it has a lot of functions, one of them is installApplication which works also with cluster.
Edit:
Lately I found out about AdminApplication which is a script library included in WAS 7 (/opt/IBM/WebSphere/AppServer/scriptLibraries/application/V70)
The docuemntation is not great in the info center but its a .py file you can look inside to see what it does.
It is imported automatically to wsadmin and you can use it without any imports or other configuration.
Worth a check.
#aviram-segal is right, wsadminlib is really helpful for this.
I use the following syntax:
arg = ["-reloadEnabled", "-reloadInterval '0'", "-cell "+self.cellName, "-node "+self.nodeName, "-server '"+ self.serverName+"'", "-appname "+ name, '-MapWebModToVH',[['.*', '.*', self.virtualHost]]]
AdminApp.install(path, arg)
Where path is the location of your EAR/WAR file.
You can find documentation here

Resources