I am trying to run this getting started sample for loading data into my single node HDInsight hadoop cluster. When I run the sample I get the error as shown below:
c:\Hadoop\GettingStarted>powershell -ExecutionPolicy unrestricted -F importdata.
ps1 w3c
Attempting to import scenario w3c
Path
----
C:\Hadoop\GettingStarted\w3c
Error occurred during initialization of VM
java.nio.charset.IllegalCharsetNameException:
at java.nio.charset.Charset.checkName(Charset.java:273)
at java.nio.charset.Charset.lookup2(Charset.java:458)
at java.nio.charset.Charset.lookup(Charset.java:437)
at java.nio.charset.Charset.defaultCharset(Charset.java:579)
at sun.nio.cs.StreamEncoder.forOutputStreamWriter(StreamEncoder.java:37)
at java.io.OutputStreamWriter.<init>(OutputStreamWriter.java:94)
at java.io.PrintStream.<init>(PrintStream.java:100)
at java.lang.System.initializeSystemClass(System.java:1092)
It seems this issue is related with file write permission when creating data file on your machine.
Related
While I try to run this command its showing error in one system and I tried in another its working. Why is it so? I'm using hadoop-3.3.2 using lubuntu terminal.
The job is not submitting. Its showing error in mapred.xml configuration, but everything is proper I worked in another system there its working. What I should do?
hadoop jar tutorial_classes_com.jar WordCount /WordCountApp/input /WordCountApp/output
I have set up a Hadoop single node cluster with pseudo distributed operations, and YARN running. I am able to use Spark JAVA API to run queries as a YARN-client. I wanted to go one step further and try Apache Drill on this "cluster". I installed Zookeeper that is running smoothly but I am not able to start drill and I get this log:
nohup: ignoring input
Error: Could not find or load main class
org.apache.drill.exec.server.Drillbit
Any idea?
I am on Windows 10 with JDK 1.8.
DRILL CLASSPATH is not initialized in the process of running drillbit on your machine.
For the purpose to start Drill on Windows machine it is necessary to run sqlline.bat script, for example:
C:\bin\sqlline sqlline.bat –u "jdbc:drill:zk=local;schema=dfs"
See more info: https://drill.apache.org/docs/starting-drill-on-windows/
I am trying to execute my hive from command prompt.
When i am trying to run the command on my Windows 10 machine.
i.e C:\hadoop-2.7.1\hive-2.1.0\bin>hive
It throws the Error applying authorization policy on hive configuration Error.
Here is full Stack of Error:
Error applying authorization policy on hive configuration: Couldn't create directory ${system:java.io.tmpdir}\${hive.session.id}_resources
What could be the problem.
Check if namenode and datanode services are running. Start those by sbin/start-dfs.cmd on windows.
I used Amazon EMR to create an emr-4.0.0 cluster:
However, whenever I try to submit a spark application on it, it fails and gives the following error:
File does not exist: hdfs://ip-xx-xx-xxx-xx.ec2.internal:8020/user/hadoop/.sparkStaging/application_1441035668468_0001/spark-assembly-1.4.1-hadoop2.6.0-amzn-0.jar
This is even though earlier in the log it uploads this exact same file without issuing any error message:
2015-08-31 15:43:29,070 INFO [main] yarn.Client (Logging.scala:logInfo(59)) - Uploading resource file:/usr/lib/spark/lib/spark-assembly-1.4.1-hadoop2.6.0-amzn-0.jar -> hdfs://ip-xx-xx-xxx-xx.ec2.internal:8020/user/hadoop/.sparkStaging/application_1441035668468_0001/spark-assembly-1.4.1-hadoop2.6.0-amzn-0.jar
(I've verified that the source file indeed exists at /usr/lib/spark/lib/spark-assembly-1.4.1-hadoop2.6.0-amzn-0.jar on the master machine).
The command I use is:
spark-submit --deploy-mode cluster --master yarn-cluster --class com.sundaysky.ads.spark.cluster.TrackingLogsAnalysis /tmp/oz/AdsTests-1.0-SNAPSHOT.jar
BTW, I've noticed that this uses Java 1.7 (even though it's the newest EMR version by Amazon), but I don't think that is relevant.
Do you have any ideas what could be the issue, or alternatively, how to debug the problem? I've tried many way of adding parameters to the spark-submit command to get TRACE level messages from yarn-client, but without success.
Thanks,
Oz
So, after talking to Amazon support, in case anyone ever comes across a simliar issue:
The specific problem in my case was that my logic jar (not the spark-assembly-1.4.1-hadoop2.6.0-amzn-0.jar, which is provided by Amazon) was compiled with Java 8, while the machine only supported Java 7.
This was not reflected in the error log for the step, but rather in the stderr log for the step's container, where a following message appeared:
15/08/31 15:43:41 INFO yarn.ApplicationMaster: Starting the user application in a separate Thread Exception in thread "main" java.lang.UnsupportedClassVersionError: com/xxxxxx/xxxx/xxxxx/xxxxx/MyClass : Unsupported major.minor version 52.0
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
If you encounter a similar problem, and the step's log files do not provide an answer, you should also look in the container's log:
Go to Amazon's EMR web page.
Click your cluster to open the Cluster Details screen
Near the "Log URI" there should be a folder icon, click it to open the logs
Go to "containers" and continue going down the one matching your task
Check the stderr.gz and stdout.gz for issues
HTH,
Oz
I wrote a simple program to test the embedded pig in java to run in mapreduce mode.
The hadoop version in the server I am running is 0.20.2-cdh3u4a, and pig version is 0.10.0-cdh3u4a.
When I try to run in local mode, it runs successfully. But when I try to run in mapreduce mode, it gives me the error.
I run my program using the following commands as shown in http://pig.apache.org/docs/r0.9.1/cont.html#embed-java
javac -cp pig.jar EmbedPigTest.java
javac -cp pig.jar:.:/etc/hadoop/conf EmbedPigTest.java input.txt
My program gives error as:
Exception in thread "main" java.lang.RuntimeException: Failed to create DataStorage
at org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:75)
at org.apache.pig.backend.hadoop.datastorage.HDataStorage.<init>(HDataStorage.java:58)
at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:214)
at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:134)
at org.apache.pig.impl.PigContext.connect(PigContext.java:183)
at org.apache.pig.PigServer.<init>(PigServer.java:226)
at org.apache.pig.PigServer.<init>(PigServer.java:215)
at org.apache.pig.PigServer.<init>(PigServer.java:211)
at org.apache.pig.PigServer.<init>(PigServer.java:207)
at WordCount.main(EmbedPigTest.java:9)
In some online resources they say that this problem occurs due to different hadoop version. But, I didn't understand what I should do. Suggestions please !!
This is happening because you are linking to the wrong jar, Please see the link below it describes this issue very well.
http://localsteve.wordpress.com/2012/09/30/embedding-pig-for-cdh4-java-apps-fer-realz/
I was faced same kind of issue when I tried to use pig in map reduce mode without starting the services.
Please check all services using jps before using pig in map reduce mode.