MRUnit test case for Driver - hadoop

I have written MRunit with following code:
Configuration conf = new Configuration();
conf.set("fs.defaultFS", "file:///");
conf.set("", "file:///");
conf.set("", "local");
conf.setInt("", 1);
Path input = new Path("input/ncdc/micro");
Path output = new Path("output");
FileSystem fs = FileSystem.getLocal(conf);
fs.delete(output, true); // delete old output
VisitedItemFlattenDriver driver = new VisitedItemFlattenDriver();
int exitCode = String[] {
input.toString(), output.toString(), "false" });
But when I execute the Junit test case from eclipse. I 'm getting exception as below:-
at java.lang.ProcessBuilder.start(
at org.apache.hadoop.util.Shell.runCommand(
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(
at org.apache.hadoop.util.Shell.execCommand(
at org.apache.hadoop.util.Shell.execCommand(
at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(
at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(
at org.apache.hadoop.fs.FilterFileSystem.mkdirs(
at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(
at org.apache.hadoop.mapreduce.Job$
at org.apache.hadoop.mapreduce.Job$
at Method)
at org.apache.hadoop.mapreduce.Job.submit(
at org.apache.hadoop.mapreduce.Job.waitForCompletion(
I'm not sure what is causing this error as I just intent to unit test my class:
public class VisitedItemFlattenDriver extends Configured implements Tool {
I deeply appreciate if some one when guide how to resolve the error.

I tried a couple of options to resolve the problem and spend many hours to do so..
Firstly, I searched for an option and found to add winutils.exe, and .dll files to hadoop/bin. I tried the step and also set HADOOP_HOME environment variable.
Somehow above mentioned error got resolved and I was then stuck in to an different error like below:
It was obvious that error is due to some compatibility issues. But then I did some searching and found that it can be resolved if we upgrade JRE from 32 bit to 64 bit.
Earlier I was using JDK 6 32 bit and then I updated it to JDK 6 64 bit. It did not resolved my problem. I also tried to use minidfscluster for MR unit but that to gave same error.
But then I used JDK 7 64 bit for my code and the problem was resolved and it ran successfully.
** Note: I 'm using Hadoop version 2.2.0


unable to save rdd on local filesystem on windows 10

I have a scala/spark program that is used to validate xmls file in an input directory and then writes the report to another input parameter (local filesystem path to write report to).
As per the requirements from stakeholders this program is to run on local machines hence I am using spark in local mode.
Till now things were fine, i was using the code below to save my report to a file
.option("header", "true")
However this required winutils to be installed/configured on the machines running my program.
Given we use cloudera updates very often, there was an overhead of changing winutils after evry update as we would be updating the jars to the latest version in our pom file. Hence, I have been asked to remove the dependency on winutils
On a quick google search and after coming across How to save Spark RDD to local filesystem
I decided to change the above pice of code to
val outputRdd = dataframe.rdd
val count = outputRdd.count()
println("\nCount is: " + count + "\n")
println("\nOutput path is: " + reportPath + "\n")
However, on running the code I am now getting this error
Count is: 15
Output path is: C:\\codingdir\\test\\report
Exception in thread "main" java.lang.IllegalAccessError: tried to access method org.apache.hadoop.mapred.JobContextImpl.<init>(Lorg/apache/hadoop/mapred/JobConf;Lorg/apache/hadoop/mapreduce/JobID;)V from class
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply$mcV$sp(PairRDDFunctions.scala:1096)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply(PairRDDFunctions.scala:1094)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply(PairRDDFunctions.scala:1094)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopDataset(PairRDDFunctions.scala:1094)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$4.apply$mcV$sp(PairRDDFunctions.scala:1067)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$4.apply(PairRDDFunctions.scala:1032)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$4.apply(PairRDDFunctions.scala:1032)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:1032)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$1.apply$mcV$sp(PairRDDFunctions.scala:958)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$1.apply(PairRDDFunctions.scala:958)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$1.apply(PairRDDFunctions.scala:958)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:957)
at org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1.apply$mcV$sp(RDD.scala:1499)
at org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1.apply(RDD.scala:1478)
at org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1.apply(RDD.scala:1478)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
at org.apache.spark.rdd.RDD.saveAsTextFile(RDD.scala:1478)
at com.optus.dcoe.hawk.XmlParser$.delayedEndpoint$com$optus$dcoe$hawk$XmlParser$1(XmlParser.scala:120)
at com.optus.dcoe.hawk.XmlParser$delayedInit$body.apply(XmlParser.scala:16)
at scala.Function0$class.apply$mcV$sp(Function0.scala:34)
at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
at scala.App$$anonfun$main$1.apply(App.scala:76)
at scala.App$$anonfun$main$1.apply(App.scala:76)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
at scala.App$class.main(App.scala:76)
at com.optus.dcoe.hawk.XmlParser$.main(XmlParser.scala:16)
at com.optus.dcoe.hawk.XmlParser.main(XmlParser.scala)
I have tried changing the value of reportPath varible to
and other values as suggested on
Write RDD as textfile using Apache Spark
How to save Spark RDD to local filesystem
How to access local files in Spark on Windows?
and other links but I am still getting same error
I have found these articles about java.lang.IllegalAccessError but not sure how do i get around this error:
java.lang.IllegalAccessError: tried to access method
Can someone please help me in resolving this?
Env variable HADOOP_HOME pertaining to winutls has been removed.
winutils entry has been removed from PATH variable
I am using java 8 on windows 10 (all the users of the program would be on similar laptops)
Spark version is 2.4.0-cdh6.2.1
Finally found out the issue,
It was caused by some unwanted mapreduce related dependencies which have now been removed and I have moved to another error now

HBaseTestingUtility failing on Windows 10 with UnsatisfiedLinkError

I'm trying to get the HBaseTestingUtility running on Windows 10.
I'm using hbase-client and hbase-testing-util with version 1.4.2.
When running:
HBaseTestingUtility hbaseUtility = new HBaseTestingUtility();
hbaseUtility.startMiniCluster(); //<- error thrown on this line
I get the below error:
at$Windows.access0(Native Method)
at org.apache.hadoop.fs.FileUtil.canWrite(
I have downloaded winutils, and have set the following user variables:
but this does not make a difference.
The official documentation for the HBaseTestingUtility says that Cygwin is needed on Windows, but I cannot install that due to the admin restrictions on my work machine. Is there any other solution?
After some digging, I found a solution in I %HADOOP_HOME%/bin to PATH. Now I get another error but will raise another question for that.

Unable to run basic example for hadoop

Hadoop version 2.9.0, Java - 1.8.0_162
When trying to run the example given here:, under standalone operation, I get the following error:
$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.0.jar grep input output 'dfs[a-z.]+'
at org.apache.hadoop.examples.ExampleDriver.main(
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(
at sun.reflect.DelegatingMethodAccessorImpl.invoke(
at java.lang.reflect.Method.invoke(
at org.apache.hadoop.util.RunJar.main(
I am new to hadoop and not sure how to fix this. I have set the JAVA_HOME in I am pretty sure that I am using the correct compatible versions of java and hadoop.
Any help will be useful.
The error message means that while the runtime was able to find the class ProgramDriver, the function run() is not present.
The most likely reason for this is that you're running an old version of Hadoop that exposed a difference interface in ProgramDriver. About a year ago this method was renamed to run() after being called driver().
The fix for that would be making sure you're running a recent version of Hadoop.
For your reference please check following links they have asked same question.
Error while executing hadoop-mapreduce-examples-2.2.0.jar
Can the hadoop programm which write under the hadoop-2.2.0 run in hadoop-1.2.1?

Oracle Data Integrator (ODI - v11.1.1.3) "unable to load language: beanshell" Error

Following an install of Eclipse 3.7.2 on my Ubuntu 12.04 development machine, I have been unable to execute any ODI packages/interfaces/procedures. On execution (for both simulated and actual runs), an error is thrown (java trace below). I am not sure if it's anything to do with the Eclipse install, but it seems likely. Does anyone have an idea how to fix this?
Also, when launching ODI from the terminal using 'bash odi', the following error is displayed in the terminal:
2013-08-15 14:43:46.162 ERROR Error during RuntimeClassLoader initialization. ODI will start without RuntimeClassLoader
Error output:
oracle.odi.core.exception.OdiRuntimeException: Error during Code Interpretor creation
at com.sunopsis.dwg.codeinterpretor.SnpCodeInterpretor.getInstance(
at com.sunopsis.dwg.codeinterpretor.SnpGeneratorSQLCIT.<init>(
at com.sunopsis.graphical.dialog.SnpsDialogExecution.doPackageExecuter(
at oracle.odi.ui.action.SnpsPopupActionExecuteHandler.actionPerformed(
at oracle.odi.ui.SnpsActionControler.handleEvent(
at oracle.ide.controller.IdeAction.performAction(
at oracle.ide.controller.IdeAction.actionPerformedImpl(
at oracle.ide.controller.IdeAction.actionPerformed(
at javax.swing.AbstractButton.fireActionPerformed(
at javax.swing.AbstractButton$Handler.actionPerformed(
at javax.swing.DefaultButtonModel.fireActionPerformed(
at javax.swing.DefaultButtonModel.setPressed(
at javax.swing.AbstractButton.doClick(
at javax.swing.plaf.basic.BasicMenuItemUI.doClick(
at javax.swing.plaf.basic.BasicMenuItemUI$Handler.mouseReleased(
at java.awt.Component.processMouseEvent(
at javax.swing.JComponent.processMouseEvent(
at java.awt.Component.processEvent(
at java.awt.Container.processEvent(
at java.awt.Component.dispatchEventImpl(
at java.awt.Container.dispatchEventImpl(
at java.awt.Component.dispatchEvent(
at java.awt.LightweightDispatcher.retargetMouseEvent(
at java.awt.LightweightDispatcher.processMouseEvent(
at java.awt.LightweightDispatcher.dispatchEvent(
at java.awt.Container.dispatchEventImpl(
at java.awt.Window.dispatchEventImpl(
at java.awt.Component.dispatchEvent(
at java.awt.EventQueue.dispatchEventImpl(
at java.awt.EventQueue.access$400(
at java.awt.EventQueue$
at java.awt.EventQueue$
at Method)
at java.awt.EventQueue$
at java.awt.EventQueue$
at Method)
at java.awt.EventQueue.dispatchEvent(
at java.awt.EventDispatchThread.pumpOneEventForFilters(
at java.awt.EventDispatchThread.pumpEventsForFilter(
at java.awt.EventDispatchThread.pumpEventsForHierarchy(
at java.awt.EventDispatchThread.pumpEvents(
at java.awt.EventDispatchThread.pumpEvents(
Caused by: org.apache.bsf.BSFException: unable to load language: beanshell
at org.apache.bsf.BSFManager.loadScriptingEngine(
at com.sunopsis.dwg.codeinterpretor.SnpCodeInterpretor.loadEngine(
at com.sunopsis.dwg.codeinterpretor.SnpCodeInterpretor.<init>(
at com.sunopsis.dwg.codeinterpretor.SnpCodeInterpretor.getInstance(
... 45 more
After digging around for about a day on this issue, I brazenly tried running ODI as the root user on the off chance that this was a permissions issue. I started ODI from the command line (using 'bash odi') for greater verbosity, and it loaded without the error mentioned above. Something gave me the impression that this wasn't a permissions issue, but one related to the user settings.
To rectify the issue, I removed my user's odi settings folder (renaming it, for safety):
mv ~/.odi ~/.backup_odi
Then I started ODI from the terminal under my own user (i.e. not root) - there were no errors! None of my connections were available in the new settings folder though. This I fixed by closing ODI and entering the following:
cp ~/.backup_odi/oracledi/snps_login_work.xml ~/.odi/oracledi/
If anybody else encounters this issue, I hope you find this post quicker than it took me to fix it!
org.apache.bsf.BSFException: unable to load language: beanshell
The exception was thrown because bsh-2.Ob4.jar was not in the classpath and it is a dependent jar of bsf.jar

Nutch in Windows: Failed to set permissions of path

I'm trying to user Solr with Nutch on a Windows Machine and I'm getting the following error:
Exception in thread "main" Failed to set permissions of path: c:\temp\mapred\staging\admin-1654213299\.staging to 0700
From a lot of threads I learned, that hadoop which seems to be used by nutch does some chmod magic that will work on Unix machines, but not on Windows.
This problem exists for more than a year now. I found one thread, where the code line is shown and a fix proposed. Am I really them only one who has this problem? Are all others creating a custom build in order to run nutch on windows? Or is there some option to disable the hadoop stuff or another solution? Maybe another crawler than nutch?
Here's the stack trace of what I'm doing:
admin#WIN-G1BPD00JH42 /cygdrive/c/solr/apache-nutch-1.6
$ bin/nutch crawl urls -dir crawl -depth 3 -topN 5 -solr http://localhost:8080/solr-4.1.0
cygpath: can't convert empty path
crawl started in: crawl
rootUrlDir = urls
threads = 10
depth = 3
topN = 5
Injector: starting at 2013-03-03 17:43:15
Injector: crawlDb: crawl/crawldb
Injector: urlDir: urls
Injector: Converting injected urls to crawl db entries.
Exception in thread "main" Failed to set permissions of path: c:\temp\mapred\staging\admin-1654213299\.staging to 0700
at org.apache.hadoop.fs.FileUtil.checkReturnValue(
at org.apache.hadoop.fs.FileUtil.setPermission(
at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(
at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(
at org.apache.hadoop.fs.FilterFileSystem.mkdirs(
at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(
at org.apache.hadoop.mapred.JobClient$
at org.apache.hadoop.mapred.JobClient$
at Method)
at Source)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(
at org.apache.hadoop.mapred.JobClient.submitJob(
at org.apache.hadoop.mapred.JobClient.runJob(
at org.apache.nutch.crawl.Injector.inject(
at org.apache.nutch.crawl.Crawl.main(
It took me a while to get this working but here's the solution which works on nutch 1.7.
Download Hadoop Core 0.20.2 from the maven repository
Replace $NUTCH_HOME/lib/hadoop-core-1.2.0.jar with the downloaded file renaming it with the same name.
That should be it.
This issue is caused by hadoop since it assumes you're running on unix and abides by the file permission rules. The issue was resolved in 2011 actually but nutch didn't update the hadoop version they use. The relevant fixes are here and here
We are using Nutch too, but it is not supported for running on Windows, on Cygwin our 1.4 version had similar problems as you had, something like mapreduce too.
We solved it by using a vm (Virtual box) with Ubuntu and a shared directory between Windows and Linux, so we can develop and built on Windows and run Nutch (crawling) on Linux.
I have Nutch running on windows, no custom build. It's a long time since I haven't used it though. But one thing that took me a while to catch, is that you need to run cygwin as a windows admin to get the necessary rights.
I suggest a different approach. Check this link out. It explains how to swallow the error on Windows, and does not require you to downgrade Hadoop or rebuild Nutch. I tested on Nutch 2.1, but it applies to other versions as well.
I also made a simple .bat for starting the crawler and indexer, but it is meant for Nutch 2.x, might not be applicable for Nutch 1.x.
For the sake of posterity, the approach entails:
Making a custom LocalFileSystem implementation:
public class WinLocalFileSystem extends LocalFileSystem {
public WinLocalFileSystem() {
System.err.println("Patch for HADOOP-7682: "+
"Instantiating workaround file system");
* Delegates to <code>super.mkdirs(Path)</code> and separately calls
* <code>this.setPermssion(Path,FsPermission)</code>
public boolean mkdirs(Path path, FsPermission permission)
throws IOException {
boolean result=super.mkdirs(path);
return result;
* Ignores IOException when attempting to set the permission
public void setPermission(Path path, FsPermission permission)
throws IOException {
try {
catch (IOException e) {
System.err.println("Patch for HADOOP-7682: "+
"Ignoring IOException setting persmission for path \""+path+
"\": "+e.getMessage());
Compiling it and placing the JAR under ${HADOOP_HOME}/lib
And then registering it by modifying ${HADOOP_HOME}/conf/core-site.xml:
Enables patch for issue HADOOP-7682 on Windows
You have to change the project dependences hadoop-core and hadoop-tools. I'm using 0.20.2 version and works fine.
