Input path does not exist: file:/D:/pigsample_1749383998_1377684507424 - hadoop

I am facing a wiered issue.
I am running PIG 0.11 on windows7/64 bit machine with latest version of cygwin.
I am a weblog which I want to order it by userName to have all the user activities for the same user together to feed for next line of processing.
I am starting commandprompt -> cygwin.bat -> on the cygwin console go to D:/ -> pig and typing the following script on grunt shall (local mode).
(Note I've set PIG_HOME, PIG_CLASSPATH correctly).
Script is :
USERACTIVITIES = LOAD '/D:/path/of/logs/useractivities' USING org.apache.pig.piggybank.storage.CSVExcelStorage(',') AS (datetimeUnProcessed:chararray, username:chararray, request:chararray);
USERACTIVITIES_ORDERED = ORDER USERACTIVITIES by username;
STORE USERACTIVITIES_ORDERED INTO '/D:/readyfornextinput/useractivities' USING org.apache.pig.piggybank.storage.CSVExcelStorage(',');
When I do illustrate USERACTIVITIES_ORDERED I see it going smooth.
But when I do store/dump I face wiered issue.
It fails by saying :
java.lang.RuntimeException: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/D:/pigsample_1749383998_1377684507424
When I tried to search this pigsample_number file I could find that in :
D:/tmp//mapred/local/localRunner
I am not sure how it is happening.
I am not sure if its windows/cygwin related issue or someone saw this on Linux also.
For reference, you can find the stacktrace attached here:
2013-08-28 15:38:28,863 [Thread-46] WARN
org.apache.hadoop.mapred.LocalJobRunner - job_local_0004
java.lang.RuntimeException:
org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input
path does not exist: file:/D:/pigsample_1749383998_1377684507424
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:157)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.(MapTask.java:677)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:756)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:214)
Caused by:
org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input
path does not exist: file:/D:/pigsample_1288777582_1377684802262
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:235)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigFileInputFormat.listStatus(PigFileInputFormat.java:37)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:252)
at org.apache.pig.impl.io.ReadToEndLoader.init(ReadToEndLoader.java:190)
at org.apache.pig.impl.io.ReadToEndLoader.(ReadToEndLoader.java:126)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:131)
... 6 more
Any help on this will be useful.

Looks like this is reproducible only on cygwin environment.
I've documented the root cause and solution here

Related

unable to save rdd on local filesystem on windows 10

I have a scala/spark program that is used to validate xmls file in an input directory and then writes the report to another input parameter (local filesystem path to write report to).
As per the requirements from stakeholders this program is to run on local machines hence I am using spark in local mode.
Till now things were fine, i was using the code below to save my report to a file
dataframe.repartition(1)
.write
.option("header", "true")
.mode("overwrite")
.csv(reportPath)
However this required winutils to be installed/configured on the machines running my program.
Given we use cloudera updates very often, there was an overhead of changing winutils after evry update as we would be updating the jars to the latest version in our pom file. Hence, I have been asked to remove the dependency on winutils
On a quick google search and after coming across How to save Spark RDD to local filesystem
I decided to change the above pice of code to
val outputRdd = dataframe.rdd
val count = outputRdd.count()
println("\nCount is: " + count + "\n")
println("\nOutput path is: " + reportPath + "\n")
outputRdd.coalesce(1).saveAsTextFile(reportPath)
However, on running the code I am now getting this error
Count is: 15
Output path is: C:\\codingdir\\test\\report
Exception in thread "main" java.lang.IllegalAccessError: tried to access method org.apache.hadoop.mapred.JobContextImpl.<init>(Lorg/apache/hadoop/mapred/JobConf;Lorg/apache/hadoop/mapreduce/JobID;)V from class org.apache.spark.internal.io.HadoopMapRedWriteConfigUtil
at org.apache.spark.internal.io.HadoopMapRedWriteConfigUtil.createJobContext(SparkHadoopWriter.scala:178)
at org.apache.spark.internal.io.SparkHadoopWriter$.write(SparkHadoopWriter.scala:67)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply$mcV$sp(PairRDDFunctions.scala:1096)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply(PairRDDFunctions.scala:1094)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply(PairRDDFunctions.scala:1094)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopDataset(PairRDDFunctions.scala:1094)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$4.apply$mcV$sp(PairRDDFunctions.scala:1067)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$4.apply(PairRDDFunctions.scala:1032)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$4.apply(PairRDDFunctions.scala:1032)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:1032)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$1.apply$mcV$sp(PairRDDFunctions.scala:958)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$1.apply(PairRDDFunctions.scala:958)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$1.apply(PairRDDFunctions.scala:958)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:957)
at org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1.apply$mcV$sp(RDD.scala:1499)
at org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1.apply(RDD.scala:1478)
at org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1.apply(RDD.scala:1478)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
at org.apache.spark.rdd.RDD.saveAsTextFile(RDD.scala:1478)
at com.optus.dcoe.hawk.XmlParser$.delayedEndpoint$com$optus$dcoe$hawk$XmlParser$1(XmlParser.scala:120)
at com.optus.dcoe.hawk.XmlParser$delayedInit$body.apply(XmlParser.scala:16)
at scala.Function0$class.apply$mcV$sp(Function0.scala:34)
at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
at scala.App$$anonfun$main$1.apply(App.scala:76)
at scala.App$$anonfun$main$1.apply(App.scala:76)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
at scala.App$class.main(App.scala:76)
at com.optus.dcoe.hawk.XmlParser$.main(XmlParser.scala:16)
at com.optus.dcoe.hawk.XmlParser.main(XmlParser.scala)
I have tried changing the value of reportPath varible to
C:\codingdir\test\report
file://C:/codingdir/test/report
file://C:/codingdir/test/report
and other values as suggested on
Write RDD as textfile using Apache Spark
How to save Spark RDD to local filesystem
How to access local files in Spark on Windows?
and other links but I am still getting same error
I have found these articles about java.lang.IllegalAccessError but not sure how do i get around this error:
https://examples.javacodegeeks.com/java-basics/exceptions/java-lang-illegalaccesserror-how-to-resolve-illegal-access-error/
java.lang.IllegalAccessError: tried to access method
Can someone please help me in resolving this?
Env variable HADOOP_HOME pertaining to winutls has been removed.
winutils entry has been removed from PATH variable
I am using java 8 on windows 10 (all the users of the program would be on similar laptops)
Spark version is 2.4.0-cdh6.2.1
Finally found out the issue,
It was caused by some unwanted mapreduce related dependencies which have now been removed and I have moved to another error now

Getting error file not found for ProcessCenter_CaseManagerConfig.properties while running BPMGenerateUpgradeSchemaScripts.bat command

I am updating IBM BPM 8.6.0 to IBM Business Automation Workflow Version 18.0.0.2, after updating fix pack for IBAW when I am running below command I get an error.
BPMGenerateUpgradeSchemaScripts.bat -profileName Node1Profile -de ProcessCenter
Below is the error which is coming on running above command.
Unable to find the response file
C:\IBM\BPM\v8.6\profiles\Node1Profile\config\cells\PCCell1\ProcessCenter_CaseManagerConfig.properties
Unable to find the file C:\IBM\BPM\v8.6\profiles\Node1Profile\config\cells\PCCell1\ProcessCenter_CaseManagerConfig.properties, please run the command 'BPMConfig -update -profile deployment_manager_profile -de deployment_environment_name -caseConfigure' to collect the configuration information for the content data sources, please read the knowledge center for details.
java.io.FileNotFoundException: C:\IBM\BPM\v8.6\profiles\Node1Profile\config\cells\PCCell1\ProcessCenter_CaseManagerConfig.properties (The system cannot find the file specified.)
CWMCO6007E: The BPMGenerateUpgradeSchemaScripts command could not complete successfully. The following exception occurred :
Faild to initialize the CommonInfo. java.io.FileNotFoundException: C:\IBM\BPM\v8.6\profiles\Node1Profile\config\cells\PCCell1\ProcessCenter_CaseManagerConfig.properties (The system cannot find the file specified.)
The file command asked to run first in the above error is on 11 point in the upgrade guide, can some one please suggest whats wrong with this?

ElasticSearch + Twitter river class not found

I am trying to install twitter river plugin on elastic search. I have done it before on my mac without any issues, but this time I am trying to install on a Linux server and I keep getting this class not found exception. The exception comes when I use curl -XPUT command as shown in https://github.com/elasticsearch/elasticsearch-river-twitter/blob/master/README.md
Here's the exception I get :-
[2013-11-25 15:12:00,116][WARN ][river ] [Fasaud] failed to create river [twitter][my_twitter_river2]
org.elasticsearch.common.settings.NoClassSettingsException: Failed to load class with value [twitter]
at org.elasticsearch.river.RiverModule.loadTypeModule(RiverModule.java:87)
at org.elasticsearch.river.RiverModule.spawnModules(RiverModule.java:58)
at org.elasticsearch.common.inject.ModulesBuilder.add(ModulesBuilder.java:44)
at org.elasticsearch.river.RiversService.createRiver(RiversService.java:135)
at org.elasticsearch.river.RiversService$ApplyRivers$2.onResponse(RiversService.java:271)
at org.elasticsearch.river.RiversService$ApplyRivers$2.onResponse(RiversService.java:265)
at org.elasticsearch.action.support.TransportAction$ThreadedActionListener$1.run(TransportAction.java:93)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
Caused by: java.lang.ClassNotFoundException: twitter
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at org.elasticsearch.river.RiverModule.loadTypeModule(RiverModule.java:73)
... 9 more
I have tried restarting, installing many many times and can't get it to work anyhow. I can also see that the jar files present in the plugin directories contain the class twitter but for some reason they're not available to elasticsearch execution.
This is a configuration and install issue.
Steps:
Check to see where elasticsearch is installed by running ps aux|grep elasticsearch (on linux) and that will tell which executable is running on that machine (look for a line that looks like this: (-Des.default.path.home=/usr/share/elasticsearch) or something similar.
If several elasticsearch instances running, kill them all and start the one you care about.
Ensure that only one instance of elasticsearch is running (if that is the intent) and, from step one above, go to the home path directory.
Ensure that the plugin is installed in this directory in elasticsearch/plugins dir.
Restart elasticsearch.
confirm your classpath contain a text file es-plugin.properties with content
plugin=your.package.YourRiverPlugin
missing of such a property file shows almost the same error.
for more info, please refer http://blog.trifork.com/2013/01/10/how-to-write-an-elasticsearch-river-plugin/

Oracle Data Integrator (ODI - v11.1.1.3) "unable to load language: beanshell" Error

Following an install of Eclipse 3.7.2 on my Ubuntu 12.04 development machine, I have been unable to execute any ODI packages/interfaces/procedures. On execution (for both simulated and actual runs), an error is thrown (java trace below). I am not sure if it's anything to do with the Eclipse install, but it seems likely. Does anyone have an idea how to fix this?
Also, when launching ODI from the terminal using 'bash odi', the following error is displayed in the terminal:
2013-08-15 14:43:46.162 ERROR Error during RuntimeClassLoader initialization. ODI will start without RuntimeClassLoader
Error output:
oracle.odi.core.exception.OdiRuntimeException: Error during Code Interpretor creation
at com.sunopsis.dwg.codeinterpretor.SnpCodeInterpretor.getInstance(SnpCodeInterpretor.java:209)
at com.sunopsis.dwg.codeinterpretor.SnpGeneratorSQLCIT.<init>(SnpGeneratorSQLCIT.java:300)
at com.sunopsis.graphical.dialog.SnpsDialogExecution.doPackageExecuter(SnpsDialogExecution.java:907)
at oracle.odi.ui.action.SnpsPopupActionExecuteHandler.actionPerformed(SnpsPopupActionExecuteHandler.java:68)
at oracle.odi.ui.SnpsActionControler.handleEvent(SnpsActionControler.java:75)
at oracle.ide.controller.IdeAction.performAction(IdeAction.java:529)
at oracle.ide.controller.IdeAction.actionPerformedImpl(IdeAction.java:884)
at oracle.ide.controller.IdeAction.actionPerformed(IdeAction.java:501)
at javax.swing.AbstractButton.fireActionPerformed(AbstractButton.java:1995)
at javax.swing.AbstractButton$Handler.actionPerformed(AbstractButton.java:2318)
at javax.swing.DefaultButtonModel.fireActionPerformed(DefaultButtonModel.java:387)
at javax.swing.DefaultButtonModel.setPressed(DefaultButtonModel.java:242)
at javax.swing.AbstractButton.doClick(AbstractButton.java:357)
at javax.swing.plaf.basic.BasicMenuItemUI.doClick(BasicMenuItemUI.java:809)
at javax.swing.plaf.basic.BasicMenuItemUI$Handler.mouseReleased(BasicMenuItemUI.java:850)
at java.awt.Component.processMouseEvent(Component.java:6297)
at javax.swing.JComponent.processMouseEvent(JComponent.java:3275)
at java.awt.Component.processEvent(Component.java:6062)
at java.awt.Container.processEvent(Container.java:2039)
at java.awt.Component.dispatchEventImpl(Component.java:4660)
at java.awt.Container.dispatchEventImpl(Container.java:2097)
at java.awt.Component.dispatchEvent(Component.java:4488)
at java.awt.LightweightDispatcher.retargetMouseEvent(Container.java:4575)
at java.awt.LightweightDispatcher.processMouseEvent(Container.java:4236)
at java.awt.LightweightDispatcher.dispatchEvent(Container.java:4166)
at java.awt.Container.dispatchEventImpl(Container.java:2083)
at java.awt.Window.dispatchEventImpl(Window.java:2489)
at java.awt.Component.dispatchEvent(Component.java:4488)
at java.awt.EventQueue.dispatchEventImpl(EventQueue.java:674)
at java.awt.EventQueue.access$400(EventQueue.java:81)
at java.awt.EventQueue$2.run(EventQueue.java:633)
at java.awt.EventQueue$2.run(EventQueue.java:631)
at java.security.AccessController.doPrivileged(Native Method)
at java.security.AccessControlContext$1.doIntersectionPrivilege(AccessControlContext.java:87)
at java.security.AccessControlContext$1.doIntersectionPrivilege(AccessControlContext.java:98)
at java.awt.EventQueue$3.run(EventQueue.java:647)
at java.awt.EventQueue$3.run(EventQueue.java:645)
at java.security.AccessController.doPrivileged(Native Method)
at java.security.AccessControlContext$1.doIntersectionPrivilege(AccessControlContext.java:87)
at java.awt.EventQueue.dispatchEvent(EventQueue.java:644)
at java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:269)
at java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:184)
at java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:174)
at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:169)
at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:161)
at java.awt.EventDispatchThread.run(EventDispatchThread.java:122)
Caused by: org.apache.bsf.BSFException: unable to load language: beanshell
at org.apache.bsf.BSFManager.loadScriptingEngine(BSFManager.java:718)
at com.sunopsis.dwg.codeinterpretor.SnpCodeInterpretor.loadEngine(SnpCodeInterpretor.java:85)
at com.sunopsis.dwg.codeinterpretor.SnpCodeInterpretor.<init>(SnpCodeInterpretor.java:75)
at com.sunopsis.dwg.codeinterpretor.SnpCodeInterpretor.getInstance(SnpCodeInterpretor.java:184)
... 45 more
After digging around for about a day on this issue, I brazenly tried running ODI as the root user on the off chance that this was a permissions issue. I started ODI from the command line (using 'bash odi') for greater verbosity, and it loaded without the error mentioned above. Something gave me the impression that this wasn't a permissions issue, but one related to the user settings.
To rectify the issue, I removed my user's odi settings folder (renaming it, for safety):
mv ~/.odi ~/.backup_odi
Then I started ODI from the terminal under my own user (i.e. not root) - there were no errors! None of my connections were available in the new settings folder though. This I fixed by closing ODI and entering the following:
cp ~/.backup_odi/oracledi/snps_login_work.xml ~/.odi/oracledi/
If anybody else encounters this issue, I hope you find this post quicker than it took me to fix it!
org.apache.bsf.BSFException: unable to load language: beanshell
The exception was thrown because bsh-2.Ob4.jar was not in the classpath and it is a dependent jar of bsf.jar

Configuring Teamcity's logging behaviour

I'm using Teamcity 5 for our CI environment. It's a great tool, but I've been struggling with one thing: the stdout_yyyyMMdd.log file in the \TeamCity\logs folder grows to ridiculous sizes. Is there a way to turn it off?
Places I've looked so far:
Jetbrains: Nothing on stdout;
Google for "tomcat stdout logs": the first few links don't really address the issue.
Edit:
At KIR's suggestion, I actually looked to see what's in stdout. It's the same exception message repeated over and over again:
[2010-12-01 08:57:21,268] WARN - jetbrains.buildServer.SERVER - java.io.FileNotFoundException: <...Path...>\.BuildServer\system\caches\search\_8p.prx (The system cannot find the file specified)
[2010-12-01 08:57:21,315] ERROR - erverSide.search.SearchService - SearchService.enqueueHistory
java.io.FileNotFoundException: <...Path...>\.BuildServer\system\caches\search\_8p.prx (The system cannot find the file specified)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.<init>(Unknown Source)
at org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput$Descriptor.<init>(SimpleFSDirectory.java:78)
at org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.<init>(SimpleFSDirectory.java:108)
at org.apache.lucene.store.SimpleFSDirectory.openInput(SimpleFSDirectory.java:65)
at org.apache.lucene.index.SegmentReader$CoreReaders.<init>(SegmentReader.java:132)
at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:638)
at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:599)
at org.apache.lucene.index.DirectoryReader.<init>(DirectoryReader.java:104)
at org.apache.lucene.index.ReadOnlyDirectoryReader.<init>(ReadOnlyDirectoryReader.java:27)
at org.apache.lucene.index.DirectoryReader$1.doBody(DirectoryReader.java:74)
at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:704)
at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:69)
at org.apache.lucene.index.IndexReader.open(IndexReader.java:476)
at org.apache.lucene.index.IndexReader.open(IndexReader.java:314)
at jetbrains.buildServer.serverSide.search.SearchService.getIndexSearcher(SearchService.java:451)
at jetbrains.buildServer.serverSide.search.SearchService.enqueueHistory(SearchService.java:515)
at jetbrains.buildServer.serverSide.search.BackgroundIndexer.run(BackgroundIndexer.java:32)
at java.lang.Thread.run(Unknown Source)
Any idea what this file is?
If you're running TC on unix you could use logrotate: http://linuxcommand.org/man_pages/logrotate8.html (Obviously, this is a workaround but it should be effective.)
This guy has a windows equivalent that may do the trick too: http://www.datori.org/?p=7
Remove .BuildServer\system\caches\search directory and restart TeamCity. May be this would help.
The problem is caused by someone or something deleting the Lucene Index in Team city.
Everytime you hit a page after this it will log in stdout that it couldnt find the file.
If you clear out the whole folder which should be
%USERPROFILE%.BuildServer\system\caches\search\
See http://confluence.jetbrains.net/display/TCD5/TeamCity+Data+Directory for more information on where to find the folder.
And restart Teamcity it will recreate the index on start up and stop logging the error message.
Oh and search should start working again too.

Resources