I have windows without cygwin and unit test for mahout, and this test start hadoop job during it i have next exception:
Jul 9, 2013 5:21:23 AM org.apache.hadoop.util.NativeCodeLoader
WARNING: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Jul 9, 2013 5:21:23 AM org.slf4j.impl.JCLLoggerAdapter info
INFO: Build Clusters Input: file:/tmp/mahout1-TestClusterDumper-3279087666375853056/testdata Out: file:/tmp/mahout1-TestClusterDumper-3279087666375853056/output Measure: org.apache.mahout.common.distance.EuclideanDistanceMeasure#62c8769b t1: 8.0 t2: 4.0
Jul 9, 2013 5:21:24 AM org.slf4j.impl.JCLLoggerAdapter info
INFO: Input: file:/tmp/mahout1-TestClusterDumper-3279087666375853056/testdata Clusters In: file:/tmp/mahout1-TestClusterDumper-3279087666375853056/output/clusters-0-final Out: file:/tmp/mahout1-TestClusterDumper-3279087666375853056/output/kmeans Distance: org.apache.mahout.common.distance.EuclideanDistanceMeasure
Jul 9, 2013 5:21:24 AM org.slf4j.impl.JCLLoggerAdapter info
INFO: convergence: 0.0010 max Iterations: 10
java.io.IOException: Failed to set permissions of path: C:\Users\Administrator\Desktop\mahout\mahout\integration\target\mahout-TestClusterDumper-5458229048736903168\hadoop0.5515906057710666\mapred\staging\Administrator-585933322.staging to 0700
at org.apache.hadoop.fs.FileUtil.checkReturnValue(FileUtil.java:689)
at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:662)
at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:509)
...
Jul 9, 2013 5:21:24 AM org.apache.hadoop.security.UserGroupInformation doAs
SEVERE: PriviledgedActionException as:Administrator cause:java.io.IOException: Failed to set permissions of path: C:\Users\Administrator\Desktop\mahout\mahout\integration\target\mahout-TestClusterDumper-5458229048736903168\hadoop0.5515906057710666\mapred\staging\Administrator-585933322.staging to 0700
I can't use cygwin!
Windows and Hadoop don't really like each other, but if I remember correctly it had something to do more with the JVM. This is a known issue of Hadoop since versions above 0.22.0, although I'm uncertain whether it has been fixed in the most recent versions.
There is/was a workaround, but when I encountered this issue as well, I remember it was quite convoluted.
I suggest you use a linux (virtual) machine for using Mahout with Hadoop.
Oh! There's a hadoop ticket (link) discussing the issue.
I also found this (link)
Related
Have a job which takes around 1/2 minutes to finish, now trying to run this job through the command line just goes on forever and doesn't finish. It doesn't look like I get any errors from this either. So the job seems to be starting and I know the job works correctly since it works within spoon, any ideas?
C:\Users\a\Downloads\pdi-ce-8.3.0.0-371\data-integration> Kitchen.bat
/file:C:\Users\a\Downloads\pdi-ce-8.3.0.0-371\data-integration\job.kjb
/level:Minimal
DEBUG: Using PENTAHO_JAVA_HOME
DEBUG: _PENTAHO_JAVA_HOME=C:\Program Files\Java\jre1.8.0_231 DEBUG: _PENTAHO_JAVA=C:\Program Files\Java\jre1.8.0_231\bin\java.exe
C:\Users\a\Downloads\pdi-ce-8.3.0.0-371\data-integration>"C:\Program
Files\Java\jre1.8.0_231\bin\java.exe" "-Xms1024m" "-Xmx2048m"
"-XX:MaxPermSize=256m" "-Dhttps.protocols=TLSv1,TLSv1.1,TLSv1.2"
"-Djava.library.path=libswt\win64" "-DKETTLE_HOME="
"-DKETTLE_REPOSITORY=" "-DKETTLE_USER=" "-DKETTLE_PASSWORD="
"-DKETTLE_PLUGIN_PACKAGES=" "-DKETTLE_LOG_SIZE_LIMIT="
"-DKETTLE_JNDI_ROOT=" -jar launcher\launcher.jar -lib ..\libswt\win64
-main org.pentaho.di.kitchen.Kitchen -initialDir "C:\Users\a\Downloads\pdi-ce-8.3.0.0-371\data-integration"\
/file:C:\Users\a\Downloads\pdi-ce-8.3.0.0-371\data-integration\job.kjb
/level:Minimal Java HotSpot(TM) 64-Bit Server VM warning: ignoring
option MaxPermSize=256m; support was removed in 8.0 13:58:07,867 INFO
[KarafBoot] Checking to see if org.pentaho.clean.karaf.cache is
enabled 13:58:12,006 INFO [KarafInstance]
* Karaf Instance Number: 2 at C:\Users\a\Downloads\pdi-ce-8.3.0.0-
371\data-integration.\system\karaf\caches\kitchen\data-1
FastBin Provider Port:52902
Karaf Port:8803
OSGI Service Port:9052 *
******************************************************************************* Dec 19, 2019 1:58:12 PM org.apache.karaf.main.Main$KarafLockCallback
lockAquired INFO: Lock acquired. Setting startlevel to 100 2019/12/19
13:58:12 - Kitchen - Logging is at level : Minimal 2019/12/19 13:58:12
- Kitchen - Start of run. 2019-12-19 13:58:15.902:INFO:oejs.Server:jetty-8.1.15.v20140411 2019-12-19
13:58:15.955:INFO:oejs.AbstractConnector:Started
NIOSocketConnectorWrapper#0.0.0.0:9052 Dec 19, 2019 1:58:16 PM
org.apache.cxf.bus.osgi.CXFExtensionBundleListener addExtensions INFO:
Adding the extensions from bundle org.apache.cxf.cxf-rt-management
(182) [org.apache.cxf.management.InstrumentationManager] Dec 19, 2019
1:58:16 PM org.apache.cxf.bus.osgi.CXFExtensionBundleListener
addExtensions INFO: Adding the extensions from bundle
org.apache.cxf.cxf-rt-transports-http (183)
[org.apache.cxf.transport.http.HTTPTransportFactory,
org.apache.cxf.transport.http.HTTPWSDLExtensionLoader,
org.apache.cxf.transport.http.policy.HTTPClientAssertionBuilder,
org.apache.cxf.transport.http.policy.HTTPServerAssertionBuilder,
org.apache.cxf.transport.http.policy.NoOpPolicyInterceptorProvider]
Dec 19, 2019 1:58:16 PM
org.pentaho.caching.impl.PentahoCacheManagerFactory$RegistrationHandler$1
onSuccess INFO: New Caching Service registered 2019/12/19 13:58:17 -
job - Start of job execution Dec 19, 2019 1:58:18 PM
org.apache.cxf.endpoint.ServerImpl initDestination INFO: Setting the
server's publish address to be /lineage Dec 19, 2019 1:58:18 PM
org.apache.cxf.endpoint.ServerImpl initDestination INFO: Setting the
server's publish address to be /i18n Dec 19, 2019 1:58:19 PM
org.apache.cxf.endpoint.ServerImpl initDestination INFO: Setting the
server's publish address to be /marketplace
Update
Tried deleting kitchen cache from Karaf cache starting running but job never finished, now I'm running the job with a debug level and getting these results. Still, the job doesn't get any further than this, Job works in spoon so cannot be related to the job.
C:\Users\a\Downloads\pdi-ce-8.3.0.0-371\data-integration>kitchen.bat
/file:C:\Users\a\Downloads\pdi-ce-8.3.0.0-371\data-integration\Job.kjb
/level:Debug
DEBUG: Using PENTAHO_JAVA_HOME
DEBUG: _PENTAHO_JAVA_HOME=C:\Program Files\Java\jre1.8.0_231
DEBUG: _PENTAHO_JAVA=C:\Program Files\Java\jre1.8.0_231\bin\java.exe
C:\Users\a\Downloads\pdi-ce-8.3.0.0-371\data-integration>"C:\Program
Files\Java\jre1.8.0_231\bin\java.exe" "-Xms1024m" "-Xmx2048m"
"-XX:MaxPermSize=256m" "-Dhttps.protocols=TLSv1,TLSv1.1,TLSv1.2"
"-Djava.library.path=libswt\win64" "-DKETTLE_HOME="
"-DKETTLE_REPOSITORY=" "-DKETTLE_USER=" "-DKETTLE_PASSWORD="
"-DKETTLE_PLUGIN_PACKAGES=" "-DKETTLE_LOG_SIZE_LIMIT="
"-DKETTLE_JNDI_ROOT=" -jar launcher\launcher.jar -lib ..\libswt\win64
-main org.pentaho.di.kitchen.Kitchen -initialDir "C:\Users\a\Downloads\pdi-ce-8.3.0.0-371\data-integration"\
/file:C:\Users\a\Downloads\pdi-ce-8.3.0.0-371\data-integration\Job.kjb
/level:Debug
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option
MaxPermSize=256m; support was removed in 8.0
08:07:33,026 INFO [KarafBoot] Checking to see if
org.pentaho.clean.karaf.cache is enabled
08:07:37,211 INFO [KarafInstance]
* Karaf Instance Number: 1 at C:\Users\a\Downloads\pdi-ce-8.3.0.0- *
* 371\data-integration.\system\karaf\caches\kitchen\data-1 *
* FastBin Provider Port:52901 *
* Karaf Port:8802 *
* OSGI Service Port:9051 *
Dec 23, 2019 8:07:38 AM org.apache.karaf.main.Main$KarafLockCallback
lockAquired
INFO: Lock acquired. Setting startlevel to 100
2019/12/23 08:07:38 - Kitchen - Logging is at level : Debug
2019/12/23 08:07:38 - Kitchen - Start of run.
2019/12/23 08:07:38 - Kitchen - Allocate new job.
2019/12/23 08:07:38 - Kitchen - Parsing command line options.
2019-12-23 08:07:43.475:INFO:oejs.Server:jetty-8.1.15.v20140411
2019-12-23 08:07:43.538:INFO:oejs.AbstractConnector:Started
NIOSocketConnectorWrapper#0.0.0.0:9051
Dec 23, 2019 8:07:43 AM
org.apache.cxf.bus.osgi.CXFExtensionBundleListener addExtensions
INFO: Adding the extensions from bundle
org.apache.cxf.cxf-rt-management (182)
[org.apache.cxf.management.InstrumentationManager]
Dec 23, 2019 8:07:43 AM
org.apache.cxf.bus.osgi.CXFExtensionBundleListener addExtensions
INFO: Adding the extensions from bundle
org.apache.cxf.cxf-rt-transports-http (183)
[org.apache.cxf.transport.http.HTTPTransportFactory,
org.apache.cxf.transport.http.HTTPWSDLExtensionLoader,
org.apache.cxf.transport.http.policy.HTTPClientAssertionBuilder,
org.apache.cxf.transport.http.policy.HTTPServerAssertionBuilder,
org.apache.cxf.transport.http.policy.NoOpPolicyInterceptorProvider]
Dec 23, 2019 8:07:44 AM
org.pentaho.caching.impl.PentahoCacheManagerFactory$RegistrationHandler$1
onSuccess
INFO: New Caching Service registered
2019/12/23 08:07:45 - Job - Start of job execution
2019/12/23 08:07:45 - Job - exec(0, 0, START.0)
2019/12/23 08:07:45 - START - Starting job entry
2019/12/23 08:07:45 - Job - Job
Dec 23, 2019 8:07:46 AM org.apache.cxf.endpoint.ServerImpl
initDestination
INFO: Setting the server's publish address to be /lineage
Dec 23, 2019 8:07:47 AM org.apache.cxf.endpoint.ServerImpl
initDestination
INFO: Setting the server's publish address to be /i18n
Dec 23, 2019 8:07:48 AM org.apache.cxf.endpoint.ServerImpl
initDestination
INFO: Setting the server's publish address to be /marketplace
2019/12/23 08:07:55 - Job - Triggering heartbeat signal
for Job at every 10 seconds
Something deeper must have been corrupted, as I deleted all files, downloaded the latest version, and it worked.
to run from command line you have to run below command
path to kitchen.sh/kitchen.sh -file=".ktr filename" --level=Debug >> "log.txt"
I am facing an issue from past few days,
If I use Hadoop 2.7.4, I am not able to start nodemanager's at slaves with this version, because of this I can't run any mapred jobs.
When I use Hadoop 2.8.2, everything is starting well(start-dfs.sh and start-yarn.sh) and I am able to see the content, nodes activity at http://:50070, but while running my programs that use data from HDFS, it keeps on displaying
17/11/15 12:51:46 WARN hdfs.DFSClient: zero
but the job runs well. I am not aware of this issue and how it is effected. So, I tried 2.7.3
When I tried 2.7.3, I am not facing the above error, but I see that the yarn are not starting completing and when I start yarn, it throws more line on screen that usual way and it starts. The main issue, in this case, I am not able to watch watch going on in hadoop from web url as here it doesn't display anything on web, except for those files present in HDFS(So, I am only able see the data from Utilities --> Browse files )
The error is similar to this:
starting yarn daemons
starting resourcemanager, logging to
/usr/local/hadoop/logs/yarn--resourcemanager-hadoop-master.out
Nov 20, 2017 8:01:28 AM
com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory
register
INFO: Registering
org.apache.hadoop.yarn.server.resourcemanager.webapp.JAXBContextResolver
as a provider class
Nov 20, 2017 8:01:28 AM
com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory
register
INFO: Registering
org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices as
a root resource class
Nov 20, 2017 8:01:28 AM
com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory
register
INFO: Registering
org.apache.hadoop.yarn.webapp.GenericExceptionHandler as a provider
class
Nov 20, 2017 8:01:28 AM
com.sun.jersey.server.impl.application.WebApplicationImpl _initiate
INFO: Initiating Jersey application, version 'Jersey: 1.9 09/02/2011
11:17 AM'
Nov 20, 2017 8:01:28 AM
com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory
getComponentProvider
INFO: Binding
org.apache.hadoop.yarn.server.resourcemanager.webapp.JAXBContextResolver
to GuiceManagedComponentProvider with the scope "Singleton"
hadoop-slave1: Warning: Permanently added 'hadoop-slave1,10.40.0.0'
(ECDSA) to the list of known hosts.
hadoop-slave1: starting nodemanager, logging to
/usr/local/hadoop/logs/yarn-root-nodemanager-hadoop-slave1.weave.local.out
Any idea how to set it up completely without any issue?
I'm a new on Spark. When I install the Spark on Cloudera, it appears this error:
Error: Could not find or load main class org.apache.spark.deploy.history.HistoryServer
The last lines of the log file are:
/user/spark/applicationHistory
Thu Sep 10 23:43:50 CST 2015
Thu Sep 10 23:43:50 CST 2015: Detected CDH_VERSION of [5]
Thu Sep 10 23:43:50 CST 2015: Starting Spark History Server
Error: Could not find or load main class org.apache.spark.deploy.history.HistoryServer
My cloudera manager version is 5.2.5 and CDH version is 5.0.0.
I have installed Yarn, HDFS and zookeeper, and they works well so far.
And there is a spark jar (spark-assembly_2.10-0.9.0-cdh5.0.0-hadoop2.3.0-cdh5.0.0.jar) at:
/opt/cloudera/parcels/CDH-5.0.0-1.cdh5.0.0.p0.47/lib/spark/assembly/lib/
Thanks!
I am trying to connect to a sqoop server on localhost:
sqoop:000> set server --host manager --port 12000 --webapp sqoop
Server is set successfully
sqoop:000> show version -all
client version:
Sqoop 1.99.6 source revision 07244c3915975f26f03d9e1edf09ab7d06619bb8
Compiled by root on Wed Apr 29 10:40:43 CST 2015
0 [main] WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Exception has occurred during processing command
Exception: org.apache.sqoop.common.SqoopException Message: CLIENT_0000:An unknown error has occurred
sqoop:000>
Port 12000 is closed
$ netstat -na|grep 12000
Why is this happening?
Hadoop libraries need to be set in a file named catalina.properties inside directory server/conf. In this file, you need to set the hadoop libraries path in common-loader property. Default will be /usr/lib/hadoop and /usr/lib/hadoop/lib. If you have your hadoop libraries at any different locations then point that directory in this property.
sqoop2-tool verify can be used to verify all the sqoop server configurations. If it is successful, you can start your server using sqoop2-server start.
Ref:
https://sqoop.apache.org/docs/1.99.6/Installation.html
I'm installing TeamCity in EC2, starting with the Server then moving on the agents. I'm starting with the Amazon Linux AMI, running on a micro instance. Then I did:
sudo yum update
wget http://download.jetbrains.com/teamcity/TeamCity-7.1.1.tar.gz
tar -xvzf TeamCity-7.1.1.tar.gz
cd TeamCity
bin/teamcity-server.sh start
When I start it using bin/teamcity-server.sh start, things happen. I can connect using a web browser which shows the 'TeamCity is starting' page. The teamcity-server.log shows a bunch of activity, unzipping plugins etc.
But then suddently, the server process just disappears. The port's no longer listened to, ps shows no java process running, and the browser can't connect.
There's no error messages in the catalina or teamcity logs. After much trial and error though, I ran bin/teamcity-server.sh run (instead of start) to get console output, and got the following:
Using CATALINA_BASE: /home/ec2-user/TeamCity
Using CATALINA_HOME: /home/ec2-user/TeamCity
Using CATALINA_TMPDIR: /home/ec2-user/TeamCity/temp
Using JRE_HOME: /usr/lib/jvm/jre
Using CLASSPATH: /home/ec2-user/TeamCity/bin/bootstrap.jar:/home/ec2-user/TeamCity/bin/tomcat-juli.jar
Nov 1, 2012 7:22:25 PM org.apache.catalina.core.AprLifecycleListener init
INFO: The APR based Apache Tomcat Native library which allows optimal performance in production environments was not found on the java.library.path: /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/lib/amd64/server:/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/lib/amd64:/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/../lib/amd64:/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
Nov 1, 2012 7:22:26 PM org.apache.coyote.AbstractProtocol init
INFO: Initializing ProtocolHandler ["http-bio-8111"]
Nov 1, 2012 7:22:26 PM org.apache.catalina.startup.Catalina load
INFO: Initialization processed in 2742 ms
Nov 1, 2012 7:22:26 PM org.apache.catalina.core.StandardService startInternal
INFO: Starting service Catalina
Nov 1, 2012 7:22:26 PM org.apache.catalina.core.StandardEngine startInternal
INFO: Starting Servlet Engine: Apache Tomcat/7.0.23
Nov 1, 2012 7:22:26 PM org.apache.catalina.startup.HostConfig deployDirectory
INFO: Deploying web application directory /home/ec2-user/TeamCity/webapps/ROOT
Log4J configuration file /home/ec2-user/TeamCity/bin/../conf/teamcity-server-log4j.xml will be monitored with interval 10 seconds.
Nov 1, 2012 7:22:30 PM org.apache.coyote.AbstractProtocol start
INFO: Starting ProtocolHandler ["http-bio-8111"]
Nov 1, 2012 7:22:30 PM org.apache.catalina.startup.Catalina start
INFO: Server startup in 3786 ms
=======================================================================
TeamCity 7.1.1 (build 24074) initialized, OS: Linux, JRE: 1.6.0_24-b24
TeamCity is running in professional mode
bin/teamcity-server.sh: line 18: 4231 Killed ./catalina.sh $1
I promise that I did not kill the process! I can find my way around in Linux well enough, but I'm not at all sure where to go next to find out why or what killed the process. Can anyone help?
After some further scanning of .sh files to see how TeamCity was starting itself up, I noticed that it was grabbing a fair amount of memory for it's java process (either 512m or 750m depending on which line you use).
The EC2 micro instance only has 613m of RAM total. When I realized this, I tried the whole process again with a larger instance, and things worked fine.
I'm still curious if there's a better way I could've known what was causing catalina to die, so if anyone wants to answer with that information...