In my Cloudera manager 5.4.7, I tried to distribute the spark package using parcels, but got stuck at activating 0%.
The error message stated Parcel not distributed but have active state ACTIVATING. It seems to be in an inconsistent state.
By searching the error message on Google, I followed instructions here to use the Cloudera API to force deactivation, but it did not work either (with 405 Method not allowed HTTP response code).
The API I tried are as follows:
http://master:7180/api/v10/clusters/myCluster/parcels/products/SPARK/versions/0.9.0-1.cdh4.6.0.p0.98 (this worked fine and returned a JSON object)
http://master:7180/api/v10/clusters/myCluster/parcels/products/SPARK/versions/0.9.0-1.cdh4.6.0.p0.98/commands/deactivate (this returned HTTP code 405)
Thanks for your help.
I've found the solution in case anyone is having the same problem. See the solution at here. Basically you should use curl -X POST with Cloudera API.
Related
I recently upgraded HDP from 2.6.5 to 3.1.0, which runs YARN 3.1.0, and I can no longer kill applications from the YARN ResourceManager UI, using either the old (:8088/cluster/apps) or new (:8088/ui2/index.html#/yarn-apps/apps) version. I can still kill them using the shell in RHEL 7 with yarn app -kill {app-id}
These applications are submitted via Livy. Here is my workflow:
Open the ResourceManagerUI, open the Application, click Settings and choose Kill Application. Notice, the 'Logged in as:' is set to UNKNOWN_USER:
Confirm that I want to kill the Application:
I get the following error in the UI:
Opening the console in Chrome, I see a 401 (Unauthorized) error.
If I try this from the old UI I am able to expand the error message and it shows the following:
{"RemoteException":{"exception":"AuthorizationException","message":"Unable to obtain user name, user not authenticated","javaClassName":"org.apache.hadoop.security.authorize.AuthorizationException"}}
I've read lots of posts, verified and changed several settings to try to fix this with no luck. Here are some of the settings I checked or changed as a result of my research:
hadoop.http.filter.initializers=org.apache.hadoop.security.HttpCrossOriginFilterInitializer,org.apache.hadoop.http.lib.StaticUserWebFilter
hbase.security.authentication=simple
hbase.security.authorization=false
yarn.nodemanager.webapp.cross-origin.enabled=true
yarn.resourcemanager.webapp.cross-origin.enabled=true
yarn.resourcemanager.webapp.delegation-token-auth-filter.enabled=false
yarn.resourcemanager.webapp.ui-actions.enabled=true
yarn.timeline-service.http-authentication.simple.anonymous.allowed=true
yarn.timeline-service.http-authentication.type=simple
yarn.webapp.api-service.enable=true
yarn.webapp.ui2.enable=true
ranger.add-yarn-authorization=false
Some of these seem way off base to me, like the hbase stuff, since I don't think that has anything to do with what I'm seeing. However, some users, in other situations, had it work for them so I wanted to try it.
Digging through the documentation it seems like you need to be authenticated before you can call the API. However, that same language was in the documentation for 2.6.5, which is the version of YARN I was running before where this worked.
Hopefully someone can point me to documentation that more clearly outlines what I can do to resolve the issue.
Thanks in advance.
Hey I know this isn't solution (I'm experiencing the same issue post-upgrade), but I found adding "?user.name=" at the end of the url to the old resource manager url will log you in as that user on both pages. I've found the old RM page as the only way to kill jobs though.
In a secured Hadoop cluster I am trying to access Flink AM page and logs from YARN and seeing the following error:
User %remoteUser are not authorized to view application %appID
Seems like that the cause is lack of support of ACL in YARN from Flink side.
How the code works
The message comes from hadoop/yarn/server/AppBlock class which uses ApplicationACLsManager class. This class performs checks and refers to app info which was set in RMAppManager:
this.applicationACLsManager.addApplication(applicationId,
submissionContext.getAMContainerSpec().getApplicationACLs()
AMContainerSpec is ContainerLaunchContext class which has a PB implementation, submitted from the framework side.
From Flink, this object is created in AbstractYarnClusterDescriptor class which (and other classes in Flink) doesn't call setApplicationACLs.
Question
Is there a way to bypass this or the right solution is to contribute the support to Flink? What is the state of this feature from the Flink side?
This sounds like a limitation in Flink which we should fix. Please open a JIRA issue. The community would be very happy if you could help implementing it.
I setup Apache Openwhisk locally following this guide: http://jamesthom.as/blog/2018/01/19/starting-openwhisk-in-sixty-seconds/. In general it seems to work correctly, but whenever I'm trying to execute any commands related to api, e.g.
wsk -i api list
it gives me an error,
Unable to obtain the API list: The requested resource does not exist. (code 153)
Any idea how to fix this?
This is unfortunately a temporary issue with docker-compose, and work is in progress to fix this.
I am trying to use h2o steam (running on localhost) to deploy a model. After importing the model from h2o flow, clicking the "deploy model" option in the "models" section of the project, filling out the resulting dialog box, and clicking the "deploy" button, the following messages are displayed:
At first I thought that it was because maybe I needed to start up the service builder on my own, so I started it up following the docs here, but still got the same error. Any suggestions would be appreciated. Thanks :)
Just make sure jetty HTTP server is running locally by executing the following in your shell:
java -jar var/master/assets/jetty-runner.jar var/master/assets/ROOT.war
Looking here, it seems like I would need to "override" some kind of default browser restriction for accessing localhost:8080 (which is what I assume steam is trying to do to launch the service builder (I don't know much about networking related stuff)). I got around this by launching steam with the command:
$ ./steam serve master --prediction-service-host=localhost --prediction-service-port-range=12345:22345
where the ports are some arbitrary range between (1025, 65535) which I got by word-searching the a page of the steam source code (line 182 as of the date of this posting).
Doing this lets me deploy the models through the steam dialog without any error messages. Again, I don't know much about networking related stuff, so if anyone has a better way to solve this problem (ie. allow access of localhost:8080) please post or comment. Thanks.
I'm currently trying to deploy Eremetic (version 0.28.0) on top of Marathon using the configuration provided as an example. I actually have been able to deploy it once, but suddenly, after trying to redeploy it, the framework stays inactive.
By inspecting the logs I noticed a constant attempt to connect to some service that apparently never succeeds because of some authentication problem.
2017/08/14 12:30:45 Connected to [REDACTED_MESOS_MASTER_ADDRESS]
2017/08/14 12:30:45 Authentication failed: EOF
It looks like the service returning an error is ZooKeeper and more precisely it looks like the error can be traced back to this line in the Go ZooKeeper library. ZooKeeper however seems to work: I've tried to query it directly with zkCli and to run a small Spark job (where the Mesos master is given with zk:// URL) and everything seems to work.
Unfortunately I'm not able to diagnose the problem further, what could it be?
It turned out to be a configuration problem. The master URL was simply wrong and this is how the error was reported.