I have installed a DSX 3 node cluster on RHEL 7.4, all notebooks and r-studio code work fine. However, model creation gives this error:
Load Data
Error: The provided kernel id was not found. Verify the input spark service credentials
All kubernetes pods seem to be up and running. Any ideas on how to fix this?
If you are in the Sept release, suggest stop kernels and restart. There was a limit of of 10 kernels in that release. You will see the active green button across notebooks/models with option to stop.
Related
I am trying to execute my databricks note book and linked service as execution Pool type of connection, also I have upload the Append libraries option for wheel format library in ADF but unable to execute our notebook via ADF and getting below error.
Run result unavailable: job failed with error message Library installation failed for library due to user error for whl:
"dbfs:/FileStore/jars/xxxxxxxxxxxxxxxxxxxx/prophet-1.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl"
. Error messages: Library installation attempted on the driver node of
cluster 1129-161441-xwjfzl6k and failed. Please refer to the following
error message to fix the library or contact Databricks support. Error
Code: DRIVER_LIBRARY_INSTALLATION_FAILURE. Error Message:
org.apache.spark.SparkException: Process List(bash,
/local_disk0/.ephemeral_nfs/cluster_libraries/python/python_start_clusterwide.sh,
/local_disk0/.ephemeral_nfs/cluster_libraries/python/bin/pip, install,
--upgrade, --find-links=/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages,
/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/prophet-1.1-cp38-cp38-manylinux_2_17_x86_6
... *WARNING: message truncated. Skipped 195 bytes of output
Kindly help us. and in linked in service, there is three types of option we have(Select cluster),
1.new job cluster
2.exixting interactive cluster
3.Existing instance pool
in production perspective which is the best, we do not have any job created in databricks and plan note book needs to trigger in adf to success the execution. please advice
Make sure you install the wheel onto the interactive cluster (option 2). This has nothing to do with Azure Data Bricks.
Installing local .whl files on Databricks cluster
See the above article for details.
Karthik from the error it is complaining about the library . This is what i could have done .
Cross check & make sure that the ADF is pointing the correct cluster .
If The cluster is correct , move on the cluster and open the notebook which you are trying to refer from ADF . try to execute that .
If the notebook works fine , go and stop the cluster and restart it again and run the notebook .
My guess is that once the cluster goes into the idle mode and shutsdown and then when ADF starts the cluster , it is not able to find the library it needs .
I have setup a Hadoop Cluster with Hortonworks Data Platform 2.5. I'm using 1 master and 5 slave (worker) nodes.
Every few days one (or more) of my worker nodes gets a high load and seem to restart the whole CentOS operating system automatically. After the restart the Hadoop components don't run anymore and have to be restarted manually via the Amabri management UI.
Here a screenshot of the "crashed" node (reboot after the high load value ~4 hours ago):
Here a screenshot of one of other "healthy" worker node (all other workers have similar values):
The node crashes alternate between the 5 worker nodes, the master node seems to run without problems.
What could cause this problem? Where are these high load values coming from?
This seems to be a Kernel problem, as the log file (e.g. /var/spool/abrt/vmcore-127.0.0.1-2017-06-26-12:27:34/backtrace) says something like
Version: 3.10.0-327.el7.x86_64
BUG: unable to handle kernel NULL pointer dereference at 00000000000001a0
After running a sudo yum update I had the kernel version
[root#myhost ~]# uname -r
3.10.0-514.26.2.el7.x86_64
Since the operating system updates the problem didn't occur anymore. I will observe the issue and give feedback if neccessary.
I am trying to configure ambari on single node. I am at confirm hosts phase. Its is showing installing from more than 30 minutes. I am running Ubuntu 14.04 64 bit on virtual box(RAM = 4GB). Is this normal?
It shouldn't take 30 minutes to register a single host. If you click the Installing link under the status column it will drill down into the log of what registration is doing. This may provide more details on what's going wrong with the registration process.
I set-up a new Hadoop Cluster with Hortonworks Data Platform 2.5. In the "old" cluster (installed HDP 2.4) I was able to see the information about running Spark jobs via the History Server UI by clicking the link show incomplete applications:
Within the new installation this link opens the page, but it always sais No incomplete applications found! (when there's still an application running).
I just saw, that the YARN ResourceManager UI shows two different kind of links in the "Tracking UI" column, dependent on the status of the Spark application:
application running: Application Master
this link opens http://master_url:8088/proxy/application_1480327991583_0010/
application finished: History
this link opens http://master_url:18080/history/application_1480327991583_0009/jobs/
Via the YARN RM link I can see the running Spark app infos, but why can't I access them via Spark History Server UI? Was there somethings changed from HDP 2.4 to 2.5?
I solved it, it was a network problem: Some of the cluster hosts (Spark slaves) couldn't reach each other due to a incorrect switch configuration. Found it out, as I tried to ping each host from each other.
Since all hosts can ping each other hosts the problem is gone and I can see active and finished jobs in my Spark History server UI again!
I didn't noticed the problem, because the ambari-agents worked on each host, and the ambari-server was also reachable from each cluster host! However, since ALL hosts can reach each other the problem is solved!
I installed Datastax community version in an EC2 server and it worked fine. After that I tried to add one more server and I see two nodes in the Nodes menu but in the main dashboard I see the following error:
Error: Call to /Test_Cluster__No_AMI_Parameters/rc/dashboard_presets/ timed out.
One potential rootcause I can see is the name of the cluster? I specified something else in the cassandra.yaml but it looks like opscenter is still using the original name? Any help would be grealy appreciated.
It was because cluster name change wasn't made properly. I found it easier to change the cluster name before starting Cassandra cluster. On top of this, only one instance of opscentered needs to run in one single cluster. datastax-agent needs to be running in all nodes in the cluster but they need to point to the same opscenterd (change needs to be made at /var/lib/datastax-agent/conf/address.yaml)