What is Spark default username and password for a standalong installation? - oracle

I am trying to connect Spark to Oracle Analytics Cloud(OAC). I have a standalone spark(3.1.2) installation with Hadoop (2.7) in my windows VM. The connection requires a username, password, host and port. Can you please provide the default username and password for Spark standalone installation along with port or do i need to configure a spark cluster to have this sort of information. Thanks.

Related

Permission when setup Hadoop Cluster in Pentaho

I try to setup new hadoop cluster in pentaho 9.3 but i got permission error.
It requires username and password for hdfs but i don't know how to create user and password for hdfs.
step 1
step 2
I get Error
Error
Hadoop, by default, uses regular OS user accounts. For Linux, you'd use useradd command.

Configure hadoop-client to connect to hadoop in other machine/server

On server A i have hadoop and python scripts for performing tasks on hadoop.
On server B i have hive/hadoop.
Is it possible to configure hadoop-client on server A to be connected to hadoop on server B?
It's not clear what Python library you are using, but assuming PySpark, you can copy or configure the HADOOP_CONF_DIR on your client machine, and it can communicate with any external Hadoop system.
At the very least, you'll need to configure a core-site.xml to communicate with HDFS and a hive-site.xml to communicate with Hive.
If you are using PyHive library, you just connect to user#hiveserver2:1000

Installing Hive 2.1.0 Interactive Query (LLAP) on Kerberized HDP 2.6.2 environment

I had a lot of issues surrounding the installation/activation of Hive 2.1.0 on our HDP 2.6.2 cluster. But finally I got it working, so I wanted to share the steps involved with the community. I got these steps from different sources, which I will also mention below each step. My specifications:
Clustered HDP 2.6.2 (hortonworks) environment
Kerberos
Hive 1.2.1000 -> Hive 2.1.0
Step 1: Enable Hive Interactive Query
Follow the steps on the Hortonworks website. This includes enabling YARN pre-emption and some other Yarn settings. After adjusting YARN your can enable Hive Interactive Query via Ambari. You also have to specify a default queue that is at least 20% of your total cluster capacity.
Source
Step 2: Kerberos related settings
Make sure you add the following settings to the custom hiveserver2-interactive site in Ambari. Where ${REALMNAME} is the name of your LDAP realm.
hive.llap.zk.sm.keytab.file=/etc/security/keytabs/hive.llap.zk.sm.keytab
hive.llap.zk.sm.principal=hive/_HOST#${REALMNAME}
hive.llap.daemon.keytab.file=/etc/security/keytabs/hive.service.keytab
hive.llap.daemon.service.principal=hive/_HOST#${REALMNAME}
Now you have to put those 2 keytabs (basically the same keytabs) on every YARN node. This can be done manually or through Ambari (Kerberos service). Make sure those keytabs are chown hive:hadoop and have a chmod 440 (group read).
Note: you also need a user hive on all those nodes.
Source
Step 3: Zookeeper configuration
It could be that Hive is not recognized by Zookeeper, this will give acl errors when trying to start the HiveServer2 Interactive. To cope with this issue I added the right hive acl nodes through a zookeeper client host.
su -
# First, authenticate with the hive keytab
kinit hive/'hostname' -kt /etc/security/keytabs/hive.service.keytab
# Second, connect to a zookeeper client on your cluster
/usr/hdp/current/zookeeper-server/bin/zkCli.sh -server ${ZOOKEEPER_CLIENT}
# Third, check the current status of the user-hive acl
getAcl /llap-sasl/user-hive
# Fourth, If this is not there create the following nodes
create /llap-sasl/user-hive "" sasl:hive:cdrwa,world:anyone:r
create /llap-sasl/user-hive/llap0 "" sasl:hive:cdrwa,world:anyone:r
create /llap-sasl/user-hive/llap0/workers "" sasl:hive:cdrwa,world:anyone:r
# Fifth, change the llap-sasl node to add the user hive
setAcl /llap-sasl sasl:hive:cdrwa,world:anyone:r
Source 1, Source 2
Basically, this should work for Kerberized environments. If you got errors related to ACL, go back to your Zookeeper settings and look if everything is fine. If you have errors related to a missing Hive user, you should look of the hive user is added correctly to the nodes. If you have an error related to Kerberos (principal or keytabs) look if the keytabs are on the designated (YARN) nodes with the correct rights.

phoenix hbase not connecting remotely

I have two cloudera VM and on both i've configured phoenix and it is working fine as long as it is localhost.
When i'm trying to connect hbase from one VM from phoenix of another VM, i'm using this command
$ ./sqlline.sh xxx.xx.xx.xx:2181
The connection is successful, but phoenix is still referencing the local HBASE and not the remote HBASE. Can anyone tell me where is the problem?

Hadoop HDFS connection in PowerCenter

I have installed Cloudera's Hadoop QuickStart VM and I am attempting to pass records from my local database to HDFS using a PowerCenter mapping.
I've set up the Hadoop_HDFS_Connection in PowerCenter Workflow Manager but when I run the workflow I get the following error: "Unable to establish a connection with the specified HDFS host". It gives a "java.net.ConnectionException" error when trying to connect to the host name and port.
I think the error may be in the hostname notation. On Cloudera Manager on the VM, the host name is listed as 'localhost.localdomain' but I don't know how to translate this in the PowerCenter connection settings.
Anybody got this connection to work?
Many thanks.
Brian

Resources