Set up Hadoop KMS - hadoop

I tried to set up Hadoop KMS server and client.
below is my kms.site.xml
<property>
<name>hadoop.kms.key.provider.uri</name>
<value>jceks://file#/${user.home}/kms.keystore</value>
<description>
URI of the backing KeyProvider for the KMS.
</description>
</property>
<property>
<name>hadoop.security.keystore.java-keystore-provider.password-file</name>
<value>kms.keystore.password</value>
<description>
If using the JavaKeyStoreProvider, the file name for the keystore password.
</description>
</property>
In core-site.xml added below
<property>
<name>dfs.encryption.key.provider.uri</name>
<value>kms://http#mydomain:16000/kms</value>
</property>
in hdfs-site added below
<property>
<name>dfs.encryption.key.provider.uri</name>
<value>kms://http#mydomain:16000/kms</value>
</property>
Then restarted hadoop and used ./kms.sh start to start kms
But when i m trying to generate a key using below command
hadoop key create key_demo -size 256
i m getting below message , am i missing anything ?
There are no valid (non-transient) providers configured.
No action has been taken. Use the -provider option to specify
a provider. If you want to use a transient provider then you
MUST use the -provider argument.

I am using hadoop 3.3.1
this is my kms-site.xml:
<property>
<name>hadoop.kms.key.provider.uri</name>
<value>jceks://file#/${user.home}/kms.keystore</value>
</property>
<property>
<name>hadoop.security.keystore.java-keystore-provider.password-file</name>
<value>kms.keystore.password</value>
</property>
this is my core-site.xml:
<property>
<name>hadoop.security.key.provider.path</name>
<value>kms://http#localhost:9600/kms</value>
<description>
The KeyProvider to use when interacting with encryption keys used
when reading and writing to an encryption zone.
</description>
</property>
Before adding those to my core-site.xml, I also get the same message as yours. I think you are using hadoop v2, so your port number for keyProvider are still 16000, I use v3. I also see that you are still using JavaKeyStoreProvider like the example in hadoop documentation (so am I), if you don't provide "password file" which is kms.keystore.password the KMS will terminate immediately after starting up. So, you would need to place an empty file in your classpath, which is in /hadoop_home/etc/
I know i arrive quite late, hope it help.

Related

ResouceManager got stucked in Accepted State

I am trying to integrate my es 2.2.0 version with hadoop HDFS.In my envoirnment,I have 1 master node and 1 data node. On my master node my Es is installed.
But while integrating it with HDFS my resource manager applications jobs get stuck in Accepted state.
Somehow i found link to change my yarn-site.xml settings:
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>2200</value>
<description>Amount of physical memory, in MB, that can be allocated for containers.</description>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>500</value>
</property>
I have done this also but it is not giving me expected output.
Configuration:-
my core-site.xml
<property>
<name>hadoop.tmp.dir</name>
<value>/app/hadoop/tmp</value>
<description>A base for other temporary directories.
</description> </property>
<property> <name>fs.default.name</name>
<value>
hdfs://localhost:54310
</value>
<description>The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem.
</description>
</property>
my mapred-site.xml,
<property>
<name>mapred.job.tracker</name>
<value>localhost:54311</value>
<description>The host and port that the MapReduce job tracker runs at. If "local", then jobs are run in-process as a single map and reduce task. </description>
</property>
my hdfs-site.xml,
<property>
<name>dfs.replication</name> <value>1</value> <description>Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time. </description>
</property>
Please help me how can i change my RM job to running state.So that i can use my elasticsearch data on HDFS.
If the screenshot is correct - you have 0 nodemanager - thus the application can’t start running - you need to start at least 1 nodemanager, so that application master and later tasks can be started.

The Job History web server can't connect to the Resource Manager in a HA setup

When using the YARN Resource Manager webapp, I have a problem viewing a page with job history, with URL like this:
http://node-0002.example.com:19888/jobhistory/job/job_1455545464819_0001
It shows a HTTP error with a Java exception like this:
java.lang.IllegalArgumentException: Does not contain a valid host:port authority: node-0001.example.com,node-0002.example.com:8088 (configuration property 'yarn.resourcemanager.webapp.address')
at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:213)
at org.apache.hadoop.yarn.conf.YarnConfiguration.getSocketAddr(YarnConfiguration.java:1782)
at org.apache.hadoop.yarn.webapp.util.WebAppUtils.getResolvedRMWebAppURLWithoutScheme(WebAppUtils.java:164)
at org.apache.hadoop.mapreduce.v2.app.webapp.AppController.<init>(AppController.java:63)```
The hostname node-0001.example.com,node-0002.example.com is obviously wrong.
I have a pretty straightforward HA-enabled configuration in my yarn-site.xml:
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>my-cluster</value>
</property>
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>node-0001.example.com,node-0002.example.com</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.node-0001.example.com</name>
<value>node-0001.example.com</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.node-0002.example.com</name>
<value>node-0002.example.com</value>
</property>
The only unusual thing is naming the rm-ids in the same way as the hosts. I also tried defining properties like yarn.resourcemanager.address and yarn.resourcemanager.webapp.address, but that didn't help.
Any idea where the comma-separated list instead of a hostname is coming from? I use Hadoop 2.7.1.

WSO2 BAM Cluster

I´m configuring a WSO2 BAM Cluster with external Cassandra and Hadoop Cluster following the indications in the CLUSTER420 documentation for WSO2 BAM 2.5.0 and in section 14 I found this:
Change the following properties in the /repository/conf/advanced/hive-site.xml file.
Change the below properties if you are using the incremental data processing and notification task features.
<property>
<name>hive.incremental.processing.intermediate.results.cassandra.hosts</name>
<value>localhost:9160</value>
</property>
<property>
<name>hive.incremental.processing.intermediate.results.cassandra.userName</name>
<value>admin</value>
</property>
<property>
<name>hive.incremental.processing.intermediate.results.cassandra.password</name>
<value>admin</value>
</property>
<!-- Credentials for WSO2BAM_UTILS_KS -->
<property>
<name>notification.task.receiver.username</name>
<value>admin</value>
</property>
<property>
<name>notification.task.receiver.password</name>
<value>admin</value>
</property>
My questions:
In my cassandra cluster I had 3 nodes, so what I need to put in "hive.incremental.processing.intermediate.results.cassandra.hosts" property value?
In the "cassandra.userName" and "cassandra.password" I put admin and admin or my cassandra cluster credentials?
In the credentials for WSO2BAM_UTILS_KS I put admin and admin or what?
Thanks.
List all hostnames as comma separated list
provide cassendra cluster credentials
It is not necessary to put credentials for WSO2_BAM_UTIL_KS. Rather, mention all credentials correctly on $BAM_HOME/repository/conf/datasources/master-datasource.xml

Tables created by oozie hive action cannot be found from hive client but can find them in HDFS

I'm trying to run hive script via Oozie Hive Action, I just created a hive table 'test' in my script.q , and the oozie job ran successed, I can find the table created by oozie job under hdfs path /user/hive/warehouse. But I could not find the 'test' table via command "show tables" in Hive Client.
I think there is something wrong with my metastore config, but I just can't figure it out.
Can somebody help ?
oozie admin -oozie http://localhost:11000/oozie -status
System mode: NORMAL
oozie job -oozie http://localhost:11000/oozie -config C:\Hadoop\oozie-3.2.0-incubating\oozie-win-distro\examples\apps\hive\job.properties -run
Job ID : 0000001-130910094106919-oozie-hado-W
Run Result
Here is my oozie-site.xml
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<!--
Refer to the oozie-default.xml file for the complete list of
Oozie configuration properties and their default values.
-->
<property>
<name>oozie.service.ActionService.executor.ext.classes</name>
<value>
org.apache.oozie.action.email.EmailActionExecutor,
org.apache.oozie.action.hadoop.HiveActionExecutor,
org.apache.oozie.action.hadoop.ShellActionExecutor,
org.apache.oozie.action.hadoop.SqoopActionExecutor
</value>
</property>
<property>
<name>oozie.service.SchemaService.wf.ext.schemas</name>
<value>shell-action-0.1.xsd,email-action-0.1.xsd,hive-action-0.2.xsd,sqoop-action-0.2.xsd,ssh-action-0.1.xsd</value>
</property>
<property>
<name>oozie.system.id</name>
<value>oozie-${user.name}</value>
<description>
The Oozie system ID.
</description>
</property>
<property>
<name>oozie.systemmode</name>
<value>NORMAL</value>
<description>
System mode for Oozie at startup.
</description>
</property>
<property>
<name>oozie.service.AuthorizationService.security.enabled</name>
<value>false</value>
<description>
Specifies whether security (user name/admin role) is enabled or not.
If disabled any user can manage Oozie system and manage any job.
</description>
</property>
<property>
<name>oozie.service.PurgeService.older.than</name>
<value>30</value>
<description>
Jobs older than this value, in days, will be purged by the PurgeService.
</description>
</property>
<property>
<name>oozie.service.PurgeService.purge.interval</name>
<value>3600</value>
<description>
Interval at which the purge service will run, in seconds.
</description>
</property>
<property>
<name>oozie.service.CallableQueueService.queue.size</name>
<value>10000</value>
<description>Max callable queue size</description>
</property>
<property>
<name>oozie.service.CallableQueueService.threads</name>
<value>10</value>
<description>Number of threads used for executing callables</description>
</property>
<property>
<name>oozie.service.CallableQueueService.callable.concurrency</name>
<value>3</value>
<description>
Maximum concurrency for a given callable type.
Each command is a callable type (submit, start, run, signal, job, jobs, suspend,resume, etc).
Each action type is a callable type (Map-Reduce, Pig, SSH, FS, sub-workflow, etc).
All commands that use action executors (action-start, action-end, action-kill and action-check) use
the action type as the callable type.
</description>
</property>
<property>
<name>oozie.service.coord.normal.default.timeout
</name>
<value>120</value>
<description>Default timeout for a coordinator action input check (in minutes) for normal job.
-1 means infinite timeout</description>
</property>
<property>
<name>oozie.db.schema.name</name>
<value>oozie</value>
<description>
Oozie DataBase Name
</description>
</property>
<property>
<name>oozie.service.JPAService.create.db.schema</name>
<value>true</value>
<description>
Creates Oozie DB.
If set to true, it creates the DB schema if it does not exist. If the DB schema exists is a NOP.
If set to false, it does not create the DB schema. If the DB schema does not exist it fails start up.
</description>
</property>
<property>
<name>oozie.service.JPAService.jdbc.driver</name>
<value>org.apache.derby.jdbc.EmbeddedDriver</value>
<description>
JDBC driver class.
</description>
</property>
<property>
<name>oozie.service.JPAService.jdbc.url</name>
<value>jdbc:derby:${oozie.data.dir}/${oozie.db.schema.name}-db;create=true</value>
<description>
JDBC URL.
</description>
</property>
<property>
<name>oozie.service.JPAService.jdbc.username</name>
<value>sa</value>
<description>
DB user name.
</description>
</property>
<property>
<name>oozie.service.JPAService.jdbc.password</name>
<value>pwd</value>
<description>
DB user password.
IMPORTANT: if password is emtpy leave a 1 space string, the service trims the value,
if empty Configuration assumes it is NULL.
</description>
</property>
<property>
<name>oozie.service.JPAService.pool.max.active.conn</name>
<value>10</value>
<description>
Max number of connections.
</description>
</property>
<property>
<name>oozie.service.HadoopAccessorService.kerberos.enabled</name>
<value>false</value>
<description>
Indicates if Oozie is configured to use Kerberos.
</description>
</property>
<property>
<name>local.realm</name>
<value>LOCALHOST</value>
<description>
Kerberos Realm used by Oozie and Hadoop. Using 'local.realm' to be aligned with Hadoop configuration
</description>
</property>
<property>
<name>oozie.service.HadoopAccessorService.keytab.file</name>
<value>${user.home}/oozie.keytab</value>
<description>
Location of the Oozie user keytab file.
</description>
</property>
<property>
<name>oozie.service.HadoopAccessorService.kerberos.principal</name>
<value>${user.name}/localhost#${local.realm}</value>
<description>
Kerberos principal for Oozie service.
</description>
</property>
<property>
<name>oozie.service.HadoopAccessorService.jobTracker.whitelist</name>
<value> </value>
<description>
Whitelisted job tracker for Oozie service.
</description>
</property>
<property>
<name>oozie.service.HadoopAccessorService.nameNode.whitelist</name>
<value> </value>
<description>
Whitelisted job tracker for Oozie service.
</description>
</property>
<property>
<name>oozie.service.HadoopAccessorService.hadoop.configurations</name>
<value>*=hadoop-conf</value>
<description>
Comma separated AUTHORITY=HADOOP_CONF_DIR, where AUTHORITY is the HOST:PORT of
the Hadoop service (JobTracker, HDFS). The wildcard '*' configuration is
used when there is no exact match for an authority. The HADOOP_CONF_DIR contains
the relevant Hadoop *-site.xml files. If the path is relative is looked within
the Oozie configuration directory; though the path can be absolute (i.e. to point
to Hadoop client conf/ directories in the local filesystem.
</description>
</property>
<property>
<name>oozie.service.WorkflowAppService.system.libpath</name>
<value>/user/${user.name}/share/lib</value>
<description>
System library path to use for workflow applications.
This path is added to workflow application if their job properties sets
the property 'oozie.use.system.libpath' to true.
</description>
</property>
<property>
<name>use.system.libpath.for.mapreduce.and.pig.jobs</name>
<value>false</value>
<description>
If set to true, submissions of MapReduce and Pig jobs will include
automatically the system library path, thus not requiring users to
specify where the Pig JAR files are. Instead, the ones from the system
library path are used.
</description>
</property>
<property>
<name>oozie.authentication.type</name>
<value>simple</value>
<description>
Defines authentication used for Oozie HTTP endpoint.
Supported values are: simple | kerberos | #AUTHENTICATION_HANDLER_CLASSNAME#
</description>
</property>
<property>
<name>oozie.authentication.token.validity</name>
<value>36000</value>
<description>
Indicates how long (in seconds) an authentication token is valid before it has
to be renewed.
</description>
</property>
<property>
<name>oozie.authentication.signature.secret</name>
<value>oozie</value>
<description>
The signature secret for signing the authentication tokens.
If not set a random secret is generated at startup time.
In order to authentiation to work correctly across multiple hosts
the secret must be the same across al the hosts.
</description>
</property>
<property>
<name>oozie.authentication.cookie.domain</name>
<value></value>
<description>
The domain to use for the HTTP cookie that stores the authentication token.
In order to authentiation to work correctly across multiple hosts
the domain must be correctly set.
</description>
</property>
<property>
<name>oozie.authentication.simple.anonymous.allowed</name>
<value>true</value>
<description>
Indicates if anonymous requests are allowed.
This setting is meaningful only when using 'simple' authentication.
</description>
</property>
<property>
<name>oozie.authentication.kerberos.principal</name>
<value>HTTP/localhost#${local.realm}</value>
<description>
Indicates the Kerberos principal to be used for HTTP endpoint.
The principal MUST start with 'HTTP/' as per Kerberos HTTP SPNEGO specification.
</description>
</property>
<property>
<name>oozie.authentication.kerberos.keytab</name>
<value>${oozie.service.HadoopAccessorService.keytab.file}</value>
<description>
Location of the keytab file with the credentials for the principal.
Referring to the same keytab file Oozie uses for its Kerberos credentials for Hadoop.
</description>
</property>
<property>
<name>oozie.authentication.kerberos.name.rules</name>
<value>DEFAULT</value>
<description>
The kerberos names rules is to resolve kerberos principal names, refer to Hadoop's
KerberosName for more details.
</description>
</property>
<!-- Proxyuser Configuration -->
<!--
<property>
<name>oozie.service.ProxyUserService.proxyuser.#USER#.hosts</name>
<value>*</value>
<description>
List of hosts the '#USER#' user is allowed to perform 'doAs'
operations.
The '#USER#' must be replaced with the username o the user who is
allowed to perform 'doAs' operations.
The value can be the '*' wildcard or a list of hostnames.
For multiple users copy this property and replace the user name
in the property name.
</description>
</property>
<property>
<name>oozie.service.ProxyUserService.proxyuser.#USER#.groups</name>
<value>*</value>
<description>
List of groups the '#USER#' user is allowed to impersonate users
from to perform 'doAs' operations.
The '#USER#' must be replaced with the username o the user who is
allowed to perform 'doAs' operations.
The value can be the '*' wildcard or a list of groups.
For multiple users copy this property and replace the user name
in the property name.
</description>
</property>
-->
Here is my hive-site.xml
[hive-site.xml]
Here is my script.q
create table test(id int);
Inside your oozie hive action you need to tell oozie where your hive metastore is.
Means you need to pass your hive-site.xml as argument.
Also you need to configure external metastore for hive for it to work. The default derby database configuration will not work for you.
So in simple steps
Create hive settings with external database , say mysql
Pass that hive-site.xml to oozie action
See here for details
http://oozie.apache.org/docs/3.3.1/DG_HiveActionExtension.html
Thanks

Failed to get system directory - hadoop

Using hadoop multinode setup (1 mater , 1 salve)
After starting up start-mapred.sh on master , i found below error in TT logs (Slave an)
org.apache.hadoop.mapred.TaskTracker: Failed to get system directory
can some one help me to know what can be done to avoid this error
I am using
Hadoop 1.2.0
jetty-6.1.26
java version "1.6.0_23"
mapred-site.xml file
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>master:54311</value>
<description>The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.
</description>
</property>
<property>
<name>mapred.map.tasks</name>
<value>1</value>
<description>
define mapred.map tasks to be number of slave hosts
</description>
</property>
<property>
<name>mapred.reduce.tasks</name>
<value>1</value>
<description>
define mapred.reduce tasks to be number of slave hosts
</description>
</property>
</configuration>
core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://master:54310</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hduser/workspace</value>
</property>
</configuration>
It seems that you just added hadoop.tmp.dir and started the job. You need to restart the Hadoop daemons after adding any property to the configuration files. You have specified in your comment that you added this property at a later stage. This means that all the data and metadata along with other temporary files is still in the /tmp directory. Copy all those things from there into your /home/hduser/workspace directory, restart Hadoop and re run the job.
Do let me know the result. Thank you.
If, it is your windows PC and you are using cygwin to run Hadoop. Then task tracker will not work.

Resources