How to set master address and 19998 port in Alluxio 2.0 java api? - alluxio

I want to know how to set the hostname and rpc_port of master in alluxio 2.0 java api.
When I use the code that works in alluxio 1.8, I find that it doesn't work in alluxio 2.0.
Here is my code, it doesn't work. I don't know how to write correct code in alluxio 2.0 java api:

In Alluxio 2, you can create a configuration object and then pass it to FileSystem.Factory.create().
For example:
InstancedConfiguration conf = new InstancedConfiguration(ConfigurationUtils.defaults());
conf.set(PropertyKey.MASTER_HOSTNAME, alluxioMaster);
conf.set(PropertyKey.MASTER_RPC_PORT, ...);
return FileSystem.Factory.create(conf);

Related

How to set dynamic properties of processor(Nifi) using java front end

how to set properties file of processor of apache nifi from Jsp front-end using java?
It sounds pretty unnecessary since you have the NiFi UI. If you want to do so anyway, you have NiFi's Rest API so you can configure and even deploy new DataFlows with it.

How to write from Apache Flink to Elasticsearch

I am trying to connect Flink to Elasticsearch and when I run the Maven project I have this error :
or another way to do it, I am using this example : https://github.com/keiraqz/KafkaFlinkElastic
The example you linked depends on various Flink modules with different version which is highly discouraged. Try setting them all to one version and see if this fixes the issue.

How to create security filter for Spark UI in Spark on YARN

Environment:
AWS EMR, yarn cluster.
Description:
I am trying to use a java filter to protect the access to spark ui, this by using the property spark.ui.filters; the problem is that when spark is running on yarn mode, that property is being allways overriden by hadoop with the filter org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter:
spark.ui.filters: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
and this two parameters are being passed automatically by haddoop
spark.org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.param.PROXY_HOSTS: ip-x-x-x-226.eu-west-1.compute.internal
spark.org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.param.PROXY_URI_BASES: http://ip-x-x-x-226.eu-west-1.compute.internal:20888/proxy/application_xxxxxxxxxxxxx_xxxx
Any suggestion of how to add a java security filter so hadoop does not override it, or maybe how to configure the security from hadoop side?
Thanks.
This is solved by using the property hadoop.http.authentication.type to
specify a custom Java Handler objects that contains the authentication
logic. This class only has to implement the
interface org.apache.hadoop.security.authentication.server.AuthenticationHandler.
See:
https://hadoop.apache.org/docs/r2.7.3/hadoop-project-dist/hadoop-common/HttpAuthentication.html

How to set JDBC datasource targets to all servers in the cluster through WLST?

I have a WebLogic domain where in JDBC datasources are targeted to only one of the managed servers in the cluster. I need it to target to all servers in the cluster. I am able to target it to all servers through WebLogic console. But I need to perform the same through WLST. I tried set() to target to different managed servers. But what if there is a new managed server in future and i want the JDBC datasources to be pointed to the new one too.
It is very simple you could try to use Jython jarray to pass the cluster name and also mention in the last argument that is target type as 'Cluster'.
clstrNam=raw_input('Cluster Name: ')
set('Targets',jarray.array([ObjectName('com.bea:Name='+clstrNam+',Type=Cluster')], ObjectName))
Please let me know if you still have challenges.
Reference link: Generic datasource using WLST Example
HTH

Where do I set the configuration mapreduce.job.jvm.numtasks?

I am reading in a book (Professional Hadoop Solutions) that JVM reuse can be enabled by specifying the job configuration mapreduce.job.jvm.numtasks. My question is do we need to set this in the Driver class?
I tried looking for this configuration in mapreduce.Job object, and I don't find it. Could this API be replaced elsewhere in the version of Hadoop I am using? Or am I not looking in the right place? I am using Hadoop version 1.0.3.
I also tried to look for the older property mapred.job.reuse.jvm.num.tasks, and I couldn't.
Thanks!
Your source is referring to the newer Hadoop configuration API for Hadoop 2.x (YARN). Within the shift to YARN a lot of configuration names have been revised. The changes are documented here on the the offical site for the related Hadoop release (in this case the by Amazon's Elastic MapReduce adopted version 2.4.0).
It explicitly mentions the old configuration name mapred.job.reuse.jvm.num.tasks has been replaced by the new name mapreduce.job.jvm.numtasks.
Furthermore the documentation for the MapReduce default configuration says this for mapreduce.job.jvm.numtasks:
How many tasks to run per jvm. If set to -1, there is no limit.
The default configuration for Hadoop 1.2.1 (compatible configuration API to 1.0.3) can be found on GrepCode for example.
Regarding your question, where to set this property. It either can be set
for the whole cluster in ${HADOOP_CONF_DIR}/mapred-site.xml,
or you specify it in the configuration of your Job (or JobContext), as long it is not declared final within your cluster:
job.getConfiguration().set("mapred.job.reuse.jvm.num.tasks","-1");
You could define it in mapred-site.xml:
<property>
<name>mapred.job.reuse.jvm.num.tasks</name>
<value>-1</value>
</property>
Use it when you have shorter task that run for a definite period of time.

Resources