storm-twitter hdfsBolt hiveBolt failure on Kerberized cluster - hadoop

Trying to set up a storm-twitter stream on cluster using hdfsBolt and HiveBolt for flushing data to disk/hive table. using https://github.com/pvillard31/storm-twitter as reference. Followed all instructions to pass keytabs/principal both inside topology and storm.yaml as per https://github.com/apache/storm/blob/master/external/storm-hive/README.md But still getting error on both the bolts.
For HDFSBolt getting:
org.apache.hadoop.security.AccessControlException: org.apache.hadoop.security.AccessControlException: SIMPLE authentication is not enabled. Available:[TOKEN, KERBEROS]
For HiveBolt getting:
com.google.common.util.concurrent.UncheckedExecutionException: java.lang.RuntimeException: Unable to instantiate org.apache.hive.hcatalog.common.HiveClientCache$CacheableHiveMetaStoreClient
Tried many different ways following other posts to make storm aware about secure cluster but seems like it is still expecting a SIMPLE authentication.
my storm.yaml has as follows where 'xyz' is user that is running the zookeeper/nimbus/supervisor/topology on a docker container:
storm.zookeeper.servers:
- "localhost"
nimbus.host: "127.0.0.1"
nimbus.seeds: ["localhost"]
ui.port: 5555
logviewer.port: 5566
hive.keytab.file : "/home/user/.kt/xyz.keytab"
hive.kerberos.principal : "hive/_HOST#HADOOP.DOMAIN.ORG"
hive.metastore.uris : "thrift://<f.q.d.n>:9083"
hdfs.keytab.file : "/home/user/.kt/xyz.keytab"
hdfs.kerberos.principal : "xyz#HADOOP.DOMAIN.ORG"
topology.auto-credentials : ["org.apache.storm.hive.security.AutoHive", "org.apache.storm.hdfs.security.AutoHDFS"]
hiveCredentialsConfigKeys : ["hivecluster"]
"hivecluster": {"hive.keytab.file": "/home/user/storm/hive.keytab", "hive.kerberos.principal": "hive/_HOST#HADOOP.DOMAIN.ORG", "hive.metastore.uris": "thrift://<f.q.d.n>:9083"}
hdfsCredentialsConfigKeys : ["hdfscluster"]
"hdfscluster": {"hdfs.keytab.file": "/home/user/.kt/xyz.keytab", "hdfs.kerberos.principal": "xyz#HADOOP.DOMAIN.ORG"}
I also included keytab info inside topology config:
Config config = new Config();
config.put(HdfsSecurityUtil.STORM_KEYTAB_FILE_KEY, "/home/user/.kt/xyz.keytab");
config.put(HdfsSecurityUtil.STORM_USER_NAME_KEY, "xyz#HADOOP.DOMAIN.ORG");
As well as cluster xmls in hdfsBolt:
public void doPrepare(Map conf, TopologyContext topologyContext, OutputCollector collector) throws IOException {
this.hdfsConfig.addResource(new Path("/etc/hadoop/conf/hdfs-site.xml"));
this.hdfsConfig.addResource(new Path("/etc/hadoop/conf/core-site.xml"));
this.hdfsConfig.addResource(new Path("/etc/hadoop/conf/yarn-site.xml"));
this.hdfsConfig.addResource(new Path("/etc/hive/conf/hive-site.xml"));
this.fs = FileSystem.get(URI.create(this.fsUrl), this.hdfsConfig);
}
built a shaded jar to include everything except storm-core.
Any help would be appreciated.

Related

Unable to connect cosmos table api from databricks throws errror

Loaded proper library at cluster level.
com.microsoft.azure:azure-cosmosdb-spark_2.4.0_2.11:3.7.0
Gave proper connection strings from cosmos table api
cosmosConfig = {
"Endpoint" : "https://cosmos-account-name.table.cosmos.azure.com:443/",
"Masterkey" : "PrimaryKey",
"Database" : "TablesDB",
"Collection" : "Deals_Metadata"
}
Started reading this using spark api.
cosmosdbConnection = spark.read.format("com.microsoft.azure.cosmosdb.spark").options(**cosmosConfig).load()
when i execute this throws below error.
java.lang.NoSuchMethodError: scala.Predef$.refArrayOps([Ljava/lang/Object;)Lscala/collection/mutable/ArrayOps;
I tried to reproduce same in my environment I got same error.
To resolve this error, try to check com.azure.cosmos.spark jar properly installed or not and also follow below code.
Endpoint = "https://xxxx.documents.azure.com:443/"
MasterKey = "cosmos_db_key"
DatabaseName = "<dbname>"
ContainerName = "container"
spark.conf.set("spark.sql.catalog.cosmosCatalog", "com.azure.cosmos.spark.CosmosCatalog")
spark.conf.set("spark.sql.catalog.cosmosCatalog.spark.cosmos.accountEndpoint", Endpoint)
spark.conf.set("spark.sql.catalog.cosmosCatalog.spark.cosmos.accountKey", MasterKey)
spark.sql("CREATE DATABASE IF NOT EXISTS cosmosCatalog.{};".format(DatabaseName))
spark.sql("CREATE TABLE IF NOT EXISTS cosmosCatalog.{}.{} using cosmos.oltp TBLPROPERTIES(partitionKeyPath = '/id', manualThroughput = '1100')".format(DatabaseName, ContainerName))
Reading the data into spark Dataframe
Cfg1 = {
"spark.cosmos.accountEndpoint": Endpoint,
"spark.cosmos.accountKey": MasterKey,
"spark.cosmos.database": DatabaseName,
"spark.cosmos.container": ContainerName,
"spark.cosmos.read.inferSchema.enabled" : "false"
}
df = spark.read.format("cosmos.oltp").options(**Cfg1).load()
print(df.count())
Reference :
Manage data with Azure Cosmos DB Spark 3 OLTP Connector for SQL API | Microsoft

Break in fabric8 kubernetes client mock server

We were using fabric8 kubernetes client 5.3.x for watcher and it worked fine. Recently when we moved to 5.11.2 there were many changes observed and eventually the JUnit Tests started failing.
We use io.fabric8.kubernetes.client.server.mock.KubernetesServer
Earlier we were using ContainerStatus .withNewReady which now seem to be removed.
And then we added the following annotation
#Rule
public KubernetesServer myMockServer = new KubernetesServer(false, true);
After this, we are getting the following logs stating unsupported label requirement while the application code is sending this label.
[2022-02-03T07:30:22.733Z] 07:30:17.812 [pool-1-thread-1] DEBUG io.fabric8.kubernetes.client.dsl.internal.AbstractWatchManager - Watching http://localhost:40033/api/v1/namespaces/test/pods?labelSelector=app.kubernetes.io%2Fname%20in%20%28apps%29&timeoutSeconds=0&allowWatchBookmarks=true&watch=true...
[2022-02-03T07:30:22.733Z] 07:30:17.814 [MockWebServer /127.0.0.1:49882] WARN io.fabric8.kubernetes.client.server.mock.KubernetesAttributesExtractor - Ignoring unsupported label requirement: app.kubernetes.io/name in (apps)
[2022-02-03T07:30:22.733Z] 07:30:17.815 [MockWebServer /127.0.0.1:49882] DEBUG io.fabric8.kubernetes.client.server.mock.KubernetesAttributesExtractor - fromPath /api/v1/namespaces/test/pods?labelSelector=app.kubernetes.io%2Fname%20in%20%28apps%29&timeoutSeconds=0&allowWatchBookmarks=true&watch=true : {attributes: {namespace={key:namespace, value:test}, version={key:version, value:v1}, plural={key:plural, value:pods}}}
[2022-02-03T07:30:22.733Z] 07:30:17.815 [OkHttp http://localhost:40033/...] DEBUG io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener - WebSocket successfully opened
[2022-02-03T07:30:22.733Z] 07:30:20.818 [OkHttp http://localhost:40033/...] DEBUG io.fabric8.kubernetes.client.dsl.internal.AbstractWatchManager - Scheduling reconnect task
Is there something more should be done? Tests are in ERROR state and not Failed.
Sample JUnit program
public void testAddNewPodWatchEvent()
{
//given
doReturn(myClientMock).when(myTestObj).getClient();
doReturn(myWatcherSpy).when(myTestObj).createEventHandler();
String PATH =
"/api/v1/namespaces/test/pods?labelSelector=app.kubernetes.io%2Fname%20in%20%28apps%29&timeoutSeconds=0&watch=true";
Map<String, String> mockLabelMap = new HashMap<>();
mockLabelMap.put("foo", "testlabel");
mockLabelMap.put("app.kubernetes.io/name", "apps");
Pod accPod = createAppsPod(mockLabelMap, true);
myMockServer.expect()
.get()
.withPath(PATH)
.andUpgradeToWebSocket()
.open()
.waitFor(100)
.andEmit(new WatchEvent(accPod, "ADDED"))
.done()
.once();
//when
myTestObj.activate(mockProps);
sleepForWatchToBeInvoked();
//then
verify(myWatcherSpy, atLeastOnce()).eventReceived(Action.ADDED, accPod);
}

kafka connect elastic sink Could not connect to Elasticsearch. General SSLEngine problem

I'm trying to deploy confluent Kafka connect to elasticsearch. My elastic stack is deployed on kubernetes, has HTTP encryption, and authentication. I'm forwarding elastic from kubernetes to localhost.
Caused by: org.apache.kafka.connect.runtime.rest.errors.BadRequestException: Connector configuration
is invalid and contains the following 3 error(s):
Could not connect to Elasticsearch. Error message: General SSLEngine problem
Could not authenticate the user. Check the 'connection.username' and 'connection.password'. Error
message: General SSLEngine problem
Could not authenticate the user. Check the 'connection.username' and 'connection.password'. Error
message: General SSLEngine problem
I'm sure that the username and password are right. Elastic properties file looks like
name=elasticsearch-sink
connector.class=io.confluent.connect.elasticsearch.ElasticsearchSinkConnector
tasks.max=1
topics=pwp-alerts
key.ignore=true
connection.url=https://localhost:9200
type.name=kafka-connect
errors.tolerance = all
behavior.on.malformed.documents=warn
schema.ignore = true
connection.username ="elastic"
connection.password ="my_password"
Does anyone know what can cause the problem?
I guess the failure issued by unsuccessful connection to your elastic engine it may happens by many things for example wrong port or your listener type it may be advertised listener instead of simple consumer, I recommend to use Logstash and add the Kafka input configuration in your Logstash configuration, You can simply modify your Kafka consumer and bootstrap server and many properties in input and your elastic index, Port and authorization in output easily.
Your Logstash configuration file with Kafka input may look like as below
input {
kafka{
group_id => "Your group consumer group id"
topics => ["Your topic name"]
bootstrap_servers => "Your consumer port, Default port is 9092"
codec => json
}
}
filter {
}
output {
file {
path => "Some path"
}
elasticsearch {
hosts => ["localhost:9200"]
document_type => "_doc"
index => "Your index name"
user => username
password => password
}
stdout { codec => rubydebug
}
}
You can remove the file in output if you don't want to store your data additionally beside your Logstash pipeline.
Find out more about Logstash Kafka input properties in Here

Cannot connect to Zookeeper when running a mapreduce job

I am running a map reduce job using an Accumulo table as input and storing the data in another table in Accumulo. This is the run method
public int run(String[] args) throws Exception {
Opts opts = new Opts();
opts.parseArgs(PivotTable.class.getName(), args);
Configuration conf = getConf();
conf.set("formula", opts.formula);
Job job = Job.getInstance(conf);
job.setJobName("Pivot Table Generation");
job.setJarByClass(PivotTable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
job.setMapperClass(PivotTableMapper.class);
job.setCombinerClass(PivotTableCombiber.class);
job.setReducerClass(PivotTableReducer.class);
AccumuloInputFormat.setInputTableName(job, opts.dataTable);
BatchWriterConfig bwConfig = new BatchWriterConfig();
AccumuloOutputFormat.setBatchWriterOptions(job, bwConfig);
AccumuloOutputFormat.setDefaultTableName(job, opts.pivotTable);
AccumuloOutputFormat.setCreateTables(job, true);
job.setInputFormatClass(AccumuloInputFormat.class);
job.setOutputFormatClass(AccumuloOutputFormat.class);
opts.setAccumuloConfigs(job);
return job.waitForCompletion(true) ? 0 : 1;
}
The problem though is that when I run the job, I get an exception that says that it cannot connect to zookeeper.
Error: java.lang.RuntimeException: Failed to connect to zookeeper (zookeeper.1:22181) within 2x zookeeper timeout period 30000
at org.apache.accumulo.fate.zookeeper.ZooSession.connect(ZooSession.java:124)
at org.apache.accumulo.fate.zookeeper.ZooSession.getSession(ZooSession.java:164)
at org.apache.accumulo.fate.zookeeper.ZooReader.getSession(ZooReader.java:43)
at org.apache.accumulo.fate.zookeeper.ZooReader.getZooKeeper(ZooReader.java:47)
at org.apache.accumulo.fate.zookeeper.ZooCache.getZooKeeper(ZooCache.java:59)
at org.apache.accumulo.fate.zookeeper.ZooCache.retry(ZooCache.java:159)
at org.apache.accumulo.fate.zookeeper.ZooCache.get(ZooCache.java:289)
at org.apache.accumulo.fate.zookeeper.ZooCache.get(ZooCache.java:238)
at org.apache.accumulo.core.client.ZooKeeperInstance.getInstanceID(ZooKeeperInstance.java:169)
at org.apache.accumulo.core.client.ZooKeeperInstance.<init>(ZooKeeperInstance.java:159)
at org.apache.accumulo.core.client.ZooKeeperInstance.<init>(ZooKeeperInstance.java:140)
at org.apache.accumulo.core.client.mapreduce.RangeInputSplit.getInstance(RangeInputSplit.java:364)
at org.apache.accumulo.core.client.mapreduce.AbstractInputFormat$AbstractRecordReader.initialize(AbstractInputFormat.java:495)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:548)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:786)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
I checked to see if zookeeper was up and it was running. I ran telnet to see if the port was up and it was up.
I am using $ACCUMULO_HOME/bin/tool.sh to run the job. Any help would be appreciated.
It was an issue with the hosts file in my hadoop slaves. The hostname mappings were not correct.

Building a JMX client in a servlet installed on the Deployment Manager

I'm building a monitoring application as a servlet running on my websphere 7 ND deployment manager. The tool uses JMX to query the deployment manager for various data. Global Security is enabled on the dmgr.
I'm having problems getting this to work however. My first attempt was to use the websphere client code:
String sslProps = "file:" + base +"/properties/ssl.client.props";
System.setProperty("com.ibm.SSL.ConfigURL", sslProps);
String soapProps = "file:" + base +"/properties/soap.client.props";
System.setProperty("com.ibm.SOAP.ConfigURL", pp);
Properties connectProps = new Properties();
connectProps.setProperty(AdminClient.CONNECTOR_TYPE, AdminClient.CONNECTOR_TYPE_SOAP);
connectProps.setProperty(AdminClient.CONNECTOR_HOST, dmgrHost);
connectProps.setProperty(AdminClient.CONNECTOR_PORT, soapPort);
connectProps.setProperty(AdminClient.CONNECTOR_SECURITY_ENABLED, "true");
AdminClient adminClient = AdminClientFactory.createAdminClient(connectProps) ;
This results in the following exception:
Caused by: com.ibm.websphere.management.exception.ConnectorNotAvailableException: ADMC0016E: The system cannot create a SOAP connector to connect to host ssunlab10.apaceng.net at port 13903.
at com.ibm.ws.management.connector.soap.SOAPConnectorClient.getUrl(SOAPConnectorClient.java:1306)
at com.ibm.ws.management.connector.soap.SOAPConnectorClient.access$300(SOAPConnectorClient.java:128)
at com.ibm.ws.management.connector.soap.SOAPConnectorClient$4.run(SOAPConnectorClient.java:370)
at com.ibm.ws.security.util.AccessController.doPrivileged(AccessController.java:118)
at com.ibm.ws.management.connector.soap.SOAPConnectorClient.reconnect(SOAPConnectorClient.java:363)
... 22 more
Caused by: java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
at java.net.Socket.connect(Socket.java:519)
at java.net.Socket.connect(Socket.java:469)
at java.net.Socket.<init>(Socket.java:366)
at java.net.Socket.<init>(Socket.java:209)
at com.ibm.ws.management.connector.soap.SOAPConnectorClient.getUrl(SOAPConnectorClient.java:1286)
... 26 more
So, I then tried to do it via RMI, but adding in the sas.client.properties to the environment, and setting the connectort type in the code to CONNECTOR_TYPE_RMI. Now though I got a NameNotFoundException out of CORBA:
Caused by: javax.naming.NameNotFoundException: Context: , name: JMXConnector: First component in name JMXConnector not found. [Root exception is org.omg.CosNaming.NamingContextPackage.NotFound: IDL:omg.org/CosNaming/NamingContext/NotFound:1.0]
To see if it was an IBM issue, I tried using the standard JMX connector as well with the same result (substitute AdminClient for JMXConnector in the above error)
JMXServiceURL url = new JMXServiceURL("service:jmx:rmi:///jndi/JMXConnector");
Hashtable h = new Hashtable();
String providerUrl = "corbaloc:iiop:" + dmgrHost + ":" + rmiPort + "/WsnAdminNameService";
h.put(Context.PROVIDER_URL, providerUrl);
// Specify the user ID and password for the server if security is enabled on server.
String[] credentials = new String[] { "***", "***" };
h.put("jmx.remote.credentials", credentials);
// Establish the JMX connection.
JMXConnector jmxc = JMXConnectorFactory.connect(url, h);
// Get the MBean server connection instance.
mbsc = jmxc.getMBeanServerConnection();
At this point, in desperation I wrote a wsadmin sccript to run both the RMI and SOAP methods. To my amazement, this works fine. So my question is, why does the code not work in a servlet installed on the dmgr ?
regards,
Trevor
For the SOAP error, the ConnectException looks like the wrong SOAP host/port was used for the dmgr. I would double-check the server logs for the SOAP port. For the RMI error (NameNotFoundException), it looks like you're trying to use JMXConnectorFactory, which isn't supported by WAS.
If your application is installed on the dmgr, it's probably easiest to just use AdminServiceFactory.getAdminService to get an in-process reference to the AdminService rather than trying to open a new connection to the same process:
http://publib.boulder.ibm.com/infocenter/wasinfo/fep/topic/com.ibm.websphere.javadoc.doc/web/apidocs/com/ibm/websphere/management/AdminServiceFactory.html

Resources