NiFi: Using xml.etree.ElementTree in Python script

NiFi: Using xml.etree.ElementTree in Python script - etl

Nifi 1.5.0
I'm try execute this Python Script in ExecuteScript processor:
import java.io
from org.apache.commons.io import IOUtils
from java.nio.charset import StandardCharsets
from org.apache.nifi.processor.io import StreamCallback
import xml.etree.ElementTree as ET
class PyStreamCallback(StreamCallback):
def __init__(self):
pass
def process (self, inputStream, outputStream):
xmlRaw = IOUtils.toString(inputStream, StandardCharsets.UTF_8)
tree = ET.fromstring(xmlRaw)
root = tree.getroot()
xmlFix = ET.tostring(root, encoding='utf8', method='xml')
outputStream.write(bytearray(xmlFix))
flowFile = session.get()
if (flowFile != None):
#callback = PyStreamCallback(flowFile)
#session.read(flowFile, callback)
flowFile = session.write(flowFile, PyStreamCallback())
session.transfer(flowFile, REL_SUCCESS)
session.commit()
And I have error:
ExecuteScript[id=3c68eecc-0172-1000-ffff-ffff82be9cc3] Failed to process session due to javax.script.ScriptException: org.xml.sax.SAXException: org.xml.sax.SAXException: SAX2 driver class org.apache.xerces.parsers.SAXParser not found
java.lang.ClassNotFoundException: org.apache.xerces.parsers.SAXParser in <script> at line number 21: org.apache.nifi.processor.exception.ProcessException: javax.script.ScriptException: org.xml.sax.SAXException: org.xml.sax.SAXException: SAX2 driver class org.apache.xerces.parsers.SAXParser not found
java.lang.ClassNotFoundException: org.apache.xerces.parsers.SAXParser in <script> at line number 21
xml.etree.ElementTree not supported in Jython or it Nifi configuration error?

Spaxparser2 depricated in Nifi 1.5 https://community.cloudera.com/

Related

Error creating bean, Bean instantiation via factory method failed

am trying to configure elasticsearch in Spring boot but Bean instantiation is failing, Node Builder has been removed from elasticsearch api so am trying to use Settings.Builder but its not helping
below is the code:
import java.io.File;
import java.io.IOException;
import java.net.InetAddress;
import org.elasticsearch.client.transport.TransportClient;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.common.transport.InetSocketTransportAddress;
import org.elasticsearch.transport.client.PreBuiltTransportClient;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.data.elasticsearch.core.ElasticsearchOperations;
import org.springframework.data.elasticsearch.core.ElasticsearchTemplate;
import org.springframework.data.elasticsearch.repository.config.EnableElasticsearchRepositories;
#Configuration
#EnableElasticsearchRepositories(basePackages = "com.demo.elastic.elasticdemo.repository")
public class ElasticConfiguration {
#SuppressWarnings("resource")
#Bean
public ElasticsearchOperations elasticsearchTemplate() throws IOException {
File tempDir = File.createTempFile("temp-elastic", Long.toString(System.nanoTime()));
System.out.println("Temp directory: "+ tempDir.getAbsolutePath());
Settings.Builder settings = Settings.builder()
//http settings
.put("http.enable", "true")
.put("http.cor.enable", "true")
.put("http.cors.allow-origin", "https?:?/?/localhost(:[0-9]+)?/")
.put("http.port", "9200")
//transport settings
.put("transport.tcp.port", "9300")
.put("network.host", "localhost")
//node settings
.put("node.data", "true")
.put("node.master", "true")
//configuring index
.put("index.number_of_shards", "1")
.put("index.number_of_replicas", "2")
.put("index.refresh_interval", "10s")
.put("index.max_results_window", "100")
//configuring paths
.put("path.logs", new File (tempDir, "logs").getAbsolutePath())
.put("path.data", new File (tempDir, "data").getAbsolutePath())
.put("path.home", tempDir);
//.build();
TransportClient client = new PreBuiltTransportClient(Settings.EMPTY)
.addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("localhost"),9300));
return new ElasticsearchTemplate(client);
}
}
what am i doing wrong.??
error am getting:
2018-09-27 19:23:35.825 ERROR 57876 --- [ main] o.s.boot.SpringApplication : Application run failed
org.springframework.beans.factory.UnsatisfiedDependencyException: Error creating bean with name 'loaders': Unsatisfied dependency expressed through field 'operations'; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'elasticsearchTemplate' defined in class path resource [com/demo/elastic/elasticdemo/config/ElasticConfiguration.class]: Bean instantiation via factory method failed; nested exception is org.springframework.beans.BeanInstantiationException: Failed to instantiate [org.springframework.data.elasticsearch.core.ElasticsearchOperations]: Factory method 'elasticsearchTemplate' threw exception; nested exception is java.lang.NoClassDefFoundError: org/elasticsearch/transport/Netty3Plugin
at org.springframework.beans.factory.annotation.AutowiredAnnotationBeanPostProcessor$AutowiredFieldElement.inject(AutowiredAnnotationBeanPostProcessor.java:586) ~[spring-beans-5.0.9.RELEASE.jar:5.0.9.RELEASE]
at org.springframework.beans.factory.annotation.InjectionMetadata.inject(InjectionMetadata.java:90) ~[spring-beans-5.0.9.RELEASE.jar:5.0.9.RELEASE]

NiFi & Spark Integration Error: java.lang.NoClassDefFoundError: org/apache/http/nio/protocol/HttpAsyncResponseConsumer

I am trying to setup a very basic example.
Push random data to Output port in NiFi
Use Spark streaming context to print received data.
Versions (All on single instance)
HDF - 3.1.1.0-35
HDP - 2.6.5.0-292
nifi-spark-receiver & site-to-site-client - 1.7.1
I have configured spark-defaults.conf as follows
spark.driver.extraClassPath /usr/hdf/3.1.1.0-35/nifi/work/META-INF/bundled-dependencies/nifi-client-dto-1.5.0.3.1.1.0-35.jar:/opt/spark-receiver/httpcore-nio-4.0-alpha6.jar:/opt/spark-receiver/nifi-site-to-site-client-1.5.0.3.1.2.0-7.jar:/opt/spark-receiver/nifi-spark-receiver-1.5.0.3.1.2.0-7.jar:/usr/hdf/3.1.1.0-35/nifi/lib/nifi-api-1.5.0.3.1.1.0-35.jar:/usr/hdf/3.1.1.0-35/nifi/lib/bootstrap/nifi-utils-1.5.0.3.1.1.0-35.jar:/usr/hdf/3.1.1.0-35/nifi/lib/nifi-framework-api-1.5.0.3.1.1.0-35.jar
I am running following commands in spark-shell
import org.apache.nifi._
import java.nio.charset._
import org.apache.nifi.spark._
import org.apache.nifi.remote.client._
import org.apache.spark._
import org.apache.nifi.events._
import org.apache.spark.streaming._
import org.apache.spark.streaming.StreamingContext._
import org.apache.nifi.remote._
import org.apache.nifi.remote.client._
import org.apache.nifi.remote.protocol._
import org.apache.spark.storage._
import org.apache.spark.streaming.receiver._
import java.io._
import org.apache.spark.serializer._
val conf = new SiteToSiteClient.Builder().url("http://10.140.0.2:9090/nifi").portName("toSpark").buildConfig()
val ssc = new StreamingContext(sc, Seconds(10))
val lines = ssc.receiverStream(new NiFiReceiver(conf, StorageLevel.MEMORY_ONLY))
val text = lines.map(dataPacket => new String(dataPacket.getContent, StandardCharsets.UTF_8))
text.print()
ssc.start()
After running this code, I get following error:
Exception in thread "NiFi Receiver" java.lang.NoClassDefFoundError:
org/apache/http/nio/protocol/HttpAsyncResponseConsumer
at org.apache.nifi.remote.client.SiteInfoProvider.createSiteToSiteRestApiClient(SiteInfoProvider.java:104)
at org.apache.nifi.remote.client.SiteInfoProvider.refreshRemoteInfo(SiteInfoProvider.java:68)
at org.apache.nifi.remote.client.SiteInfoProvider.getPortIdentifier(SiteInfoProvider.java:220)
at org.apache.nifi.remote.client.SiteInfoProvider.getOutputPortIdentifier(SiteInfoProvider.java:204)
at org.apache.nifi.remote.client.socket.SocketClient.getPortIdentifier(SocketClient.java:79)
at org.apache.nifi.remote.client.socket.SocketClient.createTransaction(SocketClient.java:121)
at org.apache.nifi.spark.NiFiReceiver$ReceiveRunnable.run(NiFiReceiver.java:149)
at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.ClassNotFoundException:
org.apache.http.nio.protocol.HttpAsyncResponseConsumer
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
Pls help.

I'm guessing your inclusion of httpcore-nio-4.0-alpha6.jar is the issue, as that version doesn't have that class in it, which appears to be interfering with the version of httpcore-nio that is included transitively with the nifi-spark-receiver (currently version 4.4.6).

gatling jms-Nosuchmethoderror

I installed ActiveMQ in my local machine. I want to test it using Gatling. I created one simulation class to test my JMS. But I am getting error shown below.
Here is My Code:
import io.gatling.core.Predef._
import io.gatling.jms.Predef._
import scala.concurrent.duration._
import javax.jms._
import org.apache.activemq.ActiveMQConnectionFactory
class JmsTest extends Simulation{
val jmsConfig = jms
.connectionFactoryName("connectionFactory")
.url("tcp://localhost:61616")
.credentials("admin", "admin")
.contextFactory(classOf[org.apache.activemq.jndi.ActiveMQInitialContextFactory].getName)
.listenerCount(1)
val scn = scenario("JMS DSL test").repeat(1) {
exec(jms("req reply testing")
.reqreply
.queue("MyQueue")
.replyQueue("MyTopic")
.textMessage("Hello this is Naveen")
)
}
setUp(scn.inject(atOnceUsers(1)))
.protocols(jmsConfig)
}
I created jndi.properties file with following code.
java.naming.factory.initial = org.apache.activemq.jndi.ActiveMQInitialContextFactory
java.naming.provider.url = vm://localhost
connectionFactoryNames = connectionFactory
queue.MyQueue = TestJms1
topic.MyTopic = TestJms1
The error which I am getting is:
Exception in thread "main" java.lang.NoSuchMethodError: io.gatling.jms.Predef$.jms()Lio/gatling/jms/protocol/JmsProtocolBuilderBase$;
at computerdatabase.JmsTest.<init>(JmsTest.scala:14)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at java.lang.Class.newInstance(Class.java:442)
at io.gatling.app.Gatling$.io$gatling$app$Gatling$$$anonfun$1(Gatling.scala:41)
at io.gatling.app.Gatling.run(Gatling.scala:92)
at io.gatling.app.Gatling.runIfNecessary(Gatling.scala:75)
at io.gatling.app.Gatling.start(Gatling.scala:65)
at io.gatling.app.Gatling$.start(Gatling.scala:57)
at io.gatling.app.Gatling$.fromArgs(Gatling.scala:49)
at io.gatling.app.Gatling$.main(Gatling.scala:43)
at io.gatling.app.Gatling.main(Gatling.scala)

Running Dataproc BigQuery example on local machine

I am trying to run the connector example on local machine but keep getting UnknownHostException. How do I configure the access to BigQuery using the Hadoop Connector?
package com.mycompany.dataproc;
import com.google.cloud.hadoop.io.bigquery.BigQueryConfiguration;
import com.google.cloud.hadoop.io.bigquery.GsonBigQueryInputFormat;
import com.google.gson.JsonObject;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.io.LongWritable;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaPairRDD;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.Function;
import scala.Tuple2;
public class BigQueryAccessExample {
JavaSparkContext jsc ;
public BigQueryAccessExample(JavaSparkContext jsc){
}
public static void main(String[] args) throws Exception{
SparkConf conf = new SparkConf()
.setAppName("BigQuery Reader").setMaster("local[5]");
conf.set("spark.serializer", org.apache.spark.serializer.KryoSerializer.class.getName());
JavaSparkContext jsc = new JavaSparkContext(conf);
String projectId = "mycompany-data";
String fullyQualifiedInputTableId = "mylogs.display20151030";
Configuration hadoopConfiguration = jsc.hadoopConfiguration();
//BigQueryConfiguration.
// Set the job-level projectId.
hadoopConfiguration.set(BigQueryConfiguration.PROJECT_ID_KEY, projectId);
// Use the systemBucket for temporary BigQuery export data used by the InputFormat.
String bucket = "my-spark-test";
hadoopConfiguration.set(BigQueryConfiguration.GCS_BUCKET_KEY, bucket);
com.google.cloud.hadoop.io.bigquery.
// Configure input and output for BigQuery access.
BigQueryConfiguration.configureBigQueryInput(hadoopConfiguration, fullyQualifiedInputTableId);
//BigQueryConfiguration.configureBigQueryOutput(conf, fullyQualifiedOutputTableId, outputTableSchema);
JavaPairRDD<LongWritable, JsonObject> tableData = jsc.newAPIHadoopRDD(hadoopConfiguration, GsonBigQueryInputFormat.class, LongWritable.class, JsonObject.class);
//tableData.count();
JavaRDD<JsonObject> myRdd = tableData.map(new Function<Tuple2<LongWritable, JsonObject>, JsonObject>() {
public JsonObject call(Tuple2<LongWritable, JsonObject> v1) throws Exception {
System.out.println(String.format("idx: %s val: %s", v1._1(), v1._2().toString()));
return v1._2();
}
});
myRdd.take(10);
}
}
but I get UnknownHostException
java.net.UnknownHostException: metadata
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:184)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:432)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:527)
at sun.net.www.http.HttpClient.<init>(HttpClient.java:211)
at sun.net.www.http.HttpClient.New(HttpClient.java:308)
at sun.net.www.http.HttpClient.New(HttpClient.java:326)
at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1168)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1104)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:998)
at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:932)
at com.google.api.client.http.javanet.NetHttpRequest.execute(NetHttpRequest.java:93)
at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:972)
at com.google.cloud.hadoop.util.CredentialFactory$ComputeCredentialWithRetry.executeRefreshToken(CredentialFactory.java:142)
at com.google.api.client.auth.oauth2.Credential.refreshToken(Credential.java:489)
at com.google.cloud.hadoop.util.CredentialFactory.getCredentialFromMetadataServiceAccount(CredentialFactory.java:189)
at com.google.cloud.hadoop.util.CredentialConfiguration.getCredential(CredentialConfiguration.java:71)
at com.google.cloud.hadoop.io.bigquery.BigQueryFactory.createBigQueryCredential(BigQueryFactory.java:81)
at com.google.cloud.hadoop.io.bigquery.BigQueryFactory.getBigQuery(BigQueryFactory.java:101)
at com.google.cloud.hadoop.io.bigquery.BigQueryFactory.getBigQueryHelper(BigQueryFactory.java:89)
at com.google.cloud.hadoop.io.bigquery.AbstractBigQueryInputFormat.getBigQueryHelper(AbstractBigQueryInputFormat.java:363)
at com.google.cloud.hadoop.io.bigquery.AbstractBigQueryInputFormat.getSplits(AbstractBigQueryInputFormat.java:102)
at org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:115)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1277)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:306)
at org.apache.spark.rdd.RDD.take(RDD.scala:1272)
at org.apache.spark.api.java.JavaRDDLike$class.take(JavaRDDLike.scala:494)
at org.apache.spark.api.java.AbstractJavaRDDLike.take(JavaRDDLike.scala:47)
at com.mycompany.dataproc.BigQueryAccessExample.main(BigQueryAccessExample.java:57)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)
I appears that I need to set up access credentials or permissions .. but I don't see any docs regarding that.
I downloaded credentials from https://console.developers.google.com/project//apiui/credential
and set up GOOGLE_APPLICATION_CREDENTIALS but that didn't seem to work.
Any help?

The simplest way is to create a new service account and download the .p12 file (the Hadoop connectors do not currently support Application Default Credentials or JSON keyfiles):
String serviceAccount = "foo#bar.gserviceaccount.com";
String localKeyfile = "/path/to/local/keyfile.p12";
hadoopConfiguration.set("google.cloud.auth.service.account.enable", true);
hadoopConfiguration.set("google.cloud.auth.service.account.email", serviceAccount);
hadoopConfiguration.set("google.cloud.auth.service.account.keyfile", localKeyfile);

Issue while using hbase java client while puting data in database

I am testing the hbase . i am using a standalone one without hadoop. i used the version hbase 0.90.6 the code worked fine and i upgraded to latest version 0.94.0 it fails and gives this exception while i try to put datas in table.
Exception
Exception in thread "main" org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 action: DoNotRetryIOException: 1 time, servers with issues: xxxx:36601,
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1591)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1367)
at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:945)
at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:801)
at org.apache.hadoop.hbase.client.HTable.put(HTable.java:776)
at com.hhase.Hbase.main(Hbase.java:22)
I am using the below code.
package com.hhase;
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.client.Get;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.util.Bytes;
public class Hbase {
public static void main(String args[]) throws IOException {
Configuration hConf = HBaseConfiguration.create();
HTable table = new HTable(hConf, "myLittleHBaseTable");
Put p = new Put(Bytes.toBytes("myLittleRow"));
Put put = new Put(Bytes.toBytes("myLittleRow"));
put.add(Bytes.toBytes("myLittleFamily"),
Bytes.toBytes("someQualifier"), Bytes.toBytes("Some"));
table.put(put);
}
}
Library used
commons-cli-1.2.jar hadoop-core-1.0.2.jar
commons-codec-1.4.jar hbase-0.94.0.jar
commons-collections-3.2.1.jar httpclient-4.1.2.jar
commons-configuration-1.6.jar httpcore-4.1.4.jar
commons-httpclient-3.1.jar log4j-1.2.16.jar
commons-io-2.1.jar protobuf-java-2.4.0a.jar
commons-lang-2.5.jar slf4j-api-1.5.8.jar
commons-logging-1.1.1.jar slf4j-log4j12-1.5.8.jar
commons-net-1.4.1.jar stax-api-1.0.1.jar
guava-r09.jar zookeeper-3.4.3.jar

I encountered same error while inserting data into the HBase.
In my case, it was due to incorrect column family name.
Please refer to this conversation.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

NiFi: Using xml.etree.ElementTree in Python script - etl

Spaxparser2 depricated in Nifi 1.5 https://community.cloudera.com/

Related

Error creating bean, Bean instantiation via factory method failed

NiFi & Spark Integration Error: java.lang.NoClassDefFoundError: org/apache/http/nio/protocol/HttpAsyncResponseConsumer

gatling jms-Nosuchmethoderror

Running Dataproc BigQuery example on local machine

Issue while using hbase java client while puting data in database

Categories

Resources