Spark RDD throwing NullPointerException

Spark RDD throwing NullPointerException - hadoop

I am facing issue when I try to get some products from hive table and process/apply rools in spark.
//function which return products from Hive table
def getProductsList(hiveContext: org.apache.spark.sql.hive.HiveContext): scala.collection.mutable.MutableList[Product] = {
val products = scala.collection.mutable.MutableList[Product]()
val results = hiveContext.sql("select item_id,value from details where type_id=12");
val collection = results.collect();
var i = 0;
results.collect.foreach(t => {
val product = new Product(collection(i)(0).asInstanceOf[Long], collection(i)(1).asInstanceOf[String]);
i = i+ 1;
products += product
})
products
}
Calling getProductsList function and applying drools rools on products.
val randomProducts = this.getProductsList(hiveContext)
val rdd = ssc.sparkContext.parallelize(randomProducts)
val evaluatedProducts = rdd.mapPartitions(incomingProducts => {
print("Hello");
rulesExecutor.evalRules(incomingProducts) })
val productdf = hiveContext.applySchema(evaluatedProducts, classOf[Product])
})
As showin in above rdd mapPartitions iteration not happening and it is throwing following error. But I am sure rdd is not empty.
Exception in thread "main" java.lang.NullPointerException
at org.spark-project.guava.reflect.TypeToken.method(TypeToken.java:465)
at org.apache.spark.sql.catalyst.JavaTypeInference$$anonfun$2.apply(JavaTypeInference.scala:103)
at org.apache.spark.sql.catalyst.JavaTypeInference$$anonfun$2.apply(JavaTypeInference.scala:102)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)
at org.apache.spark.sql.catalyst.JavaTypeInference$.org$apache$spark$sql$catalyst$JavaTypeInference$$inferDataType(JavaTypeInference.scala:102)
at org.apache.spark.sql.catalyst.JavaTypeInference$.inferDataType(JavaTypeInference.scala:47)
at org.apache.spark.sql.SQLContext.getSchema(SQLContext.scala:995)
at org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:488)
at org.apache.spark.sql.SQLContext.applySchema(SQLContext.scala:1028)
at com.cloudera.sprue.ValidateEan$.main(ValidateEan.scala:70)
at com.cloudera.sprue.ValidateEan.main(ValidateEan.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:672)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
16/05/05 07:44:48 INFO SparkContext: Invoking stop() from shutdown hook
Please help me out to resolve this issue.

As we need the final result as DataFrame, Let's use SchemaRDD which returned from hiveContext.sql().
//defining schema
case class Product(id: Long, value: String)
//loading data from Hive table
val results: DataSet[Row] = hiveContext.sql("select item_id,value from details where type_id=12")
//convert ROW type to Product type then pass it to rulesExecutor.evalRules()
val evaluatedProducts = results.map(productRow => rulesExecutor.evalRules(Product(productRow.getLong(0), productRow.getString(1)))).toDF()
I'd assume rulesExecutor.evalRules() will aceeptProduct type. If not we can go head with Row type (without explicitly converting in map()).

Related

Flink data type does not match when add time attributes by table source

I tried to add a table source with event time attribute according to flink doc. My codes like:
class SISSourceTable
extends StreamTableSource[Row]
with DefinedRowtimeAttributes
with FlinkCal
with FlinkTypeTags {
private[this] val profileProp = ConfigurationManager.loadBusinessProperty
val topic: String = ...
val schemas = Seq(
(TsCol, SQLTimestamp),
(DCol, StringTag),
(CCol, StringTag),
(RCol, StringTag)
)
override def getProducedDataType: DataType = DataTypes.ROW(extractFields(schemas): _*)
override def getTableSchema: TableSchema =
new TableSchema.Builder()
.fields(extractFieldNames(schemas), extractFieldDataTypes(schemas))
.build()
override def getRowtimeAttributeDescriptors: util.List[RowtimeAttributeDescriptor] =
Collections.singletonList(
new RowtimeAttributeDescriptor(
TsCol,
new ExistingField(TsCol),
new AscendingTimestamps
)
)
override def getDataStream(execEnv: StreamExecutionEnvironment): DataStream[Row] = {
val windowTime: Int = profileProp.getProperty("xxx", "300").toInt
val source = prepareSource(topic)
val colsToCheck = List(RCol, CCol, DCol)
execEnv
.addSource(source)
.map(new MapFunction[String, Map[String, String]]() {
override def map(value: String): Map[String, String] = ...
})
.map(new MapFunction[Map[String, String], Row]() {
override def map(value: Map[String, String]): Row = {
Row.of(new Timestamp(value(TsCol).toLong * 1000), value(DCol), value(CCol), value(RCol))
}
})
.assignTimestampsAndWatermarks(new BoundedOutOfOrdernessTimestampExtractor[Row](Time.seconds(windowTime)) {
override def extractTimestamp(element: Row): Long = element.getField(0).asInstanceOf[Timestamp].getTime
})
}
}
The source I get in getDataStream method is a Kafka string source. And there's a TsCol which I extracted from each kafka record. I want to use the TsCol as event time. However the TsCol is a 10 digits timestamp with string data type, so I need to transform it to 13 digits Long data type. When I tried to use 13 digits Long data as rowtime, I got exception said rowtime can only be extract from a SQL_TIMESTAMP column. So I tranformed the ts col to a java.sql.Timestamp in the end. When I registered above Source Table and run the flink. I got following exception:
org.apache.flink.table.api.TableException: TableSource of type com.mob.mobeye.flink.table.source.StayInStoreSourceTable returned a DataStream of data type ROW<`t` TIMESTAMP(3), `mac` STRING, `c` STRING, `r` STRING> that does not match with the data type ROW<`t` TIMESTAMP(3), `mac` STRING, `c` STRING, `r` STRING> declared by the TableSource.getProducedDataType() method. Please validate the implementation of the TableSource.
at org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecTableSourceScan.translateToPlanInternal(StreamExecTableSourceScan.scala:113)
at org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecTableSourceScan.translateToPlanInternal(StreamExecTableSourceScan.scala:55)
at org.apache.flink.table.planner.plan.nodes.exec.ExecNode.translateToPlan(ExecNode.scala:54)
at org.apache.flink.table.planner.plan.nodes.exec.ExecNode.translateToPlan$(ExecNode.scala:52)
at org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecTableSourceScan.translateToPlan(StreamExecTableSourceScan.scala:55)
at org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecCalc.translateToPlanInternal(StreamExecCalc.scala:86)
at org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecCalc.translateToPlanInternal(StreamExecCalc.scala:46)
at org.apache.flink.table.planner.plan.nodes.exec.ExecNode.translateToPlan(ExecNode.scala:54)
at org.apache.flink.table.planner.plan.nodes.exec.ExecNode.translateToPlan$(ExecNode.scala:52)
at org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecCalc.translateToPlan(StreamExecCalc.scala:46)
at org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecExchange.translateToPlanInternal(StreamExecExchange.scala:84)
at org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecExchange.translateToPlanInternal(StreamExecExchange.scala:44)
at org.apache.flink.table.planner.plan.nodes.exec.ExecNode.translateToPlan(ExecNode.scala:54)
at org.apache.flink.table.planner.plan.nodes.exec.ExecNode.translateToPlan$(ExecNode.scala:52)
at org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecExchange.translateToPlan(StreamExecExchange.scala:44)
at org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecGroupWindowAggregate.translateToPlanInternal(StreamExecGroupWindowAggregate.scala:140)
at org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecGroupWindowAggregate.translateToPlanInternal(StreamExecGroupWindowAggregate.scala:55)
at org.apache.flink.table.planner.plan.nodes.exec.ExecNode.translateToPlan(ExecNode.scala:54)
at org.apache.flink.table.planner.plan.nodes.exec.ExecNode.translateToPlan$(ExecNode.scala:52)
at org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecGroupWindowAggregate.translateToPlan(StreamExecGroupWindowAggregate.scala:55)
at org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecCalc.translateToPlanInternal(StreamExecCalc.scala:86)
at org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecCalc.translateToPlanInternal(StreamExecCalc.scala:46)
at org.apache.flink.table.planner.plan.nodes.exec.ExecNode.translateToPlan(ExecNode.scala:54)
at org.apache.flink.table.planner.plan.nodes.exec.ExecNode.translateToPlan$(ExecNode.scala:52)
at org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecCalc.translateToPlan(StreamExecCalc.scala:46)
at org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecLookupJoin.translateToPlanInternal(StreamExecLookupJoin.scala:97)
at org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecLookupJoin.translateToPlanInternal(StreamExecLookupJoin.scala:40)
at org.apache.flink.table.planner.plan.nodes.exec.ExecNode.translateToPlan(ExecNode.scala:54)
at org.apache.flink.table.planner.plan.nodes.exec.ExecNode.translateToPlan$(ExecNode.scala:52)
at org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecLookupJoin.translateToPlan(StreamExecLookupJoin.scala:40)
at org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecCalc.translateToPlanInternal(StreamExecCalc.scala:86)
at org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecCalc.translateToPlanInternal(StreamExecCalc.scala:46)
at org.apache.flink.table.planner.plan.nodes.exec.ExecNode.translateToPlan(ExecNode.scala:54)
at org.apache.flink.table.planner.plan.nodes.exec.ExecNode.translateToPlan$(ExecNode.scala:52)
at org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecCalc.translateToPlan(StreamExecCalc.scala:46)
at org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecLookupJoin.translateToPlanInternal(StreamExecLookupJoin.scala:97)
at org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecLookupJoin.translateToPlanInternal(StreamExecLookupJoin.scala:40)
at org.apache.flink.table.planner.plan.nodes.exec.ExecNode.translateToPlan(ExecNode.scala:54)
at org.apache.flink.table.planner.plan.nodes.exec.ExecNode.translateToPlan$(ExecNode.scala:52)
at org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecLookupJoin.translateToPlan(StreamExecLookupJoin.scala:40)
at org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecCalc.translateToPlanInternal(StreamExecCalc.scala:86)
at org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecCalc.translateToPlanInternal(StreamExecCalc.scala:46)
at org.apache.flink.table.planner.plan.nodes.exec.ExecNode.translateToPlan(ExecNode.scala:54)
at org.apache.flink.table.planner.plan.nodes.exec.ExecNode.translateToPlan$(ExecNode.scala:52)
at org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecCalc.translateToPlan(StreamExecCalc.scala:46)
at org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecSink.translateToTransformation(StreamExecSink.scala:185)
at org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecSink.translateToPlanInternal(StreamExecSink.scala:133)
at org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecSink.translateToPlanInternal(StreamExecSink.scala:50)
at org.apache.flink.table.planner.plan.nodes.exec.ExecNode.translateToPlan(ExecNode.scala:54)
at org.apache.flink.table.planner.plan.nodes.exec.ExecNode.translateToPlan$(ExecNode.scala:52)
at org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecSink.translateToPlan(StreamExecSink.scala:50)
at org.apache.flink.table.planner.delegation.StreamPlanner.$anonfun$translateToPlan$1(StreamPlanner.scala:61)
at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:233)
at scala.collection.Iterator.foreach(Iterator.scala:937)
at scala.collection.Iterator.foreach$(Iterator.scala:937)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1425)
at scala.collection.IterableLike.foreach(IterableLike.scala:70)
at scala.collection.IterableLike.foreach$(IterableLike.scala:69)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at scala.collection.TraversableLike.map(TraversableLike.scala:233)
at scala.collection.TraversableLike.map$(TraversableLike.scala:226)
at scala.collection.AbstractTraversable.map(Traversable.scala:104)
at org.apache.flink.table.planner.delegation.StreamPlanner.translateToPlan(StreamPlanner.scala:60)
at org.apache.flink.table.planner.delegation.PlannerBase.translate(PlannerBase.scala:149)
at org.apache.flink.table.api.internal.TableEnvironmentImpl.translate(TableEnvironmentImpl.java:439)
at org.apache.flink.table.api.internal.TableEnvironmentImpl.insertInto(TableEnvironmentImpl.java:327)
at org.apache.flink.table.api.internal.TableImpl.insertInto(TableImpl.java:411)
I'm so confused that why the
ROW<t TIMESTAMP(3), mac STRING, c STRING, r STRING>
does not match with the data type
ROW<t TIMESTAMP(3), mac STRING, c STRING, r STRING>
I got similar error in another place where I replaced TIMESTAMP as Long and it worked. But here, I need column t to be extracted as rowtime, so it has to be of type TIMESTAMP(3). I greatly appreciate that someone can help with the problem.

What flink version are you using? If I am not mistaken you are using a version <1.9.2. Is that correct?
If it is so the exception message is not very helpful as it has a bug that was fixed in https://issues.apache.org/jira/browse/FLINK-15726. Before that actually the same type was printed twice.
There are a couple of problems with your implementation. The type mismatch is most probably because you produce a GenericTypeInformation returned by the map operator in
.map(new MapFunction[Map[String, String], Row]() {
override def map(value: Map[String, String]): Row = {
Row.of(new Timestamp(value(TsCol).toLong * 1000), value(DCol), value(CCol), value(RCol))
}
})
Try changing it to
.map(new MapFunction[Map[String, String], Row]() {
override def map(value: Map[String, String]): Row = {
Row.of(new Timestamp(value(TsCol).toLong * 1000), value(DCol), value(CCol), value(RCol))
}
}).returns(Types.ROW(Types.SQL_TIMESTAMP, Types.STRING, Types.STRING, Types.STRING))
Secondly you don't need to assign the timestamps and watermarks within the TableSource. They will be assigned automatically based on the information provided through DefinedRowtimeAttributes.

Mybatis: IllegalArgumentException: Mapped Statements collection does not contain value for xxx

I have two entities Vendor and Goods with one-to-many relation, the relation looks like:
I am using mybatis with annotation, the mapper:
GoodsMapper
public interface GoodsMapper {
#Select("select * from goods where id=#{goodsId}")
#Results({
#Result(id = true, column = "id", property = "id"),
#Result(column = "name", property = "name"),
#Result(column = "vendor_id", property = "vendor",
one = #One(select = "com.xxx.server.mapper.VendorMapper.getVendor"))
})
Goods getGoods(#Param("goodsId") String goodsId);
}
VendorMapper
public interface VendorMapper {
#Select("select * from vendor where id=#{vendorId}")
Vendor getVendor(#Param("vendorId") String vendorId);
}
ignore the entity code & others...
when I invoked goodsMapper.getGoods(goodsId), I caught the following exception :
Caused by: org.apache.ibatis.exceptions.PersistenceException:
### Error querying database. Cause: java.lang.IllegalArgumentException: Mapped Statements collection does not contain value for com.xxx.server.mapper.VendorMapper.getVendor
### The error may exist in com/xxx/server/mapper/GoodsMapper.java (best guess)
### The error may involve com.xxx.server.mapper.GoodsMapper.getGoods
### The error occurred while handling results
### SQL: select * from goods where id=?
### Cause: java.lang.IllegalArgumentException: Mapped Statements collection does not contain value for com.xxx.server.mapper.VendorMapper.getVendor
at org.apache.ibatis.exceptions.ExceptionFactory.wrapException(ExceptionFactory.java:30)
at org.apache.ibatis.session.defaults.DefaultSqlSession.selectList(DefaultSqlSession.java:150)
at org.apache.ibatis.session.defaults.DefaultSqlSession.selectList(DefaultSqlSession.java:141)
at org.apache.ibatis.session.defaults.DefaultSqlSession.selectOne(DefaultSqlSession.java:77)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.mybatis.spring.SqlSessionTemplate$SqlSessionInterceptor.invoke(SqlSessionTemplate.java:433)
... 117 more
Caused by: java.lang.IllegalArgumentException: Mapped Statements collection does not contain value for com.xxx.server.mapper.VendorMapper.getVendor
at org.apache.ibatis.session.Configuration$StrictMap.get(Configuration.java:933)
at org.apache.ibatis.session.Configuration.getMappedStatement(Configuration.java:726)
at org.apache.ibatis.session.Configuration.getMappedStatement(Configuration.java:719)
at org.apache.ibatis.executor.resultset.DefaultResultSetHandler.getNestedQueryMappingValue(DefaultResultSetHandler.java:740)
at org.apache.ibatis.executor.resultset.DefaultResultSetHandler.getPropertyMappingValue(DefaultResultSetHandler.java:465)
at org.apache.ibatis.executor.resultset.DefaultResultSetHandler.applyPropertyMappings(DefaultResultSetHandler.java:441)
I have checked the class path com.xxx.server.mapper.VendorMapper.getVendor for the select of #One, it is correct.
Appreciate any kind help~

In my case this was caused by referenced collection not being initialized by Spring yet.
Solution is to add #DependsOn annotation to the "parent" mapper.
#DependsOn("VendorMapper")
public interface GoodsMapper{
...
}
#Repository("VendorMapper")
public interface VendorMapper {
...
}

calcite select count(intCol) from table when row type is _MAP (elasticsearch example)

I'm new to Calcite. The functionality it provides look fabulous!
While doing a research, I'm trying to figure out how to do some basic SQL queries with example ElasticSearch adapter.
In the AbstractElasticsearchTable.getRowType, it maps rows to a MAP.
The issue is:
Query:
select * from zips where \"city\" = 'BROOKLYN'
returns:
city=BROOKLYN; longitude=-73.956985; latitude=40.646694; pop=111396; state=NY; id=11226
Query:
select \"pop\" from zips where \"city\" = 'BROOKLYN'
returns:
pop={pop=111396}
My goal is to sum up all the 'pop' values.
So when I construct query like this:
select sum(\"pop\") from zips where \"city\" = 'BROOKLYN'
The error is:
Caused by: java.lang.ClassCastException: java.util.LinkedHashMap cannot be cast to java.lang.Integer
at Baz$2.apply(Unknown Source)
at Baz$2.apply(Unknown Source)
at org.apache.calcite.linq4j.EnumerableDefaults.aggregate(EnumerableDefaults.java:117)
at org.apache.calcite.linq4j.DefaultEnumerable.aggregate(DefaultEnumerable.java:107)
at Baz.bind(Unknown Source)
at org.apache.calcite.jdbc.CalcitePrepare$CalciteSignature.enumerable(CalcitePrepare.java:356)
Can somebody point me to the right direction to figure out how to do aggregations with such mapping like in example?
To execute this query I added a test into ElasticSearchAdapterTest.java.
#Test
public void select() {
CalciteAssert.that().with(newConnectionFactory())
.query("select sum(\"pop\") from zips where \"city\" = 'BROOKLYN'").returns("");
}

Implementation of RowType as StructType resolved my issue. Here it is:
public RelDataType getRowType(RelDataTypeFactory typeFactory) {
try {
Map<String, String> mapping = getMapping();
List<RelDataType> types = new ArrayList<RelDataType>();
List<String> names = new ArrayList<>();
for(Map.Entry<String, String> e : mapping.entrySet()) {
names.add(e.getKey());
types.add(translateEsType(e.getValue(), typeFactory));
}
return typeFactory.createStructType(types, names);
} catch(IOException e) {
throw new RuntimeException(e.getMessage(), e);
}
}

Perform aggregation in Elasticsearch index with Spark in Java

I want to prepare a Java class that will read an index from Elasticsearch, perform aggregations using Spark and then write the results back to Elasticsearch. The target schema (in the form of StructType) is the same as the source one. My code is as follows
SparkConf conf = new SparkConf().setAppName("Aggregation").setMaster("local");
JavaSparkContext sc = new JavaSparkContext(conf);
SQLContext sqlContext = new SQLContext(sc);
JavaPairRDD<String, Map<String, Object>> pairRDD = JavaEsSpark.esRDD(sc, "kpi_aggregator/record");
RDD rdd = JavaPairRDD.toRDD(pairRDD);
Dataset df = sqlContext.createDataFrame(rdd, customSchema);
df.registerTempTable("data");
Dataset kpi1 = sqlContext.sql("SELECT host, SUM(bytes_uplink), SUM(bytes_downlink) FROM data GROUP BY host");
JavaEsSparkSQL.saveToEs(kpi1, "kpi_aggregator_total/record");
I am using the latest version of spark-core_2.11 and elasticsearch-spark-20_2.11. The previous code results in the following exception
java.lang.ClassCastException: scala.Tuple2 cannot be cast to org.apache.spark.sql.Row
Any ideas what I am doing wrong?

You get this exception because sqlContext.createDataFrame(rdd, customSchema) expects RDD<CustomSchemaJavaBean> but instead you pass to it results of JavaPairRDD.toRDD(pairRDD) which is RDD<Tuple2<String, Map<String, Object>>>. You have to map your JavaPairRDD<String, Map<String, Object>> to RDD<CustomSchemaJavaBean>:
SparkConf conf = new SparkConf().setAppName("Aggregation").setMaster("local");
JavaSparkContext sc = new JavaSparkContext(conf);
SQLContext sqlContext = new SQLContext(sc);
JavaRDD<CustomSchemaBean> rdd = JavaEsSpark.esRDD(sc, "kpi_aggregator/record")
.map(tuple2 -> {
/**transform Tuple2<String, Map<String, Object>> to CustomSchemaBean **/
return new CustomSchemaBean(????);
} );
Dataset df = sqlContext.createDataFrame(rdd, customSchema);
df.registerTempTable("data");
Dataset kpi1 = sqlContext.sql("SELECT host, SUM(bytes_uplink), SUM(bytes_downlink) FROM data GROUP BY host");
JavaEsSparkSQL.saveToEs(kpi1, "kpi_aggregator_total/record");
Notice I used JavaRDD not RDD both methods are legal.

Unable to locate table meta data for table

I am using Spring and my class is annotated with #Transactional.
I am using the SimpleJdbcInsert but I am getting the following warning:
TableMetaDataProvider: - Unable to locate table meta data for
'data.data_insert' -- column names must be provided
I have three tables and all the three are having the relationship such that:
primary key of table1 is the foreign key in table 2 and the primary key in table 2 is the foreign key in table 3.
Showing table 1 insert code:
java.sql.Timestamp timestamp = getCurrentJavaSqlTimestamp();
Map<String, Object> params = new HashMap<String, Object>();
params.put("notes", task.getNotes());
params.put("recording_time", timestamp);
params.put("end_user_id", 805);
SimpleJdbcInsert insertData = new SimpleJdbcInsert(dataSource).
withTableName("data.data_insert").
usingColumns("notes", "recording_time",
"end_user_id").usingGeneratedKeyColumns("data_id");
long dataId = insertData.executeAndReturnKey(params).longValue();
The error logs:
2015-09-29 14:10:27,133 WARN [http-8080-2] LegacyFlexJsonExceptionMessageConverter: - Generated Key Name(s) not specificed. Using the generated keys features requires specifying the name(s) of the generated column(s) for User ID: 805, Request ID: f8da3bb5-0613-4a74-9ca8-95a6ab4f1692, clientIP: 127.0.0.1 uri: /admin/dataInsert, Request Parameters:
org.springframework.dao.InvalidDataAccessApiUsageException: Generated Key Name(s) not specificed. Using the generated keys features requires specifying the name(s) of the generated column(s)
at org.springframework.jdbc.core.simple.AbstractJdbcInsert.prepareStatementForGeneratedKeys(AbstractJdbcInsert.java:530)
at org.springframework.jdbc.core.simple.AbstractJdbcInsert.access$0(AbstractJdbcInsert.java:528)
at org.springframework.jdbc.core.simple.AbstractJdbcInsert$1.createPreparedStatement(AbstractJdbcInsert.java:448)
at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:581)
at org.springframework.jdbc.core.JdbcTemplate.update(JdbcTemplate.java:843)
at org.springframework.jdbc.core.simple.AbstractJdbcInsert.executeInsertAndReturnKeyHolderInternal(AbstractJdbcInsert.java:445)
at org.springframework.jdbc.core.simple.AbstractJdbcInsert.executeInsertAndReturnKeyInternal(AbstractJdbcInsert.java:426)
at org.springframework.jdbc.core.simple.AbstractJdbcInsert.doExecuteAndReturnKey(AbstractJdbcInsert.java:380)
at org.springframework.jdbc.core.simple.SimpleJdbcInsert.executeAndReturnKey(SimpleJdbcInsert.java:122)
at com.gridpoint.energy.datamodel.impl.PGDataFixBackUpManagerBean.backupDataInRange(PGDataFixBackUpManagerBean.java:79)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:319)
at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183)

The correct one:
java.sql.Timestamp timestamp = getCurrentJavaSqlTimestamp();
Map<String, Object> params = new HashMap<String, Object>();
params.put("notes", task.getNotes());
params.put("recording_time", timestamp);
params.put("end_user_id", 805);
SimpleJdbcInsert insertData = new
SimpleJdbcInsert(dataSource).withSchemaName("data").
withTableName("data_insert")
usingColumns("notes", "recording_time",
"end_user_id").usingGeneratedKeyColumns("data_id");
long dataId = insertData.executeAndReturnKey(params).longValue();
So just needed a schemaName using withSchemaName.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Spark RDD throwing NullPointerException - hadoop

Related

Flink data type does not match when add time attributes by table source

Mybatis: IllegalArgumentException: Mapped Statements collection does not contain value for xxx

calcite select count(intCol) from table when row type is _MAP (elasticsearch example)

Perform aggregation in Elasticsearch index with Spark in Java

Unable to locate table meta data for table

Categories

Resources