calcite select count(intCol) from table when row type is _MAP (elasticsearch example) - elasticsearch

I'm new to Calcite. The functionality it provides look fabulous!
While doing a research, I'm trying to figure out how to do some basic SQL queries with example ElasticSearch adapter.
In the AbstractElasticsearchTable.getRowType, it maps rows to a MAP.
The issue is:
Query:
select * from zips where \"city\" = 'BROOKLYN'
returns:
city=BROOKLYN; longitude=-73.956985; latitude=40.646694; pop=111396; state=NY; id=11226
Query:
select \"pop\" from zips where \"city\" = 'BROOKLYN'
returns:
pop={pop=111396}
My goal is to sum up all the 'pop' values.
So when I construct query like this:
select sum(\"pop\") from zips where \"city\" = 'BROOKLYN'
The error is:
Caused by: java.lang.ClassCastException: java.util.LinkedHashMap cannot be cast to java.lang.Integer
at Baz$2.apply(Unknown Source)
at Baz$2.apply(Unknown Source)
at org.apache.calcite.linq4j.EnumerableDefaults.aggregate(EnumerableDefaults.java:117)
at org.apache.calcite.linq4j.DefaultEnumerable.aggregate(DefaultEnumerable.java:107)
at Baz.bind(Unknown Source)
at org.apache.calcite.jdbc.CalcitePrepare$CalciteSignature.enumerable(CalcitePrepare.java:356)
Can somebody point me to the right direction to figure out how to do aggregations with such mapping like in example?
To execute this query I added a test into ElasticSearchAdapterTest.java.
#Test
public void select() {
CalciteAssert.that().with(newConnectionFactory())
.query("select sum(\"pop\") from zips where \"city\" = 'BROOKLYN'").returns("");
}

Implementation of RowType as StructType resolved my issue. Here it is:
public RelDataType getRowType(RelDataTypeFactory typeFactory) {
try {
Map<String, String> mapping = getMapping();
List<RelDataType> types = new ArrayList<RelDataType>();
List<String> names = new ArrayList<>();
for(Map.Entry<String, String> e : mapping.entrySet()) {
names.add(e.getKey());
types.add(translateEsType(e.getValue(), typeFactory));
}
return typeFactory.createStructType(types, names);
} catch(IOException e) {
throw new RuntimeException(e.getMessage(), e);
}
}

Related

JPA Criteria api - Total records for concrete query within pagination

I am programming function for pagination in my repository layer. Function receive as parameters spring's pageable object and some value like this:
public Page<Foo> filterFoo(Pageable pageable, String value) {
CriteriaBuilder cb = entityManager.getCriteriaBuilder();
CriteriaQuery<Foo> fooQuery = cb.createQuery(Foo.class);
Root<Foo> foo = fooQuery .from(Foo.class);
fooQuery .where(adding predicate for match value);
List<Foo> result = entityManager.createQuery(fooQuery )
.setFirstResult((pageable.getPageNumber() - 1) * pageable.getPageSize())
.setMaxResults(pageable.getPageSize())
.getResultList();
return new PageImpl<>(result, pageable, xxxx);
}
Function return spring's PageImpl object filled with my result. To PageImpl I also need set total count of objects which suit predicates. This count number have to be of course without maxResult and firstResult. Is possible create another database call with my fooQuery to get total database records for that query without limit? What is the best practise to use pageable and criteria api in JPA? Thank you in advice.
Because generated SQL uses aliases - you may need make separate query for get total count of rows.
For example:
CriteriaQuery<Long> countQuery = cb.createQuery(Long.class);
countQuery.select(cb.count(countQuery.from(Foo.class)));
if (Objects.nonNull(filters)) {
countQuery.where(filters);
}
return new PageImpl<>(result, pageable, em.createQuery(countQuery).getSingleResult());
where filters is equal to your adding predicate for match value expression.
Also, you may use a TupleQuery with custom SQL function for calculate count of rows in one select query.
Like this:
public class SqlFunctionsMetadataBuilderContributor implements MetadataBuilderContributor {
#Override
public void contribute(MetadataBuilder metadataBuilder) {
metadataBuilder.applySqlFunction(
"count_over",
new SQLFunctionTemplate(
StandardBasicTypes.LONG,
"(count(?1) over())"
)
);
}
}
and Criteria:
public Page<Foo> findAll(Specification<Foo> specification, Pageable pageable) {
CriteriaQuery<Tuple> cq = cb.createTupleQuery();
Root<Foo.class> fooRoot = cq.from(Foo.class);
cq.select(cb.tuple(fooRoot, cb.function("count_over", Long.class, fooRoot.get("id"))));
Predicate filters = specification.toPredicate(fooRoot, cq, cb);
if (Objects.nonNull(filters)) {
cq.where(filters);
}
TypedQuery<Tuple> query = em.createQuery(cq);
query.setFirstResult((int) pageable.getOffset());
query.setMaxResults(pageable.getPageSize());
List<Tuple> result = query.getResultList();
if (result.isEmpty()) {
return new PageImpl<>(List.of());
}
return new PageImpl<>(
result.stream().map(tuple -> (Foo) tuple.get(0)).collect(toUnmodifiableList()),
pageable,
(long) result.get(0).get(1)
);
}
See more about SQLFunction: https://vladmihalcea.com/hibernate-sql-function-jpql-criteria-api-query/ and Custom SQL for Order in JPA Criteria API

spark jdbc api can't use built-in function

I want to get subquery from impala table as one dataset.
Code like this:
String subQuery = "(select to_timestamp(unix_timestamp(now())) as ts from my_table) t"
Dataset<Row> ds = spark.read().jdbc(myImpalaUrl, subQuery, prop);
But result is error:
Caused by: java.sql.SQLDataException: [Cloudera][JDBC](10140) Error converting value to Timestamp.
I can use unix_timestamp function,but to_timestmap failed, why?
I found code in org.apache.spark.sql.execution.datasources.jdbc.JDBC.compute() exists some problem:
sqlText = s"SELECT $columnList FROM ${options.table} $myWhereClause"
$columList contains " like "col_name" , when I delete " it work fine.
I solve this problem by add dialect, default dialect will add "" to column name,
JdbcDialect ImpalaDialect = new JdbcDialect(){
#Override
public boolean canHandle(String url) {
return url.startsWith("jdbc:impala") || url.contains("impala");
}
#Override
public String quoteIdentifier(String colName) {
return colName;
}
};
JdbcDialects.registerDialect(ImpalaDialect);

Missing document in Elasticsearch when using BulkProcessor

I'm using a [java] kafka-producer to push data to kafka-topic x and a [java] high level consumer/bulkProcessor to read from topic x and index data to elasticsearch. The producer pushes 10 docs each time. When I start my java code for bulkProcessor for the first time after running producer, I see only 9 records being pushed to ES, all with "_version": 1. The 10th record is not in ES.
But somehow, beforeBulk() and afterBulk() methods show the follwoing results.
Going to execute new bulk composed of 10 actions
Executed bulk composed of 10 actions
This moment onwards, if I remove the elasticsearch index and use the producer, I see 10 records consistently. I have no idea why this is happening. Any help is appreciated.
Note: ES version 2.2.0
Kafka: 0.9.0.0
EDIT [Added relevant code]
public Consumer(KafkaStream a_stream, int a_threadNumber, String esHost, String esCluster, int bulkSize, String topic) {
/*Create transport client*/
BulkProcessor bulkProcessor;
this.bulkProcessor = BulkProcessor.builder(client, new BulkProcessor.Listener() {
public void beforeBulk(long executionId, BulkRequest request) {
System.out.format("Going to execute new bulk composed of %d actions\n", request.numberOfActions());
}
public void afterBulk(long executionId, BulkRequest request, BulkResponse response) {
System.out.format("Executed bulk composed of %d actions\n", response.getItems().length);
}
public void afterBulk(long executionId, BulkRequest request, Throwable failure) {
System.out.format("Error executing bulk", failure);
}
}).setBulkActions(bulkSize)
.setBulkSize(new ByteSizeValue(200, ByteSizeUnit.MB))
.setFlushInterval(TimeValue.timeValueSeconds(1))
.build();
}
public void run() {
ConsumerIterator<byte[], byte[]> it = m_stream.iterator();
while (it.hasNext()) {
byte[] x = it.next().message();
try {
bulkProcessor.add(new IndexRequest(index, type, id.toString()).source(modifyMsg(x).toString()));
}
catch (Exception e) {
logger.warn("bulkProcessor failed: " + m_threadNumber + e.getMessage());
}
}
logger.info("Shutting down Thread: " + m_threadNumber);
}
Docs going to ES are of the following form:
{"index":"temp1","type":"temp2","id":"0","event":"we're doomed"}
{"index":"temp1","type":"temp2","id":"1","event":"we're doomed"}
{"index":"temp1","type":"temp2","id":"2","event":"we're doomed"}
...
{"index":"temp1","type":"temp2","id":"9","event":"we're doomed"}
[EDIT]
If I add the following line in my run() method the problem is gone.
public void run() {
...
bulkProcessor.add(new IndexRequest("")); //Added this line
while (it.hasNext()) {
...
}
...
}
I feel like such a fool. In the line bulkProcessor.add(new IndexRequest(index, type, id.toString()).source(modifyMsg(x).toString())); the method modifyMsg() was initializing index, type and id, which was set to empty string in the constructor. That's why my first index request was failing as it had invalid index name.

Need DB Table name for multiple queries executed using spring JDBCTemplate

I am executing multiple queries concurrently and retrieving the results. But, the queries belong to multiple tables so, when resultset is retrieved, it is difficult to identify that a resultset belong to which table.
Can anyone help here as to how to identify the table names for each query resultset?
I tried below code but table name is blank!!!!
public static void getColumnNames(ResultSet rs) throws SQLException {
if (rs == null) {
return;
}
// get result set meta data
ResultSetMetaData rsMetaData = rs.getMetaData();
int numberOfColumns = rsMetaData.getColumnCount();
// get the column names; column indexes start from 1
for (int i = 1; i < numberOfColumns + 1; i++) {
String columnName = rsMetaData.getColumnName(i);
// Get the name of the column's table name
String tableName = rsMetaData.getTableName(i);
System.out.println("column name=" + columnName + " table=" + tableName + "");
}
}
I am calling this method like this:
jdbcTemplate.query(sql, new ResultSetExtractor<ResultSet>() {
#Override
public ResultSet extractData(ResultSet resultSet) throws SQLException,
DataAccessException {
getColumnNames(resultSet);
return resultSet;
}
});
Please advise, what is done wrong here? :(
You're not doing anything wrong here. The problem is caused by the method itself in connection with your DBMS or your JDBC driver, respectively.
See this doc please. 'table name or "" if not applicable' suggests that in your case the DBMS/driver does not provide the required information, causing the method to return an empty string.
I'm afraid, you'll have to find another way to detect which query the result originated from.

How to query relational data using subclasses? parse.com and Unity

Im trying to query all elements of subclass in Unity. I have found SDK constraint or missing something here.
According to documentation querying subclasses is possible.
> var query = new ParseQuery<Armor>()
.WhereLessThanOrEqualTo("rupees", ((Player)ParseUser.CurrentUser).Rupees);
query.FindAsync().ContinueWith(t =>
{
IEnumerable<Armor> result = t.Result;
});
Im however using relation table and cannot specify
Here is my code:
IEnumerator LoadMyDesigns(Action<RequestResult> result) {
ParseUser user = ParseUser.CurrentUser;
ParseRelation<Design> relation = user.GetRelation<Design>("designs");
Task<IEnumerable<Design>> task = relation.Query.FindAsync();
while (!task.IsCompleted) yield return new WaitForEndOfFrame();
if (task.IsFaulted) {
//error
foreach(var e in task.Exception.InnerExceptions) {
ParseException parseException = (ParseException) e;
Debug.LogError("Error message " + parseException.Message);
Debug.LogError("Error code: " + parseException.Code);
result(new RequestResult(true, parseException.Message));
}
}
else {
result(new RequestResult(true, new List<Design>(task.Result)));
}
}
And error:
ArgumentNullException: Must specify a ParseObject class name when creating a ParseQuery.
So the question is how do I specify query subclass type when using relations?
Thanks.
I've struggled with the same problem and in my case I needed to provide the propertyName again in de GetRelationProperty call.
For example:
[ParseFieldName("designs")]
public ParseRelation<Design> Designs
{
get { return GetRelationProperty<Design>("Designs"); }
}
Try querying your designs Table.
Make a new query for class "Designs" where equal("owner", PFUser.currentUser())
This should return all of the designs for the current User.

Resources