Hibernate search, convert byte[] to List<LuceneWork> - jms

As of Hibernate Search 3.1.1, when one wanted to send an indexed entity to a JMS queue for further processing, in the onMessage() method of the processing MDB was enough to apply a cast to obtain the list of LuceneWork, e.g
List<LuceneWork> queue = (List<LuceneWork>) objectMessage.getObject();
But in version 4.2.0 this is no longer an option as objectMessage.getObject() returns a byte[].
How could I deserialize this byte[] into List<LuceneWork>?
I've inspected the message and saw that I have the value for JMSBackendQueueTask.INDEX_NAME_JMS_PROPERTY.

You could extend AbstractJMSHibernateSearchController and have it deal with these details, or have a look at its source which contains:
indexName = objectMessage.getStringProperty(JmsBackendQueueTask.INDEX_NAME_JMS_PROPERTY);
indexManager = factory.getAllIndexesManager().getIndexManager(indexName);
if (indexManager == null) {
log.messageReceivedForUndefinedIndex(indexName);
return;
}
queue = indexManager.getSerializer().toLuceneWorks((byte[]) objectMessage.getObject());
indexManager.performOperations(queue, null);
Compared to older versions 3.x there are two main design differences to keep in mind:
The Serializer service is pluggable so it needs to be looked up
Each index (identified by name) can have an independent backend
The serialization is now performed (by default) using Apache Avro as newer Lucene classes are not Serializable.

Related

Can I store sensitive data in a Vert.x context in a Quarkus application?

I am looking for a place to store some request scoped attributes such as user id using a Quarkus request filter. I later want to retrieve these attributes in a Log handler and put them in the MDC logging context.
Is Vertx.currentContext() the right place to put such request attributes? Or can the properties I set on this context be read by other requests?
If this is not the right place to store such data, where would be the right place?
Yes ... and no :-D
Vertx.currentContext() can provide two type of objects:
root context shared between all the concurrent processing executed on this event loop (so do NOT share data)
duplicated contexts, which are local to the processing and its continuation (you can share in these)
In Quarkus 2.7.2, we have done a lot of work to improve our support of duplicated context. While before, they were only used for HTTP, they are now used for gRPC and #ConsumeEvent. Support for Kafka and AMQP is coming in Quarkus 2.8.
Also, in Quarkus 2.7.2, we introduced two new features that could be useful:
you cannot store data in a root context. We detect that for you and throw an UnsupportedOperationException. The reason is safety.
we introduced a new utility class ( io.smallrye.common.vertx.ContextLocals to access the context locals.
Here is a simple example:
AtomicInteger counter = new AtomicInteger();
public Uni<String> invoke() {
Context context = Vertx.currentContext();
ContextLocals.put("message", "hello");
ContextLocals.put("id", counter.incrementAndGet());
return invokeRemoteService()
// Switch back to our duplicated context:
.emitOn(runnable -> context.runOnContext(runnable))
.map(res -> {
// Can still access the context local data
String msg = ContextLocals.<String>get("message").orElseThrow();
Integer id = ContextLocals.<Integer>get("id").orElseThrow();
return "%s - %s - %d".formatted(res, msg, id);
});
}

jtds.jdbc.JtdsConnection.createBlob java.lang.AbstractMethodError

mvn:net.sourceforge.jtds/jtds/1.3.1-patch-20190523/jar
to save to a Microsoft SQL Server database.
Actually this JDBC driver is provided by Talend. For various reasons, I have some Java JDBC code which saves blobs to the database. This works fine with an Oracle DB.
try (Connection con = DriverManager.getConnection(this.connectionString, this.user, this.pw)) {
insert(con, ags, snr, productVersion, nummer, inhalt, this.migrationsBenutzer, format, daten);
}
public long insert(Connection con, String ags, String snr, int productVersion ,int nummer,String inhalt, String benutzer, String format, byte[] daten) throws SQLException {
long result = 0;
String timestamp = now();
try {
Blob blob = con.createBlob();
blob.setBytes(1, daten);
PreparedStatement ps = con.prepareStatement(this.insert);
...
ps.setBlob(9, blob);
result = ps.executeUpdate();
However when I call the method with a connection string for jtds:
jdbc:jtds:sqlserver://:1433;databaseName=
I get this exception
Exception in thread "main" java.lang.AbstractMethodError
at net.sourceforge.jtds.jdbc.JtdsConnection.createBlob(JtdsConnection.java:2776)
at de.iteos.dao.BildspeichernDAO.insert(BildspeichernDAO.java:60)
What can I do? If feasible I want to keep the method a generic as possible. Do I have to switch to another JDBC driver? I have read somewhere that this could be solved with a validation query. For this I would have to implement two different DataSources for Oracle and SQL Server right?
The problem is that jTDS is - as far as I know - no longer maintained, and available versions are stuck on JDBC 3.0 (Java 1.4) support. The method Connection.createBlob you're trying to call was introduced in JDBC 4.0 (Java 6). Attempts to call this method on a driver that was compiled against the JDBC 3.0 API will result in AbstractMethodError as the method from JDBC 4.0 (or higher) is not implemented.
The simplest and best solution is probably to switch to the Microsoft SQL Server JDBC driver.
Alternatively, you need to switch to the JDBC 3.0 options for populating a blob, that is, using the PreparedStatement.setBinaryStream(int, InputStream, int) method (not to be confused with setBinaryStream(int, InputStream) or setBinaryStream(int, InputStream, long) introduced in JDBC 4.0!), or possibly setBytes(int, byte[]), or using driver specific methods for creating blobs.
The validation query solution you mention doesn't apply here, that solves a problem with Connection.isValid that the data source implementation in some Tomcat versions calls by default, unless you configure a validation query. It has the same underlying cause (the isValid method was also introduced in JDBC 4.0).
I'm not sure what you mean with "I would have to implement two different DataSources for Oracle and SQL Server right?", you cannot use one and the same data source for two different databases, so you would need to configure two different instances anyway.

Protocol Buffers between two different languages

We are using Golang and .NET Core for our inter-communication microservices infrastructure.
All the data across the services are coming based on Protobuffs Protocols that we have created.
Here is an example of one of our Protobuffs:
syntax = "proto3";
package Protos;
option csharp_namespace = "Protos";
option go_package="Protos";
message EventMessage {
string actionType = 1;
string payload = 2;
bool auditIsActive = 3;
}
Golang is working well and the service is generating the content as needed and sending it to the SQS queue, once that happens the .NET core service is getting the data and trying to serialize it.
Here are the contents of the SQS message example:
{"#type":"type.googleapis.com/Protos.EventMessage","actionType":"PushPayload","payload":"<<INTERNAL>>"}
But we are getting an Exception that saying the wire-type is not defined as mentioned below:
Google.Protobuf.InvalidProtocolBufferException: Protocol message contained a tag with an invalid wire type.
at Google.Protobuf.UnknownFieldSet.MergeFieldFrom(CodedInputStream input)
at Google.Protobuf.UnknownFieldSet.MergeFieldFrom(CodedInputStream input)
at Google.Protobuf.UnknownFieldSet.MergeFieldFrom(UnknownFieldSet unknownFields, CodedInputStream input)
at Protos.EventMessage.MergeFrom(CodedInputStream input) in /Users/maordavidzon/projects/github_connector/GithubConnector/GithubConnector/obj/Debug/netcoreapp3.0/EventMessage.cs:line 232
at Google.Protobuf.MessageExtensions.MergeFrom(IMessage message, Byte[] data, Boolean discardUnknownFields, ExtensionRegistry registry)
at Google.Protobuf.MessageParser`1.ParseFrom(Byte[] data)
The Proto file is exactly the same in both of the services.
Is there any potential missing options or property that we need to add?
It looks like you're using the JSON format rather than the binary format. In that case, you want ParseJson(string json), not ParseFrom(byte[] data).
Note: the binary format is more efficient, if that matters to you. It also has better support across protobuf libraries / tools.
Basically there are two possible scenarios, or your protos files generated for .NET and GoLang are not in the same version or your data has been corrupted while transferring between GoLang and .NET application.
Protobuf is a binary protocol, check if you have any http filter or anything else that can change incoming or outgoing stream of bytes.

Read Nifi Counter value programmatically

I am developing a custom processor in which I want to read value of Nifi counters. Is there a way to read Counters' value other than using Nifi Rest Api "http://nifi-host:port/nifi-api/counters"?
No. Apache NiFi doesn't have any straightforward APIs available to read the counter value programmatically. An easy approach would be to use GetHTTP processor and use the NiFi REST API URL that you had mentioned: http(s)://nifi-host:port/nifi-api/counters.
Then use EvaluateJsonPath to just parse and read the counter value from the response JSON received from the GetHTTP processor.
Based on Andy's suggestion, I have used refelection to read Counters as follows:
private void printCounters(ProcessSession session) throws NoSuchFieldException, SecurityException, IllegalArgumentException, IllegalAccessException, NoSuchMethodException, InvocationTargetException {
Class standardProcessSession=session.getClass();
Field fieldContext = standardProcessSession.getDeclaredField("context");
fieldContext.setAccessible(true);
Object processContext = fieldContext.get(session);
Class processContextClass = processContext.getClass();
Field fieldCounterRepo = processContextClass.getDeclaredField("counterRepo");
fieldCounterRepo.setAccessible(true);
Object counterRepo = fieldCounterRepo.get(processContext);
Method declaredMethod = counterRepo.getClass().getDeclaredMethod("getCounters");
ArrayList<Object> counters = (ArrayList<Object>)declaredMethod.invoke(counterRepo);
for(Object obj:counters) {
Method methodName = obj.getClass().getDeclaredMethod("getName");
methodName.setAccessible(true);
Method methodVal = obj.getClass().getDeclaredMethod("getValue");
methodVal.setAccessible(true);
System.out.println("Counter name: "+methodName.invoke(obj));
System.out.println("Counter value: "+methodVal.invoke(obj));
}
}
NOTE: NIFI Version is 1.5.0
While it is not as easy to read/write counter values as it is to modify flowfile attributes, Apache NiFi does have APIs for modifying counters. However, the intent of counters is to provide information to human users, not for processors to make decisions based on their values. Depending on what you are trying to accomplish, you might be more successful using local maps or DistributedMapCacheServer and DistributedMapCacheClientService. If the values are only relevant to this processor, you can just use an in-memory map to store and retrieve the values. If you need to communicate with other processors, use the cache (example here).
Pierre Villard has written a good tutorial about using counters, and you can use ProcessSession#adjustCounter(String counter, int delta, boolean immediate) to modify counter values. Because counters were not designed to allow programmatic there is no way to retrieve the CounterRepository instance from the RepositoryContext object. You may also want to read about Reporting Tasks, as depending on your goal, this may be a better way to achieve it.

Parquet-MR AvroParquetWriter - how to convert data to Parquet (with Specific Mapping)

I'm working on a tool for converting data from a homegrown format to Parquet and JSON (for use in different settings with Spark, Drill and MongoDB), using Avro with Specific Mapping as the stepping stone. I have to support conversion of new data on a regular basis and on client machines which is why I try to write my own standalone conversion tool with a (Avro|Parquet|JSON) switch instead of using Drill or Spark or other tools as converters as I probably would if this was a one time job. I'm basing the whole thing on Avro because this seems like the easiest way to get conversion to Parquet and JSON under one hood.
I used Specific Mapping to profit from static type checking, wrote an IDL, converted that to a schema.avsc, generated classes and set up a sample conversion with specific constructor, but now I'm stuck configuring the writers. All Avro-Parquet conversion examples I could find [0] use AvroParquetWriter with deprecated signatures (mostly: Path file, Schema schema) and Generic Mapping.
AvroParquetWriter has only one none-deprecated Constructor, with this signature:
AvroParquetWriter(
Path file,
WriteSupport<T> writeSupport,
CompressionCodecName compressionCodecName,
int blockSize,
int pageSize,
boolean enableDictionary,
boolean enableValidation,
WriterVersion writerVersion,
Configuration conf
)
Most of the parameters are not hard to figure out but WriteSupport<T> writeSupport throws me off. I can't find any further documentation or an example.
Staring at the source of AvroParquetWriter I see GenericData model pop up a few times but only one line mentioning SpecificData: GenericData model = SpecificData.get();.
So I have a few questions:
1) Does AvroParquetWriter not support Avro Specific Mapping? Or does it by means of that SpecificData.get() method? The comment "Utilities for generated Java classes and interfaces." over 'SpecificData.class` seems to suggest that but how exactly should I proceed?
2) What's going on in the AvroParquetWriter constructor, is there an example or some documentation to be found somewhere?
3) More specifically: the signature of the WriteSupport method asks for 'Schema avroSchema' and 'GenericData model'. What does GenericData model refer to? Maybe I'm not seeing the forest because of all the trees here...
To give an example of what I'm aiming for, my central piece of Avro conversion code currently looks like this:
DatumWriter<MyData> avroDatumWriter = new SpecificDatumWriter<>(MyData.class);
DataFileWriter<MyData> dataFileWriter = new DataFileWriter<>(avroDatumWriter);
dataFileWriter.create(schema, avroOutput);
The Parquet equivalent currently looks like this:
AvroParquetWriter<SpecificRecord> parquetWriter = new AvroParquetWriter<>(parquetOutput, schema);
but this is not more than a beginning and is modeled after the examples I found, using the deprecated constructor, so will have to change anyway.
Thanks,
Thomas
[0] Hadoop - The definitive Guide, O'Reilly, https://gist.github.com/hammer/76996fb8426a0ada233e, http://www.programcreek.com/java-api-example/index.php?api=parquet.avro.AvroParquetWriter
Try AvroParquetWriter.builder :
MyData obj = ... // should be avro Object
ParquetWriter<Object> pw = AvroParquetWriter.builder(file)
.withSchema(obj.getSchema())
.build();
pw.write(obj);
pw.close();
Thanks.

Resources