is ToXMLContentHandler thread-safe? - thread-safety

I am currently using Apache-Tika to parse my documents. While parsing the documents I am using AutoDetectParser, ToXMLContentHandler, Metadata classes of Apache Tika in this way:
InputStream stream = new FileInputStream(file);
parser.parse(stream, handler, metadata);
String filecontentInXMLFormat = handler.toString();
I have created the beans of AutoDetectParser, ToXMLContentHandler, Metadata using Springs, so the same class objects will be used when multiple documents are getting parsed through the same code flow.
So for example, if the second document comes in inside the code flow, will the "handler" object contain the data of the first document?
So, is the ToXMLContentHandler class "iorg.apache.tika.sax.ToXMLContentHandler" and Metadata class "org.apache.tika.metadata.Metadata" thread safe?

Related

Does OpenCSV support write to CSV from bean with a map [duplicate]

I'm using opencsv for read/write csv files using opencsv annotations.
My bean is having fields not just primitives, but a java HashMap as well.
Now what i want to do is
public class MyBean {
#CsvBindByName(column = "ID")
private int id;
#CsvBindByName(column = "PROPERTIES")
private Map<String, String> sampleMap = new HashMap<String, String>();
TO
ID, property1, property2...
1, value1, value2.....
I'd like to get this working in both read/write.
as i understand, the default MappingStrategy doesn't work in this case. Also Creating Custom MappingStrategy doesn't makes sense for HashMap field. because we don't know the complete field list until we iterate all the map.
Another way to get column names is that just read one bean from the list of beans. And get access to HashMap then create the header.(Hashmap keys are fixed across beans in my case)
MappingStrategy only concerned about Class level meta data. Like fields etc.
public static Field[] getAllFields(Class<?> cls) {
List allFieldsList = getAllFieldsList(cls);
return (Field[])allFieldsList.toArray(new Field[allFieldsList.size()]);
}
getting access to the real data for creating csv header doesn't look like a natural way to do.
Any advice on how to solve this?
Please point me out to any other libraries out there that can do read/write into beans having Map field.
Cheers!
Sadly openCSV does not support this. A quick google showed me that SuperCSV comes close but it puts everything in a map whereas you want to control what goes in the map and what does not. There may be others out there but most require a one to one between the fields in the object and the columns in the csv file.
This was something that I wanted to develop for years and contribute because the company I currently work for has need for that but they do not want to pay me to develop it and I have higher priorities for openCSV when free time is available.
A possible but limited workaround would be to create what I would call a Data Transfer Object. Have a custom object that has all the values you would have and have it return the object of the type you want (or a translator that will convert the DTO that has all fields to the object you want with the map and some fields). The problem with this solution is that it forces you to know in advance what are all possible entries you have in the map.

How to access/read objects inside Itemreader/StaxEventItemReader correctly?

I am new to the spring batch and I have an application where for example an xml-file of flights is read and saved to a database. The whole process is already working, but due to a certain use case I also need to access the data inside the reader object (ItemReader), should it is possible.
Down there is the reader-method. It is not about this method in particular, but as mentioned it is about ItemReader.
#Bean
public StaxEventItemReader<Flight> flightReader() {
StaxEventItemReader<Flight> reader = new StaxEventItemReader<Obj>();
reader.setResource(ressource);
reader.setFragmentRootElementName("flight");
Map<String, String> aliases = new HashMap<String, String>();
aliases.put("flight", Flight.class);
XStreamMarshaller xStreamMarshaller = new XStreamMarshaller();
xStreamMarshaller.setAliases(aliases);
reader.setUnmarshaller(xStreamMarshaller);
return reader;
}
How can I access the flight objects inside the reader (StaxEventItemReader) object?
I actually tried to use the read() method (Spring doc ItemReader), but I am always getting NullPointerExceptions.
If the read() method is the correct way, how can you access the flight objects inside ItemReader correctly?
If not, are there other ways?
There is more than one way to access the items. It really depends on what you want to do with them:
If you only want to have a look without manipulating the items, you can implement an ItemReadListener with its afterRead method, and add the listener to your step.
The items are passed to the processor. So you can operate on them there.
You can extend the class StaxEventItemReader and override the read method to include additional logic.
If you prefer composition over inheritance, you can write a new reader that uses a StaxEventItemReader as a delegate.

How to access Couchbase RAW document with spring-boot-couchbase CrudRepository

I am migrating an old application from using the Couchbase java client, 2.2.5 to Spring-boot-couchbase. I would like to use Spring Data's Crud Repository.
The problem I am running into is that the document in Couchbase is RAW format, (See more info here: Couchbase non-json docs). I have set up a crud repository interface and when connecting to other buckets with other json-formatted data, it works perfectly. When attempting to read the byte array data I get this error:
org.springframework.dao.DataRetrievalFailureException: Flags (0x802) indicate non-JSON document for id 9900fb3d-1edf-4428-b9e6-0ef6c3251c08, could not decode.; nested exception is com.couchbase.client.java.error.TranscodingException: Flags (0x802) indicate non-JSON document for id 9900fb3d-1edf-4428-b9e6-0ef6c3251c08, could not decode.
I have tried the following object types in the repository:
public interface MyRepository extends CouchbasePagingAndSortingRepository<Object, String> {
Object
byte[]
The object with an id and byte[] fields (and setters & getters)
Objects from the java-client
CouchbaseDocument
RawJsonDocument
AbstractDocument
I've also attempted to write a custom jackson mapper but the error message stays consistent in that it's trying to deserialize a non-json document.

Stanford CoreNLP: Can I retrieve parent Annotation (i.e., document) from a contained CoreMap (i.e. sentence)?

I'm just getting started with CoreNLP. From all the code samples I've seen (particularly the one on CoreNLP's main website: http://nlp.stanford.edu/software/corenlp.shtml#Usage), I've gathered that Annotation objects hold the annotated document, and CoreMap objects hold the sentences (if "ssplit" annotation is enabled).
To keep my code lightweight, I'm only passing CoreMap to one of my functions. However, in one instance I need to retrieve the parent Annotation document object. Is there any backpointer using the CoreMap object, or will I have to pass in the Annotation object to my function as well?
The overall document is an Annotation. The Annotation contains a List which contains the sentences. Each sentence is a CoreMap. I don't know of any way to get the parent Annotation from a CoreMap, so I would just pass the Annotation object to your function.

Use JSON deserializer for Batch job execution context

I'm trying to get a list of job executions which have been stored in Spring batch related tables in the database using:
List<JobExecution> jobExecutions = jobExplorer.getJobExecutions(jobInstance);
The above method call seems to invoke ExecutionContextRowMapper.mapRow method in JdbcExecutionContextDao class.
The ExecutionContextRowMapper uses com.thoughtworks.xstream.Xstream.fromXML method to deserialize the JSON string of JobExecutionContext stored in DB.
It looks like an incorrect or default xml deserializer is used for unmarshalling JSONified JobExecutionContext.
Is there any configuration to use a JSON deserializer in this scenario.
The serializer/deserializer for the ExecutionContext is configurable in 2.2.x. We use the ExecutionContextSerializer interface (providing two implementations, one using java serialization and one using the XStream impl you mention). To configure your own serializer, you'll need to implement the org.springframework.batch.core.repository.ExecutionContextSerializer and inject it into the JobRepositoryFactoryBean (so that the contexts are serialized/deserialized correctly) and the JobExplorerFactoryBean (to reserialize the previously saved contexts).
It is important to note that changing the serialization method will prevent Spring Batch from deserializing previously saved ExecutionContexts.

Resources