Serve PostgreSQL large objects via HTTP - spring

I'm building an app to serve data from a PostgreSQL database via a REST API (with Spring MVC) and a PWA (with Vaadin).
The PostgreSQL database stores files up to 2GB using Large Objects (I'm not in control of that); the JDBC driver provides streamed access to their binary content via Blob#getBinaryStream, so data does not need to be read entirely into memory.
The only requirement is that the stream from the blob must be consumed in the same transaction, otherwise the JDBC driver will throw.
The problem is that even if I retrieve the stream in a transactional repository method, both Spring MVC and Vaadin's StreamResource will consume it outside the transaction, so the JDBC driver throws.
For example, given
public interface SomeRepository extends JpaRepository<SomeEntity, Long> {
#Transactional(readOnly = true)
default InputStream getStream() {
return findById(1).getBlob().getBinaryStream();
}
}
this Spring MVC method will fail
#RestController
public class SomeController {
private final SomeRepository repository;
#GetMapping
public ResponseEntity getStream() {
var stream = repository.getStream();
var resource = new InputStreamResource(stream);
return new ResponseEntity(resource, HttpStatus.OK);
}
}
and the same for this Vaadin StreamResource
public class SomeView extends VerticalLayout {
public SomeView(SomeRepository repository) {
var resource = new StreamResource("x", repository::getStream);
var anchor = new Anchor(resource, "Download");
add(anchor);
}
}
with the same exception:
org.postgresql.util.PSQLException: ERROR: invalid large-object descriptor: 0
which means the transaction is already closed when the stream is read.
I see two possible solutions to this:
keep the transaction open during the download;
write the stream to disk during transaction and then serve the file from disk during download.
Solution 1 is an anti-pattern and a security risk: the transaction duration is left on the hands of the client and both a slow-reader or an attacker might block data access.
Solution 2 creates a huge delay between the client request and the server response, since the stream is first read from the database and written to disk.
One idea might be to start reading from the disk while the file is being written with data from the database, so that the transfer starts immediately but the transaction duration would be decoupled from the client download; but I don't know which side-effects this might have.
How can I achieve the goal of serving PostgreSQL large objects in a secure and performant way?

We solved this problem in Spring Content by using threads + piped streams and a special inputstream wrapper ClosingInputStream that delays closes the connection/transaction until the consumer closes the input stream. Maybe something like this would help you too?
Just as an FYI. We have found using Postgres's OIDs and the Large Object API to be extremely slow when compared with similar databases.
Perhaps it is also possible that you might be able to just retrofit Spring Content JPA to your solution and therefore use its http endpoints (and the solution I just outlined) instead of creating your own? Something like this:-
pom.xml
<!-- Java API -->
<dependency>
<groupId>com.github.paulcwarren</groupId>
<artifactId>spring-content-jpa-boot-starter</artifactId>
<version>0.4.0</version>
</dependency>
<!-- REST API -->
<dependency>
<groupId>com.github.paulcwarren</groupId>
<artifactId>spring-content-rest-boot-starter</artifactId>
<version>0.4.0</version>
</dependency>
SomeEntity.java
#Entity
public class SomeEntity {
#Id
#GeneratedValue
private long id;
#ContentId
private String contentId;
#ContentLength
private long contentLength = 0L;
#MimeType
private String mimeType = "text/plain";
...
}
SomeEntityContentStore.java
#StoreRestResource(path="someEntityContent")
public interface SomeEntityContentStore extends ContentStore<SomeEntity, String> {
}
Is all you need to get REST endpoints that will allow you to associate content with your entity SomeEntity. There is a working example in our examples repo here.

One option is to decouple reading from the database and writing response to client as you mentioned. The downside is the complexity of the solution, you would need to synchronize between the reader and the writer.
Another option is to first get the large object id in the main transaction and then read data in chunks, each chunk in the separate transaction.
byte[] getBlobChunk(Connection connection, long lobId, long start, long chunkSize) throws SQLException {
Blob blob = PgBlob(connection, lobId);
InputStream is = blob.getBinaryStream(start, chunkSize);
return IOUtils.toByteArray(is);
}
This solution is much simpler but has an overhead of establishing a new connection which shouldn't be a big deal if you use connection pooling.

Related

Spring State Machine | Actions (Calling External API with data & pass data to another State)

I would like to use the Action<S,E> to call an external api. How can i add more data into this Action in order to invoke an external API? Another question is what if i want to send back the response (pass data to another State)
What is the best way to add more data? I'm trying to find an alternative of using context (which i know is possible but very ugly using Key-value).
Calling an external API is the same as any executing code, you can wire in your action any executable code. This includes autowiring a Service or Gateway and retrieve the data you need.
Regarding the second question, in my company we are using the extended state (context) to expose data. Before we release the state machine we get the data inside of it and serialise to a response object using object mapper.
Here is a snippet for illustration
#Configuration
#RequiredArgsConstructor
public class YourAction implements Action<States, Events> {
private final YourService service;
#Override
public void execute(final StateContext<States, Events> context) {
//getting input data examples
final Long yourIdFromHeaders = context.getMessageHeaders().get(key, Long.class);
final Long yourIdFromContext = context.getExtendedState().get(key, Long.class);
//calling service
final var responseData = service.getData(yourIdFromContext);
//storing results
context.getExtendedState().getVariables().put("response", responseData);
}

Access other service's API

In our Angular + Spring boot application application, we have 2 Controllers (2 Services are internally referenced). In first controller, We are sending a File from UI and reading the content of the file , query an external application and retrieve a set of data and return only a sub-set of Data, for entering as recommendation for UI fields. why we are returning only sub-set of data received from the external application? Because, we need only those sub-set data for showing recommendations in UI.
Once the rest of the fields are filled, then, we call another controller to generate a report. But, for generation of files, the second service requires the rest of the data from external application, which is received by the first service. I understand that Autowiring the first service in the second service, will create new instance of the first service and I will not get the first service instance, which is used to query the external application. I also like to avoid calling the external application again to retrieve the same data again in the second service. My question is how to fetch the data received by the first service in the second service?
For example:
First controller (ExternalApplicationController), which delegates loading of loading/importing of data from files
public class Department{
private Metadata metadata; // contains data such as name, id, location, etc.,
private Collection<Employee> employees; // the list of employees working in the department.
}
#RestController
#RequestMapping("/externalApp")
public class ExternalApplicationController{
#Autowired
private ExternalApplicationImportService importService;
#PostMapping("/importDepartmentDataFromFiles")
public Metadata importDepartmentDataFromFiles(#RequestParam("files") final MultipartFile[] files) {
return this.importService.loadDepartmentDetails(FileUtils.getInstance().convertToFiles(files)).getMetadata();
}
}
The first service (ExternalApplicationImportService), which delegates the request to the external application for loading of department data.
#Service
public class ExternalApplicationImportService{
private final ExternalApp app;
public Department loadDepartmentDetails(File file){
return app.loadDepartmentDetails(file);
}
}
The Metadata from the ExternalApplicationController is used to populated UI fields and after doing some operations (filling up some data), user requests to generate a report(which contains details from the employees of that department)
#RestController
#RequestMapping("/reportGenerator")
public class ReportController{
#Autowired
private ReportGenerationService generationService;
#PostMapping("/generateAnnualReports")
public void generateAnnualReports(){
generationService.generateAnnualReports();
}
}
#Service
public class ReportGenerationService{
public void generateAnnualReports(){
//here I need access to the data loaded in the ExternalApplicationImportService.
}
}
So, I would like to access the data loaded in the ExternalApplicationImportService in the ReportGenerationService.
I also see that there would be more services created in the future and might need to access the data loaded in the ExternalApplicationImportService.
How can this be designed and achieved?
I feel that I'm missing something how to have a linking between these services, for a given user session.
Thanks,
Paul
You speak about user session. Maybe you could inject the session of your user directly in your controllers and "play" with it?
Just adding HttpSession as parameter of your controllers' methods and spring will inject it for you. Then you just have to put your data in the session during the first WS call. And recover it from the session at the second WS call.
#RestController
#RequestMapping("/reportGenerator")
public class ReportController{
#PostMapping("/generateAnnualReports")
public void generateAnnualReports(HttpSession session){
generationService.generateAnnualReports();
}
}
Alternatively for the second call you could use:
#RestController
#RequestMapping("/reportGenerator")
public class ReportController{
#PostMapping("/generateAnnualReports")
public void generateAnnualReports(#SessionAttribute("<name of your session attribute>") Object yourdata){
generationService.generateAnnualReports();
}
}
You are starting from a wrong assumption:
I understand that Autowiring the first service in the second service, will create new instance of the first service and I will not get the first service instance, which is used to query the external application.
That is not correct: by default, Spring will create your bean as singleton, a single bean definition to a single object instance for each Spring IoC container.
As a consequence, every bean in which you inject ExternalApplicationImportService will receive the same instance.
To solve your problem, you only need a place in where temporarily store the results of your external app calls.
You have several options for that:
As you are receiving the same bean, you can preserve same state in instance fields of ExternalApplicationImportService.
#Service
public class ExternalApplicationImportService{
private final ExternalApp app;
// Maintain state in instance fields
private Department deparment;
public Department loadDepartmentDetails(File file){
if (department == null) {
department = app.loadDepartmentDetails(file);
}
return department;
}
}
Better, you can use some cache mechanism, the Spring builtin is excellent, and return the cached result. You can choose the information that will be used as the key of the cached data, probably some attribute related to your user in this case.
#Service
public class ExternalApplicationImportService{
private final ExternalApp app;
#Cacheable("department")
public Department loadDepartmentDetails(File file){
// will only be invoked if the file argument changes
return app.loadDepartmentDetails(file);
}
}
You can store the information returned from the external app in an intermediate information system like Redis, if available, or even in the application underlying database.
As suggested by Mohicane, in the Web tier, you can use the http sessions to store the attributes you need to, directly as a result of the operations performed by your controllers, or even try using Spring session scoped beans. For example:
#RestController
#RequestMapping("/externalApp")
public class ExternalApplicationController{
#Autowired
private ExternalApplicationImportService importService;
#PostMapping("/importDepartmentDataFromFiles")
public Metadata importDepartmentDataFromFiles(#RequestParam("files") final MultipartFile[] files, HttpSession session) {
Deparment department = this.importService.loadDepartmentDetails(FileUtils.getInstance().convertToFiles(files));
session.setAttribute("department", department);
return deparment.getMetadata();
}
}
And:
#RestController
#RequestMapping("/reportGenerator")
public class ReportController{
#Autowired
private ReportGenerationService generationService;
#PostMapping("/generateAnnualReports")
public void generateAnnualReports(HttpSession session){
Department department = (Department)session.setAttribute("department");
// Probably you need pass that information to you service
// TODO Handle the case in which the information is not present in the session
generationService.generateAnnualReports(department);
}
}
In my opinion, the second of the proposed approaches is the best one but all are valid mechanisms to share your data between the two operations.
my recommendation for you will be to revisit your design of classes and build a proper relationship between them. I feel you need to introduce the extra logic to manage your temporal data for report generation.
#Mohicane suggested to use HTTP Session in above answer. It might be a possible solution, but it has an issue if your service needs to be distributed in the future (e.g. more than one runnable instance will serve your WEB app).
I strongly advise:
creating a separate service to manage Metadata loading process, where you will have load(key) method
you need to determine by yourself what is going to be a key
both of your other services will utilize it
this service with method load(key) can be marked by #Cacheable annotation
configure your cache implementation. As a simple one you can use In-Memory, if a question becomes to scale your back-end app, you can easily switch it to Redis/DynamoDB or other data storages.
Referances:
Spring Caching
Spring Caching Guide

How to update image using image url

I have method which is taking multipart image file,if i want to update same image then obviously i have to take image url as a input but i cant able to take the input as url since it is taking file format
my method:
MediaType.APPLICATION_OCTET_STREAM_VALUE}, produces = MediaType.APPLICATION_JSON_VALUE)
public ResponseEntity<ApiResponse> updatePersonalDataForUser(
#RequestHeader("accessToken") #NotEmpty(message = "accessToken is mandatory") String bearer,
#RequestHeader("mappingId") #NotEmpty(message = "mappingId is mandatory") String mappingId,
#RequestPart("personalInfoObj") String personalInfoObj,
#RequestPart(value = "profileImage") MultipartFile profileImage)
throws IOException {
jobPostController.userRoleAuthorization(mappingId);
userController.oAuthByRedisAccessToken(bearer, mappingId);
ObjectMapper objectMapper = new ObjectMapper();
PersonalInfoResponse personalInfoConv = objectMapper.readValue(personalInfoObj, PersonalInfoResponse.class);
return userController.updatePersonalData(mappingId, personalInfoConv, profileImage, Contants.UserRoleName);
}```
You should take a look at the Spring community project called Spring Content.
This project makes it easy to build contentful applications and services. It has the same programming model as Spring Data. Meaning it can supply implementations for the file storage and REST controllers on top of that file storage, therefore you don't need to concern yourself with creating these yourself. It is to content (or unstructured data) what Spring Data is to structured data.
This might look something like the following:-
pom.xml (for Spring Web MVC. Spring Boot also supported)
<!-- Spring Web MVC dependencies -->
...
<!-- Java API -->
<dependency>
<groupId>com.github.paulcwarren</groupId>
<artifactId>spring-content-fs</artifactId>
<version>1.0.0.M5</version>
</dependency>
<!-- REST API -->
<dependency>
<groupId>com.github.paulcwarren</groupId>
<artifactId>spring-content-rest</artifactId>
<version>1.0.0.M5</version>
</dependency>
StoreConfig.java
#Configuration
#EnableFilesystemStores
#Import(RestConfiguration.class)
public class EnableFilesystemStoresConfig {
#Bean
File filesystemRoot() {
try {
return new File("/path/to/your/uploaded/files");
} catch (IOException ioe) {}
return null;
}
#Bean
FileSystemResourceLoader fileSystemResourceLoader() {
return new FileSystemResourceLoader(filesystemRoot().getAbsolutePath());
}
}
ImageStore.java
#StoreRestResource(path="images")
public interface ImageStore extends Store<String> {
}
This is all you need to do to get REST Endpoints that will allow you to store and retrieve files. As mentioned how this actually works is very much like Spring Data. When your application starts Spring Content will see the spring-content-fs dependency, know that you want to store content on your filesystem and inject a filesystem implementation of the ImageStore interface into the application context. It will also see the spring-content-rest and inject a controller (i.e. REST endpoints) that talk to the ImageStore interface. Therefore, you don't have to do any of this yourself.
So, for example:
curl -X POST /images/myimage.jpg -F "file=#/path/to/myimage.jpg"
will store the image on the filesystem at /path/to/your/uploaded/files/myimage.jpg
And:
curl /images/myimage.jpg
will fetch it again and so on...these endpoints support full CRUD and the GET & PUT endpoints also support video streaming (or byte range-requests).
You could also decide to store the contents elsewhere like in the database with your entities, or in S3 by swapping the spring-content-fs dependency for the appropriate Spring Content Storage module. Examples for every type of storage are here.
In addition, in case it is helpful, often content is associated with Spring Data Entities. So, it is also possible to have the ImageStore interface implement ContentStore, like this:
> FileStore.java
#StoreRestResource(path="images")
public interface ImageStore extends ContentStore<PersonalInfo, String> {
}
And to add Spring Content-annotated fields to your Spring Data entities, like this:
> PersonalInfo.java
#Entity
public class PersonalInfo {
#Id
#GeneratedValue
private long id;
...other existing fields...
#ContentId
private String contentId;
#ContentLength
private long contentLength = 0L;
#MimeType
private String mimeType = "text/plain";
...
}
This approach changes the REST endpoints as the content is now addressable via the Spring Data URL. So:
POST /personalInfos/{personalInfoId} -F "image=#/some/path/to/myimage.jpg"
will upload myimage.jpg to /path/to/your/uploaded/files/myimage.jpg. As it did before but it will also update the fields on the PersonalInfo entity with id personalInfoId.
GET /personalInfos/{personalInfoId}
will get it again.
HTH

Spring webflux and reading from database

Spring 5 introduces the reactive programming style for rest APIs with webflux. I'm fairly new to it myself and was wondering wether wrapping synchronous calls to a database into Flux or Mono makes sense preformence-wise? If yes, is this the way to do it:
#RestController
public class HomeController {
private MeasurementRepository repository;
public HomeController(MeasurementRepository repository){
this.repository = repository;
}
#GetMapping(value = "/v1/measurements")
public Flux<Measurement> getMeasurements() {
return Flux.fromIterable(repository.findByFromDateGreaterThanEqual(new Date(1486980000L)));
}
}
Is there something like an asynchronous CrudRepository? I couldn't find it.
One option would be to use alternative SQL clients that are fully non-blocking. Some examples include:
https://github.com/mauricio/postgresql-async or https://github.com/finagle/roc. Of course, none of these drivers is officially supported by database vendors yet. Also, functionality is way much less attractive comparing to mature JDBC-based abstractions such as Hibernate or jOOQ.
The alternative idea came to me from Scala world. The idea is to dispatch blocking calls into isolated ThreadPool not to mix blocking and non-blocking calls together. This will allow us to control the overall number of threads and will let the CPU serve non-blocking tasks in the main execution context with some potential optimizations.
Assuming that we have JDBC based implementation such as Spring Data JPA which is indeed blocking, we can make it’s execution asynchronous and dispatch on the dedicated thread pool.
#RestController
public class HomeController {
private final MeasurementRepository repository;
private final Scheduler scheduler;
public HomeController(MeasurementRepository repository, #Qualifier("jdbcScheduler") Scheduler scheduler) {
this.repository = repository;
this.scheduler = scheduler;
}
#GetMapping(value = "/v1/measurements")
public Flux<Measurement> getMeasurements() {
return Mono.fromCallable(() -> repository.findByFromDateGreaterThanEqual(new Date(1486980000L))).publishOn(scheduler);
}
}
Our Scheduler for JDBC should be configured by using dedicated Thread Pool with size count equal to the number of connections.
#Configuration
public class SchedulerConfiguration {
private final Integer connectionPoolSize;
public SchedulerConfiguration(#Value("${spring.datasource.maximum-pool-size}") Integer connectionPoolSize) {
this.connectionPoolSize = connectionPoolSize;
}
#Bean
public Scheduler jdbcScheduler() {
return Schedulers.fromExecutor(Executors.newFixedThreadPool(connectionPoolSize));
}
}
However, there are difficulties with this approach. The main one is transaction management. In JDBC, transactions are possible only within a single java.sql.Connection. To make several operations in one transaction, they have to share a connection. If we want to make some calculations in between them, we have to keep the connection. This is not very effective, as we keep a limited number of connections idle while doing calculations in between.
This idea of an asynchronous JDBC wrapper is not new and is already implemented in Scala library Slick 3. Finally, non-blocking JDBC may come along on the Java roadmap. As it was announced at JavaOne in September 2016, and it is possible that we will see it in Java 10.
Based on this blog you should rewrite your snippet in following way
#GetMapping(value = "/v1/measurements")
public Flux<Measurement> getMeasurements() {
return Flux.defer(() -> Flux.fromIterable(repository.findByFromDateGreaterThanEqual(new Date(1486980000L))))
.subscribeOn(Schedulers.elastic());
}
Obtaining a Flux or a Mono doesn’t necessarily mean it will run in a dedicated Thread. Instead, most operators continue working in the Thread on which the previous operator executed. Unless specified, the topmost operator (the source) itself runs on the Thread in which the subscribe() call was made.
If you have blocking persistence APIs (JPA, JDBC) or networking APIs to use, Spring MVC is the best choice for common architectures at least. It is technically feasible with both Reactor and RxJava to perform blocking calls on a separate thread but you would not be making the most of a non-blocking web stack.
So... How do I wrap a synchronous, blocking call?
Use Callable to defer execution. And you should use Schedulers.elastic because it creates a dedicated thread to wait for the blocking resource without tying up some other resource.
Schedulers.immediate() : Current thread.
Schedulers.single() : A single, reusable thread.
Schedulers.newSingle() : A per-call dedicated thread.
Schedulers.elastic() : An elastic thread pool. It creates new worker pools as needed, and reuse idle ones. This is a good choice for I/O blocking work for instance.
Schedulers.parallel() : A fixed pool of workers that is tuned for parallel work.
example:
Mono.fromCallable(() -> blockingRepository.save())
.subscribeOn(Schedulers.elastic());
Spring data support reactive repository interface for Mongo and Cassandra.
Spring data MongoDb Reactive Interface
Spring Data MongoDB provides reactive repository support with Project Reactor and RxJava 1 reactive types. The reactive API supports reactive type conversion between reactive types.
public interface ReactivePersonRepository extends ReactiveCrudRepository<Person, String> {
Flux<Person> findByLastname(String lastname);
#Query("{ 'firstname': ?0, 'lastname': ?1}")
Mono<Person> findByFirstnameAndLastname(String firstname, String lastname);
// Accept parameter inside a reactive type for deferred execution
Flux<Person> findByLastname(Mono<String> lastname);
Mono<Person> findByFirstnameAndLastname(Mono<String> firstname, String lastname);
#InfiniteStream // Use a tailable cursor
Flux<Person> findWithTailableCursorBy();
}
public interface RxJava1PersonRepository extends RxJava1CrudRepository<Person, String> {
Observable<Person> findByLastname(String lastname);
#Query("{ 'firstname': ?0, 'lastname': ?1}")
Single<Person> findByFirstnameAndLastname(String firstname, String lastname);
// Accept parameter inside a reactive type for deferred execution
Observable<Person> findByLastname(Single<String> lastname);
Single<Person> findByFirstnameAndLastname(Single<String> firstname, String lastname);
#InfiniteStream // Use a tailable cursor
Observable<Person> findWithTailableCursorBy();
}

Spring Data + MongoDB GridFS access via Repository possible?

I recently discovered GridFS which I'd like to use for file storage with metadata. I just wondered if it's possible to use a MongoRepository to query GridFS? If yes, can someone give me an example?
I'd also take a solution using Hibernate, if there is some.
The reason is: My metadata contains a lot of different fields and it would be much easier to query a repository than to write some new Query(Criteria.where(...)) for each scenario. And I hopefully could also simply take a Java object and provide it via REST API without the file itself.
EDIT: I'm using
Spring 4 Beta
Spring Data Mongo 1.3.1
Hibernate 4.3 Beta
There is a way to solve this:
#Document(collection="fs.files")
public class MyGridFsFile {
#Id
private ObjectId id;
public ObjectId getId() { return id; }
private String filename;
public String getFilename() { return filename; }
private long length;
public long getLength() { return length; }
...
}
You can write a normal Spring Mongo Repo for that. Now you can at least query the fs.files collection using a Spring Data Repo. But: You cannot access the file contents this way.
For getting the file contents itself, you've got (at least) 2 options:
Use file = gridOperations.findOne(Query.query(Criteria.where("_id").is(id))); InputStream is = file.getInputStream();
Have a look at the source code of GridFSDBFile. There you can see, how it internally queries the fs.chunks collection and fills the InputStream.
(Option 2 is really low level, Option 1 is a lot easier and this code gets maintained by the MongoDB-Java-Driver devs, though Option 1 would be my choice).
Updating GridFS entries:
GridFS is not designed to update file content!
Though only updating the metadata field can be useful. The rest of the fields is kinda static.
You should be able to simply use your custom MyGridFsFileRepo's update method. I suggest to only create a setter for the metadata field.
Different metadata for different files:
I solved this using an abstract MyGridFsFile class with generic metadata, i.e.:
#Document(collection="fs.files")
public abstract class AbstractMyGridFsFile<M extends AbstractMetadata> {
...
private M metadata;
public M getMetadata() { return metadata; }
void setMetadata(M metadata) { this.metadata = metadata; }
}
And of course each impl has its own AbstractMetadata impl associated. What have I done? AbstractMetadata always has a field called type. This way I can find the right AbstractMyGridFsFile impl. Though I have also a generic abstract repository.
Btw: In the meantime I switched here from using Spring Repo, to use plain access via MongoTemplate, like:
protected List<A> findAll(Collection<ObjectId> ids) {
List<A> files = mongoTemplate.find(Query.query(Criteria
.where("_id").in(ids)
.and("metadata.type").is(type) // this is hardcoded for each repo impl
), typeClass); // this is the corresponding impl of AbstractMyGridFsFile
return files;
}
Hope this helps. I can write more, if you need more information about this. Just tell me.
You can create a GridFS object with the database from your MongoTemplate, and then interact with that:
MongoTemplate mongoTemplate = new MongoTemplate(new Mongo(), "GetTheTemplateFromSomewhere");
GridFS gridFS = new GridFS(mongoTemplate.getDb());
The GridFS object lets you create, delete and find etc.

Resources