I would like to unify (merge) an item based on 2 differents sources to read (flat file and DB).
How I can perform that using Spring Batch ItemReader ?
public class User {
Long id;
String firstName;
String lastName;
String subscriptionCode;
}
...
stepBuilderFactory.get("createUserStep1")
.<User, User>chunk(1000)
.reader(flatFileReader) // Read line from file and create partial user object (only get id, subscriptionCode)
// Here - How to read DB using User.id read previously and extend User object with additionnal data
.processor(new SkipDuplicatedUserItemProcessor()) // processor skip duplicated user
.writer(itemWriter) // Write somewhere
.build();
This problem should be solved using a reader and a processor.
Use a FlatFileItemReader to read your incomplete object and a ItemProcessor<YourObject,YourObject> to complete your object with missing informations.
Related
I have a spring app, that pushes data in an s3 bucket.
public class Ebook implements Serializable {
#Column(name= "cover_path", unique = true, nullable = true)
private String coverPath;
private String coverDownloadUrl;
#Value("${aws.cloudfront.region}")
private String awsCloudFrontDns;
#PostLoad
public void init(){
// I want to access the property here
System.out.println("PostConstruct");
String coverDownloadUrl = "https://"+awsCloudFrontDns+"/"+coverPath;
}
When a data is pushed, let's say my cover here, I get the key 1/test-folder/mycover.jpg which is the important part of the future http URL of the data.
When I read the data from database, I enter inside #PostLoad method and I want construct the complete URL using the cloudfront value. This value changes frequently so we don't want to save hardly in the database.
How could I do to construct my full path just after reading the data in database?
The only way to do this is to use a service that update the data after using repository to read it? For readbyId it can be a good solution, but for reading list or using other jpa methods, this solutions won't work because I have each time to create a dedicated service for the update.
It doesn't look good for Entity to depend on property.
How about EntityListener.
#Component
public class EbookEntityListener {
#Value("${aws.cloudfront.region}")
private String awsCloudFrontDns;
#PostLoad
void postload(Ebook entity) { entity.updateDns(awsCloudFrontDns); }
}
I recommend trying this way :)
I am trying to Map a JSON response to a Java POJO which has a different field name from different API.
I need an efficient way to do this reducing boilerplate codes.
I have tried mapping the JSON property field in Java POJO.
However, the problem is I am fetching data from different sources.
Let's say I have below user class
Class User{
String name;
String contact;
}
The JSON I may receive from different sources can be
{"name": "ABC" , "contact": "123456"}
or
{"userName": "XYZ" , "mobileNo":"4354665"}
There may be more variations as we go on integrating more API's
Is there a way I can archive this?
above is just a simple example
there could be more complex JSON object I may need to read.
like List of User etc.
You can use the #JsonAlias() to give the variable more than one JSON key binding.
#JsonAlias is introduced in Jackson 2.9 release. #JsonAlias defines one or more alternative names for a property to be accepted during deserialization i.e. setting JSON data to Java object. But at the time of serialization i.e. while getting JSON from Java object, only actual logical property name is used and not alias. #JsonAlias is defined as follows.
#Entity
Class User{
#JsonProperty()
#JsonAlias({"name", "userName"})
String name;
#JsonProperty()
#JsonAlias({"contact", "mobileNo"})
String contact;
}
You could use the #JsonSetter annotation like :
public class User{
public String contact;
public String name;
#JsonSetter("name")
public void setName(String name) {
this.name = name;
}
#JsonSetter("userName")
public void setName(String name) {
this.name = name;
}
}
Instead of directly mapping to an entity class , you should have a DTO object or model in between to map the json response. Then, you can convert that into any entity you may choose.If you are fetching the data from different sources , it means you are calling different endpoints, why don't you create different DTO 's for that.In that way even if one of the endpoints introduce a change , it won't affect the rest of the endpoint calls.
Vice-versa you could have different DTO objects being returned from the two endpoints instead of returning the same Entity class as well, that way you can have control over which attributes should be there in the response.
To reduce the boiler plate code, you could use library such as MAP STRUCT to enable conversion between entity and DTO objects easily
Read here about the advantages of using a DTO .
Using spring-data-jpa and working on getting data out of table where there are about a dozen columns which are used in queries to find particular rows, and then a payload column of clob type which contains the actual data that is marshalled into java objects to be returned.
Entity object very roughly would be something like
#Entity
#Table(name = "Person")
public class Person {
#Column(name="PERSON_ID", length=45) #Id private String personId;
#Column(name="NAME", length=45) private String name;
#Column(name="ADDRESS", length=45) private String address;
#Column(name="PAYLOAD") #Lob private String payload;
//Bunch of other stuff
}
(Whether this approach is sensible or not is a topic for a different discussion)
The clob column causes performance to suffer on large queries ...
In an attempt to improve things a bit, I've created a separate entity object ... sans payload ...
#Entity
#Table(name = "Person")
public class NotQuiteAWholePerson {
#Column(name="PERSON_ID", length=45) #Id private String personId;
#Column(name="NAME", length=45) private String name;
#Column(name="ADDRESS", length=45) private String address;
//Bunch of other stuff
}
This gets me a page of NotQuiteAPerson ... I then query for the page of full person objects via the personIds.
The hope is that in not using the payload in the original query, which could filtering data over a good bit of the backing table, I only concern myself with the payload when I'm retrieving the current page of objects to be viewed ... a much smaller chunk.
So I'm at the point where I want to map the contents of the original returned Page of NotQuiteAWholePerson to my List of Person, while keeping all the Paging info intact, the map method however only takes a Converter which will iterate over the NotQuiteAWholePerson objects ... which doesn't quite fit what I'm trying to do.
Is there a sensible way to achieve this ?
Additional clarification for #itsallas as to why existing map() will not suffice..
PageImpl::map has
#Override
public <S> Page<S> map(Converter<? super T, ? extends S> converter) {
return new PageImpl<S>(getConvertedContent(converter), pageable, total);
}
Chunk::getConvertedContent has
protected <S> List<S> getConvertedContent(Converter<? super T, ? extends S> converter) {
Assert.notNull(converter, "Converter must not be null!");
List<S> result = new ArrayList<S>(content.size());
for (T element : this) {
result.add(converter.convert(element));
}
return result;
}
So the original List of contents is iterated through ... and a supplied convert method applied, to build a new list of contents to be inserted into the existing Pageable.
However I cannot convert a NotQuiteAWholePerson to a Person individually, as I cannot simply construct the payload... well I could, if I called out to the DB for each Person by Id in the convert... but calling out individually is not ideal from a performance perspective ...
After getting my Page of NotQuiteAWholePerson I am querying for the entire List of Person ... by Id ... in one call ... and now I am looking for a way to substitute the entire content list ... not interively, as the existing map() does, but in a simple replacement.
This particular use case would also assist where the payload, which is json, is more appropriately persisted in a NoSql datastore like Mongo ... as opposed to the sql datastore clob ...
Hope that clarifies it a bit better.
You can avoid the problem entirely with Spring Data JPA features.
The most sensible way would be to use Spring Data JPA projections, which have good extensive documentation.
For example, you would first need to ensure lazy fetching for your attribute, which you can achieve with an annotation on the attribute itself.
i.e. :
#Basic(fetch = FetchType.LAZY) #Column(name="PAYLOAD") #Lob private String payload;
or through Fetch/Load Graphs, which are neatly supported at repository-level.
You need to define this one way or another, because, as taken verbatim from the docs :
The query execution engine creates proxy instances of that interface at runtime for each element returned and forwards calls to the exposed methods to the target object.
You can then define a projection like so :
interface NotQuiteAWholePerson {
String getPersonId();
String getName();
String getAddress();
//Bunch of other stuff
}
And add a query method to your repository :
interface PersonRepository extends Repository<Person, String> {
Page<NotQuiteAWholePerson> findAll(Pageable pageable);
// or its dynamic equivalent
<T> Page<T> findAll(Pageable pageable, Class<T>);
}
Given the same pageable, a page of projections would refer back to the same entities in the same session.
If you cannot use projections for whatever reason (namely if you're using JPA < 2.1 or a version of Spring Data JPA before projections), you could define an explicit JPQL query with the columns and relationships you want, or keep the 2-entity setup. You could then map Persons and NotQuiteAWholePersons to a PersonDTO class, either manually or (preferably) using your object mapping framework of choice.
NB. : There are a variety of ways to use and setup lazy/eager relations. This covers more in detail.
I have a Spring Boot project with multiple databases of different years and these databases have same tables so the only difference is the year (..., DB2016, DB2017). In the controller of the application i need to return data that belong to "different" years. Moreover in future years other databases will be created (eg. in 2018 there's going to be a db named "DB2018"). So my problem is how to switch the connection among databases without creating a new datasource and a new repository every new year.
In an other question posted by me (Spring Boot - Same repository and same entity for different databases) the answer was to create different datasources and different repositories for every existing database, but in this case i want to return data from existing databases on the basis of the current year. More specifically:
SomeEntity.java
#Entity(name = "SOMETABLE")
public class SomeEntity implements Serializable {
#Id
#Column(name="ID", nullable=false)
private Integer id;
#Column(name="NAME")
private String name;
}
SomeRepository.java
public interface SomeRepository extends PagingAndSortingRepository<SomeEntity, Integer> {
#Query(nativeQuery= true, value = "SELECT * FROM SOMETABLE WHERE NAME = ?1")
List<SomeEntity> findByName(String name);
}
SomeController.java
#RequestMapping(value="/foo/{name}", method=RequestMethod.GET)
public ResponseEntity<List<SomeEntity>> findByName(#PathVariable("name") String name) {
List<SomeEntity> list = autowiredRepo.findByName(name);
return new ResponseEntity<List<SomeEntity>>(list,HttpStatus.OK);
}
application.properties
spring.datasource.url=jdbc:postgresql://localhost:5432/DB
spring.datasource.username=xxx
spring.datasource.password=xxx
So if the current year is 2017 i want something like this:
int currentyear = Calendar.getInstance().get(Calendar.YEAR);
int oldestDbYear = 2014;
List<SomeEntity> listToReturn = new LinkedList<SomeEntity>();
//the method getProperties is a custom method to get properties from a file
String url = getProperties("application.properties", "spring.datasource.url");
props.setProperty("user", getProperties("application.properties","spring.datasource.username"));
props.setProperty("password", getProperties("application.properties","spring.datasource.password"));
for (int i = currentYear, i>oldestDbYear, i--) {
//this is the connection that must be used by autowiredRepo Repository, but i don't know how to do this.
//So the repository uses different connection for every year.
Connection conn = getConnection(url+year,props);
List<SomeEntity> list_of_specific_year = autowiredRepo.findByName(name);
conn.close;
listToReturn.addAll(list_of_specific_year);
}
return listToReturn;
Hope everithing is clear
The thing that is probably most suitable to your needs here is Spring's AbstractRoutingDataSource. You do need to define multiple DataSources but you will only need a single repository. Multiple data sources is not an issue here as there is always a way to create the DataSource beans programatically at run time and register them with the application context.
How it works is you basically register a Map<Object, DataSource> inside your #Configuration class when creating your AbstractRoutingDataSource #Bean and in this case the lookup key would be the year.
Then you need create a class that implements AbstractRoutingDataSource and implement the determineCurrentLookupKey() method. Anytime a database call is made, this method is called in the current context to lookup which DataSource should be returned. In your case it sounds like you simply want to have the year as a #PathVariable in the URL and then as the implementation of determineCurrentLookupKey() grab that #PathVariable out of the URL (e.g in your controller you have mappings like #GetMapping("/{year}/foo/bar/baz")).
HttpServletRequest request = ((ServletRequestAttributes)RequestContextHolder
.getRequestAttributes()).getRequest();
HashMap templateVariables =
(HashMap)request.getAttribute(HandlerMapping.URI_TEMPLATE_VARIABLES_ATTRIBUTE);
return templateVariables.get("year");
I used this approach when writing a testing tool for a product where there were many instances running on multiple different servers and I wanted a unified programming model from my #Controllers but still wanted it to be hitting the right database for the server/deployment combination in the url. Worked like a charm.
The drawback if you are using Hibernate is that all connections will go through a single SessionFactory which will mean you can't take advantage of Hibernate's 2nd level caching as I understand it, but I guess that depends on your needs.
I'm trying to get the created datetime of the last created item in a mongodb repository.
I could obviously use a findAll(Sort sort) function, and get the first element, but this would not be very practical on a large database.
Mongo queries do not support an "orderBy" query method so this is also not a solution.
The order of creation is in chronological order of "created" so if I can get the last created document in the collection that would be good too.
So my question is:
What is the best way to retrieve the last created document in a mongodb repo using Spring data?
My current code:
#Data
#Document
public class Batch
{
#Id
String id;
LocalDateTime created;
//other stuff
}
public interface BatchRepository extends MongoRepository<Batch,String>
{
//this does not work
//Batch findOneOrderByCreatedDesc();
}
Try the following one, it should work well
public interface BatchRepository extends MongoRepository<Batch,String>
{
Batch findTopByOrderByCreatedDesc();
}
Notice that the method name slightly differs from your variant, this difference is important as spring parses the method name and builds a query based on the result of parsing.