Step is hang during the read() method - spring

I have a job and in one of the steps, the reader needs to handle a list of 6000 +/- objects. This is not a big volume of data and it's not the first time that I am working with this size of a volume.
For some reason, the reader can hang for an hour or even more after each chunk in the read() function. I don’t know what is wrong and have no idea how to debug this step.
#Autowired
private ModelMapper modelMapper;
#Value("${chunk.size}") //500
private int chunkSize;
#javax.annotation.Resource
private Environment environment;
private ItemReader<Device> delegate;
public void setDelegate(ItemReader<Device> delegate) {
this.delegate = delegate;
}
#Override
public void beforeStep(StepExecution stepExecution) {
List<AccountDeviceDto> outboundList = (List<AccountDeviceDto>) stepExecution.getJobExecution().getExecutionContext().get("outboundThatMightSwappedList");
List <DeviceDto> accountList = outboundList .stream().map(dto -> dto.getDevice()).collect(Collectors.toList());
Type listType = new TypeToken<List<Device>>() {}.getType();
List<Device> mapList = modelMapper.map(accountList,listType);
setDelegate(new IteratorItemReader<Device>(mapList));
}
#Override
public ExitStatus afterStep(StepExecution stepExecution) {
logger.info("** start afterStep **");
return ExitStatus.COMPLETED;
}
#Override
public Device read() throws Exception, UnexpectedInputException, ParseException, NonTransientResourceException {
return delegate.read();
}
#Data
#Entity
#Table(name = "Device")
public class Device {
#Id
#Column(name = "MacAddress")
private String macAddress;
......
.....
....
...
..
#Column(name = "LastModifiedDate")
private LocalDate lastModifiedDate;
#OneToMany(mappedBy = "device")
private List<AccountDevice> accountDeviceList;
}
Any idea how this can be debugged to figure what is the bottleneck.
Thank you

First thing you need to do is to find out which line of the code does it hang at and then troubleshoot from there.
Firstly find out the process ID of your spring-batch app by :
jps -v
Assuming the process ID is 1234 , then use jstack to print out the stack trace of the java thread for this process. You could then see the related threads are hanging at which lines of the codes :
jstack 1234

You seem to be putting the list of items in the execution context in a previous step and then reading them from the execution context with a delegate IteratorItemReader.
The execution context is persisted between steps, and what's happening is that your big items list is taking time to be deserialized before even starting to read the first item. I'm pretty sure this is the cause of your issue, but you can debug/time the beforeStep method to confirm that.

Related

Using DelegatingSessionFactory with RemoteFileTemplate.execute(SessionCallback)

I'm trying to declare multiple SFTP sessions, wrap them in a DelegatingSessionFactory, then later use SftpRemoteFileTemplate.execute(...) during a cron job.
On the execute part of things, the code is very simple, it is already used for a single session, but I want to expand it to multiple possible sessions.
Below I extended my single session code. I just copied the methods for reference. At the end I'll show how I think the new methods should look.
public class XSession extends SftpSession {
#Scheduled(cron = "${sftp.scan.x.schedule}")
void scan() {
List<FileHistoryEntity> fileList = template.execute(this::processFiles);
...
}
private List<FileHistoryEntity> processFiles(Session<ChannelSftp.LsEntry> session) {
List.of(session.list(this.remoteDir)).forEach(file -> doWhatever());
...
}
}
But now I have multiple sessions. So I declare the following class:
#Slf4j
#Configuration
#RequiredArgsConstructor
public class DelegateSftpSessionHandler {
private final SessionFactory<ChannelSftp.LsEntry> session1;
private final SessionFactory<ChannelSftp.LsEntry> session2;
private final SessionFactory<ChannelSftp.LsEntry> session3;
private final SessionFactory<ChannelSftp.LsEntry> session4;
private final SessionFactory<ChannelSftp.LsEntry> session5;
#RequiredArgsConstructor
public enum DelegateSessionConfig {
SESSION_1("IN_REALITY_A_RELEVANT_NAME_1");
SESSION_2("IN_REALITY_A_RELEVANT_NAME_2");
SESSION_3("IN_REALITY_A_RELEVANT_NAME_3");
SESSION_4("IN_REALITY_A_RELEVANT_NAME_4");
SESSION_5("IN_REALITY_A_RELEVANT_NAME_5");
public final String threadKey;
}
#Bean
#Primary
public DelegatingSessionFactory<ChannelSftp.LsEntry> delegatingSessionFactory() {
Map<Object, SessionFactory<ChannelSftp.LsEntry>> sessionMap = new HashMap<>();
sessionMap.put(DelegateSessionConfig.SESSION_1.threadKey, session1);
sessionMap.put(DelegateSessionConfig.SESSION_2.threadKey, session2);
sessionMap.put(DelegateSessionConfig.SESSION_3.threadKey, session3);
sessionMap.put(DelegateSessionConfig.SESSION_4.threadKey, session4);
sessionMap.put(DelegateSessionConfig.SESSION_5.threadKey, session5);
DefaultSessionFactoryLocator<ChannelSftp.LsEntry> sessionLocator = new DefaultSessionFactoryLocator<>(sessionMap);
return new DelegatingSessionFactory<>(sessionLocator);
}
#Bean
SftpRemoteFileTemplate ftpRemoteFileTemplate(DelegatingSessionFactory<ChannelSftp.LsEntry> dsf) {
return new SftpRemoteFileTemplate(dsf);
}
}
Ting is, I have no idea how any of this works, and the spring sftp / fpt documentation is by no means clear. The code is virtually undocumented. And I'm just guessing. I think that I have to do the following:
public class XSession extends SftpSession {
#Autowire
DelegatingSessionFactory<ChannelSftp.LsEntry> delegatingSessionFactory;
#Autowired
SftpRemoteFileTemplate template;
#Scheduled(cron = "${sftp.scan.x.schedule}") // x == SESSION_1
#Async // for thread key
void scan() {
delegatingSessionFactory.setThreadKey(DelegateSessionConfig.SESSION_1.threadKey);
// because thread key changes the session globally? So I don't need specify
// which session this template is working with???
List<FileHistoryEntity> fileList = template.execute(this::processFiles);
...
delegatingSessionFactory.clearThreadKey();
}
private List<FileHistoryEntity> processFiles(Session<ChannelSftp.LsEntry> session) {
List.of(session.list(this.remoteDir)).forEach(file -> doWhatever());
...
}
}
I'm basing what I'm saying on the following link, github spring integration test
Honestly, I hardly understand what is happening. But it seems like setting the thread key, changes the session globally.
My only other idea is to just ... create the RemoteFileTemplate on demand
public static SftpRemoteFileTemplate getTemplateFor(DelegatingSessionFactory<ChannelSftp.LsEntry> dsf, DelegateSessionConfig session) {
return new SftpRemoteFileTemplate(dsf.getFactoryLocator().getSessionFactory(session.threadKey));
}
It does not set it globally. That's how a ThreadLocal variable works: you set a value in some thread and only this thread can see it. If you use the same object concurrently, other threads don't see that value because it does not belong to their thread state.
Not sure what is your concern, but pattern to extend an SftpSession for custom logic is not right. You should consider to use an SftpRemoteFileTemplate.execute(SessionCallback<F, T> callback) instead, but thread key must be set into a DelegatingSessionFactory before anyway and in the same thread you going to call that execute().

How to use StepListenerSupport

I am trying to stop a running job based on timeout value. I am following a post found here, but I am not sure how you add this listener.
Here is the listener implementation
public class StopListener extends StepListenerSupport{
public static final Logger LOG = LoggerFactory.getLogger(StopListener.class);
private static final int TIMEOUT = 30;
private StepExecution stepExecution;
#Override
public void beforeStep(StepExecution stepExecution) {
this.stepExecution = stepExecution;
}
#Override
public void afterChunk(ChunkContext context) {
if (timeout(context)) {
this.stepExecution.setTerminateOnly();
}
}
private boolean timeout(ChunkContext chunkContext) {
LOG.info("----- TIMEOUT-----");
Date startTime = chunkContext.getStepContext().getStepExecution().getJobExecution().getStartTime();
Date now = new Date();
return Duration.between(startTime.toInstant(), now.toInstant()).toMinutes() > TIMEOUT;
}
}
Here is my step
#Bean
public Step dataFilterStep() {
return stepBuilderFactory.get("dataFilterStep")
.<UserInfo, UserInfo> chunk(10)
.reader(dataFilterItemReader())
.processor(dataFilterItemProcessor())
.writer(dataFilterWriter())
.listener(new StopListener())
.build();
}
But I am getting error saying "The method listener(Object) is ambiguous for the type SimpleStepBuilder<UserInfo,UserInfo>". A help would be really appreciated!
On one hand, StepListenerSupport is a polymorphic object, it implements 7 interfaces. On the other hand, the step builder provides several overloaded .listener() methods to accept different types of listeners. That's why when you pass your StopListener in .listener(new StopListener()), the type of listener is ambiguous.
What you can do is cast the listener to the type you want, something like:
.listener(((ChunkListener) new StopListener()))
However, by following the principle of least power [1][2], I would recommend changing your StopListener to implement only the interface required for the functionality. In your case, you seem to want to stop the job after a given timeout in afterChunk, so you can make your listener implement ChunkListener and not extend StepListenerSupport.
[1]: The Rule of Least Power
[2]: The Principle of Least Power

Why is exception in Spring Batch AsycItemProcessor caught by SkipListener's onSkipInWrite method?

I'm writing a Spring Boot application that starts up, gathers and converts millions of database entries into a new streamlined JSON format, and then sends them all to a GCP PubSub topic. I'm attempting to use Spring Batch for this, but I'm running into trouble implementing fault tolerance for my process. The database is rife with data quality issues, and sometimes my conversions to JSON will fail. When failures occur, I don't want the job to immediately quit, I want it to continue processing as many records as it can and, before completion, to report which exact records failed so that I, and or my team, can examine these problematic database entries.
To achieve this, I've attempted to use Spring Batch's SkipListener interface. But I'm also using an AsyncItemProcessor and an AsyncItemWriter in my process, and even though the exceptions are occurring during the processing, the SkipListener's onSkipInWrite() method is catching them - rather than the onSkipInProcess() method. And unfortunately, the onSkipInWrite() method doesn't have access to the original database entity, so I can't store its ID in my list of problematic DB entries.
Have I misconfigured something? Is there any other way to gain access to the objects from the reader that failed the processing step of an AsynItemProcessor?
Here's what I've tried...
I have a singleton Spring Component where I store how many DB entries I've successfully processed along with up to 20 problematic database entries.
#Component
#Getter //lombok
public class ProcessStatus {
private int processed;
private int failureCount;
private final List<UnexpectedFailure> unexpectedFailures = new ArrayList<>();
public void incrementProgress { processed++; }
public void logUnexpectedFailure(UnexpectedFailure failure) {
failureCount++;
unexpectedFailure.add(failure);
}
#Getter
#AllArgsConstructor
public static class UnexpectedFailure {
private Throwable error;
private DBProjection dbData;
}
}
I have a Spring batch Skip Listener that's supposed to catch failures and update my status component accordingly:
#AllArgsConstructor
public class ConversionSkipListener implements SkipListener<DBProjection, Future<JsonMessage>> {
private ProcessStatus processStatus;
#Override
public void onSkipInRead(Throwable error) {}
#Override
public void onSkipInProcess(DBProjection dbData, Throwable error) {
processStatus.logUnexpectedFailure(new ProcessStatus.UnexpectedFailure(error, dbData));
}
#Override
public void onSkipInWrite(Future<JsonMessage> messageFuture, Throwable error) {
//This is getting called instead!! Even though the exception happened during processing :(
//But I have no access to the original DBProjection data here, and messageFuture.get() gives me null.
}
}
And then I've configured my job like this:
#Configuration
public class ConversionBatchJobConfig {
#Autowired
private JobBuilderFactory jobBuilderFactory;
#Autowired
private StepBuilderFactory stepBuilderFactory;
#Autowired
private TaskExecutor processThreadPool;
#Bean
public SimpleCompletionPolicy processChunkSize(#Value("${commit.chunk.size:100}") Integer chunkSize) {
return new SimpleCompletionPolicy(chunkSize);
}
#Bean
#StepScope
public ItemStreamReader<DbProjection> dbReader(
MyDomainRepository myDomainRepository,
#Value("#{jobParameters[pageSize]}") Integer pageSize,
#Value("#{jobParameters[limit]}") Integer limit) {
RepositoryItemReader<DbProjection> myDomainRepositoryReader = new RepositoryItemReader<>();
myDomainRepositoryReader.setRepository(myDomainRepository);
myDomainRepositoryReader.setMethodName("findActiveDbDomains"); //A native query
myDomainRepositoryReader.setArguments(new ArrayList<Object>() {{
add("ACTIVE");
}});
myDomainRepositoryReader.setSort(new HashMap<String, Sort.Direction>() {{
put("update_date", Sort.Direction.ASC);
}});
myDomainRepositoryReader.setPageSize(pageSize);
myDomainRepositoryReader.setMaxItemCount(limit);
// myDomainRepositoryReader.setSaveState(false); <== haven't figured out what this does yet
return myDomainRepositoryReader;
}
#Bean
#StepScope
public ItemProcessor<DbProjection, JsonMessage> dataConverter(DataRetrievalSerivice dataRetrievalService) {
//Sometimes throws exceptions when DB data is exceptionally weird, bad, or missing
return new DbProjectionToJsonMessageConverter(dataRetrievalService);
}
#Bean
#StepScope
public AsyncItemProcessor<DbProjection, JsonMessage> asyncDataConverter(
ItemProcessor<DbProjection, JsonMessage> dataConverter) throws Exception {
AsyncItemProcessor<DbProjection, JsonMessage> asyncDataConverter = new AsyncItemProcessor<>();
asyncDataConverter.setDelegate(dataConverter);
asyncDataConverter.setTaskExecutor(processThreadPool);
asyncDataConverter.afterPropertiesSet();
return asyncDataConverter;
}
#Bean
#StepScope
public ItemWriter<JsonMessage> jsonPublisher(GcpPubsubPublisherService publisherService) {
return new JsonMessageWriter(publisherService);
}
#Bean
#StepScope
public AsyncItemWriter<JsonMessage> asyncJsonPublisher(ItemWriter<JsonMessage> jsonPublisher) throws Exception {
AsyncItemWriter<JsonMessage> asyncJsonPublisher = new AsyncItemWriter<>();
asyncJsonPublisher.setDelegate(jsonPublisher);
asyncJsonPublisher.afterPropertiesSet();
return asyncJsonPublisher;
}
#Bean
public Step conversionProcess(SimpleCompletionPolicy processChunkSize,
ItemStreamReader<DbProjection> dbReader,
AsyncItemProcessor<DbProjection, JsonMessage> asyncDataConverter,
AsyncItemWriter<JsonMessage> asyncJsonPublisher,
ProcessStatus processStatus,
#Value("${conversion.failure.limit:20}") int maximumFailures) {
return stepBuilderFactory.get("conversionProcess")
.<DbProjection, Future<JsonMessage>>chunk(processChunkSize)
.reader(dbReader)
.processor(asyncDataConverter)
.writer(asyncJsonPublisher)
.faultTolerant()
.skipPolicy(new MyCustomConversionSkipPolicy(maximumFailures))
// ^ for now this returns true for everything until 20 failures
.listener(new ConversionSkipListener(processStatus))
.build();
}
#Bean
public Job conversionJob(Step conversionProcess) {
return jobBuilderFactory.get("conversionJob")
.start(conversionProcess)
.build();
}
}
This is because the future wrapped by the AsyncItemProcessor is only unwrapped in the AsyncItemWriter, so any exception that might occur at that time is seen as a write exception instead of a processing exception. That's why onSkipInWrite is called instead of onSkipInProcess.
This is actually a known limitation of this pattern which is documented in the Javadoc of the AsyncItemProcessor, here is an excerpt:
Because the Future is typically unwrapped in the ItemWriter,
there are lifecycle and stats limitations (since the framework doesn't know
what the result of the processor is).
While not an exhaustive list, things like StepExecution.filterCount will not
reflect the number of filtered items and
itemProcessListener.onProcessError(Object, Exception) will not be called.
The Javadoc states that the list is not exhaustive, and the side-effect regarding the SkipListener that you are experiencing is one these limitations.

Spring Transactional method not working properly (not saving db)

I have spent day after day trying to find a solution for my problem with Transactional methods. The logic is like this:
Controller receive request, call queueService, put it in a PriorityBlockingQueue and another thread process the data (find cards, update status,assign to current game, return data)
Controller:
#RequestMapping("/queue")
public DeferredResult<List<Card>> queueRequest(#Params...){
queueService.put(result, size, terminal, time)
result.onCompletion(() -> assignmentService.assignCards(result, game,room, cliente));
}
QueueService:
#Service
public class QueueService {
private BlockingQueue<RequestQueue> queue = new PriorityBlockingQueue<>();
#Autowired
GameRepository gameRepository;
#Autowired
TerminalRepository terminalRepository;
#Autowired
RoomRpository roomRepository;
private long requestId = 0;
public void put(DeferredResult<List<Card>> result, int size, String client, LocalDateTime time_order){
requestId++;
--ommited code(find Entity: game, terminal, room)
try {
RequestQueue request= new RequestCola(requestId, size, terminal,time_order, result);
queue.put(request);
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
CardService:
#Transactional
public class CardService {
#Autowired
EntityManager em;
#Autowired
CardRepository cardRepository;
#Autowired
AsignService asignacionService;
public List<Cards> processRequest(int size, BigDecimal value)
{
List<Card> carton_query = em.createNativeQuery("{call cards_available(?,?,?)}",
Card.class)
.setParameter(1, false)
.setParameter(2, value)
.setParameter(3, size).getResultList();
List<String> ids = new ArrayList<String>();
carton_query.forEach(action -> ids.add(action.getId_card()));
String update_query = "UPDATE card SET available=true WHERE id_card IN :ids";
em.createNativeQuery(update_query).setParameter("ids", ids).executeUpdate();
return card_query;
}
QueueExecutor (Consumer)
#Component
public class QueueExecute {
#Autowired
QueueService queueRequest;
#Autowired
AsignService asignService;
#Autowired
CardService cardService;
#PostConstruct
public void init(){
new Thread(this::execute).start();
}
private void execute(){
while (true){
try {
RequestQueue request;
request = queueRequest.take();
if(request != null) {
List<Card> cards = cardService.processRequest(request.getSize(), new BigDecimal("1.0"));
request.getCards().setResult((ArrayList<Card>) cards);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
AssignService:
#Transactional
public void assignCards(DeferredResult<List<Card>> cards, Game game, Room room, Terminal terminal)
{
game = em.merge(game);
room = em.merge(room);
terminal = em.merge(terminal);
Order order = new Order();
LocalDateTime datetime = LocalDateTime.now();
BigDecimal total = new BigDecimal("0.0");
order.setTime(datetime)
order.setRoom(room);
order.setGame(game);
order.setId_terminal(terminal);
for(Card card: (List<Card>)cards.getResult()) {
card= em.merge(card)
--> System.out.println("CARD STATUS" + card.getStatus());
// This shows the OLD value of the Card (not updated)
card.setOrder(order);
order.getOrder().add(card);
}
game.setOrder(order);
//gameRepository.save(game)
}
With this code, it does not save new Card status on DB but Game, Terminal and Room saves ok on DB (more or less...). If I remove the assignService, CardService saves the new status on DB correctly.
I have tried to flush manually, save with repo and so on... but the result is almost the same. Could anybody help me?
I think I found a solution (probably not the optimum), but it's more related to the logic of my program.
One of the main problems was the update of Card status property, because it was not reflected on the entity object. When the assignOrder method is called it received the old Card value because it's not possible to share information within Threads/Transactions (as far I know). This is normal within transactions because em.executeUpdate() only commits database, so if I want to get the updated entity I need to refresh it with em.refresh(Entity), but this caused performance to go down.
At the end I changed the logic: first create Orders (transactional) and then assign cards to the orders (transactional). This way works correctly.

Mockito given().willReturn() returns sporadic result

I am testing a simple logic using mockito-all 1.10.19 and spring-boot-starter-parent 2.0.4.RELEASE. I have a service, which determines whether the uploaded file has the same store codes or not. If it has, IllegalArgumentException is been thrown:
public class SomeService {
private final CutoffRepository cutoffRepository;
private final Parser<Cutoff> cutoffParser;
public void saveCutoff(MultipartFile file) throws IOException {
List<Cutoff> cutoffList = cutoffParser.parse(file.getInputStream());
boolean duplicateStoreFlag = cutoffList
.stream()
.collect(Collectors
.groupingBy(Cutoff::getStoreCode, Collectors.counting()))
.values()
.stream()
.anyMatch(quantity -> quantity > 1);
if (duplicateStoreFlag) {
throw new IllegalArgumentException("There are more than one line corresponding to the same store");
}
//Some saving logic is here
}
}
I mock up cutoffParser.parse() so, that it returns ArrayList<CutOff> with two elements within it:
#RunWith(SpringRunner.class)
#SpringBootTest
public class SomeServiceTest {
#Mock
private CutoffRepository cutoffRepository;
#Mock
private Parser<Cutoff> cutoffParser;
#InjectMocks
private SomeService someService;
#Test(expected = IllegalArgumentException.class)
public void saveCutoffCurruptedTest() throws Exception {
Cutoff cutoff1 = new Cutoff();
cutoff1.setStoreCode(1);
Cutoff cutoff2 = new Cutoff();
//corruption is here: the same storeCode
cutoff2.setStoreCode(1);
List<Cutoff> cutoffList = new ArrayList<>();
cutoffList.add(cutoff1);
cutoffList.add(cutoff2);
MockMultipartFile mockMultipartFile = new MockMultipartFile("file.csv", "file".getBytes());
//here what I expect to mock up a response with the list
given(cutoffParser.parse(any())).willReturn(cutoffList);
someService.saveCutoff(mockMultipartFile);
}
}
But the behavior I encounter is sporadic. The test is passed from time to time. During debugging I sometimes get list of size 2, sometimes get list of size 0. What is the reason of such an unpredictable behavior?
I am definitely missing something. Any help is highly appreciated.
P.S. the same situation using IntelliJ Idea and Ubuntu terminal.
Supposedly, the reason is pointed out here https://github.com/mockito/mockito/issues/1066. #InjectMocks and #Mock<...> cause test to fail occasionally.

Resources