performance issue with writing serializable objects to disk - performance

I have an app that works with lists of data. The data is held in three classes as shown below. My problem is reading and writing the data to disk takes too long for the app to be useful past about 1,000 entries. An example would be 1,000 flashcards. File on disk is about 200K and takes about six seconds to load. An impressive 35k a second. I was hoping to be able to support ten's of thousands of entries but clearly users'attention span would time out with a minute long wait. A surprisingly long part of the time is spent after the data has been read and loaded into a linerlayout, which is in a scrollview, and the time the screen refreshes. This is about 3 of the six seconds. I've been looking at different alternatives such as parcelable (can't write to disk) Kyro and others but the benchmarks are not that impressive. If anyone can offer me advice or guidance I would really appreciate it. Code works great - just really slow. Here are the data structures of the classes and the code I use to write and read.
Thanks,
Chris
public class iList extends Activity implements Serializable {
String listName;
int lastPosition;
long dateSaved;
String listPath;
List<iSet> listLayout=new ArrayList<iSet>();
List<String> operationsHistory = new ArrayList<String>();
List<iSet>setList= new ArrayList<iSet>();
public class iSet extends Activity implements Serializable {
private static final long serialVersionUID = 2L;
String setName;
List<Objecti> setObjectis = new ArrayList<Objecti>();
List<iList> setLists = new ArrayList<iList>();
boolean hasList=false;
int listCount=0;
public class Objecti extends Activity implements Serializable {
private static final long serialVersionUID = 2L;
private String objectName;
private int objectType;
private String stringValue;
private Date dateValue;
private float floatValue;
private int integerValue;
private ImageView image;
public iList readList(String pathName) {
File f = new File(pathName);
iList list = new iList();
ObjectInputStream objectinputstream = null;
try {
FileInputStream fis = new FileInputStream(pathName);
if (fis.available() > 0) {
ObjectInputStream ois = new ObjectInputStream(fis);
list = (iList) ois.readObject();
ois.close();
}
} catch (Exception e) {
e.printStackTrace();
} finally {
if (objectinputstream != null) {
}
}
return list;
}
public void save(String filePath){
try {
File f = new File(filePath);
FileOutputStream fout = new FileOutputStream(filePath);
ObjectOutputStream oos = new ObjectOutputStream(fout);
oos.writeObject(this);
oos.flush();
oos.close();
} catch (IOException e) {
e.printStackTrace();
}
}

Looking at the class structure. You may think of following
1. Reduce the no of data members being serialized. This you can do by providing a good design for this.
2. Below are the association relations I could infer
iSet = { *objectI, *iList}
iList= {*iSet }
I could not make out the context why would this type of association is required. This looks like a cycle which may result in duplicate data being serialzed. By providing a design you can serialize only minimum data required. Remaining you can re-compute after de-serialization is done. This will shorten each IO operation.

After much research the answer I've come up with is you can't make objectinput/output any faster without doing it yourself. Because of the structure of my data one iList will have many of one type of iSet which has multiple objects with the data residing in the object. So I write all of my objects to a .txt file and when I need to restore the object I read them from the .txt and put the list back together again. The results are extreme - ranging from 15 - 20 times faster which more than meets my requirements.

Related

Step is hang during the read() method

I have a job and in one of the steps, the reader needs to handle a list of 6000 +/- objects. This is not a big volume of data and it's not the first time that I am working with this size of a volume.
For some reason, the reader can hang for an hour or even more after each chunk in the read() function. I don’t know what is wrong and have no idea how to debug this step.
#Autowired
private ModelMapper modelMapper;
#Value("${chunk.size}") //500
private int chunkSize;
#javax.annotation.Resource
private Environment environment;
private ItemReader<Device> delegate;
public void setDelegate(ItemReader<Device> delegate) {
this.delegate = delegate;
}
#Override
public void beforeStep(StepExecution stepExecution) {
List<AccountDeviceDto> outboundList = (List<AccountDeviceDto>) stepExecution.getJobExecution().getExecutionContext().get("outboundThatMightSwappedList");
List <DeviceDto> accountList = outboundList .stream().map(dto -> dto.getDevice()).collect(Collectors.toList());
Type listType = new TypeToken<List<Device>>() {}.getType();
List<Device> mapList = modelMapper.map(accountList,listType);
setDelegate(new IteratorItemReader<Device>(mapList));
}
#Override
public ExitStatus afterStep(StepExecution stepExecution) {
logger.info("** start afterStep **");
return ExitStatus.COMPLETED;
}
#Override
public Device read() throws Exception, UnexpectedInputException, ParseException, NonTransientResourceException {
return delegate.read();
}
#Data
#Entity
#Table(name = "Device")
public class Device {
#Id
#Column(name = "MacAddress")
private String macAddress;
......
.....
....
...
..
#Column(name = "LastModifiedDate")
private LocalDate lastModifiedDate;
#OneToMany(mappedBy = "device")
private List<AccountDevice> accountDeviceList;
}
Any idea how this can be debugged to figure what is the bottleneck.
Thank you
First thing you need to do is to find out which line of the code does it hang at and then troubleshoot from there.
Firstly find out the process ID of your spring-batch app by :
jps -v
Assuming the process ID is 1234 , then use jstack to print out the stack trace of the java thread for this process. You could then see the related threads are hanging at which lines of the codes :
jstack 1234
You seem to be putting the list of items in the execution context in a previous step and then reading them from the execution context with a delegate IteratorItemReader.
The execution context is persisted between steps, and what's happening is that your big items list is taking time to be deserialized before even starting to read the first item. I'm pretty sure this is the cause of your issue, but you can debug/time the beforeStep method to confirm that.

How can I see the current output of a running Storm topology?

Currently learning on how to use Storm (version 2.1.0), I am a bit confused on a specific aspect of this data streaming processing (DSP) engine: How is output data handled? Tutorials provide good explanations on system setup and running our first application. Unfortunately, I didn't find a page providing details on results generated by a topology.
With DSP applications, there are no final output because input data is a continuously incoming stream of data (or maybe we can say there is a final output when application is stopped). What I would like is to be able to see the state of current output (the actual output data generated at current time) of a running topology.
I'm able to run WordCountTopology. I understand the output of this topology is generated by the following snippet of code:
public static class WordCount extends BaseBasicBolt {
Map<String, Integer> counts = new HashMap<String, Integer>();
#Override
public void execute(Tuple tuple, BasicOutputCollector collector) {
String word = tuple.getString(0);
Integer count = counts.get(word);
if (count == null) {
count = 0;
}
count++;
counts.put(word, count);
collector.emit(new Values(word, count));
}
#Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("word", "count"));
}
}
My misunderstanding is on the location of the <"word":string, "count":int> output. Is it only in memory, written in a database somewhere, written in a file?
Going further with this question: what are the existing possibilities for storing in-progress output data? What is the "good way" of handling such data?
I hope my question is not too naive. And thanks to the StackOverflow community for always providing good help.
A few days have passed since I posted this question. I am back to share with you what I have tried. Although I cannot tell if it is the right way of doing, the two following propositions answer my question.
Simple System.out.println()
The first thing I've tried is to make a System.out.println("Hello World!") directly within the prepare() method of my BaseBasicBolt. This method is called only once at the beginning of each Bolt's thread execution.
public void prepare(Map topoConf, TopologyContext context) {
System.out.println("Hello World!");
}
The big challenge was to figure out where the log is written. By default, it is written within <storm installation folder>/logs/workers-artifacts/<topology name>/<worker-port>/worker.log where <worker-port> is the port of a requested worker/slot.
For instance, with conf.setNumWorkers(3), the topology requests an access to 3 workers (3 slots). Therefore, values of <worker-port> will be 6700, 6701 and 6702. Those values are the port numbers of the 3 slots (defined in storm.yaml under supervisor.slots.ports).
Note: you will have as many "Hello World!" as the parallel size of your BaseBasicBolt. When the split bolt is instantiated with builder.setBolt("split", new SplitSentence(), 8), it results in 8 parallel threads, each one writing its own log.
Writing to a file
For research purpose I have to analyse large amounts of logs that I need in a specific format. The solution I found is to append the logs to a specific file managed by each bolt.
Hereafter is my own implementation of this file logging solution for the count bolt.
public static class WordCount extends BaseBasicBolt {
private String workerName;
private FileWriter fw;
private BufferedWriter bw;
private PrintWriter out;
private String logFile = "/var/log/storm/count.log";
private Map<String, Integer> counts = new HashMap<String, Integer>();
public void prepare(Map topoConf, TopologyContext context) {
this.workerName = this.toString();
try {
this.fw = new FileWriter(logFile, true);
this.bw = new BufferedWriter(fw);
this.out = new PrintWriter(bw);
} catch (Exception e) {
System.out.println(e);
}
}
#Override
public void execute(Tuple tuple, BasicOutputCollector collector) {
String word = tuple.getString(0);
Integer count = counts.get(word);
if (count == null) {
count = 0;
}
count++;
counts.put(word, count);
collector.emit(new Values(word, count));
out.println(this.workerName + ": Hello World!");
}
#Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("word", "count"));
}
}
In this code, my log file is located in /var/log/storm/count.log and calling out.println(text) appends the text at this end of this file. As I am not sure if it is thread-safe, all parallel threads writing at the same time into the same file might result in data loss.
Note: if your bolts are distributed accros multiple machines, each machine is going to have its own log file. During my testings, I configured a simple cluster with 1 machine (running Nimbus + Supervisor + UI), therefore I had only 1 log file.
Conclusion
There are multiple ways to deal with output data and, more generally logging anything with Storm. I didn't find any official way of doing it and documentation very light on this subject.
While some of us would be satisfied with a simple sysout.println(), others might need to push large quantity of data into specific files, or maybe in a specialized database engine. Anything you can do with Java is possible with Storm because it's simple Java programming.
Any advices and additional comments to complete this answer will be gladly appreciated.

Spring Transactional method not working properly (not saving db)

I have spent day after day trying to find a solution for my problem with Transactional methods. The logic is like this:
Controller receive request, call queueService, put it in a PriorityBlockingQueue and another thread process the data (find cards, update status,assign to current game, return data)
Controller:
#RequestMapping("/queue")
public DeferredResult<List<Card>> queueRequest(#Params...){
queueService.put(result, size, terminal, time)
result.onCompletion(() -> assignmentService.assignCards(result, game,room, cliente));
}
QueueService:
#Service
public class QueueService {
private BlockingQueue<RequestQueue> queue = new PriorityBlockingQueue<>();
#Autowired
GameRepository gameRepository;
#Autowired
TerminalRepository terminalRepository;
#Autowired
RoomRpository roomRepository;
private long requestId = 0;
public void put(DeferredResult<List<Card>> result, int size, String client, LocalDateTime time_order){
requestId++;
--ommited code(find Entity: game, terminal, room)
try {
RequestQueue request= new RequestCola(requestId, size, terminal,time_order, result);
queue.put(request);
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
CardService:
#Transactional
public class CardService {
#Autowired
EntityManager em;
#Autowired
CardRepository cardRepository;
#Autowired
AsignService asignacionService;
public List<Cards> processRequest(int size, BigDecimal value)
{
List<Card> carton_query = em.createNativeQuery("{call cards_available(?,?,?)}",
Card.class)
.setParameter(1, false)
.setParameter(2, value)
.setParameter(3, size).getResultList();
List<String> ids = new ArrayList<String>();
carton_query.forEach(action -> ids.add(action.getId_card()));
String update_query = "UPDATE card SET available=true WHERE id_card IN :ids";
em.createNativeQuery(update_query).setParameter("ids", ids).executeUpdate();
return card_query;
}
QueueExecutor (Consumer)
#Component
public class QueueExecute {
#Autowired
QueueService queueRequest;
#Autowired
AsignService asignService;
#Autowired
CardService cardService;
#PostConstruct
public void init(){
new Thread(this::execute).start();
}
private void execute(){
while (true){
try {
RequestQueue request;
request = queueRequest.take();
if(request != null) {
List<Card> cards = cardService.processRequest(request.getSize(), new BigDecimal("1.0"));
request.getCards().setResult((ArrayList<Card>) cards);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
AssignService:
#Transactional
public void assignCards(DeferredResult<List<Card>> cards, Game game, Room room, Terminal terminal)
{
game = em.merge(game);
room = em.merge(room);
terminal = em.merge(terminal);
Order order = new Order();
LocalDateTime datetime = LocalDateTime.now();
BigDecimal total = new BigDecimal("0.0");
order.setTime(datetime)
order.setRoom(room);
order.setGame(game);
order.setId_terminal(terminal);
for(Card card: (List<Card>)cards.getResult()) {
card= em.merge(card)
--> System.out.println("CARD STATUS" + card.getStatus());
// This shows the OLD value of the Card (not updated)
card.setOrder(order);
order.getOrder().add(card);
}
game.setOrder(order);
//gameRepository.save(game)
}
With this code, it does not save new Card status on DB but Game, Terminal and Room saves ok on DB (more or less...). If I remove the assignService, CardService saves the new status on DB correctly.
I have tried to flush manually, save with repo and so on... but the result is almost the same. Could anybody help me?
I think I found a solution (probably not the optimum), but it's more related to the logic of my program.
One of the main problems was the update of Card status property, because it was not reflected on the entity object. When the assignOrder method is called it received the old Card value because it's not possible to share information within Threads/Transactions (as far I know). This is normal within transactions because em.executeUpdate() only commits database, so if I want to get the updated entity I need to refresh it with em.refresh(Entity), but this caused performance to go down.
At the end I changed the logic: first create Orders (transactional) and then assign cards to the orders (transactional). This way works correctly.

Long-running AEM EventListener working inconsistently - blacklisted?

As always, AEM has brought new challenges to my life. This time, I'm experiencing an issue where an EventListener that listens for ReplicationEvents is working sometimes, and normally just the first few times after the service is restarted. After that, it stops running entirely.
The first line of the listener is a log line. If it was running, it would be clear. Here's a simplified example of the listener:
#Component(immediate = true, metatype = false)
#Service(value = EventHandler.class)
#Property(
name="event.topics", value = ReplicationEvent.EVENT_TOPIC
)
public class MyActivityReplicationListener implements EventHandler {
#Reference
private SlingRepository repository;
#Reference
private OnboardingInterface onboardingService;
#Reference
private QueryInterface queryInterface;
private Logger log = LoggerFactory.getLogger(this.getClass());
private Session session;
#Override
public void handleEvent(Event ev) {
log.info(String.format("Starting %s", this.getClass()));
// Business logic
log.info(String.format("Finished %s", this.getClass()));
}
}
Now before you panic that I haven't included the business logic, see my answer below. The main point of interest is that the business logic could take a few seconds.
While crawling through the second page of Google search to find an answer, I came across this article. A German article explaining that EventListeners that take more than 5 seconds to finish are sort of silently quarantined by AEM with no output.
It just so happens that this task might take longer than 5 seconds, as it's working off data that was originally quite small, but has grown (and this is in line with other symptoms).
I put a change in that makes the listener much more like the one in that article - that is, it uses an EventConsumer to asynchronously process the ReplicationEvent using a pub/sub model. Here's a simplified version of the new model (for AEM 6.3):
#Component(immediate = true, property = {
EventConstants.EVENT_TOPIC + "=" + ReplicationEvent.EVENT_TOPIC,
JobConsumer.PROPERTY_TOPICS + "=" + AsyncReplicationListener.JOB_TOPIC
})
public class AsyncReplicationListener implements EventHandler, JobConsumer {
private static final String PROPERTY_EVENT = "event";
static final String JOB_TOPIC = ReplicationEvent.EVENT_TOPIC;
#Reference
private JobManager jobManager;
#Override
public JobConsumer.JobResult process (Job job) {
try {
ReplicationEvent event = (ReplicationEvent)job.getProperty(PROPERTY_EVENT);
// Slow business logic (>5 seconds)
} catch (Exception e) {
return JobResult.FAILED;
}
return JobResult.OK ;
}
#Override
public void handleEvent(Event event) {
final Map <String, Object> payload = new HashMap<>();
payload.put(PROPERTY_EVENT, ReplicationEvent.fromEvent(event));
final Job addJobResult = jobManager.addJob(JOB_TOPIC , payload);
}
}
You can see here that the EventListener passes off the ReplicationEvent wrapped up in a Job, which is then handled by the JobConsumer, which according to this magic article, is not subject to the 5 second rule.
Here is some official documentation on this time limit. Once I had the "5 seconds" key, I was able to a bit more information, here and here, that talk about the 5 second limit as well. The first article uses a similar method to the above, and the second article shows a way to turn off these time limits.
The time limits can be disabled entirely (or increased) in the configMgr by setting the Timeout property to zero in the Apache Felix Event Admin Implementation configuration.

Gson: How do I deserialize an inner JSON object to a map if the property name is not fixed?

My client retrieves JSON content as below:
{
"table": "tablename",
"update": 1495104575669,
"rows": [
{"column5": 11, "column6": "yyy"},
{"column3": 22, "column4": "zzz"}
]
}
In rows array content, the key is not fixed. I want to retrieve the key and value and save into a Map using Gson 2.8.x.
How can I configure Gson to simply use to deserialize?
Here is my idea:
public class Dataset {
private String table;
private long update;
private List<Rows>> lists; <-- little confused here.
or private List<HashMap<String,Object> lists
Setter/Getter
}
public class Rows {
private HashMap<String, Object> map;
....
}
Dataset k = gson.fromJson(jsonStr, Dataset.class);
log.info(k.getRows().size()); <-- I got two null object
Thanks.
Gson does not support such a thing out of box. It would be nice, if you can make the property name fixed. If not, then you can have a few options that probably would help you.
Just rename the Dataset.lists field to Dataset.rows, if the property name is fixed, rows.
If the possible name set is known in advance, suggest Gson to pick alternative names using the #SerializedName.
If the possible name set is really unknown and may change in the future, you might want to try to make it fully dynamic using a custom TypeAdapter (streaming mode; requires less memory, but harder to use) or a custom JsonDeserializer (object mode; requires more memory to store intermediate tree views, but it's easy to use) registered with GsonBuilder.
For option #2, you can simply add the names of name alternatives:
#SerializedName(value = "lists", alternate = "rows")
final List<Map<String, Object>> lists;
For option #3, bind a downstream List<Map<String, Object>> type adapter trying to detect the name dynamically. Note that I omit the Rows class deserialization strategy for simplicity (and I believe you might want to remove the Rows class in favor of simple Map<String, Object> (another note: use Map, try not to specify collection implementations -- hash maps are unordered, but telling Gson you're going to deal with Map would let it to pick an ordered map like LinkedTreeMap (Gson internals) or LinkedHashMap that might be important for datasets)).
// Type tokens are immutable and can be declared constants
private static final TypeToken<String> stringTypeToken = new TypeToken<String>() {
};
private static final TypeToken<Long> longTypeToken = new TypeToken<Long>() {
};
private static final TypeToken<List<Map<String, Object>>> stringToObjectMapListTypeToken = new TypeToken<List<Map<String, Object>>>() {
};
private static final Gson gson = new GsonBuilder()
.registerTypeAdapterFactory(new TypeAdapterFactory() {
#Override
public <T> TypeAdapter<T> create(final Gson gson, final TypeToken<T> typeToken) {
if ( typeToken.getRawType() != Dataset.class ) {
return null;
}
// If the actual type token represents the Dataset class, then pick the bunch of downstream type adapters
final TypeAdapter<String> stringTypeAdapter = gson.getDelegateAdapter(this, stringTypeToken);
final TypeAdapter<Long> primitiveLongTypeAdapter = gson.getDelegateAdapter(this, longTypeToken);
final TypeAdapter<List<Map<String, Object>>> stringToObjectMapListTypeAdapter = stringToObjectMapListTypeToken);
// And compose the bunch into a single dataset type adapter
final TypeAdapter<Dataset> datasetTypeAdapter = new TypeAdapter<Dataset>() {
#Override
public void write(final JsonWriter out, final Dataset dataset) {
// Omitted for brevity
throw new UnsupportedOperationException();
}
#Override
public Dataset read(final JsonReader in)
throws IOException {
in.beginObject();
String table = null;
long update = 0;
List<Map<String, Object>> lists = null;
while ( in.hasNext() ) {
final String name = in.nextName();
switch ( name ) {
case "table":
table = stringTypeAdapter.read(in);
break;
case "update":
update = primitiveLongTypeAdapter.read(in);
break;
default:
lists = stringToObjectMapListTypeAdapter.read(in);
break;
}
}
in.endObject();
return new Dataset(table, update, lists);
}
}.nullSafe(); // Making the type adapter null-safe
#SuppressWarnings("unchecked")
final TypeAdapter<T> typeAdapter = (TypeAdapter<T>) datasetTypeAdapter;
return typeAdapter;
}
})
.create();
final Dataset dataset = gson.fromJson(jsonReader, Dataset.class);
System.out.println(dataset.lists);
The code above would print then:
[{column5=11.0, column6=yyy}, {column3=22.0, column4=zzz}]

Resources