I have a bunch of threads, each creating an org.apache.qpid.client.AMQConnection and then a session.
public void run() {
Connection connection = new AMQConnection("amqp://*******:*****#clientid/test?brokerlist='tcp://********:****?sasl_mechs='ANONYMOUS''");
connection.start();
Session ssn = connection.createSession(false,Session.AUTO_ACKNOWLEDGE);
System.out.println(ssn.toString());
ssn.close();
connection.close();
}
On some runs, I get the same Session.hashCode() in two different threads like so:
org.apache.qpid.client.AMQSession_0_10#420e44
org.apache.qpid.client.AMQSession_0_10#d76237
org.apache.qpid.client.AMQSession_0_10#d76237
org.apache.qpid.client.AMQSession_0_10#7148e9
Now I understand hashcode() is not guaranteed to be unique, how can I prove or disprove that createSession() returns the same session object on two separate threads?
Turned out to be more of a Java object equivalency question rather than anything to do with qpid or messaging.
Instead of printing hashcodes, I inserted the Session objects themselves into a Vector<Session> and compared them (==). Turns out they were all unique across all threads.
Related
I am trying to schedule my consumption process from a single partition topic. I can start it using endpointlistenerregistry.start() but I want to stop it after I have consumed all the messages in current partition i.e. when I reach to last offset in current partition. Production into the topic is done after I have finished the consumption and close it. How should I achieve the assurance that I have read all the messages till the time I started scheduler and stop my consumer ? I am using #Kafkalistener for consumer.
Set the idleEventInterval container property and add an #EventListener method to listen for ListenerContainerIdleEvents.
Then stop the container.
To read till the last offset, you simply poll till you are getting empty records.
You can invoke kafkaConsumer.pause() at the end of consumption. During next schedule it is required to invoke kafkaConsumer.resume().
Suspend fetching from the requested partitions. Future calls to poll(Duration) will not return any records from these partitions until they have been resumed using resume(Collection). Note that this method does not affect partition subscription. In particular, it does not cause a group rebalance when automatic assignment is used.
Something like this,
List<TopicPartition> topicPartitions = new ArrayList<>();
void scheduleProcess() {
topicPartitions = ... // assign partition info for this
kafkaConsumer.resume(topicPartitions)
while(true) {
ConsumerRecords<String, Object> events = kafkaConsumer.poll(Duration.ofMillis(1000));
if(!events.isEmpty()) {
// processing logic
} else {
kafkaConsumer.pause(List.of(topicPartition));
break;
}
}
}
In Spring Batch, how to loop the reader,processor and writer for N times?
My requirement is:
I have "N" no of. customers/clients.
For each customer/client, I need to fetch the records from database (Reader), then I have to process (Processor) all records for the customer/client and then I have to write the records into a file (Writer).
How to loop the spring batch job for N times?
AFAIK I'm afraid there's no framework support for this scenario. Not at least the way you want to solve it.
I'd suggest to solve the problem differently:
Option 1
Read/Process/Write all records from all customers at once.You can only do this if they are all in the same DB. I would not recommend it otherwise, because you'll have to configure JTA/XA transactions and it's not worth the trouble.
Option 2
Run your job once for each client (best option in my opinion). Save necessary info of each client in different properties files (db data connections, values to filter records by client, whatever other data you may need specific to a client) and pass through a param to the job with the client it has to use. This way you can control which client is processed and when using bash files and/or cron. If you use Spring Boot + Spring Batch you can store the client configuration in profiles (application-clientX.properties) and run the process like:
$> java -Dspring.profiles.active="clientX" \
-jar "yourBatch-1.0.0-SNAPSHOT.jar" \
-next
Bonus - Option 3
If none of the abobe fits your needs or you insist in solving the problem they way you presented, then you can dynamically configure the job depending on parameters and creating one step for each client using JavaConf:
#Bean
public Job job(){
JobBuilder jb = jobBuilders.get("job");
for(Client c : clientsToProcess) {
jb.flow(buildStepByClient(c));
};
return jb.build();
}
Again, I strongly advise you not to go this way: ugly, against framework philosophy, hard to maintain, debug, you'll probably have to also use JTA/XA here, ...
I hope I've been of any help!
Local Partitioning will solve your problem.
In your partitioner, you will put all of your clients Ids in map as shown below ( just pseudo code ) ,
public class PartitionByClient implements Partitioner {
#Override
public Map<String, ExecutionContext> partition(int gridSize) {
Map<String, ExecutionContext> result = new HashMap<>();
int partitionNumber = 1;
for (String client: allClients) {
ExecutionContext value = new ExecutionContext();
value.putString("client", client);
result.put("Client [" + client+ "] : THREAD " + partitionNumber, value);
partitionNumber++;
}
}
return result;
}
}
This is just a pseudo code. You have to look to detailed documentation of partitioning.
You will have to mark your reader , processor and writer in #StepScope ( i.e. which ever part needs the value of your client ). Reader will use this client in WHERE clause of SQL. You will use #Value("#{stepExecutionContext[client]}") String client in reader etc definition to inject this value.
Now final piece , you will need a task executor and clients equal to concurrencyLimit will start in parallel provided you set this task executor in your master partitioner step configuration.
#Bean
public TaskExecutor taskExecutor() {
SimpleAsyncTaskExecutor simpleTaskExecutor = new SimpleAsyncTaskExecutor();
simpleTaskExecutor.setConcurrencyLimit(concurrencyLimit);
return simpleTaskExecutor;
}
concurrencyLimit will be 1 if you wish only one client running at a time.
I want to read 10000 messages from Websphere MQ in groups in sequential order, i am using below code to do the same, but it is taking long time to read all the messages. Even i tried to use multi thread concepts, but sometimes 2 threads are consuming same group and race condition happening. Below is the code snippet.
I am trying to use 3 threads to read 10000 messages from MQ sequentially, but two of my threads accessing same group at time. How to avoid this ? what is best way to read large volume of messages in sequential.? My requirement is i want to read 10000 messages sequentially. Please help.
MQConnectionFactory factory = new MQConnectionFactory();
factory.setQueueManager("QM_host")
MQQueue destination = new MQQueue("default");
Connection connection = factory.createConnection();
connection.start();
Session session = connection.createSession(true, Session.AUTO_ACKNOWLEDGE);
MessageConsumer lastMessageConsumer =
session.createConsumer(destination, "JMS_IBM_Last_Msg_In_Group=TRUE");
TextMessage lastMessage = (TextMessage) lastMessageConsumer.receiveNoWait();
lastMessageConsumer.close();
if (lastMessage != null) {
int groupSize = lastMessage.getIntProperty("JMSXGroupSeq");
String groupId = lastMessage.getStringProperty("JMSXGroupID");
boolean failed = false;
for (int i = 1; (i < groupSize) && !failed; i++) {
MessageConsumer consumer = session.createConsumer(destination,
"JMSXGroupID='" + groupId + "'AND JMSXGroupSeq=" + i);
TextMessage message = (TextMessage)consumer.receiveNoWait();
if (message != null) {
System.out.println(message.getText());
} else {
failed = true;
}
consumer.close();
}
if (failed) {
session.rollback();
} else {
System.out.println(lastMessage.getText());
session.commit();
}
}
connection.close();
I think a better way would be to have a coordinator thread in your application, which would listen for the last messages of groups and for each would start a new thread to get messages belonging in the group assigned to that thread. (This would cater for the race conditions.)
Within the threads getting the messages belonging in a group, you don't need to use a for loop to get each message separately, instead you should take any message belonging in the group, while maintain a group counter and buffering out of order messages. This would be safe as long as you commit your session only after receiving and processing all messages of the group. (This would yield more performance, as each group would be processed by a separate thread, and that thread would only access every message once in MQ.)
Please see IBM's documentation on sequential retrieval of messages. In case the page moves or is changed, I'll quote the most relevant part. For sequential processing to be guaranteed, the following conditions must be met:
All the put requests were done from the same application.
All the put requests were either from the same unit of work, or all the put requests were made outside of a unit of work.
The messages all have the same priority.
The messages all have the same persistence.
For remote queuing, the configuration is such that there can only be one path from the application making the put request, through its
queue manager, through intercommunication, to the destination queue
manager and the target queue.
The messages are not put to a dead-letter queue (for example, if a queue is temporarily full).
The application getting the message does not deliberately change the order of retrieval, for example by specifying a particular MsgId
or CorrelId or by using message priorities.
Only one application is doing get operations to retrieve the messages from the destination queue. If there is more than one
application, these applications must be designed to get all the
messages in each sequence put by a sending application.
Though the page does not state this explicitly, when they say "one application" what is meant is a single thread of that one application. If an application has concurrent threads, the order of processing is not guaranteed.
Furthermore, reading 10,000 messages in a single unit of work as suggested in another response is not recommended as a means to preserve message order! Only do that if the 10,000 messages must succeed or fail as an atomic unit, which has nothing to do with whether they were received in order. In the event that large numbers of messages must be processed in a single unit of work it is absolutely necessary to tune the size of the log files, and quite possibly a few other parameters. Preserving sequence order is torture enough for any threaded async messaging transport without also introducing massive transactions that run for very long periods of time.
You can do what you want with MQ classes for Java (non-JMS) and it may be possible with MQ classes for JMS but be really tricky.
First read this page from the MQ Knowledge.
I converted the pseudo code (from the web page above) to MQ classes for Java and changed it from a browse to a destructive get.
Also, I prefer to do each group of messages under a syncpoint (assuming a reasonable sized groups).
First off, you are missing several flags for the 'options' field of GMO (GetMessageOptions) and the MatchOptions field needs to be set to 'MQMO_MATCH_MSG_SEQ_NUMBER', so that all threads will always grab the first message in the group for the first message. i.e. not grab the 2nd message in the group for the first message as you stated above.
MQGetMessageOptions gmo = new MQGetMessageOptions();
MQMessage rcvMsg = new MQMessage();
/* Get the first message in a group, or a message not in a group */
gmo.Options = CMQC.MQGMO_COMPLETE_MSG | CMQC.MQGMO_LOGICAL_ORDER | CMQC.MQGMO_ALL_MSGS_AVAILABLE | CMQC.MQGMO_WAIT | CMQC.MQGMO_SYNCPOINT;
gmo.MatchOptions = CMQC.MQMO_MATCH_MSG_SEQ_NUMBER;
rcvMsg.messageSequenceNumber = 1;
inQ.get(rcvMsg, gmo);
/* Examine first or only message */
...
gmo.Options = CMQC.MQGMO_COMPLETE_MSG | CMQC.MQGMO_LOGICAL_ORDER | CMQC.MQGMO_SYNCPOINT;
do while ((rcvMsg.messageFlags & CMQC.MQMF_MSG_IN_GROUP) == CMQC.MQMF_MSG_IN_GROUP)
{
rcvMsg.clearMessage();
inQ.get(rcvMsg, gmo);
/* Examine each remaining message in the group */
...
}
qMgr.commit();
Why isn't the exception triggered? Linq's "Any()" is not considering the new entries?
MyContext db = new MyContext();
foreach (string email in {"asdf#gmail.com", "asdf#gmail.com"})
{
Person person = new Person();
person.Email = email;
if (db.Persons.Any(p => p.Email.Equals(email))
{
throw new Exception("Email already used!");
}
db.Persons.Add(person);
}
db.SaveChanges()
Shouldn't the exception be triggered on the second iteration?
The previous code is adapted for the question, but the real scenario is the following:
I receive an excel of persons and I iterate over it adding every row as a person to db.Persons, checking their emails aren't already used in the db. The problem is when there are repeated emails in the worksheet itself (two rows with the same email)
Yes - queries (by design) are only computed against the data source. If you want to query in-memory items you can also query the Local store:
if (db.Persons.Any(p => p.Email.Equals(email) ||
db.Persons.Local.Any(p => p.Email.Equals(email) )
However - since YOU are in control of what's added to the store wouldn't it make sense to check for duplicates in your code instead of in EF? Or is this just a contrived example?
Also, throwing an exception for an already existing item seems like a poor design as well - exceptions can be expensive, and if the client does not know to catch them (and in this case compare the message of the exception) they can cause the entire program to terminate unexpectedly.
A call to db.Persons will always trigger a database query, but those new Persons are not yet persisted to the database.
I imagine if you look at the data in debug, you'll see that the new person isn't there on the second iteration. If you were to set MyContext db = new MyContext() again, it would be, but you wouldn't do that in a real situation.
What is the actual use case you need to solve? This example doesn't seem like it would happen in a real situation.
If you're comparing against the db, your code should work. If you need to prevent dups being entered, it should happen elsewhere - on the client or checking the C# collection before you start writing it to the db.
GORM works fine out of the box as long as there is no batch with more than 10.000 objects. Without optimisation you will face the outOfMemory problems.
The common solution is to flush() and clear() the session each n (e.g.n=500) objects:
Session session = sessionFactory.currentSession
Transaction tx = session.beginTransaction();
def propertyInstanceMap = org.codehaus.groovy.grails.plugins.DomainClassGrailsPlugin.PROPERTY_INSTANCE_MAP
Date yesterday = new Date() - 1
Criteria c = session.createCriteria(Foo.class)
c.add(Restrictions.lt('lastUpdated',yesterday))
ScrollableResults rawObjects = c.scroll(ScrollMode.FORWARD_ONLY)
int count=0;
while ( rawObjects.next() ) {
def rawOject = rawObjects.get(0);
fooService.doSomething()
int batchSize = 500
if ( ++count % batchSize == 0 ) {
//flush a batch of updates and release memory:
try{
session.flush();
}catch(Exception e){
log.error(session)
log.error(" error: " + e.message)
throw e
}
session.clear();
propertyInstanceMap.get().clear()
}
}
session.flush()
session.clear()
tx.commit()
But there are some problems I can't solve:
If I use currentSession, then the controller fails because of session is empty
If I use sessionFactory.openSession(), then the currentSession is still used inside FooService. Of cause I can use the session.save(object) notation. But this means, that I have to modify fooService.doSomething() and duplicate code for single operation (common grails notation like fooObject.save() ) and batch operation (session.save(fooObject() ).. notation).
If I use Foo.withSession{session->} or Foo.withNewSession{session->}, then the objects of Foo Class are cleared by session.clear() as expected. All the other objects are not cleared(), what leads to memory leak.
Of cause I can use evict(object) to manualy clear the session. But it is nearly impossible to get all relevant objects, due to autofetching of assosiations.
So I have no idea how to solve my problems without making the FooService.doSomething() more complex. I'm looking for something like withSession{} for all domains. Or to save session at the begin (Session tmp = currentSession) and do something like sessionFactory.setCurrentSession(tmp). Both doesn't exists!
Any idea is wellcome!
I would recommend to use stateless session for this kind of batch processing. See this post: Using StatelessSession for Batch processing
A modified approach to what you are doing would be:
Loop over your entire collection (rawObjects) and save a list of all the ids for those objects.
Loop over the list of ids. At each iteration, look up just that single object, by its id.
Then use the same periodic clearing of the session cache like you are doing now.
By the way, someone else has suggested an approach similar to yours. But note that the code in this link is incorrect; the lines that clear the session should be inside the if statement, just like you have in your solution.