Implement cache in TIBCO BW
I need to implement cache/memory in TIBCO BW.
Contents of this cache should be available across BW projects.
What I am trying to do is, when I receive a message containing multiple shipment records - (Shipment and delivery No is unique combination)
I need to first check in cache if any of these records exist.
If yes - reject whole XML
If not, then push this data into cache/memory.
Once this is done then I need to call SOAP request reply to external system.
In Another project, when Acknowledgement is received from external system, I need to check the records in the message, find out those records in cache and delete them.
Is there any way to do this?
Challenge here is there is no unique key for a whole message. Each record with combination of shipment/delivery is unique.
Here is what I tried and challenge in it:
1) I thought of putting the data in a file and naming the file as the requestID/Key for each message.
Then in another project, check the file and delete it
But since we dont have key, I cannot do that.
2) Using shared variables:
I believe shared variables will not be available across bw projects. So, this option is out
3) third option is to use EMS queue, park the message temporarily there containing records.
Then search in this, and if the records match reject the request.
And, in acknowledgement (another project), search for the records in ems message and delete that particular message.
Any help on this would be appreciated.
thanks
Why don't you use database or file to store the records ?
Because when you stop or restart or a problem occured in the appnode, the cache will be erased and you will not be able to retreive the records not treated.
4th option is using tibco activity 'Java Global instance' for implementing cache in Java.
"The Java Global Instance shared configuration resource allows you to
specify a Java object that can be shared across all process instances
in a Java Virtual Machine (JVM). When the process engine is started,
an instance of the specified Java class is constructed."
You can google ton's of cache implementation in Java for example https://crunchify.com/how-to-create-a-simple-in-memory-cache-in-java-lightweight-cache/
Regarding your statement:
"Challenge here is there is no unique key for a whole message. Each record with combination of shipment/delivery is unique"
So, you do have unique key - combination of shipment/delivery is unique.
If record size is not too huge you can use the record as key "as is" or create unique hash for each record and use it as key if key size is an issue.
Related
I've already read official documentation and find no way.
My datas to es are from kafka which sometimes can be out of order. In the past, message from kafka is parsed and directly insert or update ES doc with specific ID. To avoid the older data override the newer data, I have to check whether the doc with specific ID is already exists and some properties of this doc are meet the conditions. Then I do the UPDATE action(or INSERT).
What I'm doing now is 'search before update'.
Before updating a doc, I search from ES with specific ID(included in kafka msg). Then check if this doc meets the conditions(for example, whether update_time is older?). Lastly I update the doc. And I set refresh to true to update index instantly.
What I'm worried about?
It seems Transactional.
If there is only one Thread executing synchronously, is it possible that When I process next message the doc updated in last message process is not refresh at ES?
If I have several Threads consuming kafka message, how to check before update? Can I use script to solve this problem?
If there is only one Thread executing synchronously, is it possible that When I process next message the doc updated in last message process is not refresh at ES?
That is a possibility since indexes are refreshed once in every second (by default), reducing this value is neither recommended nor guaranteed to give you the desired result since Elasticsearch is NOT designed for this.
If I have several Threads consuming kafka message, how to check before update? Can I use script to solve this problem?
You can use script if the number of fields being updated are very limited. Personally I have found script to be best suited for single field update and that too for corner use cases, it should not be used as a general practice. Any more than that and you are running into the same risk as that with stored procedures in the RDBMS world. It makes data management volatile overall and a system which is harder to maintain/extend in the longer run.
Your use case is best suited for optimistic locking support available from Elasticsearch out of the box. Take a look at Elasticsearch Versioning Support for full details.
You can very well use the inbuilt doc version if concurrency is the only problem that you need to solve. If, however, you need more than concurrency (out of order message delivery and respective ES updates) then you should use your application/domain specific field as the inbuilt version wouldn't work as-is.
You can very well use any of the app specific (numeric) field as a version field and use it for optimistic locking during document updates. If you use this approach, please pay special attention to all insert, update, delete operations for that index. Quoting AS-IS from versioning support - when using external versioning, make sure you always add the current version (and version_type) to any index, update or delete calls. If you forget, Elasticsearch will use it's internal system to process that request, which will cause the version to be incremented erroneously
I'll recommend you evaluate the inbuilt version first and use it if it fulfills your needs. It'll make the overall design much simpler. Consider the app specific version as the second option if the inbuilt version does not meet your requirements.
If there is only one Thread executing synchronously, is it possible that When I process next message the doc updated in last message process is not refresh at ES?
Ad 1. It is possible to save data in ElasticSearch and in a short while after receive stale result (before the index is updated)
If I have several Threads consuming kafka message, how to check before update? Can I use script to solve this problem?
Ad 2. If you process Kafka messages in several threads, it would be the best to use business data (eg. some business ids) as partition keys in Kafka to ensure data is processed in order. Remember to use Kafka to consume messages in many threads and don't consume messages by single consumer to fan out later to multiple threads.
It seems it would be best to ensure data is processed in order and then drop checking in Elasticsearch since it is not guaranteed to give valid results.
I'm currently new to Talend and I'm learning through videos and documentation, so I'm just not sure how to approach/implement this with best practices.
Goal
Integrate Magento and Quick Book using Talend.
My thoughts
Initially my first thought was I will setup direct DB connection for Magento and will take relevant data which I need and will process it and will send to QuickBook using REST API's(specifically bulk API's in batch)
But then again I thought it would be little hectic for me to query Magento database(multiple joins) so I've another option to use Magento's REST API.
But as I'm not much familiar with the tool I'm struggling little to find best suitable approach, so any help is appreciated.
What I've done till now?
I've saved my auth(for QB) and db(Magento) credentials data in file and using tFileInputDelimited and tContextLoad, I'm storing them in context variables so they can be accessible globally.
I've successfully configured database connection and dbinput but I've not used metadata for connection(should I use that and if Yes how can I pass dynamic values there?). I've used my context variables data in db connection settings.
I've taken relevant fields for now but if I want multiple fields simple query is not enough as Magento stores data in multiple tables for Customer etc but it's not big deal I know but I think it might increase my work.
For now that's what I've built and my next step is send the data to QB using REST while getting access_token and saving it to context variable and again storing the QB reference into Magento DB.
Also I've decided to use QB bulk API's but I'm not sure how I can process data in chunks in Talend(I tried to check multiple resources but no luck) i.e. if the Magento is returning 500 rows I want to process them in chunks of 30 as QB batch max limit is 30, so I will be sending it using REST to QB and as I said I also want to store back QB reference ID in magento(so I can update it later).
Also this all will be on local, then how can I do same in production? how I can maintain development and production environment?
Resources I'm referring
For REST and Auth best practices - https://community.talend.com/t5/How-Tos-and-Best-Practices/Using-OAuth-2-0-with-Talend-to-Access-Goo...
Nice example for batch processing here:
https://community.talend.com/t5/Design-and-Development/Batch-processing-in-talend-job/td-p/51952
Redirect your input to a tFileOutputDelimited.
Enter the output filename, tick the option "Split output in several files" from the "Advanced settings" and enter the value of 1000 into the field "Rows in each output file". This will create n files based on the filename with 1000 in each.
On the next subjob, use a tFileList to iterate over this file list to get records from each file.
I am using elastic package for golang. I want to use its BulkProcessor for sending huge amount of documents in background. As per shown in wiki, I could create a processor. But I don't want to create it every time I send the documents. I want to know if there is processor service exists in the connection and pass data to existing processor rather than creating new. How can I achieve it?
Register the bulk processor separately from sending the documents. The bulk processor only lives as long as your process, so to ensure you only create it once, create it when your process starts. Then elsewhere in your application, you can send documents whenever you need to.
Alternatively, if you must do it on-demand, you can use a sync.Once to ensure the creation only happens once.
I'm trying to initialize my data in my Azure Data Tables but I only want this to happen once on the server at startup (i.e. via the WebRole Role Entry OnStart routine). The problem is if I have multiple instances starting up at the same time then potentially either one of those instances can add records to the same table at the same time hence duplicating the data at runtime.
Is there there like an overarching routine for all instances? An application object in which I can shove a value into and check it in each of the instances to see if the tables have been created or not? A singleton of some sort that azure exposes?
Cheers
Rob
No, but you could use a Blob lease as a mutex. You could also use a table lock in SQL Azure, if you're using that.
You could also use a Queue, and drop a message in there and then just one role would pick up the message and process it.
You could create a new single instance role that does this job on role start.
To be really paranoid about this and address the event of failure in the middle of writing the data, you can do something even more complex.
A queue message is a great way to ensure transactional capabilities as long as the work you are doing can be idempotent.
Each instance adds a message to a queue.
Each instance polls the queue and on receiving a message
Reads the locking row from the table.
If the ‘create data state’ value is ‘unclaimed’
Attempts to update the row with a ‘in process’ value and a timeout expiration timestamp based on the amount of time needed to create the data.
if the update is successful, the instance owns the task of creating the data
So create the data
update the ‘create data state’ to ‘committed’
delete the message
else if the update is unsuccessful the instance does not own the task
so just delete the message.
Else if the ‘create data’ value is ‘in process’, check if the current time is past the expiration timestamp.
That would imply that the ‘in process’ failed
So try all over again to set the state to ‘in process’, delete the incomplete written rows
And try recreating the data, updating the state and deleting the message
Else if the ‘create data’ value is ‘committed’
Just delete the queue message, since the work has been done
I have a fairly simple domain model involving a list of Facility aggregate roots. Given that I'm using CQRS and an event-bus to handle events raised from the domain, how could you handle validation on sets? For example, say I have the following requirement:
Facility's must have a unique name.
Since I'm using an eventually consistent database on the query side, the data in it is not guaranteed to be accurate at the time the event processesor processes the event.
For example, a FacilityCreatedEvent is in the query database event processing queue waiting to be processed and written into the database. A new CreateFacilityCommand is sent to the domain to be processed. The domain services query the read database to see if there are any other Facility's registered already with that name, but returns false because the CreateNewFacilityEvent has not yet been processed and written to the store. The new CreateFacilityCommand will now succeed and throw up another FacilityCreatedEvent which would blow up when the event processor tries to write it into the database and finds that another Facility already exists with that name.
The solution I went with was to add a System aggregate root that could maintain a list of the current Facility names. When creating a new Facility, I use the System aggregate (only one System as a global object / singleton) as a factory for it. If the given facility name already exists, then it will throw a validation error.
This keeps the validation constraints within the domain and does not rely on the eventually consistent query store.
Three approaches are outlined in Eventual Consistency and Set Validation:
If the problem is rare or not important, deal with it administratively, possibly by sending a notification to an admin.
Dispatch a DuplicateFacilityNameDetected event, which could kick off an automated resolution process.
Maintain a Service that knows about used Facility names, maybe by listening to domain events and maintaining a persistent list of names. Before creating any new Facility, check with this service first.
Also see this related question: Uniqueness validation when using CQRS and Event sourcing
In this case, you may implement a simple CRUD style service that basically does an insert in a Sql table with a primary key constraint.
The insert will only happen once. When duplicate commands with the same value that should only exist one time hits the aggregate, the aggregate calls the service, the service fails the Insert operation due to a violation of the Primary Key constraint, throws an error, the whole process fails and no events are generated, no reporting in the query side, maybe a reporting of the failure in a table for eventual consistency checking where the user can query to know the status of the command processing. To check that, just query again and again the Command Status View Model with the Command Guid.
Obviously, when the command holds a value that does not exists in the table for primary key checking, the operation is a success.
The table of the primary key constraint should be only be used as a service, but, because you implemented Event sourcing, you can replay the events to rebuild the table of primary key constraint.
Because uniqueness check would be done before data writing, so the better method is to build a event-tracking service, which would send a notification when the process finished or terminated.