Add element with specified priority - data-structures

I need a data structure like a Priority Queue that allows me to enter an element
by specifying a priority value.
(Ex: queue.add( "Element", 12); where 12 is the priority of value.
If doesn't exist a similar method, can you suggest me the most efficient way to do this?
Thanks

Related

IBM MQ copy every 5th message to another queue

I have a queue with a huge message throughput. I would like to create new queue for lower environments. This new queue shouldn't be a 1-to-1 copy since it is going to cost too much. I would like to copy every nth (e.g. 5th) message to the copied queue. Can this be done?
There is the new feature called “streaming queues” introduced with MQ V. 9.2.3 / 9.3.0. It allows you to let each message which is put to a specific queue duplicated to another queue (the stream queue). To configure it you would need to set two new parameters of your original target queue: STREAMQ( ) to specify the stream queue and STRMQOS( ) to decide for the quality of service (refer to the doc).
Though, to achieve your requirement (“every nth message”), your application which processes the messages of the stream queue would need to only work with the data of every nth message and delete the rest, if you really want to process only a subset of them.
I know this is not a perfect answer to your question, as this solution comes with redundant queuing of messages you don’t want, but I am not aware of any other out-of-the-box solution.

Where should rear point in queue data structure

Where should rear point in a queue:
Place where new element WILL BE inserted.
Place where last element of the queue resides.
According to my research, I got both above cases as answers.
I'd say go with the TailPointer pointing towards the last element that was added instead of the empty slot where you'd add the new element. I have a few reasons for that:
To get the last element you can directly get the value at TailPointer which is more like the name. Instead of going with a TailPointer - 1.
In case you have an Array as a backing DataStore for your Queue, it'll be natural to check tailPointer == dataStore.Length - 1 (since 0 based indexing is most common)
Also you would be wrapping your data to initial indexes (the one's before the Head Pointer) in case you DeQueue your data. (refer this and this)
If no data is in the Queue, you can simply set the TailPointer to -1.

Spring Integration: Preserving ordering in an Aggregator

Right now I have a use case where I have a stream of events coming in. There are a few splitters and then finally downstream an aggregator. As the stream is never ending and with the number of splitters we are unable to calculate the total number of messages we expect. However we are using a simple SpeL release strategy expression :
<release-strategy-expression="size() == 10"/>
We are using a group-timeout and also have set send-partial-result-on-expiry=true.
Given this use-case am I right in concluding that there is no in-built way to preserve the original ordering of the stream of events ?
I have tried using a SequenceSizeReleaseStrategy with releasePartialSequences set to true.
What I've observed is that this is sending each message as a separate group as it relies on the group-size header which is defaulted to zero.
Am I missing out on anything ? Is there a way to preserve the ordering in the aggregator given this use-case ?
For that purpose there is an EI pattern resequencer: https://docs.spring.io/spring-integration/docs/5.3.0.M4/reference/html/message-routing.html#resequencer.
So, you place it just before an aggregator and when that aggregator releases a group, messages are going to be in the result list in the sequence order.
The resequencer also can release partial groups if all the gaps in sequence are fulfilled.

BaseStatefulBolt (Storm Core) vs StateFactory (Storm Trident)

i am confused about using storm. I am going to measure status of data source using its streamed data. Status will be calculated with combine of some fields, and these field can be achieved different time interval. That's why i need to save fields to measure status of data source.
Can i use BaseStatefulBolt? Or the only solution is trident for this cenario?
What is the difference btw them. Because there is a statefactory inside trident too.
Thank you.
I think the difference is trident is higher level than BaseStatefulBol, it has some options for counting like group by,persistentAggregate,aggregate .
I have used trident for counting total view per user. If we only care about current total count, I think we can use trident by using MemoryMapState.Factory() and class implement action for counting or summing.
In your case you need to managing status of some current fields , I think implement BaseStatefulBolt is a good choice, it has KeyValueState for save current state.

How to specify priority attributes for individual flowfiles?

I need to use PrioritizeAttributePrioritizer in NiFi.
i have observed that prioritizers in below reference.
https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#settings
if i receive 10 flowfiles then i need to set the priority value for every flow file to be unique.
After that specify queue configuration must be PrioritizeAttributePrioritizer.
Then processing flowfiles based on priority value.
How can i set priority value for seperate flow files or which prioritizer in Nifi to be work for my case?
The PriorityAttributePrioritizer prioritizes flow files by looking for a flow file attribute named "priority" and sorting the flow files lexicographically based on the value of the priority.
You can set the priority attribute using an UpdateAttribute processor. For example, if you had three logical data feeds, and feed #1 was most important, feed #2 was second most important, and feed #3 was third, then you could use three UpdateAttribute processors to set the priority attribute to 1, 2, and 3, then use a funnel to converge them all.
You would set the PriorityAttributePrioritizer on the queue between the funnel and the next processor, and at this point any time a flow file with priority=1 hits the queue, it will always be processed before any flow files with priority=2 and priority=3.
Determining how to set the priority really depends on your data. It is usually based on something about the data, like a field from each flow file that is extracted to an attribute to tell it the priority, or just knowing that everything that comes from source #1 is higher priority than what comes from source #2. Setting randomly unique priorities doesn't really make sense because you don't even know what you are prioritizing on then.
If the files are named after the time they have been generated (e.g. file_2017-03-03T010101.csv), have you considered using UpdateAttributes to parse the filename into a date, that date into Epoch (which happens to be an increasing number) as a first level index / prioritizer?
This way you could have:
GetFile (single thread) -- Connector with FIFO --> UpdateAttribute (adding Epoch from filename date) -- Connector with PriorityAttributePrioritizer --> rest of your flow
Assuming the file name is file_2017-03-03T010101.csv, the expression language would be something like:
${filename:toDate("'file_'yyyy-MM-dd'T'HHmmss'.csv'", "UTC"):toNumber()}
The PriorityAttributePrioritizer prioritizes flow files by looking for a flow file attribute named "priority" .I had file name appended with date ,so I added execute script and called groovy script to extract date from file name .Then these dates are sorted and flowfiles are iterated ,based on date sorting priority is incremented & added as flowfile attribute 'priority'.
Example :
Fileone : priority 1
Filetwo : priority 2
Nififlow :
Get file -> execute script (groovy-sort files,add priority attr)->change queue priority to PriorityAttributePrioritizer.
Above configuration will process priority 1 file first and then further file processing will be done respectively.

Resources