How do I define rules for Wasabi Azure scaling block in my code? - windows

I think of using Wasabi block for auto-scaling my Azure application. Looks like the rules have to be hardcoded in an XML file. This bothers me, because the rules I want for my application require a rather complex metric that I will have to compute inside my code.
Just as an insane example, suppose my application generates a stream of random numbers - zeroes and ones - and each instance computes the number of "ones" in row and number of "zeroes" in row. I want to scale up when any instance encounters ten or more successive "ones" and scale down when any instance encounters ten or more successive "zeroes".
I can detect such situations in my code no problem, but how do I make Wasabi react to them and scale the application?

To achieve this you need to implement a CustomOperand and an associated Custom DataCollector.
http://msdn.microsoft.com/en-us/library/hh680912(v=pandp.50).aspx
There is an example of this in the TailSpin sample application. I would start by looking at the ActiveSurveysDataPointsCollector class and working your way back up from there (the custom operand consumes an IDataPointCollector instance and the operand is then referenced, like all other operands, from the rules XML.
You'll implement the method public IEnumerable Collect(DateTimeOffset collectionTime) and this is where you'll want to look at your stream of bits or some other flag which is set by your stream of bits creator. There is no way to signal into Wasabi in a synchronous fashion; you will always be having the Collect method execute and retrieve that information from your app (or calculate it then and there).

Related

AnyLogic: Same random value for every simulation run?

I want to simulate a discrete-event simulation which contains 6 processes as delays. At model startup I want to initialize the delay times for every delay/process station.
I have written a java class "Prozess2" and every "Prozess2" object contains 6 CustomDistributions. At object initialization of "Prozess2" I draw one random value for any of them. In the end, I aggregate the 6 random values to the delay time.
Therefore, I always want to get other random values at startup for any delay time. However, when I run the simulation over and over again, I always get the same aggregated delay time by Math.round(time()).
In the constructor of "Prozess2" I hand over a RandomNumberGenerator called rng, which lies on the main agent and an instance of the main agent:
Where is my mistake?
In my opinion, it is best practice to explicitly supply your model with a random seed as one of the parameters and then use this seed to generate a random variable for each and every process that has randomness. That way you can be assured that if you change one part of the model, either through logic it input, that the other parts will still have the same random stream as they had before.
See example below:
And then you can use it
So in your case pass the seed to the model and use it instead of the getDefaualtRandomGenerator()
You can also see this question for a similar problem to do with randomness
Why do two flowcharts set up exactly the same end with different results every time the simulation is run even when I use a fixed seed?
AnyLogic will generate same random number if you manually run the exactly same model multiple times. The numbers will change in case if you add new blocks, or delete some of the existing ones in your model. To my understanding, they use the data in your model (blocks, variables, parameters, etc.) to add noise into the random number generator. That being said, the correct way of having different outputs for different runs is to use the Parameters Variation experiment as below:
Right click on your model name in the AnyLogic window, select New, then Experiment. You will see this screen. Select Parameters Variation:
On the Properties window, you have three options for Randomness. By selecting the first option, you will have different random numbers for different runs.

several questions about multi-paxos?

I have several questions about multi-paxos
will each instance has it's own proposal Number and accepted ballot and accepted value ? or all the instance share with the same
proposal number ,after one is finished ,then anther one start?
if all the instance share with the same proposal number ,Consider the below condition, server A sends a proposal ,and the acceptor returns the accepted instanceId which might be greater or less than the proposal'instanceid ,then what will proposal do? use that instanceId and it's value for accept phase? then increase it'own instanceId ,waiting for next round ,then re-proposal with it own value? if so , when is the previous accepted value removed,because if it's not removed ,the acceptor will return this intanceId and value again,then it seems it is a loop
Multi-Paxos has a vague description so two persons may build two different systems based on it and in a context of one system the answer is "no," and in the context of another it's "yes."
Approach #1 - "No"
Mindset: Paxos is a two-phase protocol for building write-once registers. Multi-Paxos is a technique how to create a log on top of them.
One of the possible ways to build a log is
Create an array of completely independent write-once registers and initialize the first one with an initial value.
On new record we should:
A) Guess an index (X) of a vacant register and try to write a dummy record here (if it's already used then pick a register with a higher index and retry).
B) Start writing dummy records to every register with smaller than X index until we find a register filled with a non-dummy record.
C) Calculate a new record based on it (e.g., a record may have an ordinal, and we can use it to calculate an ordinal of the new record; since some registers are filled with dummy records the ordinals aren't equal to index) and write it to the X+1 register. In case of a conflict, we should restart the procedure from step A).
To read the log we should start writing dummy values from the first record, and on each conflict, we should increment index and retry until the write is succeeded which would indicate that the log's end is reached.
Of course, there is a lot of overhead in this approach, so please treat it just like a top-level overview what Multi-Paxos is.
The log is a powerful concept, and we can use it as a recipe for building distributed state machines - just think of each record as an update command. Unfortunately, in some cases, there is also a lot of overhead. For example, if you want to build a key/value storage and you care only about the current value than you don't need history and probably need to implement garbage collection to remove past versions from the log to optimize storage costs.
Approach #2 - "Yes"
Mindset: rewritable register as a heavily optimized version of Multi-Paxos.
If you start with the described approach with an application to the creation of key/value storage and then iterate in other to get rid of overhead, e.g., by doing garbage collection on the fly then eventually you may come up with an idea how to update the write-once register to be rewritable.
In that case, each instance uses the same ballot numbers just because all the instances are collapsed into one rewritable instance.
I described this approach in the How Paxos Works post and implemented it in the Gryadka project with 500-lines of JavaScript. Also, the idea behind it was independently checked with TLA+ by Greg Rogers and Tobias Schottdorf.

Forcing map() over Java 8 Stream ()

I'm confused on this situation:
I've a Producer which produces an undetermined number of items from an underlining iterator, possibly a large number of them.
Each item must be mapped to a different interface (eg, wrapper, JavaBean from JSON structure).
So, I'm thinking that it would be good for Producer to return a stream, it's easier to write code that convert Iterator to Stream (using Spliterators and StreamSupport.stream()), then apply Stream.map() and return the final stream.
The problem is I have an invoker that does nothing with the resulting stream, eg, a unit test, yet I still want the mapping code to be invoked for every item. At the moment I'm simply calling Stream.count() from the invoker to force that.
Questions are:
Am I doing it wrong? Should I use different interfaces? Note that I think implementing next()/hasNext() for Iterator is cumbersome, mainly because it forces you to create a new class (even if it can be anonymous) and keep a pointer and check it. Same for collection views, returning a collection that is created and not a dynamic view over the underlining iterator is out of question (the input data set might be very large). The only alternative I like so far is a Java implementation of yield(). Neither do I want the stream to be consumed inside Producer (ie, forEach()), since some other invoker might want it to perform some real operation.
Is there a better best practice to force the stream processing?

Choosing Redis datatypes for advanced data manipulation in a simple torrent tracker service

I need your advice on Redis datatypes for my project. The project is a torrent-tracker (ruby, simple sinatra-based) with pure in-memory data store for current information about peers. I feel like this is what Redis is made for. But I'm stuck at choosing proper data types for this. For now I tend to the following setup:
Use list for seeders. Actually I'd better need a ring buffer to get a sequential range of seeders (with given size and start position) and save new start position for the next time.
Use sorted set for leechers. Score for each leecher is downloaded/(downloaded+left) so I can also extract a range for any specific case.
All string values in set and list are string (bencoded) representation of peer data.
What I actually lack in the setup above is:
Necessity to store offset for seeders so data access needs synchronization.
Unknown method of finding a specific seeder in list. Here I may benefit from set but then I won't be able to extract a range of items at once.
(General problem) Need TTL for set/list members (if client shuts down without sending any data before this). Possible option is to make each peer an ordinary string key/value (string or hash), give it TTL, subscribe on destroy and delete it in corresponding list or set.
What could you suggest? Any practical advice?

How would you implement a Workflow system?

I need to implement a Workflow system.
For example, to export some data, I need to:
Use an XSLT processor to transform an XML file
Use the resulting transformation to convert into an arbitrary data structure
Use the resulting (file or data) and generate an archive
Move the archive into a given folder.
I started to create two types of class, Workflow, which is responsible of adding new Step object and run it.
Each Steps implement a StepInterface.
My main concerns is all my steps are dependent to the previous one (except the first), and I'm wondering what would be the best way to handle such problems.
I though of looping over each steps and providing each steps the result of the previous (if any), but I'm not really happy with it.
Another idea would have been to allow a "previous" Step to be set into a Step, like :
$s = new Step();
$s->setPreviousStep(Step $step);
But I lose the utility of a Workflow class.
Any ideas, advices?
By the way, I'm also concerned about success or failure of the whole workflow, it means that if any steps fail I need to rollback or clean the previous data.
I've implemented a similar workflow engine a last year (closed source though - so no code that I can share). Here's a few ideas based on that experience:
StepInterface - can do what you're doing right now - abstract a single step.
Additionally, provide a rollback capability but I think a step should know when it fails and clean up before proceeding further. An abstract step can handle this for you (template method)
You might want to consider branching based on the StepResult - so you could do a StepMatcher that takes a stepResult object and a conditional - its sub-steps are executed only if the conditional returns true.
You could also do a StepException to handle exceptional flows if a step errors out. Ideally, this is something that you can define either at a workflow level (do this if any step fails) and/or at a step level.
I'd taken the approach that a step returns a well defined structure (StepResult) that's available to the next step. If there's bulky data (say a large file etc), then the URI/locator to the resource is passed in the StepResult.
Your workflow is going to need a context to work with - in the example you quote, this would be the name of the file, the location of the archive and so on - so think of a WorkflowContext
Additional thoughts
You might want to consider the following too - if this is something that you're planning to implement as a large scale service/server:
Steps could be in libraries that were dynamically loaded
Workflow definition in an XML/JSON file - again, dynamically reloaded when edited.
Remote invocation and call back - submit job to remote service with a callback API. when the remote service calls back, the workflow execution is picked up at the subsequent step in the flow.
Parallel execution where possible etc.
stateless design
Rolling back can be fit into this structure easily, as each Step will implement its own rollback() method, which the workflow can call (in reverse order preferably) if any of the steps fail.
As for the main question, it really depends on how sophisticated do you want to get. On a basic level, you can define a StepResult interface, which is returned by each step and passed on to the next one. The obvious problem with this approach is that each step should "know" which implementation of StepResult to expect. For small systems this may be acceptable, for larger systems you'd probably need some kind of configurable mapping framework that can be told how to convert the result of the previous step into the input of the next one. So Workflow will call Step, Step returns StepResult, Workflow then calls StepResultConverter (which is your configurable mapping thingy), StepResultConverter returns a StepInput, Workflow then calls the next Step with StepInput and so on.
I've had great success implementing workflow using a finite state machine. It can be as simple or complicated as you like, with multiple workflows linking to each other. Generally an FSM can be implemented as a simple table where the current state of a given object is tracked in a history table by keeping a journal of the transitions on the object and simply retrieving the last entry. So a transition would be of the form:
nextState = TransLookup(currState, Event, [Condition])
If you are implementing a front end you can use this transition information to construct a list of the events available to a given object in its current state.

Resources