Will a Java 8 Stream.forEach( x -> {} ); do anything? - java-8

I am controlling the Consumer that get's to this forEach so it may or may not be asked to perform an action.
list.parallelStream().forEach( x-> {} );
Streams being lazy Streams won't iterate, right? Nothing will happen is what i expect. Tell me if i am wrong, please.

It will traverse the whole stream, submitting tasks to fork-join pool, splitting the list to parts and passing all the list elements to this empty lambda. Currently it's impossible to check in runtime whether the lambda expression is empty or not, thus it cannot be optimized out.
Similar problem appears in using Collector. All collectors have the finisher operation, but in many cases it's an identity function like x -> x. In this case sometimes the code which uses collectors can be greatly optimized, but you cannot robustly detect whether the supplied lambda is identity or not. To solve this an additional collector characteristic called IDENTITY_FINISH was introduced instead. Were it possible to robustly detect whether supplied lambda is identity function, this characteristic would be unnecessary.
Also look at JDK-8067971 discussion. This proposes creating static constants like Predicate.TRUE (always true) or Predicate.FALSE (always false) to optimize operations like Stream.filter. For example, if Predicate.TRUE is supplied, then filtering step can be removed, and if Predicate.FALSE is supplied, then stream can be replaced with empty stream at this point. Again were it possible to detect in runtime that the supplied predicate is always true, then it would be unnecessary to create such constants.

Related

several questions about multi-paxos?

I have several questions about multi-paxos
will each instance has it's own proposal Number and accepted ballot and accepted value ? or all the instance share with the same
proposal number ,after one is finished ,then anther one start?
if all the instance share with the same proposal number ,Consider the below condition, server A sends a proposal ,and the acceptor returns the accepted instanceId which might be greater or less than the proposal'instanceid ,then what will proposal do? use that instanceId and it's value for accept phase? then increase it'own instanceId ,waiting for next round ,then re-proposal with it own value? if so , when is the previous accepted value removed,because if it's not removed ,the acceptor will return this intanceId and value again,then it seems it is a loop
Multi-Paxos has a vague description so two persons may build two different systems based on it and in a context of one system the answer is "no," and in the context of another it's "yes."
Approach #1 - "No"
Mindset: Paxos is a two-phase protocol for building write-once registers. Multi-Paxos is a technique how to create a log on top of them.
One of the possible ways to build a log is
Create an array of completely independent write-once registers and initialize the first one with an initial value.
On new record we should:
A) Guess an index (X) of a vacant register and try to write a dummy record here (if it's already used then pick a register with a higher index and retry).
B) Start writing dummy records to every register with smaller than X index until we find a register filled with a non-dummy record.
C) Calculate a new record based on it (e.g., a record may have an ordinal, and we can use it to calculate an ordinal of the new record; since some registers are filled with dummy records the ordinals aren't equal to index) and write it to the X+1 register. In case of a conflict, we should restart the procedure from step A).
To read the log we should start writing dummy values from the first record, and on each conflict, we should increment index and retry until the write is succeeded which would indicate that the log's end is reached.
Of course, there is a lot of overhead in this approach, so please treat it just like a top-level overview what Multi-Paxos is.
The log is a powerful concept, and we can use it as a recipe for building distributed state machines - just think of each record as an update command. Unfortunately, in some cases, there is also a lot of overhead. For example, if you want to build a key/value storage and you care only about the current value than you don't need history and probably need to implement garbage collection to remove past versions from the log to optimize storage costs.
Approach #2 - "Yes"
Mindset: rewritable register as a heavily optimized version of Multi-Paxos.
If you start with the described approach with an application to the creation of key/value storage and then iterate in other to get rid of overhead, e.g., by doing garbage collection on the fly then eventually you may come up with an idea how to update the write-once register to be rewritable.
In that case, each instance uses the same ballot numbers just because all the instances are collapsed into one rewritable instance.
I described this approach in the How Paxos Works post and implemented it in the Gryadka project with 500-lines of JavaScript. Also, the idea behind it was independently checked with TLA+ by Greg Rogers and Tobias Schottdorf.

Forcing map() over Java 8 Stream ()

I'm confused on this situation:
I've a Producer which produces an undetermined number of items from an underlining iterator, possibly a large number of them.
Each item must be mapped to a different interface (eg, wrapper, JavaBean from JSON structure).
So, I'm thinking that it would be good for Producer to return a stream, it's easier to write code that convert Iterator to Stream (using Spliterators and StreamSupport.stream()), then apply Stream.map() and return the final stream.
The problem is I have an invoker that does nothing with the resulting stream, eg, a unit test, yet I still want the mapping code to be invoked for every item. At the moment I'm simply calling Stream.count() from the invoker to force that.
Questions are:
Am I doing it wrong? Should I use different interfaces? Note that I think implementing next()/hasNext() for Iterator is cumbersome, mainly because it forces you to create a new class (even if it can be anonymous) and keep a pointer and check it. Same for collection views, returning a collection that is created and not a dynamic view over the underlining iterator is out of question (the input data set might be very large). The only alternative I like so far is a Java implementation of yield(). Neither do I want the stream to be consumed inside Producer (ie, forEach()), since some other invoker might want it to perform some real operation.
Is there a better best practice to force the stream processing?

Questions within questions for tin can api?

Does Tin Can API support questions within questions?
If so, what would be the specification for passing data to an LRS?
I was thinking of adding ID's to each sub question.
This would be much easier to answer if you could provide an example, but the flexibility of the Tin Can API is such that you can literally capture anything (which is also part of the complexity) with more or less grace.
Some immediate options come to mind:
Use a single interaction activity statement (likely with type choice) and use the formatting allowed to have multi-value responses (i.e. golf[,]tetris).
Use multiple statements where there is a combined statement (necessary if there is an overall result) such that there is a single main activity and each sub-question has its own statement where the sub-question has its own activity and the main activity would be stored in the context.contextActivities.parent list. When there is a combined statement in this case I would include a reference to the combined statement in the sub-question statements' context.statement property such that you can tie them all together.
Use result, context, and activity definition extensions to capture anything. This should be a last resort option, it usually makes setting things up simple but adds significant complexity on the reporting side. Though tempting because of the simplicity, unless you are trying to capture a specific type of data point (like geo-location data, math equations, etc.) usually you should try to avoid the use of extensions.
Which of the above makes the most sense is probably determined by what sort of response is being given, and whether or not questions are nested such that there is an overall result and sub-results or whether there is just overall results.

What does "emit" mean in general computer science terms?

I just stumbled on what appears to be a generally-known compsci keyword, "emit". But I can't find any clear definition of it in general computer science terms, nor a specific definition of an "emit()" function or keyword in any specific programming language.
I found it here, reading up on MapReduce:
https://en.wikipedia.org/wiki/MapReduce
The context of my additional searches show it has something to do with signaling and/or events. But it seems like it is just assumed that the reader will know what "emit" is and does. For example, this article on MapReduce patterns:
https://highlyscalable.wordpress.com/2012/02/01/mapreduce-patterns/
There's no mention of what "emit" is actually doing, there are only calls to it. It must be different from other forms of returning data, though, such as "return" or simply "printf" or the equivalent, else the calls to "emit" would be calls to "return".
Further searching, I found a bunch of times that some pseudocode form of "emit" appears in the context of MapReduce. And in Node.js. And in Qt. But that's about it.
Context: I'm a (mostly) self-taught web programmer and system administrator. I'm sure this question is covered in compsci 101 (or 201?) but I didn't take that course.
In the context of web and network programming:
When we call a function the function may return a value.
When we call a function and the function is supposed to send those results to another function we will not user return anymore. Instead we use emit. We expect the function to emit the results to another function by our call.
A function can return results and emit events.
I've only ever seen emit() used when building a simple compiler in academia.
Upon analyzing the grammar of a program, you tokenize the contents of it and emit (push out) assembly instructions. (The compiler program that was written actually even contained an internal function called emit to mirror that theoretical/logical aspect of it.)
Once the grammar analysis is complete, the assembler will take the assembly instructions and generate the binary code (aka machine code).
So, I don't think there is a general CS definition for emit; however, I do know it is used in the pseudocode (and sometimes, actual code) for writing compiler programs. And that is undergraduate level computer science education in the US.
I can think of three contexts in which it's used:
Map/Reduce functions, where some input value causes 0 or more output values to go into the Reduce function
Tokenizers, where a stream of text is processed, and at various intervals, tokens are emitted
Messaging systems
I think the common thread is the "zero or more". A return provides exactly one value back from a function, whereas an "emit" is a function call that could take place zero times or several times.
In the context of the MapReduce programming model, it is said that an operation of a map nature takes an input value and emits a result, which is nothing more than a transformation of the input.

How do Erlang actors differ from OOP objects?

Suppose I have an Erlang actor defined like this:
counter(Num) ->
receive
{From, increment} ->
From ! {self(), new_value, Num + 1}
counter(Num + 1);
end.
And similarly, I have a Ruby class defined like this:
class Counter
def initialize(num)
#num = num
end
def increment
#num += 1
end
end
The Erlang code is written in a functional style, using tail recursion to maintain state. However, what is the meaningful impact of this difference? To my naive eyes, the interfaces to these two things seem much the same: You send a message, the state gets updated, and you get back a representation of the new state.
Functional programming is so often described as being a totally different paradigm than OOP. But the Erlang actor seems to do exactly what objects are supposed to do: Maintain state, encapsulate, and provide a message-based interface.
In other words, when I am passing messages between Erlang actors, how is it different than when I'm passing messages between Ruby objects?
I suspect there are bigger consequences to the functional/OOP dichotomy than I'm seeing. Can anyone point them out?
Let's put aside the fact that the Erlang actor will be scheduled by the VM and thus may run concurrently with other code. I realize that this is a major difference between the Erlang and Ruby versions, but that's not what I'm getting at. Concurrency is possible in other languages, including Ruby. And while Erlang's concurrency may perform very differently (sometimes better), I'm not really asking about the performance differences.
Rather, I'm more interested in the functional-vs-OOP side of the question.
In other words, when I am passing messages between Erlang actors, how is it different than when I'm passing messages between Ruby objects?
The difference is that in traditional languages like Ruby there is no message passing but method call that is executed in the same thread and this may lead to synchronization problems if you have multithreaded application. All threads have access to each other thread memory.
In Erlang all actors are independent and the only way to change state of another actor is to send message. No process have access to internal state of any other process.
IMHO this is not the best example for FP vs OOP. Differences usually manifest in accessing/iterating and chaining methods/functions on objects. Also, probably, understanding what is "current state" works better in FP.
Here, you put two very different technologies against each other. One happen to be F, the other one OO.
The first difference I can spot right away is memory isolation. Messages are serialized in Erlang, so it is easier to avoid race conditions.
The second are memory management details. In Erlang message handling is divided underneath between Sender and Receiver. There are two sets of locks of process structure held by Erlang VM. Therefore, while Sender sends the message he acquires lock which is not blocking main process operations (accessed by MAIN lock). To sum up, it gives Erlang more soft real-time nature vs totally random behaviour on Ruby side.
Looking from the outside, actors resemble objects. They encapsulate state and communicate with the rest of the world via messages to manipulate that state.
To see how FP works, you must look inside an actor and see how it mutates state. Your example where the state is an integer is too simple. I don't have the time to provide full example, but I'll sketch the code. Normally, an actor loop looks like following:
loop(State) ->
Message = receive
...
end,
NewState = f(State, Message),
loop(NewState).
The most important difference from OOP is that there are no variable mutations i.e. NewState is obtained from the State and may share most of the data with it, but the State variable always remains the same.
This is a nice property, since we never corrupt current state. Function f will usually perform a series of transformation to turn State into NewState. And only if/when it completely succeeds we replace the old state with the new one by calling loop(NewState).
So the important benefit is consistency of our state.
The second benefit I found is cleaner code, but it takes some time getting used to it. Generally, since you cannot modify variable, you will have to divide your code in many very small functions. This is actually nice, because your code will be well factored.
Finally, since you cannot modify a variable, it is easier to reason about the code. With mutable objects you can never be sure whether some part of your object will be modified, and it gets progressively worse if using global variables. You should not encounter such problems when doing FP.
To try it out, you should try to manipulate some more complex data in a functional way by using pure erlang structures (not actors, ets, mnesia or proc dict). Alternatively, you might try it in ruby with this
Erlang includes the message passing approach of Alan Kay's OOP (Smalltalk) and the functional programming from Lisp.
What you describe in your example is the message approach for OOP. The Erlang processes sending messages are a concept similar to Alan Kay's objects sending messages. By the way, you can retrieve this concept implemtented also in Scratch where parallel running objects send messages between them.
The functional programming is how you code the processes. For instance, variables in Erlang cannot be modified. Once they have been set, you can only read them. You have also a list data structure which works pretty much like Lisp lists and you have fun which are insprired by Lisp's lambda.
The message passing on one side, and the functional on the other side are quite two separate things in Erlang. When coding real life erlang applications, you spend 98% of your time doing functional programming and 2% thinking about messages passing, which is mainly used for scalability and concurrency. To say it another way, when you come to tackly complex programming problem, you will probably use the FP side of Erlang to implement the details of the algo, and use the message passing for scalability, reliability, etc...
What do you think of this:
thing(0) ->
exit(this_is_the_end);
thing(Val) when is_integer(Val) ->
NewVal = receive
{From,F,Arg} -> NV = F(Val,Arg),
From ! {self(), new_value, NV},
NV;
_ -> Val div 2
after 10000
max(Val-1,0)
end,
thing(NewVal).
When you spawn the process, it will live by its own, decreasing its value until it reach the value 0 and send the message {'EXIT',this_is_the_end} to any process linked to it, unless you take care of executing something like:
ThingPid ! {self(),fun(X,_) -> X+1 end,[]}.
% which will increment the counter
or
ThingPid ! {self(),fun(X,X) -> 0; (X,_) -> X end,10}.
% which will do nothing, unless the internal value = 10 and in this case will go directly to 0 and exit
In this case you can see that the "object" lives its own live by itself in parallel with the rest of the application, that it can interact with the outside almost without any code, and that the outside can ask him to do things you didn't know when you wrote and compile the code.
This is a stupid code, but there are some principle that are used to implement application like mnesia transaction, the behaviors... IMHO the concept is really different, but you have to try to think different if you want to use it correctly. I am pretty sure that it is possible to write "OOPlike" code in Erlang, but it will be extremely difficult to avoid concurrency :o), and at the end no advantage. Have a look at OTP principle which gives some tracks about the application architecture in Erlang (supervision trees, pool of "1 single client servers", linked processes, monitored processes, and of course pattern matching single assignment, messages, node clusters ...).

Resources