What is the practical use for pack/unpack and gather/scatter?

What is the practical use for pack/unpack and gather/scatter? - parallel-processing

In parallel programming, what is the practical use for pack/unpack and gather/scatter? I searched for practical examples and didn't find any use of those.

Related

Examples where compiler-optimized functional code performs better than imperative code

One of the promises of side-effect free, referentially transparent functional programming is that such code can be extensively optimized.
To quote Wikipedia:
Immutability of data can, in many cases, lead to execution efficiency, by allowing the compiler to make assumptions that are unsafe in an imperative language, thus increasing opportunities for inline expansion.
I'd like to see examples where a functional language compiler outperforms an imperative one by producing a better optimized code.
Edit: I tried to give a specific scenario, but apparently it wasn't a good idea. So I'll try to explain it in a different way.
Programmers translate ideas (algorithms) into languages that machines can understand. At the same time, one of the most important aspects of the translation is that also humans can understand the resulting code. Unfortunately, in many cases there is a trade-off: A concise, readable code suffers from slow performance and needs to be manually optimized. This is error-prone, time consuming, and it makes the code less readable (up to totally unreadable).
The foundations of functional languages, such as immutability and referential transparency, allow compilers to perform extensive optimizations, which could replace manual optimization of code and free programmers from this trade-off. I'm looking for examples of ideas (algorithms) and their implementations, such that:
the (functional) implementation is close to the original idea and is easy to understand,
it is extensively optimized by the compiler of the language, and
it is hard (or impossible) to write similarly efficient code in an imperative language without manual optimizations that reduce its conciseness and readability.
I apologize if it is a bit vague, but I hope the idea is clear. I don't want to give unnecessary restrictions on the answers. I'm open to suggestions if someone knows how to express it better.
My interest isn't just theoretical. I'd like to use such examples (among other things) to motivate students to get interested in functional programming.
At first, I wasn't satisfied by a few examples suggested in the comments. On second thoughts I take my objections back, those are good examples. Please feel free to expand them to full answers so that people can comment and vote for them.
(One class of such examples will be most likely parallelized code, which can take advantage of multiple CPU cores. Often in functional languages this can be done easily without sacrificing code simplicity (like in Haskell by adding par or pseq in appropriate places). I' be interested in such examples too, but also in other, non-parallel ones.)

There are cases where the same algorithm will optimize better in a pure context. Specifically, stream fusion allows an algorithm that consists of a sequence of loops that may be of widely varying form: maps, filters, folds, unfolds, to be composed into a single loop.
The equivalent optimization in a conventional imperative setting, with mutable data in loops, would have to achieve a full effect analysis, which no one does.
So at least for the class of algorithms that are implemented as pipelines of ana- and catamorphisms on sequences, you can guarantee optimization results that are not possible in an imperative setting.

A very recent paper Haskell beats C using generalised stream fusion by Geoff Mainland, Simon Peyton Jones, Simon Marlow, Roman Leshchinskiy (submitted to ICFP 2013) describes such an example. Abstract (with the interesting part in bold):
Stream fusion [6] is a powerful technique for automatically transforming
high-level sequence-processing functions into efficient implementations.
It has been used to great effect in Haskell libraries
for manipulating byte arrays, Unicode text, and unboxed vectors.
However, some operations, like vector append, still do not perform
well within the standard stream fusion framework. Others,
like SIMD computation using the SSE and AVX instructions available
on modern x86 chips, do not seem to fit in the framework at
all.
In this paper we introduce generalized stream fusion, which
solves these issues. The key insight is to bundle together multiple
stream representations, each tuned for a particular class of stream
consumer. We also describe a stream representation suited for efficient
computation with SSE instructions. Our ideas are implemented
in modified versions of the GHC compiler and vector library.
Benchmarks show that high-level Haskell code written using
our compiler and libraries can produce code that is faster than both
compiler- and hand-vectorized C.

This is just a note, not an answer: the gcc has a pure attribute suggesting it can take account of purity; the obvious reasons are remarked on in the manual here.
I would think that 'static single assignment' imposes a form of purity -- see the links at http://lambda-the-ultimate.org/node/2860 or the wikipedia article.

make and various build systems perform better for large projects by assuming that various build steps are referentially transparent; as such, they only need to rerun steps that have had their inputs change.
For small to medium sized changes, this can be a lot faster than building from scratch.

Writing portable scheme code. Is anything "standard" beyond R5RS itself?

I'm learning scheme and until now have been using guile. I'm really just learning as a way to teach myself a functional programming language, but I'd like to publish an open source project of some sort to reenforce the study— not sure what yet... I'm a web developer, so probably something webby.
It's becoming apparent that publishing scheme code isn't very easy to do, with all these different implementations and no real standards beyond the core of the language itself (R5RS). For example, I'm almost certainly going to need to do basic IO on disk and over a TCP socket, along with string manipulation, such as scanning/regex, which seems not to be covered by R5RS, unless I'm not seeing it in the document. It seems like Scheme is more of a "concept" than a practical language... is this a fair assessment? Perhaps I should look to something like Haskell if I want to learn a functional programming language that lends itself more to use in open source projects?
In reality, how much pain do the differing scheme implementations pose when you want to publish an open source project? I don't really fancy having to maintain 5 different functions for basic things like string manipulation under various mainstream implementations (Chicken, guile, MIT, DrRacket). How many people actually write scheme for cross-implementation compatibility, as opposed to being tightly coupled with the library functions that only exist in their own scheme?
I have read http://www.ccs.neu.edu/home/dorai/scmxlate/scheme-boston/talk.html, which doesn't fill me with confidence ;)
EDIT | Let's re-define "standard" as "common".

I believe that in Scheme, portability is a fool's errand, since Scheme implementations are more different than they are similar, and there is no single implementation that other implementations try to emulate (unlike Python and Ruby, for example).
Thus, portability in Scheme is analogous to using software rendering for writing games "because it's in the common subset between OpenGL and DirectX". In other words, it's a lowest common denominator—it can be done, but you lose access to many features that the implementation offers.
For this reason, while SRFIs generally have a portable reference implementation (where practical), some of them are accompanied by notes that a quality Scheme implementation should tailor the library to use implementation-specific features in order to function optimally.
A prime example is case-lambda (SRFI 16); it can be implemented portably, and the reference implementation demonstrates it, but it's definitely less optimal compared to a built-in case-lambda, since you're having to implement function dispatch in "user" code.
Another example is stream-constant from SRFI 41. The reference implementation uses an O(n) simulation of circular lists for portability, but any decent implementation should adapt that function to use real circular lists so that it's O(1).†
The list goes on. Many useful things in Scheme are not portable—SRFIs help make more features portable, but there's no way that SRFIs can cover everything. If you want to get useful work done efficiently, chances are pretty good you will have to use non-portable features. The best you can do, I think, is to write a façade to encapsulate those features that aren't already covered by SRFIs.
† There is actually now a way to implement stream-constant in an O(1) fashion without using circular lists at all. Portable and fast for the win!

Difficult question.
Most people decide to be pragmatic. If portability between implementations is important, they write the bulk of the program in standard Scheme and isolate non-standard parts in (smallish) libraries. There have been various approaches of how exactly to do this. One recent effort is SnowFort.
http://snow.iro.umontreal.ca/
An older effort is SLIB.
http://people.csail.mit.edu/jaffer/SLIB
If you look - or ask for - libraries for regular expressions and lexer/parsers you'll quickly find some.
Since the philosophy of R5RS is to include only those language features that all implementors agree on, the standard is small - but also very stable.
However for "real world" programming R5RS might not be the best fit.
Therefore R6RS (and R7RS?) include more "real world" libraries.
That said if you only need portability because it seems to be the Right Thing, then reconsider carefully if you really want to put the effort in.
I would simply write my program on the implementation I know the best. Then if necessary port it afterwards. This often turns out to be easier than expected.

I write a blog that uses Scheme as its implementation language. Because I don't want to alienate users of any particular implementation of Scheme, I write in a restricted dialect of Scheme that is based on R5RS plus syntax-case macros plus my Standard Prelude. I don't find that overly restrictive for the kind of algorithmic programs that I write, but your needs may be different. If you look at the various exercises on the blog, you will see that I wrote my own regular-expression matcher, that I've done a fair amount of string manipulation, and that I've snatched files from the internet by shelling out to wget (I use Chez Scheme -- users have to provide their own non-portable shell mechanism if they use anything else); I've even done some limited graphics work by writing ANSI terminal sequences.
I'll disagree just a little bit with Jens. Instead of porting afterwards, I find it easier to build in portability from the beginning. I didn't use to think that way, but my experience over the last three years shows that it works.

It's worth pointing out that modern Scheme implementations are themselves fairly portable; you can often port whole programs to new environments simply by bringing the appropriate Scheme along. That doesn't help library programmers much, though, and that's where R7RS-small, the latest Scheme definition, comes in. It's not widely implemented yet, but it provides a larger common core than R5RS.

Erlang Matrix Library

I'm looking for a robust library to handle matrices in Erlang. Nothing fancy, just efficient handling of multiplication and basic operations. I could do that with lists etc. but I'm sure my implementation won't be very efficient !

The presentation in this link talks about some erlang bindings to BLAS etc -
High Performance Technical Computing in Erlang. Hope this is helpful.

In VB6, should I prefer sqr() or ()^0.5?

I am doing some numerical analysis work in VB6, and the question arises of which of
sqr(x)
or
x^0.5
I should use.
Is there any difference in the method used to evaluate these two expressions, and if so, which of them should I prefer?

VB6 does not document the method is uses to evaluate sqr() or x^0.5. Empirically, sqr() is much faster, which could mean that they are using a dedicated root finding algorithm here. The use of a specialized algorithm could mean that sqr() also has better numerical stability, but I have no information regarding this.

pseudo code - formal rules

Has anyone set out a proposal for a formal pseudo code standard?
Is it better to be a 'rough' standard to infer an understanding?

It is better to be a rough standard; the intent of pseudocode is to be human-readable, not machine-readable, and the goal of actually writing pseudocode is to convey a higher-level description of an algorithm while being unconcerned (typically) with the minutiae of the implementation. My opinion is that for it to qualify as pseudocode there has to be some ambiguity, and your goal should be a clear conveyance of your algorithmic intentions. Stick to common control structures, declarations and concepts that are paradigmatic to your target audience or language and you'll get the point across. If you start getting too formal, you're getting too close to writing actual code.

NOT AN ANSWER.
IMHO forcing a standard (pseudo code syntax, if you will) will cause people to be less clear on what they want to say.
Browse around, try to gather some knowledge about used conventions, and do your best to be clear.

Although this is by no means a formal proposal, Python is considered by some to be Executable Pseudocode.

in my opinion it depends on the people who are using your programs. In books for algorithms the pseudo code is very formal an near to maths, but is also described in a few paragraphs, so this is for scientific situations.
If I develop in others environments I would prefer a not that formal way because that is easier to understand for most people. I prefer spoken words over formalism. If you want formalism you could read the code instead.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio