interoperability - interop

I am having a bit of trouble understanding what "interoperability" and "interoperability concerns" mean in terms of Software Engineering. Can someone please give me an example of what an interoperability concern is may be?
Thanks in advance

Interoperability means intercommunication of code written in different programming languages. For example, when you call a WINAPI function (which is in C) from your C# code you are using interop.
Interop concerns are therefore concerns about interoperability. For example, when writing a C++ dll, you might want to keep your interface types POD's to enable later interoperability with C and .NET

From wiki:
According to ISO/IEC 2382-01,
Information Technology Vocabulary,
Fundamental Terms, interoperability is
defined as follows: "The capability to
communicate, execute programs, or
transfer data among various functional
units in a manner that requires the
user to have little or no knowledge of
the unique characteristics of those
units".
Example: The interoperability of .doc documents between different word processors.

See interoperability in Wikipedia
the ability of two or more systems or components to exchange information and to use the information that has been exchanged.

Related

Apache Beam and ETL processes

Given following processes:
manually transforming huge .csv's files via rules (using MS excel or excel like software) & sharing them via ftp
scripts (usually written in Perl or Python) which basically transform data preparing them for other processes.
API's batch reading from files or other origin sources & updating their corresponding data model.
Springboot deployments used (or abused) to in part regularly collect & aggregate data from files or other sources.
And given these problems/ areas of improvement:
Standardization: I'd like to (as far as it makes sense), to propose a unified powerful tool that natively deals with these types of (kind of big) data transformation workflows.
Rising the abstraction level of the processes (related to the point above): Many of the "tasks/jobs" I mentioned above, are seen by the teams using them, in a very technical low level task-like way. I believe having a higher level view of these processes/flows highlighting their business meaning would help self document these processes better, and would also help to establish a ubiquitous language different stakeholders can refer to and think unambiguously about.
IO bottlenecks and resource utilization (technical): Some of those processes do fail more often that what would be desirable, (or take a very long time to finish) due to some memory or network bottleneck. Though it is clear that hardware has limits, resource utilization doesn't seem to have been a priority in many of these data transformation scripts.
Do the Dataflow model and specifically the Apache Beam implementation paired with either Flink or Google Cloud Dataflow as a backend runner, offer a proven solution to those "mundane" topics? The material on the internet mainly focuses on discussing the unified streaming/batch model and also typically cover more advanced features like streaming/event windowing/watermarks/late events/etc, which do look very elegant and promising indeed, but I have some concerns regarding tool maturity and community long term support.
It's hard to give a concrete answer to such a broad question, but I would say that, yes, Beam/Dataflow is a tool that handle this kind of thing. Even though the documentation focuses on "advanced" features like windowing and streaming, lots of people are using it for more "mundane" ETL. For questions about tool maturity and community you could consider sources like Forrester reports that often speak of Dataflow.
You may also want to consider pairing it with other technologies like Arflow/Composer.

Distributed array in MPI for parallel numerics

in many distributed computing applications, you maintain a distributed array of objects. Each process manages a set of objects that it may read and write exclusively and furthermore a set of objects that may only read (the content of which is authored by and frequently recerived from other processes).
This is very basic and is likely to have been done a zillion times until times until now - for example, with MPI. Hence I suppose there is something like an open source extension for MPI, which provides the basic capabilities of a distributed array for computing.
Ideally, it would be written in C(++) and mimic the official MPI standard interface style. Does anybody know anything like that? Thank you.
From what I gather from your question, you're looking for a mechanism for allowing a global view (read-only) of the problem space, but each process has ownership (read-write) of a segment of the data.
MPI is simply an API specification for inter-process communication for parallel applications and any implementation of it will work at a level lower than what you are looking for.
It is quite common in HPC applications to perform data decomposition in a way that you mentioned, with MPI used to synchronise shared data to other processes. However each application have different sharing patterns and requirements (some may wish to only exchange halo regions with neighbouring nodes, and perhaps using non-blocking calls to overlap communication other computation) so as to improve performance by making use of knowledge of the problem domain.
The thing is, using MPI to sync data across processes is simple but implementing a layer above it to handle general purpose distribute array synchronisation that is easy to use yet flexible enough to handle different use cases can be rather trickly.
Apologies for taking so long to get to the point, but to answer your question, AFAIK there isn't be an extension to MPI or a library that can efficiently handle all use cases while still being easier to use than simply using MPI. However, it is possible to to work above the level of MPI which maintaining distributed data. For example:
Use the PGAS model to work with your data. You can then use libraries such as Global Arrays (interfaces for C, C++, Fortran, Python) or languages that support PGAS such as UPC or Co-Array Fortran (soon to be included into the Fortran standards). There are also languages designed specifically for this form of parallelism, i,e. Fortress, Chapel, X10
Roll your own. For example, I've worked on a library that uses MPI to do all the dirty work but hides the complexity by providing creating custom data types for the application domain, and exposing APIs such as:
X_Create(MODE, t_X) : instantiate the array, called by all processes with the MODE indicating if the current process will require READ-WRITE or READ-ONLY access
X_Sync_start(t_X) : non-blocking call to initiate synchronisation in the background.
X_Sync_complete(t_X) : data is required. Block if synchronisation has not completed.
... and other calls to delete data as well as perform domain specific tasks that may require MPI calls.
To be honest, in most cases it is often simpler to stick with basic MPI or OpenMP, or if one exists, using a parallel solver written for the application domain. This of course depends on your requirements.
For dense arrays, see Global Arrays and Elemental (Google will find them for you).
For sparse arrays, see PETSc.
I know this is a really short answer, but there is too much documentation of these elsewhere to bother repeating it.

How to design programs/softwares to take advantage of multiprocessors

To take advantage of multiprocessors
1. Do you need to select any specific programming language
2. Are there any design patterns
3. Can you schedule each thread on any available different processor
I am trying to understand good practices to write excellent programs which take full advantage of the available processors.
Writing proper parallel code is hard.
I don't know of any textbooks, but I've found Herb Sutter's series on Effective Concurrency to be pretty good.
Each language has varying support for multithreading.
Yes, you can read about them in operating systems textbooks.
It depends on your programming language and your tools. Often, it is easier to leave this decision to the operating system.
If you are just starting out with multithreading, you may want to look at some books: https://stackoverflow.com/search?q=multithreading+book.
Unless the problem you have to solve specifically asks for paralell computing I would choose the programming language best fit to solve my real problem.
The allocation of threads to processors is usually best left to the operating system. You can influence that allocation by using different priorities of threads.
I you use an language with runtime environment (java, .net) than you have the additional layer of threads within the runtime environment vs native threads.
To fully use the potential of multi processors, the problem you have must be a problem that lends itself to multi processing. There is no real use in a heavily multithreaded data entry form.
hth
Mario

FPGA Place & Route

For programming FPGAS, is it possible to write my own place & route routines? [The point is not that mine would be better; the point is whether I have the freedom to do so] -- or does the place & route stage output into undocumented bitfiles, essengially forcing me to use proprietary tools?
Thanks!
There's been some discussion of this on comp.arch.fpga in the past. The conclusion is generally that unless you want to attract intense legal action from the FPGA companies then you probably don't want to do something like this. bitfile formats are closely guarded secrets of the FPGA companies and you would likely have to understand the file format in order to do what you want to do. That implies that you would need to reverse engineer the format and that (if you made your tool public in any way) would get you a lawsuit in short order.
I will add that there probably are intermediate files and that you likely wouldn't read or write the bitfile itself to do what you want to do, but those intermediate files tend to be undocumented as well. Read the EULA for your FPGA synthesis tool (ISE from Xilinx, for example) - any kind of reverse engineering is strictly forbidden. It seems that the only way we'll ever have open source alternatives in this space is for an open source FPGA architecture to emerge.
I agree with annccodeal, but to amplify a little bit, on Xilinx, there may be a few ways to do this. The XDL file format allows (or used to allow) explicit placement and routing. In addition, it should be possible to script the FPGA Editor to implement custom routing.
As regards placement, there is a rich infrastructure to constrain technology mapping of logic to primitives and to control placement of those primitives. For example LUT_MAP constraints can control technology mapping and LOC and RLOC constraints can determine placement. In practice, these allow the experienced designer great control over how a design is implemented without requiring them to duplicate man-centuries of software development to generate a bitstream directly.
You may also find interesting the current state of the art FPGA CAD research software such VPR. In my opinion these are challenged to keep up with vendor's own tools that must cope with modern heterogeneous FPGAs with splittable 6-LUTs, DSP blocks, etc.
Happy hacking.

Java or C for image processing

I am looking in to learning a programming language (take a course) for use in image analysis and processing. Possibly Bioinformatics too. Which language should I go for? C or Java? Other languages are not an option for me. Also please explain why either of the languages is a better option for my application.
You have to balance raw processing power and developer time. Java is getting pretty fast too and if you are finished a couple of days early, you have more time to process the data.
It all depends on volume.
More importantly, I suggest you look for the libraries and frameworks which already exist, see which fits closest to what needs to be done, and choose whatever language the library was written be it C, Java or Fortran.
For Java I found BioJava.org as a starting point.
Java isn't TOOO bad for image processing. If you manage your source objects appropriately, you ll have a chance at getting reasonable performance out of it. Some of the things I like with Java that relates to imaging:
Java Advanced Imaging
2D Graphics utilities (take a look at BufferedImages)
ImageJ, etc
Get it to work with JAMA
Ask someone in the field you're working in (ie, bioinformatics)
For solar images, the majority of the work is done in IDL, Fortran, Matlab, Python, C or Perl (PDL). (Roughly in that order ... IDL is definitely first, as the majority of the instrument calibration software is written in IDL)
Because of this, there's a lot of toolkits already written in those languages for our field. Frequently, with large reference data sets, the PI releases some software package as an example of how to interpret / interact with the data format. I can only assume that Bioinformatics would be similar.
If you end up going a different route than the rest of the field, you're going to have a much harder time working with other scientists as you can't share code as easily.
Note -- There are a number of the visualization tools that have been released in our field that were written in Java, but they assume that the images have already been prepped by some other process.
The most popular computer vision (image processing, image analysis) library is OpenCV which is written in C++, but can also be used with Python, and Java (official OpenCV4Android and non-official JavaCV).
There are Bioinformatic applications that are basically image processing, so OpenCV will take care of that. But there are also some which are not, they are, for example, based on Machine Learning, so if you need something other than image/video related you will need another Bioinformatic oriented library. Opencv also has a machine learning module but it is more focused for computer vision.
About the languages C vs Java, most has been said in the other answers. I should add that these libraries are now C++ based and not plain C. If your applications have real-time processing needs, C++ will probably be better for that, if not, Java will be more than enough as it is more friendly.
Ideally, you would use something like Java or (even better) Python for "high-level" stuff, and compile in C the routines that require a lot of processing power (for instance using Cython, etc).
Some scientific libraries exist for Python (SciPy and NumPy), and they are a good start, although it isn't yet straightforward to combine Python and C (you need to tweak things a bit).
just my two pence worth: java doesn't allow the use of pointers as opposed to C/C++ or C#. So if you are going to manipulate pixels directly, i.e. write your own image processing functions then they will be much slower than the equivalent in C++. On the otherhand C++ is a total nightmare of a language compared to java. it will take you at least twice as long to write the equivalent bit of code in c++. so with all the productivity gain you can probably afford to buy a computer that makes up for the difference in runtime ;-)
i know other languages aren't an option for you, but personally i can highly recommend c# for image processing or computer vision: it allows pointers and hence IP functions in c# are only half as slow as in C++ (an acceptable trade-off i think) and it has excellent integration with native C++ and a good wrapper library for opencv.
Disclaimer: I work for TunaCode.
If you have to make a choice between different languages to get started on Image Processing, I would recommend to start with C++. You can raw pointer access which is a must if you want to operate on individual pixels.
Next, what kind of Imaging are you interested in? Just for fun image filters or some heavy stuff like motion estimation, tracking and detection etc? For that I would recommend you take a look at CUVILib since sooner than later, you will need performance on Imaging functionality and that's what CUVI provides. You can use it as standalone if it serves your purposes or you can plug it with other libraries like Intel IPP, ITK, OpenCV etc.

Resources