Could anyone please explain, what are the differences between monolithic, microkernel and exokernel ?
There are many differences between these kernel types. They differ in implementing kernel services like memory management, process management etc.
Monolithic kernel implements all kernel services so its bigger in size where as exokernel implements nothing in kernel part so it is much lighter and microkernel sits in between mono and exo kernel.
On the other hand, in case of exokernel everything is implemented in nonkernel part so application developer has to decide what to do with allocated resources. There will be no such overhead in case of monolithic kernel.
To know more differences following link may be usefull for you.
( https://gettech1.wordpress.com/2014/04/24/difference-between-monolithic-microkernel-and-exokernel/ )
Related
I am testing a custom FPGA NIC and I need to send management information (such as header info for matching) and traffic data to it using a traffic generator from within the user space.
The driver built for the FPGA is a modified version of IXGBE with DMA support for management, and also supports DPDK for kernel bypass to achieve high throughput.
I am trying to understand how the various software (driver, userspace application, etc) should be stacked/connected to each-other so I can achieve the objective of reading and writing to PCIe on the NIC using set of scripts from user space?
I have also been looking at this project
https://github.com/CospanDesign/python-pci
which is useful however based on Xilinx XDMA.
Would appreciate any help, pointers on this.
Sorry, the question is too broad. For such a broad question there is a generic answer: have a look at Inter Process Communication:
https://en.wikipedia.org/wiki/Inter-process_communication
There are variety of methods, like Unix sockets, shared memory, netlink etc, to communicate between user space processes. As well as a variety of methods to communicate between user space and kernel space.
Just pick the best for you and try to do something. If it fails, come again on SO and ask ;)
The reference material simply states that JDK7 is required for Spring XD.
What are the minimum requirements (RAM, CPU, Disk) for hosts meant to run Spring XD Admin?
What are the minimum requirements (RAM, CPU, Disk) for hosts meant to run Spring XD Containers?
The answer in both cases is it depends what you need to use them for. It seems like Spring XD is designed for high throughput computing(HTC), so unlike traditional high performance computing the addition of GPUs or coprocessors in this case would probably not be particularly beneficial. If you just want to try it out and happen to have several servers laying around it seems like as long as you have something that is powerful enough to run an OS that supports Java you could probably at least make it work. If you are in the initial stages of testing Spring XD to see if it will integrate with your existing infrastructure this would allow you to at least try it out. If you have passed that stage of testing and are confident that Spring XD will work and would like to purchase hardware to optimize its performance feel free to continue reading.
I have not used Spring XD before, but based on the documentation I have been reading and some experiences with HTC there are a few considerations for setting up systems to run it. if you take a look at the diagram from the docs and read a little bit about the services it seems like the Admin, Zookeeper, Analytics Repo and Batch Job DB could be hosted on virtual machines(VMs) under the hypervisor of your choice.
Using a setup with several of the subsystems required to use the distributed model running on VMs would give you the ability to scale resources as necessary, e.g. to begin a single hypervisor system may be sufficient to run everything but as traffic/use grows it may be desirable to separate the VMs onto multiple hypervisors and give some of the VMs additional resources.
With the containers it seems like many other virtualization or containerization schemes for HTC, where more powerful systems e.g. lots of RAM, SSD storage, allow users to run more containers on a single physical box.
To adequately assess the needs for a new system running any application it is important to understand what the limiting factor on the problem is; is it memory bound, IO bound or CPU bound? For large scale parallel applications there are a variety of tools for profiling code and determining where bottlenecks occur. TAU is a common profiling utility in HPC and there are several proprietary offerings available as well.
Once the limitations of the program are clear specing out a system with hardware to reduce/minimize the issue is a lot easier, and normally less expensive. Hopefully this information is helpful.
Additions based on comments:
It seems like it would run with 128k of memory if you have an OS that will boot and run java and any other requirements. If there is backend storage setups somewhere, like a standalone DB server which can be used for the databases as described in the DB Config section of the guide it seems like only a small amount of storage would be necessary.
Depending on how you deploy the images for the Admin OS that may not even be necessary as you could use KIWI to create and deploy a custom OS image of your choosing with configuration files and other customizations embedded in the image. This image could be loaded via the network over PXE or to one of the other output formats KIWI supports like VMs, bootable USB and more.
The exact configuration of the systems running Spring XD will depend on the end goals, available infrastructure and a number of other things. It seems like the Spring XD Admin node could be run on most infrastructure servers. Factors such as reliability, stability and desired performance must also be considered when choosing hardware.
Q: Will Spring XD Admin run on a system with RaspberryPi like specs?
A: based on documentation, yes
Q: Will it run with good performance or reliably on such a system?
A: Probably not if being used for extended periods of time or for large amounts of traffic.
I am trying to understand the working of wireless in linux. I started with wpa_supplicant, hostapd applications with the help of their documentation and source code.Understood the flow and basic functionalities of :
wpa_supplicant,nl80211(driver interface)
libnl library(socket communication between user space and kernel using netlink protocol)
cfg80211(kernel interface used for communicating with the driver from userspace with the help of nl80211 implementation in user space),mac80211(software media access control layer)
driver(loadable driver ex:ath6kl - atheros driver).
I understood the above software flow and in my exploration I came to know that for providing freedom for developers MAC layer is implemented in software(popular implementation mac80211).
Is this true in all the cases ? If so what are pros and cons of softMAC and hardMAC ? Do cfg80211 interface in kernel directly communicates with the driver ? who and how communication with mac80211 happens ?
Thanks in advance.
The term 'SoftMAC' refers to a wireless network interface device (WNIC) which does not implement the MAC layer in hardware, rather it expects the drivers to implement the MAC layer.
'HardMAC' (also called 'FullMAC') describes a WNIC which implements the MAC layer in hardware.
The advantages of SoftMAC are:
Potentially lower hardware costs
Possibility to upgrade to newer standards by updating the driver only
Possibility to correct faults in the MAC implementation by updating the driver only
An additional advantage (in the Linux kernel at least) is that many different drivers for different types of WNIC can all share the same MAC implementation, provided by the kernel itself.
Despite the advantages, not all WNICs use SoftMAC. The main advantages of HardMAC is that since the MAC functions are implemented in hardware, they contribute less CPU load.
mac80211 is the framework within the Linux kernel for implementing SoftMAC drivers. It implements the cfg80211 callbacks which would otherwise have to be implemented by the driver itself, and also implements the MAC layer functions. As such it goes between cfg80211 and the SoftMAC drivers.
HardMAC drivers have to implement the cfg80211 interfaces fully themselves.
Also to add :-
Hardmac drivers helps in better as compared to SoftMAC, power save and quick connection/disconnection recovery due to MLME implemented in HW. Better power save is because HW/FW need not to wake up host on disconnection and still can connect and recover .
the following tries to get an idea of which technologies would be suitable for a specific (as outlined) distributed/RPC problem. If something is not clear, am am very happy to add more details, but please request these in a comment and not in an "answer". Thanks.
First I will describe the current situation, and then follows what we want to achieve and the actual question. Despite this being a rather long post to get some context, the question itself is rather short (see at the end).
The Application Split challenge
Application description:
The app allows the user to configure a number of hardware devices(*)
and then communicate with these to control and collect measurement
channels of a physical experiment.
(*) Hardware devices include temperature sensors, pressure sensors,
motors, ... Communication ranges from serial port communication,
TCP/UDP communication to interfacing with the drivers of 3rd party
plugin cards.
Control involves sending commands to the various hardware devices
to configure them according to the protocols they support.
Measuring involves getting the data from (some of) these devices.
We are hard pressed to keep the whole thing running as customers
demand more and more channels at higher sample rates and we have to
keep up with writing the data+timestamps we get from all devices to
disk, display a subset of the data and still keep the system
responding properly.
Current situation:
[ DisplayAndControl.exe ]
|| /\
|| DLL Interface ||
|| || Window Messages (SendMessage, PostMessage)
|| ||
\/ ||
[ ChannelManager.dll ]
ChannelManager.dll (Native C++ DLL on Windows)
Manages n data channels (physical measurement variables)
Each channel holds a shifting arbitrary number of samples with
high-precision timestamps
Allows to group channels and write their ongoing updates or
historical values ("measurement") to disk
Calculations with channels (arithmetic, integration, mean
values, etc.)
Interfaces with (realtime) hardware devices to get the timestamps
and values of channels
Get value+timestamp from hardware and save in internal
ring buffer for channel
DisplayAndControl.exe (Native C++ MFC App on Windows)
Control the functions of ChannelManager.dll (configure channels
and HW devices)
Live display current values/timestamps/changes of all channels
Graph values of (groups of) channels in diagrams
print diagrams and tables of channel values
Summary of current situation:
The application as it is at the moment is already somewhat modular
in that the (main) executable does the display+interaction and the
(one of several) DLL does the data management (saving of live data
to disk, communication with devices, etc.)
From a performance POV, communication btw. the display module and
the data management module is optimally performant at the moment.
New situation:
[ DisplayAndControl.exe ]
|| /\
|| ? RPC/Messaging ||
|| || ? RPC/Messaging
|| ||
\/ ||
[ ChannelManager.exe (same PC or another) ]
Summary of the envisioned new situation:
For usability, performance and safety reasons, we wish to split up
this Windows app into two separate applications, so that the
performance (and safety) sensitive ChannelManager module can run as
a separate process possibly on a separate Windows PC.
Additionally, since we're already going to split this, we will
allow for multiple DisplayAndControl.exe apps connected to one
single ChannelManager.exe.
One QUESTION now is what technology we should use to facilitate the
communication btw. the now two (or, rather, 1 : small_n) applications.
Performance is important, because a lot of data travels btw. the
two applications and latency should be kept to a minimum. It "only"
needs to work on Windows, but it should be usable from native C++
only which makes all purely .NET based technologies unattractive.
(Note: Porting parts of DisplayAndControl.exe to .NET/WPF is
planned, but ChannelManager.exe should stay pure native, as we
don't want any .NET stuff running inside this process.)
Regarding latency: It is important that we achieve some level of
soft-realtime in the sense that small latency is acceptable, but
large and especially varying latency is not acceptable for usability
and safety reasons. Therefore any protocol that would help in
getting some sort of (soft) realtime behavior would be preferred.
RPC technologies we've looked at:
WCF (or .NET remoting) - Is dotnet only, therefore not
attractive. Performance figures are also not very good.
(D)COM - COM is great for Windows RPC communication, but it
breaks down once you have to have inter-PC comm because it is
horrible to get the security settings working in a corporate IT
network.
CORBA - We have had good experience with CORBA communications in
the past. The communication is easy to get working; there's not
much infrastructure overhead; it works well from C++; writing
a .NET wrapper is pretty trivial. The problem with CORBA is that
it's somewhat complicated to use correctly in C++ (people will use
a lot of time on chasing memory leaks, esp. inexperienced C++
devs). It also will be a learning curve for every developer and
every new developer, as no one expects people to "know" CORBA
nowadays. Also, it might not perform as well as we'd like it to and
as far as I know there's no readily available realtime support.
Thrift - still looks half-baked to use in our scenario.
ICE (from ZeroC) - I would prefer ICE over CORBA anytime, after all
it promises to be a "better CORBA" and I think it does deliver on
that. However, their licensing policy is very suboptimal as they do not sell development licenses but only license per
installation. (Well that's what they told us last time we asked end of 2009.) Their licensing policy also suggests that any 3rd party possibly interested in interfacing with our modules would first have to negotiate a license contract with ZeroC too.
Open MPI - Message Passing interface seems to be targeted at
scenarios with lots of clients "heavily" distributed. Doesn't seem
to fit our problem.
Writing our own communication layer using TCP/UDP - Oh my. I'd
rather not :-)
Google Protocol Buffers - Is not an RPC technology.
Distributed Shared Memory - Well. This got thrown in by a few
devs and I for one am neither sure if there's a working
implementation nor if it fit's our problem.
So again the QUESTION - what "RPC"-like technology would you prefer
in this situation and why?
I can elaborate on Johnny's answer. CORBA provides a robust infrastructure with services that go far beyond simple RPC. As your distributed application grows, you can use CORBA features to manage the mapping between interface and implementation, to provide secure connections, etc. As an RPC, CORBA provides the means for easy synchronous or asynchronous invocations.
The learning curve isn't that steep either. While some of the terms are a little arcane, the concepts such as managed (counted) references should be familiar to today's C++ programmers. And when the C++0x mapping is available, it will be even easier. Training is available to help make this transition even easier.
You mentioned not knowing about realtime support. In fact, CORBA for C++ has rich RT support. There is a RT CORBA specification and several C++ ORBs that implement it. TAO, which is open source and commercially supported, has extensive RT support, including the RT_ORB, RT_POA, an TAO-Specific RT Event service. With these tools you are able to designate priority levels for threads in the ORB, and have separate communication channels for different priority levels.
I'd suggest taking a look at Thrift. While it looks half-baked, I believe it's only the documentation that's lacking - the implementation is quite solid.
CORBA should perform well and there are people with experience. We realize that the IDL to C++ mapping is hard to use, there is a RFP from the OMG asking for a new IDL to C++0x mapping, that should make it much easier to use
I am currently looking at some JProfiler traces from our WebSphere-based application, and am noticing that a significant amount of CPU time is being spent in the class com.ibm.io.async.AsyncLibrary.getCompletionData2.
I am guessing, but I am wondering whether this is PMI-related (and we do have this enabled).
My knowledge of PMI is limited, as this is managed by another team.
Is it expected that PMI can have this sort of impact?
(If so) Is the only option to turn it off completely? Or are there some types of data capture that have a particularly high overhead?
PMI has multiple levels that can be instrumented. The basic one should have minimal impact.
This particular class that you are referring to should not be related to PMI. I am only guessing here as these classes are not exposed publicly and they are used by the WAS runtime internally.
What version of WAS are you on? There were some known issues on WAS in this space.
For e.g refer to this link below:
[1]: http://www-01.ibm.com/support/docview.wss?uid=swg1PK41617 PK41617: PERFORMANCE OF THE AIO LIBRARY SUFFERS UNDER CERTAIN TYPES OF TRAFFIC FLOW
Please check if this is applicable to your environment.
Also try and skim through this link -
[1]: http://www-01.ibm.com/support/docview.wss?uid=swg21366862 Disabling AIO (Asynchronous Input/Output) in WebSphere Application Server
You can probably give this a try if this is not a production environment and see if disabling AIO eliminates the spikes for you. Even if it works well, i would go through the formal IBM support process to find out exact reasons before doing the same for a production enviroment.
HTH
Manglu