Hardware Acceleration for non-SSL based signing and encryption - windows

I am working on a project that does a large amount of hashing, signing, and both asymmetric and symmetric encryption. Since these steps have a significant effect on our performance and available load, I was wondering if there is a hardware based solution to offloading the work.
I have done some surfing to find out, and the only items I can find are dedicated to SSL based communications. I need a more generic solution that will allow me to speed up signing and encryption regardless of where it occurs.
Is it possible to adapt these SSL based solutions (maybe it's just marketing and it would be easy to re-use elsewhere)? Is there a good generic co-processor that can help out?
I need this on a Windows Server 2008 based box, but I would be interested in solutions on any platform.

If the algorithms you're working on are standard encryption algorithms like 3DES and AES, there is definitely hardware available. Hifn is the most well known, but Broadcom also has a line of chips from their BlueSteel acquisition a number of years ago. nCipher also has a line of encryption products, though when last I looked at them (years ago) they were much more focussed on their secure key management hardware than the acceleration of block algorithms.
Even cards designed for SSL may be useful to you, though you'll need to get access to the low level details. The biggest win for SSL hardware is an exponentiator and wide multipler unit, both of which are generally accessible independently in the hardware I know of. If you're using asymmetric encryption algorithms, these two units would likely be useful to you as well.
You should also check whether a more efficient software implementation is available. For example, Dan Bernstein and Peter Schwabe published a paper in September 2008 regarding optimization of AES for modern CPUs. The software implementation has been placed in the public domain (i.e. disavow all copyright, use it however you like).
Finally, future AMD (and probably Intel) CPUs will include SSE5, which adds instructions specifically useful for AES. If you can hold out until then, your next server upgrade may provide all the hardware support you need.

I'm not sure how helpful this will be, however I have seen a few papers dealing with using Graphics Hardware to accelerate encryption
Heres a quick google search
Good Luck.

Several companies make cryptography-specific hardware. For example, I recently coded support in an application for a nCipher hardware device which processed AES on the card (and supported many other encryption algorithms). They are not cheap, but they do support a variety of algorithms and modes of operation.

The most popular hardware crypto engine is VIA Padlock, included in C3, C7 and later processors. These are low-performance, low-power; but (supposedly) easily outperform a Core2 on crypto algorithms.
Linux kernel 2.6.16 and later include support for RNG, MD5, SHA1/256, SSL, GPG and other standard things. I'm not sure about ssh.
You mention non-SSL, so you might not benefit from existing code, but Via's site has the documentation needed to use it from userspace.

On windows you want to find a device that has an API that supports MSCAPI, CAPING or PKCS#11, The first and last are all very common, MSCAPI however does not support hardware AES/3DES.
nCipher (now Thales) Do several boxes or PCI/PCIe cards that support the above (and support openssl) and also support other platforms inlcuding Linux and Solaris, Safenet do similar hardware too with similar platform support.
If I were starting out I would pick PKCS#11, you then get a good choice of languages to write in including C or java.
If you want to write in C#/.Net then you can use MSCAPI from .Net or you can PInvoke into the PKCS#11 DLL for your hardware.

Related

How to specify the physical CoreIDs used for "CLOSE" when specifying OMP_PROC_BIND?

We are trying to optimize HPC applications using OpenMP on a new hardware platform. These applications need precise placement/pinning of their cores or performance falls in half. Currently, we provide the user a custom GOMP_CPU_AFFINITY map for each platform, but this is cumbersome, because it's different on each hardware version, and even platforms with different firmware versions sometimes change their CoreID physical mappings - all things impossible for the user to detect on the fly.
It would be a great help if HPC applications could simply set GOMP_PROC_BIND to "close" and OpenMP would do the right thing for the given platform - but to make this possible, the hardware vendor would need to define what "close" means for each machine. We'd like to do this, but we can't tell how/where OpenMP gets CoreID lists to use for things like close, spread, etc. (For various external requirements, the CoreID spatial pattern on this machine would appear utterly random to a software writer.)
Any advice as to where/how OpenMP defines the CoreID lists for OMP_PROC_BIND so we could configure them? We are comfortable with the idea that we might need a custom version of OpenMP (with altered source code) for this platform if needed.
Thanks, everyone. :)
Jeff
Expanding on what #VictorEijkhout said...
You seem have invented an envirable that I can't find anywhere with Google (GOMP_PROC_BIND), with the OpenMP standard envirable (OMP_PROC_BIND). If GOMP_PROC_BIND exists the name suggests that it is a GNU feature. Note too that one of the two Google hits for GOMP_PROC_BIND says "Code that reads the setting is buggy. Setting is invalid and ignored at runtime." So, if you are setting that it is unsurprising that it has no effect!
I will therefore answer for the more general case of OMP_PROC_BIND.
The binding of OpenMP threads to logicalCPUs clearly has to be done at runtime, since, beyond its ISA, the compiler has no knowledge of the hardware on which the compiled code will run. Therefore you need to be looking at the runtime library code.
I have not looked at GNU's libgomp, but, where it can, LLVM's libomp uses the hwloc library to explore the machine hardware. Since hwloc also includes other useful tools for machine exploration (such as lstopo) it is likely that your effort is best invested in ensuring good hwloc support on your machine, at which point there will be no need to delve inside the OpenMP runtime.

MiniFilter Driver - The right implementation and the Microsoft signature

I am working on malware analysis. I use a mini-filter driver to intercept file system access. Then I apply algorithms to detect malicious activity.
My questions:
It know that the driver will need a signature by Microsoft for a public release.
https://learn.microsoft.com/fr-fr/windows-hardware/drivers/dashboard/attestation-signing-a-kernel-driver-for-public-release
Is it authorized to implement the algorithms as AI in kernel space, or I must implement them in userspace? What is recommended concerning Microsoft, the right architecture, and security?
If you can implement a Windows kernel driver, you can do whatever you want. Not only algorithms, we ever ported OpenSSL, SQLite and other open source projects (of course in C and C++) to our Windows kernel drivers. It's not something mission impossible. Just you need to know how and limitations with work-around.
The idea of driver signing from MS is to avoid rogue driver developers to run malware in kernel. This was the biggest issue for 32 bit Windows for very long time, since in kernel you are not only able to implement something but you can also abuse anything, including kernel variable, file system data, registry and you can even hook to any code you want (if system protection is not running). However, such certificate is also not perfect. Years ago hackers stole certificates from companies (RealTek, if I recall it correctly) and signed their malware drivers.

What platforms do Google Protocol Buffers support?

Google state that:
Protocol buffers are Google's language-neutral, platform-neutral, extensible mechanism for serializing structured data
I searched for an explicit list of platforms and/or operating systems officially supported by Protocol Buffers but I couldn't find it. Ironically the closest thing I found was the following information in the Wikipedia page:
Operating system: Any
Platforms: Cross-platform
Is it safe to say that Protocol Buffers support any platform/OS?
Operating system is going to be any mainstream OS. If you're running something esoteric, you might get the same problems that you get with anything else.
Platform is similar; google offer support for a range of platforms, and a much wider list is provided by community owned projects. A list is here: https://github.com/google/protobuf/blob/master/docs/third_party.md
Ultimately, the wire specification is documented and doesn't depend on OS or platform, so worst case if you're using a custom language on a custom OS, then you could still implement your own decoder as long as that language has some mechanism to talk arbitrary binary data or can interop to one of the other prebuilt libraries.

Possible to use OpenCL on multi-computers?

As far as I know, the answer is no. OpenCL is designed for multi-cores system.
But, is there any way to use OpenCL on multi-computers ( each computer is a multi-cores system ) ? If not, are any additional tools, frameworks... required?
I read some articles about Distributed computing, Cluster computing, Grid computing... but I can't find a satisfied answer
Any ideas will be appreciated
Thank you :)
There are two frameworks for this purpose: VirtualCL and CLara. Both packages let you work transparently with remote machines as local devices. Unfortunately, VirtualCL is only available as pre-compiled binaries without sources and CLara is not actively developed anymore.
SnuCL uses MPI and OpenCL to transparently use the cluster through the OpenCL API. It also adds a few OpenCL extensions to effectively deal with the memory objects.
It is open source. See http://aces.snu.ac.kr/Center_for_Manycore_Programming/SnuCL.html
and http://tbex.twbbs.org/~tbex/pad/SunCL.pdf
There is one more solution not mentioned above: dOpenCL.
"dOpenCL (distributed OpenCL) is a novel, uniform approach to programming distributed heterogeneous systems with accelerators. It transparently integrates the nodes of a distributed system into a single OpenCL platform. Thus, dOpenCL allows the user to run unmodified existing OpenCL applications in a heterogeneous distributed environment. Besides, it extends the OpenCL programming model to deal with individual nodes of the distributed system."
I have used VirtualCL to form a GPU cluster with 3 AMD GPU as compute node and my ubuntu intel desktop running as broker node. I was able to start both the broker and compute nodes.
In addition to the various options already mentioned by other posters, here are two more open source projects that you may be interested in:
ocland (in beta stage): offers a server application and an ICD implementation that the clients can use to take advantage of local and remote devices that support OpenCL in a transparent fashion. The license is GPLv3.
COPRTHR SDK by Brown Deer Technnology (currently version 1.6): this SDK which offers an open source (GPLv3) OpenCL implementation for x86_64, ARM, Epiphany and Intel MIC includes a "Compute Layer Remote Procedure Call" implementation. This consists of a client-side OpenCL implementation that supports rpc (libclrpc) and a server application (clrpcd). The website doesn't mention much about it but the documentation contains a section about this CLRPC implementation.

What modbus library should I use for modbus protocol for GCC

We are building a product, which requires modbus communication (both rs-485 and TCP/IP). The code has to run on an embedded device which has Linux running on it. We have following criteria for the selecting the library that we would be using.
It has to be opensource, since we are opensource geeks.
We would give this product to our users and what their application would be we are not aware, hence it has to complete implementation of the modbus protocol.
Wide user base: What we believe is that greater the users of the code, more the stability of the code.
I came across two such libraries:
http://www.freemodbus.org
and
libmodbus
Are there any more modbus libraries. Please suggest with pros and cons
I'd suggest libmodbus, it works well and is cross platform.
http://www.libmodbus.org
I am just starting to explore these options as well. My priority is on ease of use which has led me to RModBus since it was the only one that I was able to get immediate results with. However, there is also a Python library, Pymodbus, that appears to be quite complete in implementation.
I'm sorry, I just figured out that GCC is a compiler; my answer is way off topic.
Again, I was looking for a scripting language that my noob self could be more comfortable in. It really came down to a question of language rather than the library itself. Oh, I am only using the TCP/IP stack at this time, which somewhat simplifies it as well.

Resources