Saxon XSLT transform delivered as an Amazon AWS Lambda function - aws-lambda

Would it be technically possible to build a general purpose XSLT transform service (using the Saxon XSLT engine) delivered as an Amazon AWS Lambda function? How would you go about implementing it? Would there be a way to avoid initialising the Java VM each time the lambda function was called?
This is more of a brain-storming question. I am unlikely to attempt to implement it.
How would licensing work? There is no way for the developer to know on how many machines Saxon XSLT is installed. Probably, that is something that has to be negotiated with the vendor?

I can't see any intrinsic reason why it shouldn't work, but I have no idea about the implementation details.
Since Amazon support Java as the implementation language, one assumes they have a mechanism to avoid JVM initialization costs.
There's a distinction between having a Lambda that supports one particular defined transformation, and having one that executes an arbitrary user-defined stylesheet. I'm not sure that providing a service to execute untrusted code is ever a particularly good idea even if it's heavily sandboxed in terms of resource access.
As regards licensing, our general approach in Saxonica is that we try to ensure that licensing doesn't get in the way of doing something that makes technical sense. If there's value in doing it, we'll find a way of sharing the value that works for all parties.
If this is about executing one predefined stylesheet, as a spin-off from the Saxon-JS development we already have mechanisms that allow a developer to acquire a license that can be redistributed with the compiled stylesheet, meaning essentially that if you acquire the right kind of development license, the run-time is free.

Related

Fuchsia: how to use a built-in capability in a component

I'm trying to learn and use Fuchsia for fun, and a pretty basic concept is keeping me from progressing.
I thought that, as a learning experience, I could write a simple HTTP client that prints the content of some random URL to the log. Really nothing fancy.
As I understand, using the network (in my case I'd like to utilize fuchsia.net.http.Loader) is a capability, which has to be granted to a running component. Makes sense, that's pretty much the core of the OS.
I also understand that the initiating component, the one that runs my component, needs to grant this capability to my component. That's fair.
What I don't understand, and I'd very much appreciate any additional information (pretty please!) is how I can grant this to my component?
Specifically all demos and examples I saw had a custom client & server under a realm, which talked to each other. That's a good practice, but it doesn't bring in any capability that's built in.
What am I missing? Thanks in advance!
I'm trying to learn and use Fuchsia for fun, and a pretty basic concept is keeping me from progressing.
Thanks for your interest in Fuchsia! First of all, if you haven't already gone through Fuchsia Fundamentals I would strongly suggest that as a starting point for many of the foundational concepts.
Specifically all demos and examples I saw had a custom client & server under a realm, which talked to each other. That's a good practice, but it doesn't bring in any capability that's built in.
This is primarily because there's isn't necessarily a concept of any set of components or capabilities being "built in" to the system. The capabilities available to components in the system are entirely dependent on the rest of the components in a particular product build and how they are organized (this is called the component topology).
I thought that, as a learning experience, I could write a simple HTTP client that prints the content of some random URL to the log. Really nothing fancy.
The answer has a few sharp edges to it at the moment, as Fuchsia is a rapidly evolving open source project. Hopefully some of the details below will help you move forward.
Determine the capability routes
So you'll have to do a bit of work to figure out where the capability you need is provided and routed. In fact, one of the components exercises shows you how to do this for the fuchsia.net.http.Loader capability. Knowing where a capability is offered/used allows you to determine where your component would need to be instantiated to obtain the necessary capability.
You might also find some of the content in the Connect components developer guide useful in accessing the capability.
Run the component
Knowing where a capability is routed allows you to determine how to run your component. The most straightforward way of instantiating a component in the topology is to do so dynamically using ffx component. However, this requires a collection somewhere on the system with the capabilities you need. The ffx-laboratory realm where most examples are run has a very limited set of capabilities that does not include fuchsia.net.http.Loader.
You'll likely need to add your component statically to the topology using a core realm shard so that the necessary routes can be declared explicitly between the components that offer fuchsia.net.http.Loader and your component. With the component included statically in your product build, you can execute it using ffx component commands.
For more details on component execution, check out the Run components developer guide as well.
Run a CLI binary
Since this is a learning exercise, another option is to build your code as a binary that runs within the context of a component that already has the capabilities you need vs. creating and running an entirely new component. This is commonly used for CLI tools. With the ffx component explore command you can run your code as a binary inside the existing component that provides the HTTP capability you are looking for using the --tools argument, without the need to work through all the capability routing pieces described above.
For more details on ffx component explore, see Explore components.

Supporting multiple versions of Kuberentes APIs in Go program

Kubernetes has a rapidly evolving API and I am trying to find best practices, recommendations, or really any kind of guidance about how to write Go software that gracefully handles supporting its evolving API and supports multiple versions simultaneously. I am sure I am not the first person to attempt this, but so far I have not found any guidance about Kubernetes specifically, and what I have read about polymorphism in Go has not inspired a great solution yet.
Kubernetes is written in Go and provides Go packages like k8s.io/api/extensions/v1beta1 and k8s.io/api/networking/v1beta1. Kubernetes resources, for example Ingress, are first released in one API group (extensions) and as they become more mature, get moved to another API group (networking) and can also change versions (e.g. go from v1beta1 to plain v1). Kubernetes also provides k8s.io/client-go for interacting with a Kubernetes cluster.
I am an experienced object-oriented (and other types of) programmer, but fairly new to Go and completely new to the Kubernetes packages. What I want to accomplish is a program architecture that allows me to write code once and have it work on any version of the Kubernetes resource, at least as long as the resource contains all the features I care about. In a typical object-oriented environment, I would create a base Ingress class and have all these various versions derive from it, and package up operations so that I could just work on Ingress everywhere. My sense is that Go intends for people to take a different approach, and in any case there are complications because of the client/server aspect.
Client/server and APIs
My Go program is a client of the Kubernetes server. Various version of the server will support various version of the Kubernetes API, and therefor various versions of the Ingress resource. So my first problem is that I have to do something like this to get a list of all the Ingresses:
ingressesExt, err := il.kubeClient.ExtensionsV1beta1().Ingresses(namespace).List(metav1.ListOptions{})
ingressesNet, err := il.kubeClient.NetworkingV1beta1().Ingresses(namespace).List(metav1.ListOptions{})
I have to gracefully handle errors about the API not being supported. Because the return types are different, AFAIK there is no unified interface where I can just make one call and get the results in a single list. It seems like this is the sort of thing someone should have solved and provided a solution for, but so far I have not found anything.
Type conversion
I also have to find some way to merge ingressesExt and ingressesNet into a single usable list, with an eye toward maintainability/extensibility now that Ingress has graduated to NetworkingV1.
Kubernetes utilities
I see that Kubernetes provides a lot of auto-generated code and utilities, but I have not found a lot of documentation about how to use them. For example, Ingress has functions like
DeepCopy
Marshal
XXX_DiscardUnknown
XXX_Merge
XXX_Unmarshal
Maybe I can use these to do the type conversion? Combine marshal, unmarshall, discard, and merge somehow to take the data from on version and import it into another?
Questions
Hopefully you see the issue and understand what I am trying to achieve.
Are there packages from Kubernetes or other open source authors that make some progress in unifying the APIs like I need?
Are any of the Kubernetes auto-generated functions meant for general use (as opposed to internal use) and helpful to my challenge? I have not found documentation for any but DeepCopy.
What is the "Go way" of abstracting out the differences between the various versions of the Ingress object such that I can write the rest of the code to work on any version? Keep in mind that I may need to make another API call for further processing, in which case I would need to know the concrete type of the object and select the right API call. It is not obvious to me that client-go provides any support for such auto-selection of API calls.

Schema deployment management for Athena

In order to apply devops principles to data (ugh, dataops!), things like continuous deployment need to be considered.
Hence why tools like dbDeploy exist. However dbDeploy seems to have been orphaned and is not maintained any more. In the past i've used this tool again and again, but I don't see much support for it, and I'm not sure why?
So i'm wondering just what do people use to manage and version their schemas. In particular i'm looking for something that will work with Athena (But this has a jdbc driver, so in theory any jdbc compliant tool)
I know one answer may be to switch mindset, and use the AWS Glue crawlers instead. But do people actually do that? Or are the crawlers more for POC/Quick start situations? I'm pretty sure you'll always want to override decisions the crawler makes, so how can that be handled?

Simplest C++ library that supports distributed messaging - Observer Pattern

I need to do something relatively simple, and I don't really want to install a MOM like RabittMQ etc.
There are several programs that "register" with a central
"service" server through TCP. The only function of the server is to
call back all the registered clients when they all in turn say
"DONE". So it is a kind of "join" (edit: Barrier) for distributed client processes.
When all clients say "DONE" (they can be done at totally different times), the central server messages
them all saying "ALL-COMPLETE". The clients "block" until asynchronously called back.
So this is a kind of distributed asynchronous Observer Pattern. The server has to keep track of where the clients are somehow. It is ok for the client to pass its IP address to the server etc. It is constructable with things like Boost::Signal, BOOST::Asio, BOOST::Dataflow etc, but I don't want to reinvent the wheel if something simple already exists. I got very close with ZeroMQ, but non of their patterns support this use-case very well, AFAIK.
Is there a very simple system that does this? Notice that the server can be written in any language. I just need C++ bindings for the clients.
After much searching, I used this library
https://github.com/actor-framework
It turns out that doing this with this framework is relatively straightforward. The only real "impediment" to using it is that the library seems to have gotten an API transition recently and the documentation .pdf file has not completely caught up with the source. No biggie since the example programs and the source (.hpp) files get you over this hump. However, they need to bring the docs in sync with the source. In addition, IMO they need to provide more interesting examples on how to use c++ Actors for extreme performance. For my case it is not needed, but the idea of actors (shared nothing) in this use-case is one of the reasons people use it instead shared memory communication when using threads.
Also, getting used to the syntax that the library enforces (get used to lambdas!) if one is not used to state of the art c++11 programs it can be a bit of a mind-twister at first. Then, the triviality of remembering all the clients that registered with the server was the only other caveat.
STRONGLY RECOMMENDED.

open source gossip-based membership protocol?

I am looking for a library which I can plug into a distributed application which implements any gossip-based membership protocol.
Such a library would allow me to send/receive membership lists, merge received membership lists, etc... Even better would be if the library implemented a protocol with performance O(logn) performance guarantees.
Does anyone know of any open source library like this? It doesn't need to meet all of the aforementioned requirements; even something partially implemented would be helpful.
Take a look at this on google code:
http://code.google.com/p/gossip-protocol-java/
I happen to stumble upon it yesterday whilst looking for java based gossip implementation. It's more a reference implementation for someone to build upon, but it gives the general idea, and after reading through the code you'll definitely be able to build your own or branch what's there to add any features you need.
HTH
Have you looked at Apache Zookeeper? I'm not sure if it's what you're looking for.
ZooKeeper is a high-performance
coordination service for distributed
applications. It exposes common
services - such as naming,
configuration management,
synchronization, and group services -
in a simple interface so you don't
have to write them from scratch. You
can use it off-the-shelf to implement
consensus, group management, leader
election, and presence protocols.
C# bindings are also available.

Resources