probably there are a lot of people who will smile reading this question...
Here's my problem.
I have a Spring 3 web application acting both as a client and server. It gets some XML data from a client "C", it processes them, and it sends them to a server "S".
The input XML fron C must be validated against a schema (e.g. "c.xsd") while the output XML to S must be validated against a different one (e.g. "s.xsd").
I'm using jaxb2 for marshalling and unmarshalling.
In the documentation I read that it is possible to set the "schema" attribute for the [un]/marshaller.
Therefore, I need to have an a.xsd for the validation when I get an input and a b.xsd when I produce an output... the question is the following:
when I switch the validation schema from c,xsd to s.xsd (producing an output after processing a request from C), do I change the status of the server? In other words, If I am receiving a second request form a client C2 when I'm processing the first request from C, will I attempt to validate C2 input against s.xsd? Will the application automatically put the C2 request on a different thread? If not, how can I configure spring to do so?
Thanks alot!
I'll take a stab at it:
The input XML fron C must be validated
against a schema (e.g. "c.xsd")
You can do this by setting a schema (c.xsd) on the Unmarshaller.
while the output XML to S must be
validated against a different one
(e.g. "s.xsd").
You can do this by setting a schema (s.xsd) on the Marshaller.
when I switch the validation schema
from c,xsd to s.xsd (producing an
output after processing a request from
C), do I change the status of the
server?
No, because the Unmarshaller is always using c.xsd and the Marshaller is always using s.xsd.
Since Marshaller and Unmarshaller are not thread safe you must be sure not to share them among threads.
Related
I want to write some Rest(ful) application with Spring Boot and Spring Data JPA.
Let's assume that for business reasons I have a database with the following tables:
customer(id number, first_name text, last_name text, type text);
customer_type(type text, description text);
where:
id is generated by the database at inserion time
type column in customer table is a foreign key to type column in customer_type table and it is immutable from a microservice point of view, just a lookup table.
Assuming I want to create APIs for CRUD operations on a customer but want to minimize api calls when just reading, I suppose I need the following operations:
GET /customer/{id}
POST /customer
PUT /customer/{id}
DELETE /customer/{id}
How the body should be structured?
For GET operation the response should be
{
"id":123,
"firstName":"John",
"lastName":"Doe",
"customerType":{
"type":"P",
"description":"Premium Customer"
}
}
But for POST I imagine I need to avoid sending the id and send just the customer type since the description is immutable and the client needs the description only for visualizing the information on screen, but this leads to different request body from the one returned in the GET operation.
For the PUT operation is the same but also should the id field be sent? How to handle the case where the id in the API path is different from the id in the request body if sent?
DELETE should not be a problem since it just deletes the row in customer table.
Thank you
How the body should be structured?
Let's make a step back first and let us discuss quickly what you basically try when following a REST architecture and why and how REST installs those mechanisms.
REST is an architectural style that helps in decoupling clients from servers by introducing indirection mechanisms which may seem odd at first but in the end allow you to achieve the required level of decoupling which allows clients to introduce changes which clients will naturally adept to. Such indirection mechanisms include attaching URIs to link-relation names, using form-based representation formats to tell a client how to create requests, content-type negotiation to return representations supported and understood by others and so forth. If you don't need such properties, i.e. as client and servers always go hand in hand in regards to changes and communicate on predefined messages, REST is probably not the best style to follow. If you though have a server that is contacted by various clients not under your control or a client that has to contact various servers, also not under your direct control, this is where REST truly starts to shine if all parties adhere to these concepts.
One of RESTs premise is that a server will teach clients everything they need to know in order to construct requests. If you look at the Web, where HTML is basically used everywhere, you might see that HTML defines HTML forms which basically allow a server to explain to a client what properties of a resource the server expects as input. On top of that the form also tells you client which HTTP operation to use, which target URI to send the request to and which media-type to represent the state in. In HTML this is usually implicitly given as application/x-www-form-urlencoded which chains properties together i.e. like this:
firstName=Roman&lastName=Vottner&role=Dev
or the like. This is in essence what HATEOAS or hypertext as the engine of application state is all about. You use in-build controls of the media-type exchanged to allow your client to progress its task instead of having to consult external documentation to lookup the "API" of some services. I.e. a form could state that an input only allows numeric values, that a sub-portion of the form represents a date/time picker widget which a client could render to a user accordingly, or an element represents a slider with a given range of admissible values and the like.
How the actual representation format you have to send to the server has to look like depends on the instructed media-type. I.e. HAL forms uses application/json by default and also specifies that application/x-www-form-urlencoded needs to be supported. Other media-types have explicitly negotiated between client and server. Ion states that application/json or application/ion+json have to be negotiated via the Content-Type request header.
In plain application/json the url-encoded payload from above could simply be expressed as:
{
"firstName": "Roman",
"lastName": "Vottner",
"role": "Dev"
}
and this is OK as the server basically instructed you to send this data in that format.
There are further media-types available that are worth a closer look whether they could fit your need or not. I.e. Hydra has a bit of a different take on this matter by connecting Linked Data to REST and its affordances called operations and allows to describe resources and its properties through LD classes. So the presence of an affordance for a certain resource tells you what you can do with that resource, like i.e. updating its state, and therefore also which class it belongs to and therefore which properties it has.
This just should illustrate how a negotiated media type finally decides how the actual representation needs to look like that has to be sent to the server.
In regards of whether to put in resource identifiers in the payload or not it depends. Usually resources are identified by the URI/IRI and this, as a whole, is the identifier of the resource. In your application though you will reference related domain objects through their ID which does not necessarily need to be, and probably also should not be, part of the IRI itself. I.e. let's assume we retrieve a resource that represents an order. That order contains the users name and address, the various items that got ordered including some meta data describing those items and what not. It usually makes sense in such a case to add the orderId which you use in your application even though the URI may contain that information already. Users of that API are usually not interested in those URIs but the actual content and might also never see those URIs if they are hidden behind automated processes or user interfaces. If a user now wants to print out that order s/he has all the information needed to file complaints later on via phone i.e. In other cases, i.e. if you design a resource to be an all-purpose clipboard like, copy&paste location, an ID does not make any sense unless you grant the user to explicitly reference one of that states directly.
The reason why IDs should not be part of the URI itself stems from the fact that a URI shouldn't change if the actual resource does not change. I.e. we have a customer who went through a merge a couple of years ago. They used to expose all their products via own URIs that exposed the productId as part of the URI. During the merger the tried to combine the various different data models to reduce the number of systems they had to operate while serving each of their customers with the same data as before as the underlying products didn't change. As they tried to stay "backwards" compatible for the purpose of supporting legacy systems of their customers, they quickly noticed that exposing those productIds as part of the URI was causing them some troubles. If they had used a mapping table of i.e. exposed UUIDs to internal productIds (again an introduction of indirection) earlier they could have reduced their whole data model and thus complexity by a lot while being able to change the mapping from internal prodcutId to UUID on the fly while allowing their clients to lookup the product information.
Long story short, as hopefully can be seen the structure of a representation depends on the exchanged media type. There are loads of different media-types available. Use the ones that allow you to describe resources to clients, such as HAL/HAL forms, Ion, Hydra, .... In regards to URIs, don't overengineer URIs. They are, as a whole, just a pointer to a resource and clients are usually interested in the content, not the URI! As such, make use of indirection-features like link-relation names, content-type negotiation and so forth to help remove the direct coupling of clients to services but instead rely more on the document type exchanged. The media-type here becomes basically the contract of the message. Through mappings on the client and server side resources of various representations can be "translated" to an object which you can use in your application.
As you've tagged your question with spring-boot and spring-data-jpa, you might want to look into spring-hateoas. It supports HAL out of the box, HAL forms can be used via affordances though the media-type needs to be enabled explicitly for it otherwise you might miss out on the form-template in the responses. Hydra support in spring-hateoas seems to be added through hydra-java which implements the Spring HATEOAS SPI. While Amazon provides implementation for Ion for various programming languages, including Java, it does not yet support Spring HATEOAS or Spring in general. Here a custom SPI implementation may be necessary.
For PUT operations you need to send the id of the entity that you want to update.
If you want to generate the same response as you would get in GET, then you need to write a DTO and map details accordingly.
Currently my team and me are working in a project where we develop a distributed Kafka processing application. And we want to know how to conveniently pass Kafka Headers through these components.
What does this mean? Please have a look at the following sample image:
Our Component A receives events and generates based on the content a unique Header with metadata information about the event. After processing the result is passed into Topic A with the generated Header.
Component B and Component C receive these events, start processing and write their results into Topic B and Topic C. These components don't use the generated Headers from Component A.
But Component D needs it. So Component B and Component C must receive the header and pass it through.
Our system in the project is a little bit bigger than this example and that's why we questioned ourselves: What is the best way to pass Kafka Headers through these components. Is there an automatic way?
We considered following approaches:
Don't use Kafka Headers at all and pass the metadata information though the body.
Usage of interceptors (if that is even possible)
FYI we use Spring Boot.
Thanks in advance,
Michael
In IBM MQ, I have a requirement where I can get many types of xml from the queue. The xml messages will be conformed to already specified xsd (there are say, 5 xsd - which means I can get 5 different xml). When I get the message from queue, I would like to know the type of xml (if its xsd1 or xsd2 or so on)
The reason why I would want to know is, I am using a JaxB interface with SAX implementation, for which I need to give the java object corresponding to the xml as parameter. So I have to know which xsd the input and is and assign the parameter correspondingly.
The options I have is to set a property in the header to the message, but the party who is dropping the message into MQ is not ready.
What other options do I have? Can I get the file name (of xml) from the mq and find the xsd based on the name of the file? Or do I have to do I sax parsing and identify the root tag and derive the xsd type? Any other better option anybody has in mind?
Think of MQ like the Post Office. When you get a letter, the post office doesn't mess with anything on the inside (the payload) and if it changes the outside, it only changes routing information. If you want to sort incoming mail to different recipients, whoever is sending it has to put the data against which the sort criteria operate on the outside of the envelope. If that doesn't work, you must open the envelope and look for the recipient name, department, or whatever on the papers inside.
Your MQ message is that envelope. The sort criteria can be different queue names, a property of the message, a property of the message header, or something in the payload. But unless the sender explicitly sets the destination queue name based on the selection criteria, or sets the message or header property, your only option is to inspect the payload and figure it out.
If you have to inspect the payload, this is a perfect scenario for IBM Integration Broker. But you can also write an application to perform this function. Very often this is performed by a Dispatch app which gets the message, figures out where it goes, then puts it onto another queue and COMMITs the GET and PUT operations. But if the dispatch app must parse the XML to determine the correct queue, the message has to be parsed twice - once by the dispatcher, once by the receiving app.
I think you can do:
Does the incoming message has the file name at the beginning of the message body? In that case, after receiving the message your application can read first few bytes to get the file name. Based on the file name, application can use appropriate Xsd and pass the entire message body.
Our server A notifies 3rd party server B with an XML-formatted message, sent as HTTP POST request. It's us who specify the message format and other aspects of interaction.
We can specify that the XML is sent as
a) raw data (just the XML)
b) single POST parameter having some specific name (say, xml=XML)
The question is which way is better for the 3rd party in general, if we don't know the platform and language they are using.
I thought I had seen some problems in certain languages to easily parse the nameless raw data, though I don't remember any specific case. While my colleague insists that the parameter name is redundant, and it's really better to send the raw data without any name.
If you don't need send extra information in other post parameters the xml parameter name is redundant and innecesary as your teammate said, if the 3rd party waits only for a XML data only send the raw data in the POST body with the correct mime type and encoding and and do not complicate.
The process for Getting raw data is easy in most application server containers, so you dont care about that, most of them uses a Reader to get received data and manipulate it.
I have a client-server application where the server transmits serialized objects in protobuf format to a client, and I would like to retire a required field. Unfortunately I can't change both client and server at the same time to use the new .proto definition.
If I change a required field to be optional, but only for code that serializes messages (i.e. deserializing code has not been rebuilt and still thinks it's a required field), can I continue to publish messages that can be deserialized as long as I populate a value for the now-optional field?
(It appears to be fine to do so, at least for a few trivial cases I experimented with (only using Java), but I'm interested if it's a generally sensible approach, and whether there are any edge cases etc I should worry about).
Motivation: My goal is to retire a required field in a client-server application where the server publishes messages that are deserialized by the client. My intended approach is:
Change required field to optional on the trunk.
If it's necessary to deploy new server code (for unrelated features/fixes), ensure that the optional field continues to be populated in the message.
Gradually deploy updated code for all clients (this will take time as it requires involvement of other teams with their own release schedules)
Confirm that all clients have been updated.
Begin to retire (i.e. not populate) the optional field.
According to the encoding format documentation, whether a field is required or not is not encoded in serialized byte stream itself. That is, optional or required makes no difference in the encoded serialized message.
I've confirmed this in practice, using the Java generated code, by writing serialized messages to disk and comparing the output - using a message containing all of the supported primitive types as well as fields representing other types.
As long as the field is set, using the parseFrom(byte[]) method to deserialize will still work, because the byte[] will be the same.
However, one would wonder why you would change the field from required to optional until you are ready to allow it to be optional? Basically you are just making it "optional" in the .proto file, but you are enforcing that it is required by always populating it. Just a thought.