Advantages of using timeSeries over container resource - onem2m

The timeSeries resource represents a container for data instances and timeSeriesInstance resource represents a data instance in the resource.
The main difference from container and contentInstance is to keep the time information with data and to be able to detect the missing data.
Is there any other advantage which can be achieved using timeSeries and timeSeriesInstance resource instead of container and contentInstance resources?
Does it also help in saving data redundancy e.g. if my one application instance is sending data every 30 seconds so in a day 24*120 contentInstance will be created.
If timeSeries and timeSeriesInstance resources are being used then will the same number of timeSeriesInstance be created in a day (i.e. 24*120) for the above case?
Also, is there any specific purpose for keeping contentInfo attribute in timeSeries instead of timeSeriesInstance (like we have contentInfo in contentInstance resource)

There are a couple of differences between the <container> and <timeSeries> resource types.
A <container> resource may contain an arbitrary number of <contentInstance> resources as well as <flexContainer> and (sub) <container> resources as child resources. The advantage of this is that a <container> can be further structured to represent more complex data types.
This is also the reason why the contentInfo attribute cannot be part of the <container> resource, because the type of the content can just be mixed, or the <container> resource my not have direct <contentInstance> resources at all.
A <timeSeries> resource can only have <timeSeriesInstance> resources as a child resource (except from <subscription>, <oldest>, <latest> etc). It is assumed that all the child <timeSeriesInstance> resources are of the same type, therefore the contentInfo is located in the <timeSeries> resource.
<timeSeriesInstance> resources may also have a sequenceNr attribute which allows the CSE to check for missing or out-of-sequence data. See, for example, the missingDataDetect attribute in the <timeSeries> resource.
For your application (sending and storing data every 30 seconds): It depends on the requirements. Is it important that measurements are transmitted all the time, or when it is important to know when data is missing? Then use <timeSeries> and <timeSeriesInstances>. If your application just sends data when the measurement changes and it is only important to retrieve the latest value, then use <container> and <contentInstance>.

Two uses cases for <timeSeries> that seem better to me than using a <container>.
The first use case involves the dataGenerationTime attribute. This allows a sensor to specifically record the time that a sensor value was captured, whereas with a <contentInstance> you have the creation time (you could put the capture time into the content attribute, but then that requires additional processing to extract from the content). If you use the creationtime attribute of the <contentInstance> there will be variations in the time based on when the CSE receives the primitive. When using the <timeSeriesInstance> the variations go away because the CREATE request includes the dataGenerationTime attribute. That makes the data more accurate.
The second use case involves the missingDataDetect attribute. In short, using this, along with the expected periodicInterval you can implement a "heartbeat" type functionality for your sensor. If the sensor does not send a measurement indicating that the door is closed/open every 30 seconds, a notification can be sent indicating that the sensor is malfunctioning or tampered with.

Related

Springboot Rest(ful) Api and lookup values

I want to write some Rest(ful) application with Spring Boot and Spring Data JPA.
Let's assume that for business reasons I have a database with the following tables:
customer(id number, first_name text, last_name text, type text);
customer_type(type text, description text);
where:
id is generated by the database at inserion time
type column in customer table is a foreign key to type column in customer_type table and it is immutable from a microservice point of view, just a lookup table.
Assuming I want to create APIs for CRUD operations on a customer but want to minimize api calls when just reading, I suppose I need the following operations:
GET /customer/{id}
POST /customer
PUT /customer/{id}
DELETE /customer/{id}
How the body should be structured?
For GET operation the response should be
{
"id":123,
"firstName":"John",
"lastName":"Doe",
"customerType":{
"type":"P",
"description":"Premium Customer"
}
}
But for POST I imagine I need to avoid sending the id and send just the customer type since the description is immutable and the client needs the description only for visualizing the information on screen, but this leads to different request body from the one returned in the GET operation.
For the PUT operation is the same but also should the id field be sent? How to handle the case where the id in the API path is different from the id in the request body if sent?
DELETE should not be a problem since it just deletes the row in customer table.
Thank you
How the body should be structured?
Let's make a step back first and let us discuss quickly what you basically try when following a REST architecture and why and how REST installs those mechanisms.
REST is an architectural style that helps in decoupling clients from servers by introducing indirection mechanisms which may seem odd at first but in the end allow you to achieve the required level of decoupling which allows clients to introduce changes which clients will naturally adept to. Such indirection mechanisms include attaching URIs to link-relation names, using form-based representation formats to tell a client how to create requests, content-type negotiation to return representations supported and understood by others and so forth. If you don't need such properties, i.e. as client and servers always go hand in hand in regards to changes and communicate on predefined messages, REST is probably not the best style to follow. If you though have a server that is contacted by various clients not under your control or a client that has to contact various servers, also not under your direct control, this is where REST truly starts to shine if all parties adhere to these concepts.
One of RESTs premise is that a server will teach clients everything they need to know in order to construct requests. If you look at the Web, where HTML is basically used everywhere, you might see that HTML defines HTML forms which basically allow a server to explain to a client what properties of a resource the server expects as input. On top of that the form also tells you client which HTTP operation to use, which target URI to send the request to and which media-type to represent the state in. In HTML this is usually implicitly given as application/x-www-form-urlencoded which chains properties together i.e. like this:
firstName=Roman&lastName=Vottner&role=Dev
or the like. This is in essence what HATEOAS or hypertext as the engine of application state is all about. You use in-build controls of the media-type exchanged to allow your client to progress its task instead of having to consult external documentation to lookup the "API" of some services. I.e. a form could state that an input only allows numeric values, that a sub-portion of the form represents a date/time picker widget which a client could render to a user accordingly, or an element represents a slider with a given range of admissible values and the like.
How the actual representation format you have to send to the server has to look like depends on the instructed media-type. I.e. HAL forms uses application/json by default and also specifies that application/x-www-form-urlencoded needs to be supported. Other media-types have explicitly negotiated between client and server. Ion states that application/json or application/ion+json have to be negotiated via the Content-Type request header.
In plain application/json the url-encoded payload from above could simply be expressed as:
{
"firstName": "Roman",
"lastName": "Vottner",
"role": "Dev"
}
and this is OK as the server basically instructed you to send this data in that format.
There are further media-types available that are worth a closer look whether they could fit your need or not. I.e. Hydra has a bit of a different take on this matter by connecting Linked Data to REST and its affordances called operations and allows to describe resources and its properties through LD classes. So the presence of an affordance for a certain resource tells you what you can do with that resource, like i.e. updating its state, and therefore also which class it belongs to and therefore which properties it has.
This just should illustrate how a negotiated media type finally decides how the actual representation needs to look like that has to be sent to the server.
In regards of whether to put in resource identifiers in the payload or not it depends. Usually resources are identified by the URI/IRI and this, as a whole, is the identifier of the resource. In your application though you will reference related domain objects through their ID which does not necessarily need to be, and probably also should not be, part of the IRI itself. I.e. let's assume we retrieve a resource that represents an order. That order contains the users name and address, the various items that got ordered including some meta data describing those items and what not. It usually makes sense in such a case to add the orderId which you use in your application even though the URI may contain that information already. Users of that API are usually not interested in those URIs but the actual content and might also never see those URIs if they are hidden behind automated processes or user interfaces. If a user now wants to print out that order s/he has all the information needed to file complaints later on via phone i.e. In other cases, i.e. if you design a resource to be an all-purpose clipboard like, copy&paste location, an ID does not make any sense unless you grant the user to explicitly reference one of that states directly.
The reason why IDs should not be part of the URI itself stems from the fact that a URI shouldn't change if the actual resource does not change. I.e. we have a customer who went through a merge a couple of years ago. They used to expose all their products via own URIs that exposed the productId as part of the URI. During the merger the tried to combine the various different data models to reduce the number of systems they had to operate while serving each of their customers with the same data as before as the underlying products didn't change. As they tried to stay "backwards" compatible for the purpose of supporting legacy systems of their customers, they quickly noticed that exposing those productIds as part of the URI was causing them some troubles. If they had used a mapping table of i.e. exposed UUIDs to internal productIds (again an introduction of indirection) earlier they could have reduced their whole data model and thus complexity by a lot while being able to change the mapping from internal prodcutId to UUID on the fly while allowing their clients to lookup the product information.
Long story short, as hopefully can be seen the structure of a representation depends on the exchanged media type. There are loads of different media-types available. Use the ones that allow you to describe resources to clients, such as HAL/HAL forms, Ion, Hydra, .... In regards to URIs, don't overengineer URIs. They are, as a whole, just a pointer to a resource and clients are usually interested in the content, not the URI! As such, make use of indirection-features like link-relation names, content-type negotiation and so forth to help remove the direct coupling of clients to services but instead rely more on the document type exchanged. The media-type here becomes basically the contract of the message. Through mappings on the client and server side resources of various representations can be "translated" to an object which you can use in your application.
As you've tagged your question with spring-boot and spring-data-jpa, you might want to look into spring-hateoas. It supports HAL out of the box, HAL forms can be used via affordances though the media-type needs to be enabled explicitly for it otherwise you might miss out on the form-template in the responses. Hydra support in spring-hateoas seems to be added through hydra-java which implements the Spring HATEOAS SPI. While Amazon provides implementation for Ion for various programming languages, including Java, it does not yet support Spring HATEOAS or Spring in general. Here a custom SPI implementation may be necessary.
For PUT operations you need to send the id of the entity that you want to update.
If you want to generate the same response as you would get in GET, then you need to write a DTO and map details accordingly.

Apache Flink relating/caching data options

This is a very broad question, I’m new to Flink and looking into the possibility of using it as a replacement for a current analytics engine.
The scenario is, data collected from various equipment, the data is received As a JSON encoded string with the format of {“location.attribute”:value, “TimeStamp”:value}
For example a unitary traceability code is received for a location, after which various process parameters are received in a real-time stream. The analysis is to be ran over the process parameters however the output needs to include a relation to a traceability code. For example {“location.alarm”:value, “location.traceability”:value, “TimeStamp”:value}
What method does Flink use for caching values, in this case the current traceability code whilst running analysis over other parameters received at a later time?
I’m mainly just looking for the area to research as so far I’ve been unable to find any examples of this kind of scenario. Perhaps it’s not the kind of process that Flink can handle
A natural way to do this sort of thing with Flink would be to key the stream by the location, and then use keyed state in a ProcessFunction (or RichFlatMapFunction) to store the partial results until ready to emit the output.
With a keyed stream, you are guaranteed that every event with the same key will be processed by the same instance. You can then use keyed state, which is effectively a sharded key/value store, to store per-key information.
The Apache Flink training includes some explanatory material on keyed streams and working with keyed state, as well as an exercise or two that explore how to use these mechanisms to do roughly what you need.
Alternatively, you could do this with the Table or SQL API, and implement this as a join of the stream with itself.

Nifi processor to route flows based on changeable list of regex

I am trying to use Nifi to act as a router for syslog based on a list of regexes matching the syslog.body (nb as this is just a proof of concept I can change any part if needed)
The thought process is that via a separate system (for now, vi and a text file 😃) an admin can define a list of criteria (regex format for each seems sensible) which, if matched, would result in syslog messages being sent to a specific separate system (for example, all critical audit data (matched by the regex list) is sent to the audit system and all other data goes to the standard log store
I know that this can be done on Route by content processors but the properties are configured before the processor starts and an admin would have to stop the processor every time they need to make an edit
I would like to load the list of regex in periodically (automatically) and have the processor properties be updated
I don’t mind if this is done all natively in Nifi (but that is preferable for elegance and to save an external app being written) or via a REST API call driven by a python script or something (or can Nifi send REST calls to itself?!)
I appreciate a processor property cannot be updated while running, so it would have to be stopped to be updated, but that’s fine as the queue will buffer for the brief period. Maybe a check to see if the file has changed could avoid outages for no reason rather than periodic update regardless, I can solve that problem later.
Thanks
Chris
I think the easiest solution would be to use ScanContent, a processor which specifies a dictionary file on disk which contains a list of search terms and monitors the file for changes, reloading in that event. The processor then applies the search terms to the content of incoming flowfiles and allows you to route them based on matches. While this processor doesn't support regular expressions as dictionary terms, you could make a slight modification to the code or use this as a baseline for a custom processor with those changes.
If that doesn't work for you, there are a number of LookupService implementations which show how CSV, XML, property files, etc. can be monitored and read by the controller framework to provide an updated mapping of key/value pairs. These can also serve as a foundation for building a more complicated scan/match flow using the loaded terms/patterns.
Finally, if you have to rely on direct processor property updating, you can script this with the NiFi API calls to stop, update, and restart the processors so it can be done in near-real-time. To determine these APIs, visit the API documentation or execute the desired tasks via the UI in your browser and use the Developer Tools to capture the HTTP requests being made.

Limiting and Sorting with Parse?

I'm trying to learn how to use Parse and while it's very simple, it's also... not? Perhaps I'm just missing something, but it seems like Parse requires a lot of client-side code, and even sending multiple requests for a single request. For example, in my application I have a small photo gallery that each user has. The images are stored on Parse and obtained from parse when needed.
I want to make sure that a user can not store any more than 15 images in their gallery at a time, I also want these images to be ordered by an index.
Currently it seems like the only viable option is to perform the following steps on the client:
Execute a query/request to get the amount of pictures stored.
If the amount is less than 15, then execute a request to upload the picture.
Once the picture is uploaded, execute a request that stores an object linking the user that uploaded the PFFile.
This is a total of 3 or? 6 requests just to upload a file, depending on if a "response" is considered a request by parse too. This also does not provide any way to order the pictures in the gallery. Would I have to create a custom field called "index" and set that to the number of photos received in the first query + 1?
It's worse than you think: to create the picture you must create a file, save it, then save a reference to the file in an object and save that, too.
But it's also better than you think: this sort of network usage is expected in a connected app, and some of it can be mitigated with additional logic on the server ("cloud code" in parse parlance).
First, in your app, consider a simple data model where _User has an array of images (represented, say, by an "UserImage" custom class). If you keep this relationship as an array of pointers on user, than a user's images can be fetched eagerly, when the app starts, so you'll know the image count as a fact along with the user. The UserImage object will have a file reference in it, so you can optionally fetch the image data and just hold the lighter metadata with the current user.
Ordering is a more ephemeral idea. One doesn't order objects as they are saved, but rather as they are retrieved. Queries can be ordered according to any attribute, and even more to the point, since you're retrieving all 15 images, you should consider ordering them for presentation a function of the UI, not the data.
Finally, parse limits your app not by transaction count, but by transaction rate, with a free limit low enough to serve plenty of users.

REST - complex applications

I'm struggling to apply RESTful principles to a new web application I'm working on. In particular, it's the idea that to be RESTful, each HTTP request should carry enough information by itself for its recipient to process it to be in complete harmony with the stateless nature of HTTP.
The application allows users to search for medications. The search accepts filters as input, for example, return discontinued medicines, include complimentary therapy etc..etc. In total there are around 30 filters that can be applied.
Additionally, patient details can be entered including the patients age, gender, current medications etc.
To be Restful, should all this information be included with every request? This seems to place a huge overhead on the network. Also, wouldn't the restrictions on URL length, at least for GET, make this unfeasible?
The "Filter As Resource" is a perfect tact for this.
You can PUT the filter definition to the filter resource, and it can return the filter ID.
PUT is idempotent, so even if the filter is already there, you just need to detect that you've seen the filter before, so you can return the proper ID for the filter.
Then, you can add a filter parameter to your other requests, and they can grab the filter to use for the queries.
GET /medications?filter=1234&page=4&pagesize=20
I would run the raw filters through some sort of canonicalization process, just to have a normalized set, so that, e.g. filter "firstname=Bob lastname=Eubanks" is identical to "lastname=Eubanks firstname=Bob". That's just me though.
The only real concern is that, as time goes on, you may need to obsolete some filters. You can simply error out the request should someone make a request with a missing or obsolete filter.
Edit answering question...
Let's start with the fundamentals.
Simply, you want to specify a filter for use in queries, but these filters are (potentially) involved and complicated. If it was simple /medications/1234, this wouldn't be a problem.
Effectively, you always need to send the filter to the query. The question is how to represent that filter.
The fundamental issue with things like sessions in REST systems is that they're typically managed "out of band". When you, say, go and create a medication, you PUT or POST to the medications resource, and you get a reference back to that medication.
With a session, you would (typically) get back a cookie, or perhaps some other token to represent that session. If your PUT to the medications resource created a session also, then, in truth, your request created two resources: a medication, and a session.
Unfortunately, when you use something like a cookie, and you require that cookie for your request, the resource name is no longer the true representation of the resource. Now it's the resource name (the URL), and the cookie.
So, if I do a GET on the resource named /medications/search, and the cookie represents a session, and that session happens to have a filter in it, you can see how in effect, that resource name, /medications/search, isn't really useful at all. I don't have all of the information I need to make effective use, because of the side effect of the cookie and the session and the filter therein.
Now, you could perhaps rewrite the name: /medications/search?session=ABC123, effectively embedding the cookie in the resource name.
But now you run in to the typical contract of sessions, notably that they're short lived. So, that named resource is less useful, long term, not useless, just less useful. Right now, this query gives me interesting data. Tomorrow? Probably not. I'll get some nasty error about the session being gone.
The other problem is that sessions typically are not managed as a resource. For example, they're usually a side effect, vs explicitly managed via GET/PUT/DELETE. Sessions are also the "garbage heap" of web app state. In this case, we're just kind of hoping that the session is properly populated with what is needed for this request. We actually don't really know. Again, it's a side effect.
Now, let's turn it on its head a little bit. Let's use /medications/search?filter=ABC123.
Obviously, casually, this looks identical. We just changed the name from 'session' to 'filter'. But, as discussed, Filters, in this case, ARE a "first class resource". They need to be created, managed, etc. the same as a medication, a JPEG, or any other resource in your system. This is the key distinction.
Certainly, you could treat "sessions" as a first class resource, creating them, putting stuff in them directly, etc. But you can see how, at least from a clarity point of view, a "first class" session isn't really a good abstraction for this case. Using a session, its like going to the cleaners and handing over your entire purse or briefcase. "Yea, the ticket is in there somewhere, dig out what you want, give me my clothes", especially compared to something explicit like a filter.
So, you can see how, at 30,000 feet, there's not a lot of difference in the case between a filter and a session. But when you zoom in, they're quite different.
With the filter resource, you can choose to make them a persistent thing forever and ever. You can expire them, you can do whatever you want. Sessions tend to have pre-conceived semantics: short live, duration of the connection, etc. Filters can have any semantics you want. They're completely separate from what comes with a session.
If I were doing this, how would I work with filters?
I would assume that I really don't care about the content of a filter. Specifically, I doubt I would ever query for "all filters that search by first name". At this juncture, it seems like uninteresting information, so I won't design around it.
Next, I would normalize the filters, like I mentioned above. Make sure that equivalent filters truly are equivalent. You can do this by sorting the expressions, ensuring fieldnames are all uppercase, or whatever.
Then, I would store the filter as an XML or JSON document, whichever is more comfortable/appropriate for the application. I would give each filter a unique key (naturally), but I would also store a hash for the actual document with the filter.
I would do this to be able to quickly find if the filter is already stored. Since I'm normalizing it, I "know" that the XML (say) for logically equivalent filters would be identical. So, when someone goes to PUT, or insert a new filter, I would do a check on the hash to see if it has been stored before. I may well get back more than one (hashes can collide, of course), so I'll need to check the actual XML payloads to see whether they match.
If the filters match, I return a reference to the existing filter. If not, I'd create a new one and return that.
I also would not allow a filter UPDATE/POST. Since I'm handing out references to these filters, I would make them immutable so the references can remain valid. If I wanted a filter by "role", say, the "get all expire medications filter", then I would create a "named filter" resource that associates a name with a filter instance, so that the actual filter data can change but the name remain the same.
Mind, also, that during creation, you're in a race condition (two requests trying to make the same filter), so you would have to account for that. If your system has a high filter volume, this could be a potential bottleneck.
Hope this clarifies the issue for you.
To be Restful, should all this information be included with every request?
No. If it looks like your server is sending (or receiving) too much information, chances are that there are one or more resources you haven't yet identified.
The first and most important step in designing a RESTful system is to identify and name your resources. How would you do that for your system?
From your description, here's one possible set of resources:
User - a user of the system (maybe a doctor or patient (?) - Role might need to be exposed as a resource here)
Medication - the stuff in the bottle, but it also might represent the kind of bottle (quantity and contents), or it might represent a particular bottle - depending on if you're a pharmacy or just a help desk.
Disease - the condition for which a Patient might want to take a Medication.
Patient - a person who might take a Medication
Recommendation - a Medication that might be beneficial to a Patient based on a Disease they suffer from.
Then you could look for relationships among resources;
User has and belongs to many Roles
Medication has and belongs to many Diseases
Disease has many Recommendations.
Patient has and belongs to many Medications and Diseases (poor chap)
Patient has many Recommendations
Recommendation has one Patient and has one Disease
The specifics are probably not right for your particular problem, but the idea is simple: create a network of relationships among your resources.
At this point it might be helpful to think about URI structure, although keep in mind that REST APIs must be hypertext-driven:
# view all Recommendations for the patient
GET http://server.com/patients/{patient}/recommendations
# view all Recommendations for a Medication
GET http://servier.com/medications/{medication}/recommendations
# add a new Recommendation for a Patient
PUT http://server.com/patients/{patient}/recommendations
Because this is REST, you'll spend most of your time defining the media types used to transfer representations of your resources between client and server.
By exposing more resources, you can cut down on the amount of data that needs to be transferred during each request. Also notice there are no query parameters in the URIs. The server can be as stateful as it needs to be to keep track of it all, and each request can be fully self-contained.
REST is for APIs, not (typical) applications. Don't try to wedge a fundamentally stateful interaction into a stateless model just because you read about it on wikipedia.
To be Restful, should all this information be included with every request? This seems to place a huge overhead on the network. Also, wouldn't the restrictions on URL length, at least for GET, make this unfeasible?
The size of parameters is usually insignificant compared to the size of resources the server sends. If you're using such large parameters that they are a network burden, place them on the server once and then use them as resources.
There are no significant restrictions on URL length -- if your server has such a limit, upgrade it. It's probably years old and chock-full of security vulnerabilities anyway.
No all of that does not have to be in every request.
Each resource (medication, patient history, etc) should have a canonical URI that uniquely identifies it. In some applications (eg, Rails-based ones) this will be something like "/patients/1234" or "/drugs/5678" but the URL format is unimportant.
A client that has previously obtained the URI for a resource (such as from a search, or from a link embedded in another resource) can retrieve it using this URI.
Are you working on a RESTful API that other apps will use to search your data? Or are you building a end-user focused web application where users will log in and perform these searches?
If your users are logging in, then you're already stateful as you'll have some type of session cookie to maintain the logged in state. I would go ahead and create a session object that contains all the search filters. If a user hasn't set any filters, then this object will be empty.
Here's a great blog post about using GET vs POST. It mentions a URL length limit set by Internet Explorer of 2,048 characters, so you want to use POST for long requests.
http://carsonified.com/blog/dev/the-definitive-guide-to-get-vs-post/

Resources