Elasticsearch request validation - elasticsearch

I'm trying to figure out if I can validate elasticsearch requests against a pre-defined mapping. I've googled around and searched StackOverflow, but haven't been able to find anything that speaks to what I'm trying to do apart from this question from a year ago that went unanswered.
Is there any tool out there that fits this need, or that at the least would convert ES mappings to some other easily validatable entity like JSONSchema? Extra points if that tool would be accessible in Python, but any language would work.
Specifics:
I'm looking at the openFDA API, which has, e.g., this endpoint for animal and veterinary data. openFDA provides this mapping (YAML download here) for valid fields. I'd like to be able to e.g. make sure that a provided animal.age field is an object instead of some other type, since such a query not obeying the defined mapping returns a rather unhelpful message stating that no records were found.

Related

Why Elasticsearch uses PUT instead of POST for creating index?

As far as I know, POST is usually used for changing the state of the server, and PUT usually for updating the information. If I am creating a new index, should it not be POST instead of PUT? PUT does make sense when creating a document as it changes the state of data.
Your statement
As far as I know, POST is usually used for changing the state of the server, and PUT usually for updating the information.
does conform to the conventional HTTP vs CRUD semantics:
HTTP method
CRUD equivalent
Description
POST
Create
Let the target resource process the representation enclosed in the request.
PUT
Update
Set the target resource’s state to the state defined by the representation enclosed in the request.
However, the PUT spec also stipulates that:
The PUT method requests that the state of the target resource be
created or replaced with the state defined by the representation
enclosed in the request message payload
As such, PUT can (and is) used in Elasticsearch to both create an index AND update its
[settings and mappings].
Also, keep in mind that it's rarely just a matter of strict adherence to the semantics. One of the creators of ES put it this way:
It's all about REST semantics.
And our understanding of the semantics at the time when we made the APIs. And backwards compatibility constraints. And whatever "feels" natural to the person who implemented the API.
Where it makes a lot of sense Elasticsearch maps the HTTP verbs to
useful things. But when it doesn't make a ton of sense we just go with
whatever verb feels good rather than trying to be super strict about
REST. Also, we don't do linked data, instead relying on you to build
links from context. I'm told that is particularly non-REST. But it is
what we do.

Search methods in FHIR

I'm working on extracting patients info in FHIR server however, I've came across two types of searching methods that were somewhat different. What is the difference between the search method of
Bundle bundle = client.seach().forResource(DiagnosticReport.class)
.
.
and
GET [base]/DiagnosticReport?result.code-value-
quantity=http://loinc.org|2823-3$gt5.4|http://unitsofmeasure.org|mmol/L
It's very confusing as it seemed that there isn't much that is mentioned about these two search methods. Can i achieve the same level of filtering with the first method compared to the url method?
The first is how to perform a search using the Java reference implementation. The latter explains what the actual HTTP query looks like that hits the server (and also specifies some additional search criteria). Behind the scenes the Java code in the first example is actually making an HTTP call that looks similar to the second example. The primary documentation in the FHIR specification deals with the HTTP call. The reference implementations work differently based on which language they are and are documented outside the FHIR specification on a reference implementation by reference implementation basis.

With the Simple API in Stanford CoreNLP, is there a way to get multi-token entity mentions?

This question is very similar to my question, however due to the way SO works, I think it is better to ask a new question rather than just continue a thread.
CoreNLP has the Simple API which allows for quicker access to various components of the NLP pipeline. The way to get named entities appears to be:
Form a document annotation from the text
Get the sentences from the document object
Use nerTags() from the sentences object to get the token-by-token ner labeling.
Via other mechanisms, as talked about in the question link above, one can retrieve full multi-token entity mentions such as George Washington, which is an entity mention composed of 2 tokens. Is there a way using the simple api to get these multi-token entity mentions?
Yes, though it gives you less information than the full API, returning only the String spans of the mention. See Sentence#mentions(String) and Sentence#mentions().
If you want to get more information about a mention, you'll have to either use the regular API, or re-implement the logic in these functions. You can also try mucking around in the raw Proto, which will certainly have all the information you could possibly want, but in a less-than-pleasant proto interface. The proto definition is here.

Documenting fields in Django Rest Framework

We're providing a public API that we use internally but also provide to our SaaS users as a feature. I have a Model, a ModelSerializer and a ModelViewSet. Everything is functional but it's blurting out the Model help_text for the description in the API documentation.
While this works for some fields, we would like to be a lot more explicit for API users, providing examples, not just explanations of guidance.
I realise I can redefine each field in a Serializer (with the same name, then just add a new help_text argument, but this is pretty boring work.
Can I provide (eg) a dictionary of field names and their text?
If not, how can I intercede in the documentation process to make something like that work?
Also, related, is there a way to provide a full example for each Viewset endpoint? Eg showing what is submitted and returned, like a lot of popular APIs do (Stripe as an example). Or am I asking too much from DRF's documentation generation? Should I handle this all externally?
To override help_text values coming from the models, you'll need to use your own schema generator subclass and override get_path_fields. There you'd be able to prioritize a mapping on the viewset (as you envision) over the model fields help_text values.
On adjusting the example generation - you could define a JSON language which just deals with raw JSON and illustrate the request side of things pretty easily, however, illustrating responses is difficult without really getting deep into the plumbing, as the default schema generated does not contain response structure data.

validate request input phoenix elixir

I'm struggling to find something in the documentation that seems like it should be there...
In Phoenix I see validation at the point of trying to create an Ecto change set, but I'm not seeing much prior to that, upon validating the actual user input.
I'm not really a fan of exposing my data models across API boundaries, and I would rather just have structs representing the requests and responses, as they are likely very different shapes to my actual data models.
I'd like a way of converting user input to a struct and using some kind of validation framework to determine if the input is valid before I even think about hitting a database.
I've found https://github.com/CargoSense/vex and have gone down the route of converting the input to a struct, and using their validation, but there are a few things that worry me about this approach, namely:
I hear that there are issues with atoms in Elixir, and as structs are basically atom keyed maps, am I going to run into this atom exhaustion issue converting user input to these?
I also have some structs that would have nested structs. I'm currently checking the default value provided. If it's a struct doing some magic, based on the answers here In Elixir how do you initialize a struct with a map variable, to automatically convert a nested map to my nested struct. But again, I'm not sure if this is sensible.
The validations I'm defining in one DSL will be very similar in my Ecto models, and I would rather be using this for both.
Basically, how would you go about validating user input correctly in a Phoenix app. Am I on the right lines, or way off?

Resources