How to parse JSON in haskell - haskell-stack

I need to parse a JSON incoming form the UI into a data structure.
the data structure is a combination of other data structures
Data Collection = Collection { t1 :: t1 , t2::t2}
newtype t1 = t1 {unt1 :: String}
data t2 = t2 {id :: Integer, rank :: String}
The data I am getting is in the format
{
"t1": {
"_unt1": "at1Value"
},
"t2": {
"id" : 1
"rank": "Officer"
}
}
I basically need to create a Collection Data type . How should I do this in the simplest way ?
I tried the Aeson library and made Collection and instance of JSON ,and then tried something like
decode data :: Maybe Collection
But that doesn't work. I did try looking into the parsec library as well, But I am not sure whether that will be useful here.
I am pretty new to haskell, so maybe I am missing something here. What would be the best way to implement this taking into consideration the actual data structure might be much more complicated than the example I gave, with it going several layers deep

import Text.JSON.Generic and decode the json.
import Text.JSON.Generic
................
decodeJSON data :: Collection

Related

How to query in GraphQL with no fixed input object contents?

I want to query to get a string, for which I have to provide a input object, however the input data can have multiple optional keys and quite grow large as well. So I just wanted something like Object as data type while defining schema.
// example of supposed schema
input SampleInput {
id: ID
data: Object // such thing not exist, wanted something like this.
}
type Query {
myquery(input: SampleInput!): String
}
here data input can be quite large so I do not want to define a type for it. is there a way around this?

filter by Time in graphql (using faunaDB service)

My graphQL schema looks like this,
type Todo {
name: String!
created_at: Time
}
type Query {
allTodos: [Todo!]!
todosByCreatedAtFlag(created_at: Time!): [Todo!]!
}
This query works.
query {
todosByCreatedAtFlag(created_at: "2017-02-08T16:10:33Z") {
data {
_id
name
created_at
}
}
}
Could anyone point out how i can create greater than (or less than) Time query in graphql (using faunaDB).
GraphQL range queries are not supported (yet.. they're coming!)
FaunaDB does not provide range queries for their GraphQL out-of-the-box, we are working on these features.
.. but there is a workaround.
That doesn't mean though that it can't do range queries since range queries are supported in FQL and you can always 'escape' from GraphQL to FQL to implement more advanced queries by writing a User Defined Function (UDF).
.. using resolvers
By using the #resolver keyword in your schema you can implement GraphQL queries yourself by writing a User Defined Function in FaunaDB in FQL. There are some basic examples in the documentation bt I imagine you might need some help so I'll write you a simple example.
I added your schema and added two documents:
First thing is that our schema will be extended with the resolver:
type Todo {
name: String!
created_at: Time
}
type Query {
allTodos: [Todo!]!
todosByCreatedAtFlag(created_at: Time!): [Todo!]!
todosByCreatedRange(before: Time, after:Time): [Todo!]! #resolver
}
All this does is add a function for us to implement:
Which if we call via GraphQL gives us exactly that Abort message we saw in the screenshot before since it has not been implemented yet. But we can see that the GraphQL statement actually calls the function.
.. UDF implementation
First thing we will do is add the parameter which is just writing a name as the first parameter of the lambda:
Which also takes an array in case you need to pass multiple parameters (which I do in the resolver that I defined in the schema):
We'll add an index to support our query. Values are for ranges (and for return values and sorting). We'll add created_at to range over it and also add ref since we'll need the return value to get the actual document behind the index.
We could then start off by just writing a simple function (that won't work yet)
Query(
Lambda(
["before", "after"],
Paginate(
Range(Match(Index("todosByCreatedAtRange")), Var("before"), Var("after"))
)
)
)
and could test this by calling the function manually via the shell.
This indeed returns the two objects (range is inclusive).
Of course, there is one problem with this, it does not return the data in the structure that GraphQL expects it so we'll get these strange errors:
We can do two things now, either define a type in our Schema that fits these and/or we can adapt the data the returns. We'll do the latter and adapt our result to the expected [Todo!]! result to show you.
Step one, map over the result. The only thing we introduce here is the Map and the Lambda. We do not do anything special yet, we just return the reference instead of both the ts and the reference as an example.
Query(
Lambda(
["before", "after"],
Map(
Paginate(
Range(
Match(Index("todosByCreatedAtRange")),
Var("before"),
Var("after")
)
),
Lambda(["created_at", "ref"], Var("ref"))
)
)
)
Calling it indeed shows that the function now only returns references.
Let's get the actual documents. I know that FQL is verbose (and with good reasons, although it should become less verbose in the future) so I started adding comments to clarify things
Query(
Lambda(
["before", "after"],
Map(
// This is just the query to get your range
Paginate(
Range(
Match(Index("todosByCreatedAtRange")),
Var("before"),
Var("after")
)
),
// This is a function that will be executed on each result (with the help of Map)
Lambda(["created_at", "ref"],
// We'll use Let to structure our queries ( allowing us to use varaibles )
Let({
todo: Get(Var("ref"))
},
// And then we return something
Var("todo")))
)
)
)
Our function now returns data.. woohoo!
We still need to make sure this data is conforms to what GraphQL expects, and from the schema we can see that it expects a [Todo!]! (See docs tab) and a Todo looks like (see the schema tab):
type Todo {
_id: ID!
_ts: Long!
name: String!
created_at: Time
}
As you can also see from that docs tab is that 'non-resolver' queries are automatically changed to return TodoPages. The function we wrote so far actually return pages.
Option 1, change the schema and turn it into a paginated resolver.
We can fix this by adding the paginated: true option to the resolver. You will have to take into account for extra parameters that will be added to the resolver as explained here. I haven't tried that myself, so I'm not 100% certain how that would work. The advantage of a paginated resolve is that you can immediately take advantage of sane pagination in the GraphQL endpoint.
Option 2, turn it into a non-paginated result.
A paginated result is a result that looks as follows:
{ data: [ document1, document2, .. ],
before: ...
after: ..
}
The result doesn't accept pages but an array so I'll change it and retrieve the data field:
And we have our result.
The complete query looks as follows:
Query(
Lambda(
["before", "after"],
Select(
["data"],
Map(
Paginate(
Range(
Match(Index("todosByCreatedAtRange")),
Var("before"),
Var("after")
)
),
Lambda(
["created_at", "ref"],
Let({ todo: Get(Var("ref")) }, Var("todo"))
)
)
)
)
)
Disclaimers
Once you go custom, pagination also becomes your responsibility (e.g. pass an extra parameter). You can't fetch relations out of the box anymore as you would normally do by just requesting the relations in the GraphQL body.
Some words on the benefits of UDFs and the hybrid of GraphQL/FQL
Before you shy away from FQL (and yes, we do have to add range queries and are working on that), here is some explanation on the UDF approach in general and why it makes sense to think about it anyway.
You will at a certain moment encounter things in GraphQL that are just impossible (complex conditional transactions, e.g. update document and update this other document only if some condition that results form the previous update is true). Users that use other GraphQL implementations typically solve this by writing a serverless function in case you have to implement advanced logic or transactions.
FaunaDB's answer to this is to use their User Defined Functions (UDFs). This is not a serverless function, it's a FaunaDB function implemented in FQL which might seem cumbersome at first but it's important to realize that it gives you the same benefits ( multi-region/strong consistency/scalability/free-tier/pay-as-you-go) that FaunaDB provides.

How to Merge Maps in Pig

I am new to Pig so bear with me. I have two datasources that have the same schema: a map of attributes. I know that some attributes will have a single identifiable overlapping attribute. For example
Record A:
{"Name":{"First":"Foo", "Last":"Bar"}, "FavoriteFoods":{["Oranges", "Pizza"]}}
Record B:
{"Name":{"First":"Foo", "Last":"Bar"}, "FavoriteFoods":{["Buffalo Wings"]}}
I want to merge the records on Name such that:
Merged:
{"Name":{"First":"Foo", "Last":"Bar"}, "FavoriteFoods":{["Oranges", "Pizza", "Buffalo Wings"]}}
UNION, UNION ONSCHEMA,and JOIN don't operate in this way. Is there a method available to do this within Pig or will it have to happen within a UDF?
Something like:
A = LOAD 'fileA.json' USING JsonLoader AS infoMap:map[];
B = LOAD 'fileB.json' USING JsonLoader AS infoMap:map[];
merged = MERGE_ON infoMap#Name, A, B;
Pig by itself is very dumb when it comes to even slightly complex data translation. I feel you will need two kinds of UDFs to achieve your task. The first UDF will need to accept a map and create a unique string representation of it. It could be like a hashed string representation of the map (lets call it getHashFromMap()). This string will be used to join the two relations. The second UDF would accept two maps and return a merged map (lets call it mergeMaps()). Your script will then look as follows:
A = LOAD 'fileA.json' USING JsonLoader AS infoMapA:map[];
B = LOAD 'fileB.json' USING JsonLoader AS infoMapB:map[];
A2 = FOREACH A GENERATE *, getHashFromMap(infoMapA#'Name') AS joinKey;
B2 = FOREACH B GENERATE *, getHashFromMap(infoMapB#'Name') AS joinKey;
AB = JOIN A2 BY joinKey, B2 BY joinKey;
merged = FOREACH AB GENERATE *, mergeMaps(infoMapA, infoMapB) AS mergedMap;
Here I assume that the attrbute you want to merge on is a map. If that can vary, you first UDF will need to become more generic. Its main purpose would be to get a unique string representation of the the attribute so that the datasets can be joined on that.

Serialize to JSON dynamic structure

All examples of working with JSON describe how to serialize to JSON simple or user types (like a struct).
But I have a different case: a) I don't know the fields of my type/object b) every object will have different types.
Here is my case in pseudocode:
while `select * from item` do
while `select fieldname, fieldvalue from fields where fields.itemid = item.id` do
...
For each entity in my database I get field names and field values. In the result I need to get something like this:
{
"item.field1": value,
...
"item.fieldN": value,
"custom_fields": {
"fields.field1": value,
...
"fields.fieldK": value
}
}
What is the best way to do it in Go? Is there any useful libraries or functions in standard library ?
Update: The source of data is the database. In the result i need to get JSON as string to POST it to external web service. So, the program just read data from database and make POST requests to REST service.
What exactly is your target type supposed to be? It can't be a struct since you do not know the fields beforehand.
The only fitting type to me seems to be a map of type map[string]interface{}: with it any nested structure can be achieved:
a := map[string]interface{}{
"item.field1": "val1",
"item.field2": "val2",
"item.fieldN": "valN",
"custom_fields": map[string]interface{}{
"fields.field1": "cval1",
"fields.field2": "cval2",
},
}
b, err := json.Marshal(a)
See playground sample here.
Filling this structure from a database as you hinted at should probably be a custom script (not using json).
Note: custom_fields can also be of other types depending on what type the value column is in the database. If the value column is a string use map[string]string.

Getting the objects with similar secondary index in Riak?

Is there a way to get all the objects in key/value format which are under one similar secondary index value. I know we can get the list of keys for one secondary index (bucket/{{bucketName}}/index/{{index_name}}/{{index_val}}). But somehow my requirements are such that if I can get all the objects too. I don't want to perform a separate query for each key to get the object details separately if there is way around it.
I am completely new to Riak and I am totally a front-end guy, so please bear with me if something I ask is of novice level.
In Riak, it's sometimes the case that the better way is to do separate lookups for each key. Coming from other databases this seems strange, and likely inefficient, however you may find your query will be faster over an index and a bunch of single object gets, than a map/reduce for all the objects in a single go.
Try both these approaches, and see which turns out fastest for your dataset - variables that affect this are: size of data being queried; size of each document; power of your cluster; load the cluster is under etc.
Python code demonstrating the index and separate gets (if the data you're getting is large, this method can be made memory-efficient on the client, as you don't need to store all the objects in memory):
query = riak_client.index("bucket_name", 'myindex', 1)
query.map("""
function(v, kd, args) {
return [v.key];
}"""
)
results = query.run()
bucket = riak_client.bucket("bucket_name")
for key in results:
obj = bucket.get(key)
# .. do something with the object
Python code demonstrating a map/reduce for all objects (returns a list of {key:document} objects):
query = riak_client.index("bucket_name", 'myindex', 1)
query.map("""
function(v, kd, args) {
var obj = Riak.mapValuesJson(v)[0];
return [ {
'key': v.key,
'data': obj,
} ];
}"""
)
results = query.run()

Resources