Fast structure identification - data-structures

I'm wondering about a smart and efficient way to tell a data structure from another.
If we assume data structures are stored as JSON trees in an array, the problem can be summarized as follows:
[
// Type 1
{
"name" : "value",
"key" : "other_value",
},
// Type 2
{
"name" : "value",
"key" : "other_value",
"another_key" : "another_value",
"data" : [
{ "a" : 1 },
{ "a" : 2 },
// ...
]
},
// ...
]
Possible solutions I've come up with include:
Hand-made hard-coded rules like if 'another_key' in data, but that would be heavy to maintain when the number of data entries and the amount of different types grows.
Structure fingerprinting: hashing the whole tree except for the leaves and using the hash to identify a unique data structure. This would be a good solution if the number of entries in "data" : [ ... ] were fixed.
Schemas (JSON Schema, XML Schema, ...), but I feel this is a little "over-engineered" for the problem in question.
Are you aware of any better method to solve this and similar problems?
Edit: more examples
class Book {
public int id;
public string title;
}
class BookWithAuthor : Book {
public string author;
}
class BookWithMultipleAuthors : BookWithAuthor {
public string[] moreAuthors; // array structure
}
class BookWithSequel : BookWithAuthor {
public Book sequel; // list structure
}
class BookWithSequelsAndSpinoffs : BookWithAuthor {
public BookWithAuthor[] sequels;
public Book[] spinoffs;
// tree structure
}
Imagine having a (broken) serialization of a Book[] books with no information about what type of book it is and need to recover the original format (or something equivalent).
Also, I've used C# for the sake of simplicity, this is a general question.

Related

How does one organize more than a few mutations in GraphQL .Net GraphType First?

In GraphQL .Net most of the example code has one top level mutations graph object that has many actual mutations defined within it.
Here's an example from the GraphQL .NET mutations page:
public class StarWarsSchema : Schema
{
public StarWarsSchema(IServiceProvider provider)
: base(provider)
{
Query = provider.Resolve<StarWarsQuery>();
Mutation = provider.Resolve<StarWarsMutation>();
}
}
public class StarWarsMutation : ObjectGraphType
{
public StarWarsMutation(StarWarsData data)
{
Field<HumanType>(
"createHuman",
arguments: new QueryArguments(
new QueryArgument<NonNullGraphType<HumanInputType>> {Name = "human"}
),
resolve: context =>
{
var human = context.GetArgument<Human>("human");
return data.AddHuman(human);
});
}
}
And that seems fine when you have 1-5 mutations, but overtime in some larger projects one could conceivably end up with dozens of mutations. Putting them in one big class together seems sufficient to work, although it also seems like there is some organization lacking. I tried putting a child mutation GraphTypeObject into a field on the parent mutation, but I had a little trouble calling the sub-mutation. Perhaps I had it configured wrong.
That just leads me to wonder, certainly there must be a user out there with more than a dozen mutations who might have organized their mutations beyond putting all their mutations in a single top level mutations object.
How does one organize more than a few mutations in GraphQL .Net GraphType First?
https://graphql-dotnet.github.io/docs/getting-started/query-organization
You can "group" queries or mutations together by adding a top level field. The "trick" is to return an empty object in the resolver.
public class StarWarsSchema : Schema
{
public StarWarsSchema(IServiceProvider provider)
: base(provider)
{
Query = provider.Resolve<StarWarsQuery>();
Mutation = provider.Resolve<StarWarsMutation>();
}
}
public class StarWarsMutation : ObjectGraphType
{
public StarWarsMutation(StarWarsData data)
{
Field<CharacterMutation>(
"characters",
resolve: context => new { });
}
}
public class CharacterMutation : ObjectGraphType
{
public CharacterMutation(StarWarsData data)
{
Field<HumanType>(
"createHuman",
arguments: new QueryArguments(
new QueryArgument<NonNullGraphType<HumanInputType>> {Name = "human"}
),
resolve: context =>
{
var human = context.GetArgument<Human>("human");
return data.AddHuman(human);
});
}
}
This organization is reflected in how the mutations (or queries) are called. If you simply want them externally to appear as a flat list (which would equate to one giant file), you can also break it up into as many files as you want using partial classes.

Field with #Field not getting translated to the correct value

I have a document with one of the fields' names overridden by #Field:
public User {
#Id
private String id;
private String username;
#Field("profiles")
private List<BusinessProfile>
businessProfiles;
...
}
And a aggregation operation with a match operation as follows:
match(where("businessProfiles.services").elemMatch(Criteria.where("category").is(serviceCategory)))
However, in the query that this ultimately generates, the businessProfiles is not traslated to profiles. Here is the query I got from the log files:
Executing aggregation: [ { "$match" : { "businessProfiles.services" : { "$elemMatch" : { "category" : "Cloud_Initiation"}}}} ...]
This behavior seems very odd. Is this supposed to work this way? Thanks.
Field mapping is only done for TypedAggregation providing the mapping source type.
TypedAggregation<Product> agg = newAggregation(User.class,
match(where("businessProfiles.services")...
I created DATAMONGO-2310 to improve the documentation in that area.

ElasticSearch / NEST 6 - Serialization of enums as strings in terms query

I've been trying to update to ES6 and NEST 6 and running into issues with NEST serializing of search requests - specifically serializing Terms queries where the underlying C# type is an enum.
I've got a Status enum mapped in my index as a Keyword, and correctly being stored in its string representation by using NEST.JsonNetSerializer and setting the contract json converter as per Elasticsearch / NEST 6 - storing enums as string
The issue comes when trying to search based on this Status enum. When I try to use a Terms query to specify multiple values, these values are being serialized as integers in the request and causing the search to find no results due to the type mismatch.
Interestingly the enum is serialized correctly as a string in a Term query, so I'm theorizing that the StringEnumConverter is being ignored in a scenario where it's having to serialize a collection of enums rather than a single enum.
Lets show it a little more clearly in code. Here's the enum and the (simplified) model used to define the index:
public enum CampaignStatus
{
Active = 0,
Sold = 1,
Withdrawn = 2
}
public class SalesCampaignSearchModel
{
[Keyword]
public Guid Id { get; set; }
[Keyword(DocValues = true)]
public CampaignStatus CampaignStatus { get; set; }
}
Here's a snippet of constructing the settings for the ElasticClient:
var pool = new SingleNodeConnectionPool(new Uri(nodeUri));
var connectionSettings = new ConnectionSettings(pool, (builtin, serializerSettings) =>
new JsonNetSerializer(builtin,
serializerSettings,
contractJsonConverters: new JsonConverter[]{new StringEnumConverter()}
)
)
.EnableHttpCompression();
Here's the Term query that correctly returns results:
var singleTermFilterQuery = new SearchDescriptor<SalesCampaignSearchModel>()
.Query(x => x.Term(y => y.Field(z => z.CampaignStatus).Value(CampaignStatus.Active)));
Generating the request:
{
"query": {
"term": {
"campaignStatus": {
"value": "Active"
}
}
}
}
Here's the Terms query that does not return results:
var termsFilterQuery = new SearchDescriptor<SalesCampaignSearchModel>()
.Query(x => x.Terms(y => y.Field(z => z.CampaignStatus).Terms(CampaignStatus.Active, CampaignStatus.Sold)));
Generating the request:
{
"query": {
"terms": {
"campaignStatus": [
0,
1
]
}
}
}
So far I've had a pretty good poke around at the options being presented by the JsonNetSerializer, tried a bunch of the available attributes (NEST.StringEnumAttribute, [JsonConverter(typeof(StringEnumConverter))] rather than using the global one on the client, having an explicit filter object with ItemConverterType set on the collection of CampaignStatuses, etc.) and the only thing that has had any success was a very brute-force .ToString() every time I need to query on an enum.
These are toy examples from a reasonably extensive codebase that I'm trying to migrate across to NEST 6, so what I'm wanting is to be able to specify global configuration somewhere rather than multiple developer teams needing to be mindful of this kind of eccentricity.
So yeah... I've been looking at this for a couple of days now. Good chances there's something silly I've missed. Otherwise I'm wondering if I need to be providing some JsonConverter with a contract that would match to an arbitrary collection of enums, and whether NEST and their tweaked Json.NET serializer should just be doing that kind of recursive resolution out of the box already.
Any help would be greatly appreciated, as I'm going a bit crazy with this one.

Spring Mongo mapping variable data

I'm using Spring Data MongoDB for my project. I work with a mongo database containing a lot of data, and I want to map this data within my Java application.
The problem I have is that some data back in time had a different structure.
For example sport_name is an array now, while in some old records is a String:
sport_name: "Soccer" // Old data
sport_name: [ // Most recent entries
{
"lang" : "en",
"val" : "Soccer"
},
{
"lang" : "de",
"val" : "Fussball"
}
]
Here is what I have until now:
#Document(collection = "matches")
public class MatchMongo {
#Id
private String id;
private ??? sport_name; // Best way?!
(What's the best way to)/(How would you) handle something like this?
If old data can be considered as "en" language, then separate structure can be used to store localized text:
class LocalName {
private String language;
private String text;
// getters/setters
}
So mapped object will store the collection of localized values:
public class MatchMongo {
// it can also be a map (language -> text),
// in this case we don't need additional structure
private List<LocalName> names;
}
This approach can be combined with custom converter, to use plain string as "en" locale value:
public class MatchReadConverter implements Converter<DBObject, MatchMongo> {
public Person convert(DBObject source) {
// check what kind of data located under "sport_name"
// and define it as "en" language text if it is an old plain text
// if "sport_name" is an array, then simply convert the values
}
}
Custom mapping is described here in details.
Probably you can write a utility class which will fetch all the data where sport_name is not an array and update the element sport_name to array. But this all depends on the amount of data you have.
You can use query {"sport_name":{$type:2}}, here 2 stands for String.
Refer for more details on $type: http://docs.mongodb.org/manual/reference/operator/query/type/

Mongodb query for embedded doc

I am having trouble to query that, I want to find all the docs that contains the id "5418a26ce4b0e4a40ea1d548" in the individualUsers field. would be esp useful if you know how to do that in Spring Data MongoDB query.
db.collection.find({individualUser:{"5418a26ce4b0e4a40ea1d548"}})
example of one doc
{ "_id" : ObjectId("5418c3b9e4b03feec4345602"), "creatorId" : "5418a214e4b0e4a40ea1d546", "individualUsers" : { "5418a26ce4b0e4a40ea1d548" : null, "5418a278e4b0e4a40ea1d54a" : null } }
Update #001
Entity code
#Document
class Idea{
#Id
String id;
String creatorId;
Map<String,String> individualUsers;
/*getter and setter omitted*/
}
Interface
public interface IdeaRepository extends MongoRepository<Idea,String> {
}
Update #002
So when spring-mongodb saves hashmap to json, it will look like
"individualUsers" : { "5418a26ce4b0e4a40ea1d548" : null, "5418a278e4b0e4a40ea1d54a" : null }
In java program I can easily get the data using the key value. but in the mongodb query, I can't query the key?
so the question is can I query inside the "individualUsers": {} with the key ??

Resources