How to structure firebase data for multiple views - data-structures

I'm new to Firebase and I'm building my first app on it so thought I'd ask if my current plans for the app's data structure make sense.
I've read the Firebase blog posts and several answers on SO which have helped me understand the concept of "optimise for the way the data will be read". However, my data will be read in a few different ways and it feels like I may be over complicating things.
Background
The app is like a directory for businesses in multiple towns (schemes) to promote their upcoming events and offers. I think of the data hierarchy like this:
Scheme: A town (the app has multiple schemes)
Category: A group of businesses around a theme (e.g. shoe shops)
Business: An administrative organisation (handles billing etc). Each business can have multiple locations (shops in different towns).
Location: A shop in a town.
Event: Each location can promote events. An event can be promoted at multiple locations but not necessarily all of a business's locations.
Offer: Similar to an event but a different type of object.
Viewing the data
The app user can view the offer & event data in 5 ways:
specific to a business (e.g. Joe's shoes' offers)
for a scheme (e.g. all offers in a Smalltown)
for the whole app (e.g. all offers anywhere)
in a category in a scheme (e.g. all shoe offers in Smalltown)
in a category in the whole app (e.g. all shoe offers anywhere)
In addition, I need to make sure that an administrator from each business can see/edit all of their business's data via a CMS I'm also building.
My approach
This is the data structure I'm thinking of using:
root {
schemes{
scheme1{
name: "smalltown",
logo: "base64 data",
bgcolor: "#FF0000"
},
scheme2{...}
},
businesses{
business1{
name: "Joe's Shoes",
logo: "base64 data",
locations: {
location1: true,
location3: true,
location15: true
},
address_hq: {
street: "45 Acacia Avenue",
town: "Bigtown",
postcode: "BT1 1JS"
},
contact_hq: {
name: "Joe Simpson",
position: "Owner",
email: "joe#joesshoes.com",
tel: "07123 456789"
},
subscription: {
plan: "Standard",
date_start: "10/10/2015",
date_renewal: "10/10/2016"
},
owner: "james1"
},
business2{...}
},
locations{
location1{
name: "Joe's Shoes",
logo: "base64 data",
scheme: "scheme1",
events: {
event1: true,
event27: true
},
offers: {
offer1: true,
offer6: true
},
business: "business1",
owner: "james1"
},
location2{...}
},
events{
event1{
schemes: {
scheme1: true,
scheme4: true
},
locations{
location1: true,
location21: true
},
categories: {
shoes: true,
footwear: true,
fashion: true
},
business: "business1",
date: "5/5/2016",
title: "The History of Shoes",
description: "A fascinating talk about the way shoes have...",
image: "base64 data",
venue: {
street: "Great Hotel",
town: "Bigtown",
postcode: "BT1 1JS"
},
price: "£10"
},
event2{...}
},
offers{
offer1{
schemes: {
scheme1: true,
scheme4: true
},
locations{
location1: true,
location21: true
},
categories: {
shoes: true,
footwear: true,
fashion: true
},
business: "business1",
date_start: "5/5/2016",
date_end: "5/5/2016",
title: "All children's shoes Half Price",
description: "Get 50% off all children's shoes - just in time for the summer",
image: "base64 data",
},
offer2{...}
}
}
Here's a graphic of similar data structure in case it's easier to read:
My question is whether I need to denormalise the data further (repeat more data in more places) or is there a better way to think about this altogether?
It feels like I'm getting potential complications from having to keep data in sync without the ability to simply read from a single place (e.g. I'll need to use queries and indexes (?) to combine location and event data for scheme-wide event listings).
Any advice on making this data structure more efficient would be great.

Related

Highstock dataGrouping not working with live data

I am currently working on a project for my company, where I need to plot highstock charts, which show energy-data of our main buildings.
Since it is live data, new datapoints come per Websocket every few-or-so seconds. However, the graph should only show one datapoint every hour. I wanted to clear this with the highstock dataGrouping, but it does not really work. It groups the points yes, but it still shows the „transmission“, the graph-line, between them. Thus making the whole graph completely irreadable.
In an other Version of the project, the graph only shows the latest datapoint of each group (as specified in the „approximate“ object in the chart options), but also does not start a new group after the chosen Interval runs through.
I've been sitting on this problem for about 3 days now and have not found any proper completely working solution yet.
Unfortunately, due company policy and due to hooks and components necessary, which are only used here in the company, I'm not able to give you a jsfilddle or similar, even though I'd really love to. What I can do is give you the config, mabye you find something wrong there?
const options = {
plotOptions: {
series: {
dataGrouping: {
anchor: 'end',
approximation: function (groupData: unknown[]) {
return groupData[groupData.length - 1];
},
enabled: true,
forced: true,
units: [['second', [15]]],
},
marker: {
enabled: false,
radius: 2.5,
},
pointInterval: minutesToMilliseconds(30),
pointStart: currentWeekTraversed?.[0]?.[0],
},
},
}
This would be the plotOptions.
If you need any more information, let me know. I'll see then, what and how I can send it to you.
Thank you for helping. ^^
This is example how dataGrouping works with live data,
try to recreate your case in addition or use another demo from official Highcharts React wrapper page.
rangeSelector: {
allButtonsEnabled: true,
buttons: [{
type: 'minute',
count: 15,
text: '15S',
preserveDataGrouping: true,
dataGrouping: {
forced: true,
units: [
['second', [15]]
]
}
}, {
type: 'hour',
count: 1,
text: '1M',
preserveDataGrouping: true,
dataGrouping: {
forced: true,
units: [
['minute', [1]]
]
}
}
},
Demo: https://jsfiddle.net/BlackLabel/sr3oLkvu/

Is there a way make the fields of a GraphQLObjectType dynamic/non-required inorder to recive a dynamic key value pair

I am trying the GraphQL for the first time. I have a express-graphql server connected to MySQL for hypothetical juice shops, where a owner has ability add or remove or rename the serve type.
For example
Shop A has serves like "Cute Small","The Regular" and "Extravaganza"
Where as shop B serves like "Xsmall","small","medium","large" and "Xlarge"
As the GraphQL fields are mandatory, I am unable think of solution for this particular scenario.
In short, I would love to know if there is a way to write a GraphQLObjectType where the fields can be any/not mentioned.
Snippet of a menu type, were the fields is very specific
var typeDef = new GraphQLObjectType({
name: "Menu",
fields: {
name: { type: GraphQLString },
small_serve: { type: GraphQLFloat },
regular_serve: { type: GraphQLFloat },
medium_serve: { type: GraphQLFloat },
large_serve: { type: GraphQLFloat },
},
});
GraphiQL
{
menus{
name,
small_serve,
regular_serve,
medium_serve,
large_serve
}
}

GraphQL/Netlify CMS - don't error if query field is undefined (doesn't exist)

I'm using Gatsby with Netlify CMS and have some optional fields in a file collection. The problem with this is that I'm unable to retrieve these fields using GraphQL, as the field doesn't exist if it was left blank.
For example, let's say I have the following collection file:
label: "Primary Color",
name: "primary",
file: "data/palette.yaml",
widget: "object",
fields: [
{
label: "Light",
name: "light",
required: false,
widget: "string"
},
{
label: "Main",
name: "main",
required: false,
widget: "string"
},
{
label: "Dark",
name: "dark",
required: false,
widget: "string"
},
{
label: "Contrast Text",
name: "contrastText",
required: false,
widget: "string"
}
]
All fields are optional. So let's say the user only enters in a value for main. This then saves the data as:
primary:
main: '#ff0000'
light, dark and contrastText are not saved at all - they are simply left out entirely.
When I query the data in GraphQL, I obviously need to check for ALL fields since I have no idea which optional fields were filled in by the user and which were left blank. This means my query should be something like:
query MyQuery {
paletteYaml {
primary {
light
main
dark
contrastText
}
}
}
Using the above example where the user only filled in the main field, the above query will throw an error as light, dark and contrastText fields do not exist.
I am using a file collection type (as opposed to folder collection type) for this, so I can't set a default value. It wouldn't matter if I could set a default value anyway, since GraphQL and Yaml do not accept undefined as a value - they can only accept null or an empty string ("") as a best alternative.
Even if I manually save the yaml file with all field values set to null or "", this wouldn't work either as it would then cause additional issues as I am deep merging the query result with another javascript object.
I simply need to have GraphQL return undefined for each blank (missing) field instead of throwing an error, or not return the blank/missing fields at all.
This seems like a common issue (handling optional fields in Netlify CMS) but there is nothing in the documentation about it. How do people handle this issue?

Linking 3 types of document for a view

I'm struggling with linked documents when creating a view.
A salesperson has multiple clients, each client has multiple
purchases.
I need to get a view containing:
salesperson ids for each client purchase.
In a relational database I would join:
purchase.clientid -> client._id
client.salesperson -> salesperson._id
Given:
{ _id: "1", type: "purchase", clientid: "2", items: [] }
{ _id: "2", type: "client", salespersonid: "3", name: "Chris the client" }
{ _id: "3", type: "salesperson", name: "Simon the salesperson" }
I've tried reading a lot of stuff, but nothing has clicked. How would I do this in a view?
{
_id: 'purchase-client-2-<unique-purchase-id>',
salespersonId: 'sales-3'
}
{
_id: 'sales-3',
name: 'Simon the salesperson'
}
{
_id: 'client-2',
name: 'Chris the client'
}
With the above documents you could query for all documents starting with 'purchase-client2' to get an array of purchase document. Each purchase document then tells you who the sales person was. Depending on the number of sales staff you may already have everything you need right there, assuming your map of sales id to name is already in memory.
If not, you could do a further lookup (and potentially cache that result). If that in-memory lookup or extra lookup doesn't work for you you could also duplicating the sales person's name in the purchase document. After all, NoSQL DB's don't follow the same rules as relational DB's and it's ok to duplicate now and again. You just have to think about how you keep the dups sync'ed up later.
If you can use and abuse the ID field and getaway without views then you may be better off. Views bring their own set of problems. Good luck!

Can someone advise on an HBase schema click stream data

I would like to create a click stream application using HBase, in sql this would be a pretty simple task but in Hbase I have not got the first clue. Can someone advise me on a schema design and keys to use in HBase.
I have provided a rough data model and several questions that I would like to interrogate the data for.
Questions I would like to ask for accessing data
What events led to a conversion?
What was the last page / How many paged viewed?
What pages a customer drops off?
What products does a male customer between 20 and 30 like to buy?
A customer has bought product x also likely to buy product y?
Conversion amount from first page ?
{
PageViews: [
{
date: "19700101 00:00",
domain: "http://foobar.com",
path: "pageOne.html",
timeOnPage: "10",
pageViewNumber: 1,
events: [
{ name: "slideClicked", value: 0, time: "00:00"},
{ name: "conversion", value: 100, time: "00:05"}
],
pageData: {
category: "home",
pageTitle: "Home Page"
}
},
{
date: "19700101 00:01",
domain: "http://foobar.com",
path: "pageTwo.html",
timeOnPage: "20",
pageViewNumber: 2,
events: [
{ name: "addToCart", value: 50.00, time: "00:02"}
],
pageData: {
category: "product",
pageTitle: "Mans Shirt",
itemValue: 50.00
}
},
{
date: "19700101 00:03",
domain: "http://foobar.com",
path: "pageThree.html",
timeOnPage: "30",
pageViewNumber: 3,
events: [],
pageData: {
category: "basket",
pageTitle: "Checkout"
}
}
],
Customer: {
IPAddress: 127.0.0.1,
Browser: "Chrome",
FirstName: "John",
LastName: "Doe",
Email: "john.doe#email.com",
isMobile: 1,
returning: 1,
age: 25,
sex: "Male"
}
}
Well, you data is mainly in one-to-many relationship. One customer and an array of page view entities. And since all your queries are customer centric, it makes sense to store each customer as a row in Hbase and have customerid(may be email in your case) as part of row key.
If you decide to store one row for one customer, each page view details would be stored as nested. The video link regarding hbase design will help you understand that. So for you above example, you get one row, and three nested entities
Another approach would be, denormalized form, for hbase to perform good lookup. Here each row would be page view, and customer data gets appended for every row.So for your above example, you end up with three rows. Data would be duplicated. Again the video gives info regarding that too(compression things).
You have more nested levels inside each page view - live events and pagedata. So it will only get worse, with respect to denormalization. As everything in Hbase is a key value pair, it is difficult to query and match these nested levels. Hope this helps you to kick off
Good video link here

Resources