Best practices for filtering data - d3.js

The way I see it, when building dynamic charts with filters, each time the user requests filtered data I can
Execute a new MySQL query, and use MySQL to do the filtering.
SELECT date,
SUM(IF( `column` = `condition`, 1, 0)) as count
...
Execute a new MySQL query, and use the server-side language (PHP in my case) to filter.
function getData(condition) {
$resultSet = mysqli_query($link, "SELECT date, column ... ");
$count = 0;
while ($row = mysqli_fetch_assoc($result_set)) {
if ($row['column'] == 'condition') {
$count++;
}
}
}
Initially execute a single MySQL query, pass all the data to the client, and use Javascript & d3 to do the filtering.
I expect the answer is not black-and-white. For instance, if some filter is barely requested, it may not make sense to make the other 95% of users wait for the relevant data, and thus the filter would necessitate a new data call. But I'm really thinking about edge cases - situations where filters are used regularly, but idiosyncratically. At times like this, is it better to put filtering logic in front-end, back-end, or within my database queries?

Generally, if the filtering can be done on the front end, it should be done there. The advantages are:
Doesn't matter if your server goes down
Saves you bandwidth costs
Saves the user waiting for round trip time
The disadvantages are that it may be slower and more complicated than it would be on the backend. However, dependent on data volume, there's a lot of cases (like your examples) where Javascript is plenty good enough. d3 even has a built-in filter function:
//remove anything that isn't cake
d3.selectAll('whatever')
.filter(function(d){return d.type != 'cake'})
.remove()
If you need more complex filtering, such as basic aggregates, you can use Crossfilter (also from Mike Bostock) or the excellent d3+crossfilter wrapper dc.js.

Related

Laravel Search Caching

I'm new to Laravel (and coding in general), and I have a little Pizza Ordering system that stores the orders placed by clients to a local Pizzaria.
Inside the "New Order" form, when you start typing down the name of a pizza (four cheeses, chicken, yada yada), the program returns a query search that is run every 2 keydowns with products with a similar name.
Here's the search query, a pretty simple and basic one:
$pesquisa = json_decode($request->getContent(), true);
$produtos = Produto::select('nome', 'valor')->where('nome', 'LIKE', '%'.$pesquisa.'%')->get();
return response()->json($produtos);
Here's the "problem" I'm having: The current database has about 50 items, and it takes about ~500ms to get a return. This in my local machine, the problem gets a little bigger when it's actually hosted in a server, where it can spike from ~500ms to ~2s, depending on user connection.
In my study, I've heard about caching, and that it can shorten or remove the need for queries (which was already implemented in the "show all orders placed" list, and REALLY minimized the speed of loading), but I don't know if caching can be done with user-inputted search?
First question: How would one go about saving those pizza names to a cache, while still sorting through them based on user input?
Second question: Is caching like this the "best" way to speed up user-inputted search? Is there something else I should be doing first? (I've heard that the 'LIKE' query search is the slowest there is... should I research and try another type?)
All explanations, tips and tricks are greatly appreciated! Thank you!
For Caching you can use this
$pesquisa = json_decode($request->getContent(), true);
return Cache::remember($pesquisa, $seconds, function ()use($pesquisa) {
$produtos = Produto::select('nome', 'valor')->where('nome', 'LIKE', '%'.$pesquisa.'%')->get();
return response()->json($produtos);
});
rememberForever can also be used instead of remember

How to keep derived state up to date with Apollo/GraphQL?

My situation is this: I have multiple components in my view that ultimately depend on the same data, but in some cases the view state is derived from the data. How do I make sure my whole view stays in sync when the underlying data changes? I'll illustrate with an example using everyone's favorite Star Wars API.
First, I show a list of all the films, with a query like this:
# ALL_FILMS
query {
allFilms {
id
title
releaseDate
}
}
Next, I want a separate component in the UI to highlight the most recent film. There's no query for that, so I'll implement it with a client-side resolver. The query would be:
# MOST_RECENT_FILM
query {
mostRecentFilm #client {
id
title
}
}
And the resolver:
function mostRecentFilmResolver(parent, variables, context) {
return context.client.query({ query: ALL_FILMS }).then(result => {
// Omitting the implementation here since it's not relevant
return deriveMostRecentFilm(result.data);
})
}
Now, where it gets interesting is when SWAPI gets around to adding The Last Jedi and The Rise of Skywalker to its film list. We can suppose I'm polling on the list so that it gets periodically refetched. That's great, now my list UI is up to date. But my "most recent film" UI isn't aware that anything has changed — it's still stuck in 2015 showing The Force Awakens, even though the user can clearly see there are newer films.
Maybe I'm spoiled; I come from the world of MobX where stuff like this Just Works™. But this doesn't feel like an uncommon problem. Is there a best practice in the realm of Apollo/GraphQL for keeping things in sync? Am I approaching this problem in entirely the wrong way?
A few ideas I've had:
My "most recent film" query could also poll periodically. But you don't want to poll too often; after all, Star Wars films only come out every other year or so. (Thanks, Disney!) And depending on how the polling intervals overlap there will still be a big window where things are out of sync.
Instead putting the deriveMostRecentFilm logic in a resolver, just put it in the component and share the ALL_FILMS query between components. That would work, but that's basically answering "How do I get this to work in Apollo?" with "Don't use Apollo."
Some complicated system of keeping track of the dependencies between queries and chaining refreshes based on that. (I'm not keen to invent this if I can avoid it!)
In Apollo observables are (in components) over queried values (cached data 'slots') but your mostRecentFilm is not an observable, is not based on cached values (they are cached) but on one time fired query result (updated on demand).
You're only missing an 'updating connection', f.e. like this:
# ALL_FILMS
query {
allFilms {
id
title
releaseDate
isMostRecentFilm #client
}
}
Use isMostRecentFilm local resolver to update mostRecentFilm value in cache.
Any query (useQuery) related to mostRecentFilm #client will be updated automatically. All without additional queries, polling etc. - Just Works? (not tested, it should work) ;)

d3.queue is not giving an output

This is the page I am trying to make it dynamic by enabling cross-filtering.
So the thing is they are having multiple API.
For the top first two: TOTAL CASES & DAILY CASES
They are using this API and the third one in the top is based on this API.
The bottom three AGE, GENDER, and NATIONALITY are from this API.
In all the API one thing is common that is a date but there are some API in which some data are missing for few dates like there is a gap( Not available for some of the dates).
So I thought of combining all the JSON API in terms of dates and then allow cross filter because I believe I can enable cross-filtering between them. Correct me If I am wrong.
Like If I click on gender female since it gives info about total cases where the patient was female so only confirmed cases from the Total cases will change not the recovered, deaths as data is not available. SO I guess I should combine the top 3 charts together and gender, age and nationality charts, together. Then Dc js would be able to handle nicely filtering between each segments (cases related to landmark, cases related to person info).
Line 123:
var log = console.log;
var q = queue()
.defer(d3.json, "https://api.covid19india.org/data.json")
.defer(d3.json, "https://api.rootnet.in/covid19-in/unofficial/covid19india.org/statewise/history");
q.await(function(error, data1, data2) {
log("==========>");
log("data1:", error,data1);
log("data2:", data2);
});
This is not working because I can't see console.log() output.
https://blockbuilder.org/ninjakx/8c48ab6481311aa0452046d66c4d8701
So my questions are:
1) Why d3.queue is not working?
2) Suggestion whether combining all the datas together and allowing a filltering is a good idea or not as there is limited data. Should I go for cross filtering between the same api charts. So in this case I will have 2 segments (cases related to landmark, cases related to person info)..
Using DC js I want to make it more interactive and display more info.
d3.queue is obsolete
The answer to your first question is cut-and-dried: you don't need d3.queue, and it was deprecated and removed in d3#5.
As of d3#5, D3's data loading APIs use ES6 Promises instead of asynchronous callbacks, so you can use Promise.all([...]) instead of d3.queue. Apparently no way to make the new API emit errors when called in the old way, so it just fails silently. :-/
The new way to write your code is
Promise.all([
d3.json("https://api.covid19india.org/data.json"),
d3.json("https://api.rootnet.in/covid19-in/unofficial/covid19india.org/statewise/history")
]).then(([data1,data2]) => {
log("==========>");
log("data1:", data1);
log("data2:", data2);
})
.catch(error => log('error', error))
I find this much easier to read and understand. A nice side effect is that if you neglect to do error handling (like most people), you'll automatically get a clear message in the log.
Working fork of your block.
Combining multiple data sets
Your second question is pretty open-ended, maybe it would be better to bring that to the dc.js users group?
In general, it's difficult to cross-filter more than one data set. You would have more than one chart group that redraws together, and you'd have to manually add handlers on some chart to initiate, clear filters, and redraw the other chart group.
I haven't seen too many dashboards that do this. You'd have to make it clear to users what is going on.

How to retrieve a customised list of records according to Repository pattern?

I wanna move the business logic out of controller actions. I read a lot about repository pattern in laravel with tons of examples.
However they're usually pretty straightforward - we have a class that uses some repository to fetch a list of all possible records, the data is returned to the controller and passed to the view.
Now what if our list isn't all the possible records? What if it depends on many things. For example:
we display the list as "pages" so we might need X records for Y-th page
we might need to filter the list or even apply multiple filters (status, author, date from - to etc)
the user can change the sorting of the data (for example by clicking the table column titles)
we might need some data from other data sources (joined tables) or it might even be used for sorting (so lazy loading won't work)
Should I write a special method with all these cases in mind? Something like that:
public function getForDisplay(
$with = array(),
$filters = array(),
$count = 20,
$page = 0,
$orderBy = 'date',
$orderDir = 'DESC'
)
{
//all the code goes here
return $result;
}
And then call it like this from my controller:
$orders = $this->orders->getForDisplay(
array('customer', 'address', 'seller'),
Input::get('filters', array()),
20,
Input::get('page', 0),
Input::get('sort', 'date'),
Input::get('direction', 'DESC')
);
This looks wrong already and we didn't even get to the repositories yet.
What are the best/correct practices for solving situations like this? I'm pretty sure there has to be a way to achieve the desired results without adding all the possible combinations as a method arguments.
Use the repository pattern just for business model updates and you'll end up with very specific query methods (the Domain usually doesn't need many queries and they are pretty straightforward). For UI/reporting querying purposes, you can use a simple DAO/Service/ORM/QUery Handler , that will take some input and returns the desired data (at least part of the view model).
Since you're already using an ORM, you can use it directly. Note that you can use the ORM for domain updates also, but inside a repository's implementation i.e the app only sees the repository interface. We care about separation at the business layer, for UI querying you can skip the unneeded abstraction.
Btw, because we're talking about design, everything is subjective and thus, there's no single best/optimum way of doing things.

DOM-based data storage, best choices/performances?

This question is about load data embedded into DOM structure.
I am using jQuery2, but the question is valid for any other framework or single Javascript code.
There are two scenarios:
When the data is load once (with the page), no "refresh data" is need.
When data is refreshed by some event.
The average performance can be changed with either one or other
Suppose a typical case of scenario-2, where a page fragment must be reloaded, with new HTML and new data. So, the $('#myDiv').load('newHtmlFragment') will be used any way... And, for jQuery programmer, using AJAX, there are two ways to load an "DOM-based data":
by HTML: expressing all data into the "newHtmlFragment" HTML. Suppose many paragraphs, each like <p id="someId" data-X="someContent">...more content</p>. There are some "verbose overhead" for each data-X1="contentX1" data-X2="contentX2" ..., and is not elegant for webservice script if it is not an XHTML-oriented one (I am using PHP, my data is an array, and I preffer to use json_encode).
by jQuery evaluation: using the same $('#myDiv').load('newHtmlFragment') only for <p id="someId">...more content</p>, with no data-X. A second AJAX load an jQuery script like $('#someId').data(...) and evaluate it. So this is an overhead, for node-selection and data-inclusion, but with big item-data, each data can be enconded by JSON.
by pure JSON: similar to "by jQuery", but the second AJAX load an JSON object like var ALLDATA={'someId1':{...}, 'someId2':{...}, ...}. So this is an overhead for a static function that executes something like $('#myDiv p').each(function(){... foreach ... $(this).data('x',ALLDATA[id]['x']);}) retrive selection, but with big data, all data can be enconded by JSON.
The question: what the best choice? It depends on scenarios or another context parameter? There are signfificative performance tradeoffs?
PS: a complete answer needs to address the issue of performance... If no significative performance differences, the best choice relies on "best programming style" and software engineering considerations.
More context, if you need as reference to answer. My practical problem is at scenario-1, and I am using the second choice, "by jQuery script", executing:
$('#someId1').data({'bbox':[201733.2,7559711.5,202469.4,7560794.9],'2011-07':[3,6],'2011-08':[2,3],'2011-10':[4,4],'2012-01':[1,2],'2012-02':[12,12],'2012-03':[3,6],'2012-04':[6,12],'2012-05':[3,4],'2012-06':[2,4],'2012-07':[3,5],'2012-08':[10,11],'2012-09':[7,10],'2012-10':[1,2],'2012-12':[2,2],'2013-01':[6,10],'2013-02':[19,26],'2013-03':[2,4],'2013-04':[5,8],'2013-05':[4,5],'2013-06':[4,4]});
$('#someId2').data({'bbox':[197131.7,7564525.9,198932.0,7565798.1],'2011-07':[39,51],'2011-08':[2,3],'2011-09':[4,5],'2011-10':[13,14],'2011-11':[40,42],'2011-12':[21,25],'2012-01':[10,11],'2012-02':[26,31],'2012-03':[27,35],'2012-04':[8,10],'2012-05':[24,36],'2012-06':[4,7],'2012-07':[25,30],'2012-08':[9,11],'2012-09':[42,52],'2012-10':[4,7],'2012-11':[17,22],'2012-12':[7,8],'2013-01':[21,25],'2013-02':[5,8],'2013-03':[8,11],'2013-04':[28,40],'2013-05':[55,63],'2013-06':[1,1]});
$('#...').data(...); ... more 50 similar lines...
This question can be discussed from different aspects. Two that I can think of right now would be software engineering and end-user experience. A comprehensive solution covering first can also cover the later but as usually it's not possible to come up with such a solution (due to its cost) these two hardly overlap.
Software engineering point of view
In this POV it is strongly suggested that different parts of the system to be as isolated as possible. This means you better postpone the marriage of data and view as late as you can. It helps you to divide your developers into two separate groups; those who know server-side programming and have no idea how HTML (or any other interface layer technology) works and those who are solely experienced on HTML and Javascript. Just this division alone is a blessing for management and it helps greatly in big projects where teamwork is essential. It also helps the maintenance and expansion of the system, all the things software engineering aims at.
User experience point of view
As good as the previous solution sounds, it comes with (solvable) drawbacks. As you mentioned in your question, if we are to load view and data separately, it elevates the number of requests we have to send to server to retrieve them. It imposes two problems, first the overhead that comes with each request and second the delay user has to wait for each request to be responded. The second is more obvious so let's start with that. With all the advances to the Internet and bandwidths, yet our users' exceeding expectations enforce us to consider this delay. One way to reduce the number of requests would be your first choice: data within HTML fragments. Multiple number of requests also has an overhead problem as well. This can be explained by HTTP protocol's handshake (both on client-side and server-side) and by the fact that each request will lead to loading the session on server which in a large scale could be pretty considerable. So again your first option could be the answer to this problem.
Tie breaker
Having both sides of the story said, what then? The ultimate solution is a combination of both where data and view are married on client but they are downloaded at the same time. To be honest I don't know of such a library. But the principle is simple, you need a mechanism to package data and empty HTML fragments within the same response and combine them into what user will see on client. This solution is costy (to implement) but it is sort of the cost that once paid you can benefit from it for a life time.
It depends on how you are going to use the data that is stored. Consider this example:
Example
You have 300 items in a database (say server access logs of the last 300 days). Now you want to display 300 <div> tags, each tag representing one database-item.
Now there are 2 options to do this:
<div data-stat1="stat1" data-stat2="stat2" data-stat3="stat3">(...)</div>
<!-- again, this is repeated 300 times -->
<script>
// Example on how to show "stat1" value in all <div>s
function showStat1() {
for(var i=1; i<=300; i+= 1) {
var theID = '#id-' + i;
jQuery(theID).text(jQuery(theID).data('stat1'));
}
}
</script>
OR
<div id="id-1">(...)</div>
<!-- repeat this 300 times, for all DB items -->
<script>
data = { // JSON data which is displayed in the <div> tags
'1': ['stat1', 'stat2', 'stat3'],
// same for all 300 items
}
// Example on how to show "stat1" value in all <div>s
function showStat1() {
for(var i=1; i<=300; i+= 1) {
var theID = '#id-' + i;
jQuery(theID).text(data[i][0]);
}
}
</script>
Which scenario is better?
Case 1:
The data is directly coded into the DOM elements, which makes this one an easy to implement solution. You can see in my examples that the first code block takes a lot less code to generate and the code is less complex.
This solution is very powerful, when you want to store data that is directly related to an DOM element (as you do not have to create a logical connection between JSON and DOM): It is very simple to access data for the correct element.
However, your main concern is performance - this is NOT the way to go. Because every time you access data, you have to first select the correct DOM element and attribute via javascript, which takes considerable amount of time. So the simplicity costs you a lot of performance overhead when you want to read/write data in the element.
Case 2:
This scenario very cleanly separates the DISPLAY from the DATA storage. It has major advantages compared to the first case.
A) Data should not be mingled with the display elements - imagine you want to switch from <div> to <table> and suddenly you have to rewrite all javascript to correctly select the correct table cells
B) Data is directly accessible without traversing the DOM tree. Imagine you want to calculate an average sum of all values. In first case you need to loop all DOM elements and read value from them. In the second you simply loop a normal array
C) The data can be loaded or refreshed via ajax. You only transfer the JSON object but not all the HTML stuff that is required for displaying the data
D) Performance is way better, mostly because of the above mentioned points. Javascript is a lot better in handling simple arrays or (not too complex) JSON objects than it is in filtering data out of the DOM tree.
In general this solution is more than twice as fast as the first case. You will also discover - even though it is not obvious - that the second case is also easier to code and maintain! The code simply is easier to read and understand, as you clearly separate the data from the UI elements...
Performance comparison
You can compare the two solutions on this JSPerf scenario:
http://jsperf.com/performance-on-data-storage
Taking it further
For implementation of the second case I generally use this approach:
I generate the HTML code which will serve as the UI. It often happens that I have to generate HTML via javascript, but now I assume that the DOM tree will not change after the page is loaded
At the end of the HTML I include a tag with a JSON object for initial page display
Via jQuery onReady event I then parse the JSON object and update the UI elements as required (e.g. populating the table with data)
When data is loaded dynamically I simply transfer a new JSON object with the new data and use exact the same javascript function as in step 3 to display the new results.
Example of the HTML file:
<div>ID: <span class="the-id">-</span></div>
<div>Date: <span class="the-date">-</span></div>
<div>Total page hits: <span class="the-value">-</span></div>
<button type="button" class="refresh">Load new data</button>
<script src="my-function.js"></script>
<script>
function initial_data() {
return {"id":"1", "date":"2013-07-30", "hits":"1583"};
}
</script>
The "my-function.js" file:
jQuery(function initialize_ui() {
// Initialize the global variables we will use in the app
window.my_data = initial_data();
window.app = {};
app.the_id = jQuery('.the-id');
app.the_date = jQuery('.the-date');
app.the_value = jQuery('.the-value');
app.btn_refresh = jQuery('.refresh');
// Add event handler: When user clicks refresh button then load new data
app.btn_refresh.click(refresh_data);
// Display the initial data
render_data();
});
function render_data() {
app.the_id.text(my_data.id);
app.the_date.text(my_data.date);
app.the_value.text(my_data.hits);
}
function refresh_data() {
// For example fetch and display the values for date 2013-07-29
jQuery.post(url, {"date":"2013-07-29"}, function(text) {
my_data = jQuery.parseJSON(text);
render_data();
});
}
I did not test this code. Also there is crucial error handling and other optimization missing. But it helps to illustrate the concepts that I try to describe

Resources