Is it possible in GraphQL or Hasura to group the results by month or year? I'm currently getting the result list back as a flat array, sorted by the date attribute of the model. However, I'd like to get back 12 subarrays corresponding to each month of the year.
From docs - natively not supported.
Derived data or data transformations leads to views. Using PostgreSQL EXTRACT Function you can have separate month field from data ... but still as flat array.
Probably with some deeper customization you can achieve desired results ... but graphql [tree, arrays] structures are more for embedding not for view ...
How many records you're processing? Hundreds? Client side conversion (done easily from apollo client data on react component/container [view] level) may be good enough [especially with extracted month field].
PS. You can have many results groupped in arrays if you 'glue' many queries (copies, each month filtered) on top level ... but probably not recommended solution.
Related
I’m working on a web service for an API that provides a feed of posts. Right now the posts are organized chronologically, and I paginate with opaque before and after tokens which are essentially timestamps. However, we want to move from a chronological feed to an algorithmic one. While I can calculate the post scores and send the first page of data, I’m not sure how to paginate relative to that. I suppose snapshot it and bundle up like 200 sorted post IDs and serialize them into an HMAC blob for the tokens, but this is a nontrivial overhead for each request. Is there a better way to handle this kind of pagination?
If you can store post score in database you can make an index on them and access them fast. Top pages will be fast anyway. If you need pagination by rating with big depth standard approach with order by rating desc limit 50 offset 10000 will be slow. Here you can find a second order field - eg timestamp. If there’re several posts with the same ration - which one should be on top? Add this field to the sort index and query DB like where rating < ..., timestamp <... order by rating, timestamp.
If you recalculate rating often I recommend to store it on a separate table like post_id, rating. Query this table for post_ids - it should be faster then walk through the whole table and join posts on it.
Using Parse for the backend of my app. If I want to isolate data from a single column within a class (a la Excel spreadsheet column, to get a numerical sum of the data), is there a way to export a single column as numbers?
Parse doesn't have any aggregating queries like SQL, so your only option is to either keep totals updated using afterSave handlers (very common pattern), or get all the rows and aggregate the data yourself (queries are limited to 100 rows by default with 1000 as the max, so this option has issues).
To see an example of the afterSave pattern, look at this documentation:
https://parse.com/docs/cloud_code_guide#functions-aftersave
I've been doing a lot of reading lately on Cassandra, and specifically how to structure rows to take advantage of indexing/sorting, but there is one thing I am still unclear on; how many "index" items (or filters if you will) should you include in a column family (CF) row?
Specifically: I am building an app and will be using Cassandra to archive log data, which I will use for analytics.
Example types of analytic searches will include (by date range):
total visits to specific site section
total visits by Country
traffic source
I plan to store the whole log object in JSON format, but to avoid having to go through each item to get basic data, or to create multiple CF just to get basic data, I am curious to know if it's a good idea to include these above "filters" as columns (compound column segment)?
Example:
Row Key | timeUUID:data | timeUUID:country | timeUUID:source |
======================================================
timeUUID:section | JSON Object | USA | example.com |
So as you can see from the structure, the row key would be a compound key of timeUUID (say per day) plus the site section I want to get stats for. This lets me query a date range quite easily.
Next, my dilemma, the columns. Compound column name with timeUUID lets me sort & do a time based slice, but does the concept make sense?
Is this type of structure acceptable by the current "best practice", or would it be frowned upon? Would it be advisable to create a separate "index" CF for each metric I want to query on? (even when it's as simple as this?)
I would rather get this right the first time instead of having to restructure the data and refactor my application code later.
I think the idea behind this is OK. It's a pretty common way of doing timeslicing (assuming I've understood your schema anyway - a create table snippet would be great). Some minor tweaks ...
You don't need a timeUUID as your row key. Given that you suggest partitioning by individual days (which are inherently unique) you don't need a UUID aspect. A timestamp is probably fine, or even simpler a varchar in the format YYYYMMDD (or whatever arrangement you prefer).
You will probably also want to swap your row key composition around to section:time. The reason for this is that if you need to specify an IN clause (i.e. to grab multiple days) you can only do it on the last part of the key. This means you can do WHERE section = 'foo' and time IN (....). I imagine that's a more common use case - but the decision is obviously yours.
If your common case is querying the most recent data don't forget to cluster your timeUUID columns in descending order. This keeps the hot columns at the head.
Double storing content is fine (i.e. once for the JSON payload, and denormalised again for data you need to query). Storage is cheap.
I don't think you need indexes, but it depends on the queries you intend to run. If your queries are simple then you may want to store counters by (date:parameter) instead of values and just increment them as data comes in.
To improve my skills on Hector and cassandra I'm trying diffrent methods to query data out of cassandra.
Currently I'm trying to make a simple message system. I would like to get the posted messages in chronological order with the last posted message first.
In plain sql it is possible to use 'order by'. I know it is possible if you use the OrderPreservingPartitioner but this partioner is deprecated and less-efficient than the RandomPartioner. I thought of creating an index on a secondary column with a timestamp als value, but I can't figure out how to obtain the data. I'm sure that I have to use at least two queries.
My column Family looks like this:
create column family messages
with comparator = UTF8Type
and key_validation_class=LongType
and compression_options =
{sstable_compression:SnappyCompressor, chunk_length_kb:64}
and column_metadata = [
{column_name: message, validation_class: UTF8Type}
{column_name: index, validation_class: DateType, index_type: KEYS}
];
I'm not sure if I should use DataType or long for the index column, but I think that's not important for this question.
So how can I get the data sorted? If possible I like to know hows its done white the CQL syntax and whitout.
Thanks in advance.
I don't think there's a completely simple way to do this when using RandomPartitioner.
The columns within each row are stored in sorted order automatically, so you could store each message as a column, keyed on timestamp.
Pretty soon, of course, your row would grow large. So you would need to divide up the messages into rows (by day, hour or minute, etc) and your client would need to work out which rows (time periods) to access.
See also Cassandra time series data
and http://rubyscale.com/2011/basic-time-series-with-cassandra/
and https://www.cloudkick.com/blog/2010/mar/02/4_months_with_cassandra/
and http://pkghosh.wordpress.com/2011/03/02/cassandra-secondary-index-patterns/
I have a very large set of data on which I'm doing a great deal of post-query manipulation (sorting, filtering, etc etc). I would like do all this manipulation on an array of ActiveRecord objects that contains only the information necessary to the sorting, filtering, and paging, and then add the data necessary for display at the end.
For example, let's say I have a database with two tables: baseball_players and player_infos. The baseball_players table contains all of the interesting stuff (stats, team, name, birthday, etc etc etc). Player_infos contains player_id, player_rank, and player_position. I have 15000 players, and I want to find the numbers 100-150 of the best catchers of all times. I retrieve an array of all player_infos, filter to only catchers, sort by player_rank, and then retrieve records 100-150.
What is the best way to merge the resulting player_info records with their corresponding baseball_player records? Hash.merge would work perfectly, but I don't want to convert these objects to Hashes. Does ActiveRecord support something similar?
Note that I have a restriction where I cannot simply query the data using SQL - I have to manually sort and filter an object containing all 15000 player_info records.
I believe you are looking for ActiveRecord::Base#update.