Strapi sorting items of X,Y,Z collections by date - strapi

I have a few collections on Strapi like articles, posts, videos etc.(mentioned as X,Y,Z) I need items of X,Y,Z collections to sorted by date. I am not sure how to do it without fetching too much extra data.
For example if I need only 10 sorted items for pagination etc. I fetch 10 sorted by date items from X collection and 10 from Y collection, same for Y etc. but in the worst case most recent ones are only the first request from X collection so other requests are just waste. And doing pagination is just another problem.
I thought of with Lifecycle events, adding those created items in X, Y, Z collections to a K collection automatically as relation when they are first created(also need to delete relation when item deleted) so I can just fetch K collection as sorted, but I couldn’t be sure of the structure of K collection and since it changes items in K collection as one item created or deleted in X,Y,Z collection, if one logic is messed up, with one edge case all could go to waste, I would need to loop through all of it to re-create K collection.
Which way would be more fitting or is there anything I overlooked ?

Related

Persistent partitioning of the data

I am looking for the approach or algorithm that can help with the following requirements:
Partition the elements into a defined number of X partitions. Number of partitions might be redefined manually over time if needed.
Each partition should not have more than Y elements
Elements have a "category Id" and "element Id". Ideally all elements with the same category Id should be within the same partition. They should overflow to as few partitions as possible only if a given category has more than Y elements. Number of categories is orders of magnitude larger than number of partitions.
If the element from the set has been previously assigned to a given partition it should continue being assigned to the same partition
Account for change in the data. Existing elements might be removed and new elements within each of the categories can be added.
So far my naive approach is to:
sort the categories descending by their number of elements
keep a variable with a count-of-elements for a given partition
assign the rows from the first category to the first partition and increase the count-of-elements
if count-of-elements > Y: assign overflowing elements to the next partition, but only if the number of elements in a category is bigger than Y. Otherwise assign all elements from a given category to the next partition
continue till all elements are assigned to partitions
In order to persist the assignments store in the database all pairs: (element Id, partition Id)
On the consecutive re-assignments:
remove from the database any elements that were deleted
assign existing elements to the partitions based on (element Id, partition Id)
for any new elements follow the above algorithm
My main worry is that after few such runs we will end up with categories spread all across the partitions as the initial partitions will get all filled in. Perhaps adding a buffer (of 20% or so) to Y might help. Also if one of the categories will see a sudden increase in a number of elements the partitions will need rebalancing.
Are there any existing algorithms that might help here?
This is NP hard (knapsack) on NP hard (finding optimal way to split too large categories) on currently unknown because of future data changes. Obviously the best that you can do is a heuristic.
Sort the categories by descending size. Using a heap/priority queue for the partitions, put each category into the least full available partition. If the category won't fit, then split it as evenly as you can into the smallest number of possible partitions. My guess (experiment!) is that trying to leave partitions at the same fill is best.
On reassignment, delete the deleted elements first. Then group new elements by category. Sort the categories by how many preferred locations they have ascending, and then by descending size. Now move the categories with 0 preferred locations to the end.
For each category, if possible split its new elements across the preferred partitions, leaving them equally full. If this is not possible, put them into the emptiest possible partition. If that is not possible, then split them to put them across the fewest possible partitions.
It is, of course, possible to come up with data sets that eventually turn this into a mess. But it makes a pretty good good faith effort to try to come out well.

Parse multiple array similarity query

I am working on an algorithm that will compare 2 objects, object 1 and object 2. Each object has attributes which are 5 different arrays, array A, B, C, D, and E.
In order for the two objects to be a match, at least one item from Object 1 A must be in Object 2 A AND Object 1 B must be in Object 2 B, etc through object E must be similar. With a higher number of matches in each array A-E, the higher of a score The match will produce.
Am I going to have to pull Object 1 and object 2 then do an n^2 complexity search on each array to determine which ones exist in both arrays? Then I would go about serving a score by how many matches there were in each array, then add them up and the total would give me the score.
I feel like there has to be a better option for this, especially for Parse.com
Maybe I am going about this problem all wrong, can someone PLEASE help me with this problem. I would provide some code for this one, but I have not started the code yet because I cannot wrap my head around the best way to design it. The two object database are in place already though.
Thanks!
As I said, I may be thinking of this problem in the wrong way. If I am unclear about anything that I am trying to do, let me know and I will update accordingly.
Simplest solution:
Copy all elements from some array object1 to hash table (unordered map), and thereafter iterate array in the 2nd object, and lookup presence in the map. Thus, time complexity is O(N).
Smart solution:
Keep elements in all objects not in the "naive arrays", but in the arrays, structured as hash tables with double hashing algorithm. If so, all arrays in an objects1, 2, already will be pre-indexed, and what is you needed - iterate array, contains less number of elements, and match elements vs longest pre-indexed array.

Minimum sets to cover all sub arrays

I am explaining this question with little modification so that it becomea easy for me to explain.
There are n employees and I need to arrange an outing for them on such a day of a month on which all (or max) employees would be available for outing.
Each employee is asked to fill up an online survey stating his availability e.g. 1-31 or 15-17 etc. etc. And some might not be available for even a single day too.
There are no restrictions on the number of trips I have to arrange to cover all employees (not considering who arent available the whole month), but i want to find out minimum set of dates so as to cover all the employees. So in worst case scenario I will have to arrange trip 31 times.
Question: what is the best data structure I can use to run the best fitting algorithm on this data structure? What is the best possible way to solve this problem?
By best of course I mean time and space efficient way but I am also looking for other options to solve it.
The way I think is to maintain an array for 31 ints and initialize it to 0. Run over each employee and based on their available dates increment the array index. At the end sort the array of 31. The maximum value represents the date on qhich max employees are available. And apply the same logic on the left out employees. But the problem is to remove the left out employees. For which I will have to run over whole list of employees once to know which employees can be removed and form a new list of left out employees on which I can apply the previous logic. Running over the list twice this way to remove the employees isnt the best according to me. Any ideas?
As a first step, you should exclude employees with no available dates.
Then you problem becomes a variant of Set Cover Problem.
Your universe U is all employees, and collections of sets S are days. For each day i, you have employee j is in set S[i] iff that employee is available on day i.
That problem is NP-hard. So, unless you want an approximate solution, you must check every 31^2 combination of days, probably with some pruning.
Select an array from 1 to 31(each index is representing dates of a month).for each date you have to create a linked list(doubly) contains the emp_id who are available on that days(you can simultaneously create this list which will be sorted based on emp_id,and you can keep the information about the size of the list and the index of array which maximum employees).
The largest list must be in the solution(take it as first date).
Now compare each list with the largest list and remove those employees from the list which are already present in the selected largest list.
now do the same procedure and find the second date and so on...
this whole procedure will run in O(n^2)(because 31 is a constant value).
and space will be O(n).

Continuous monitoring and updating of popularity of items

Suppose we have 1000 items and a place to show any ten of these items at a time, to the visiting user. We can capture click rate and items which are shown together.
How can we optimally get the most popular items (say 10) out of these?
How can we continually update popularity and show the optimal 10 items?
Edit: I'm looking for the different approaches instead of implementations.
If you really want to squeeze this down, there is a dumb/simple approach for your case (show top 1%).
This optimization can happen because on average, only 1 out of 100 popularity changes will knock out one of the top 1%. (Assumes a random distribution of updates. Of course with a more typical power-law distribution, this could happen much more frequently.)
Sort the entire initial collection,
Store only the top 10 in any sorted data structure (e.g. BST)
Store the popularity score of #10 (e.g. minVisiblePopularity)
then with each subsequent popularity change in the collection, compare with minVisiblePopularity.
If the new popularity falls above the minVisiblePopularity, update the top-10 structure and minVisiblePopularity accordingly.
(Or if the old popularity was greater, but new popularity is less - e.g. former top 10 item getting knocked out).
This adds a minimal storage requirement of an extremely small binary search tree (10 items) and a primitive variable. The tree then only requires updating when a popularity change knocks out one of the previous top-10.
Self implemented:
To maintain ordered array by popularity and a hash table that contain reference to corresponding item in the popularity binary tree.
So, last 10 would the most popular items, access to them will be O(M) where M is count of items to show.
To maintain ordered array:
It can be maintained using self-balancing binary tree with log(N) complexity where N is total count of elements
http://www.sitepoint.com/data-structures-2/
As a practical option:
database can be used to store items and B-tree index can be added to popularity column; DBMS will have required optimizations here
https://en.wikipedia.org/wiki/Database_index

Grid data structure

Usually ‘expandable’ grids are represented as list of lists (list of rows, each row has list of cells) and those lists are some kind of linked lists.
Manipulating (removing, inserting) rows in this data structure is easy and inexpensive, it’s just matter of re-linking previous nodes, but when it comes to columns, for instance removing a column it become a very long operation, I need to ‘loop’ all rows to remove that indexes cells. Clearly this isn’t good behavior, at least for my case.
I’m not talking databases here; a good example I’ve found for this is some text file into a text editor, (as I know) text editors mostly splitting lines into rows and it’s easy to remove line. I want removing a column is as inexpensive and efficient as removing some row.
Finally, what I need is some Multi-dimensional grid but I think of any 2d simple grid would be applicable for MD, Am I right?
You could have a two dimensional "linked matrix" (I forget the proper terminology):
... Col 3 ... Col 4 ...
| |
... --X-- ... --Y-- ...
| |
... ... ... ... ...
Each cell has four neighbours, as shown. Additionally you need row and column headers that might indicate the row/column position, as well as pointing to the first cell in each row or column. These are most easily represented as special cells without an up neighbour (for column headers).
Inserting a new column between 3 and 4 means iterating down the cells X in col 3, and inserting a new right neighbour Z. This new cell Z links leftward to X and rightward to Y. You also need to add a new column header, and link the new cells vertically. Then the positions of all the columns after 4 can be renumbered (col 4 becomes col 5).
... Col 3 Col 4 Col 5 ...
| | |
... --X-----Z-----Y-- ...
| | |
... ... ... ... ...
The cost of inserting a column is O(n) for inserting and linking the new cells, and again O(m) for updating the column headers. It's a similar process for deletion.
Because each cell is just four links, the same algorithms are used for row insertion/deletion.
Keep your existing data structure as is. In addition, give each column a unique id when it is created. When you delete the column, just add its id to a hash table of all deleted column ids. Every time you walk a row, check each element's column id (which needs to be stored along with all other data for an element) against the hash table and splice it out of the row if it has been deleted.
The hash table and ids are unnecessary if you have a per-column data structure that each grid element can point to. Then you just need a deleted bit in that data structure.
By the way, Edmund's scheme would be fine for you as well. Even though it takes O(n) time to delete a row or column of length n, you can presumably amortize that cost against the cost of creating those n elements, making the delete O(1) amortized time.
I know that "Linked-Lists" are usually appreciated from a theoretical point of view, but in practice they are generally inefficient.
I would suggest moving toward Random Access Containers to get some speed. The most simple would be an array, but a double-ended queue or an indexed skip list / B* tree could be better, depending on the data size we are talking about.
Conceptually, it doesn't change much (yet), however you get the ability to move to a given index in O(1) (array, deque) / O(log N) (skip list / B* tree) operations, rather than O(N) with a simple linked-list.
And then it's time for magic.
Keith has already exposed the basic idea: rather than actually deleting the column, you just need to mark it as deleted and then 'jump' above it when you walk your structure. However a hash table requires a linear walk to get to the Nth column. Using a Fenwick Tree would yield an efficient way to compute the real index, and you could then jump directly there.
Note that a key benefit of marking a row as deleted is the obvious possibility of an undo operation.
Also note that you might want to build a compacting function, to eliminate the deleted columns from time to time, and not let them accumulate.

Resources