How to Sort Data In a collection in Laravel - laravel

I have the following relations
Transaction Table is the parent table for ETransaction and ATransaction and belongs to transactions table
$transactionA= Transaction::with('tAccount')->has('tAccount')->get();
$transactionE= Transaction::with('tExchange')->has('tExchange')->get();
$collection = collect([$transactionE,$transactionA]);
$sorted = $collection->sortBy('created_at') do not work for me

The main problem you encounter isn't necessarily about sorting, but about building up the initial collection.
Right now, you make two collections, transactionA and transactionE. You then initialize a third collection collection that contains both transactionA and transactionE.
You can think of this as transactionA being a box of records, and transactionE being another box of records. What you want is one big box of records, but the current code puts both boxes in another, bigger box. When sorting the bigger box, all you sort is the order in which the smaller boxes end up in the bigger box.
What you presumably want instead, is to merge the contents of both boxes and sort the ensemble. You can do so by merging the two collections:
$collection = $transactionA->merge($transactionE);

I'm not sure why you even need separate queries. Stratadox's answer shows how to query, merge and sort, but you can do that in a single query using Eloquent:
$collection = Transaction::with(["tAccount", "tExchange"])
->has("tAccount")
->orHas("tExchange")
->orderBy("created_at")
->get();
In a single query, this will look for all Transaction records that have either a tAccount or tExchange record associated, sort it by the created_at timestamp and return it in a single call. Pushing the logic to the Collection class can be inefficient, so let the database handle it when possible.

Related

Eloquent ORM get latest items query

I am trying to do get all the latest items and sort by id descending (i.e. get all items that were just added with a limit and offset).
So I did this:
$products = Product::all()
->slice($request->get('offset'))
->take($request->get('limit'))
->sortByDesc('id')
->toBase();
However it seems when I have more that that limit, then I dont have the right order. It gets me say 10 products but not sorted corrected. Any idea how to do this with Eloquent ORM?
You are probably intending to have the database handle the offset and skipping and ordering instead of pulling all the possible records then taking only what you want, then sorting them ... if you were going to do it your way you would need to sort before you skip and take, by the way.
Using the database to the the filtering and ordering:
$products = Product::skip($request->input('offset'))
->take($request->input('limit'))
->orderBy('id', 'desc')
->get();
I think the issue is you're using ::all() first, which returns all Product instances in a Collection, then using collection methods. Since these methods act in order of usage, you're slicing and offsetting before sorting, so you'll get the same products each time. Use proper Builder syntax to handle this properly and more efficiently:
$products = Product::offset($request->input("offset"))
->limit($request->input("limit"))
->orderBy("id", "DESC")
->get();
Because this is a Builder instance, the query will be compiled and executed according to your Database's grammar logic, and in a single query. There's nothing wrong in using Collection logic, you'd simply have to use the correct order of methods (sortByDesc() first, then slice(), then take()), but this is incredibly inefficient as you have to handle every Product in your database.

Sorting Issue After Table Render in Laravel DataTables as a Service Implementation

I have implemented laravel dataTable as a service.
The initial two columns are actual id and names so, I am able to sort it asc/desc after the table renders.
But the next few columns renders after performing few calculations, i.e. these values are not fetched directly from any column rather it is processed.
I am unable to sort these columns where calculations were performed, and I get this error. And I know it is looking for that particular column for eg outstanding_amount which I don't have in the DB, rather it is a calculated amount from two or more columns that are in some other tables.
Any Suggestions on how to overcome this issue?
It looks like you're trying to sort by values that aren't columns, but calculated values.
So the main issue here is to give Eloquent/MySql the data it needs to provide the sorting.
// You might need to do some joins first
->addSelect(DB::raw('your_calc as outstanding_amount'))
->orderBy('outstanding_amount') // asc can be omitted as this is the default
// Anternative: you don't need the value sorted by
// Don't forget any joins you might need
->orderByRaw('your_calc_for_outstanding_amount ASC')
For SQL functions it'll work as follow
->addSelect(DB::raw('COUNT(products.id) as product_count'));
->orderByRaw(DB::raw('COUNT(products.id)'),'DESC');

Random exhaustive (non-repeating) selection from a large pool of entries

Suppose I have a large (300-500k) collection of text documents stored in the relational database. Each document can belong to one or more (up to six) categories. I need users to be able to randomly select documents in a specific category so that a single entity is never repeated, much like how StumbleUpon works.
I don't really see a way I could implement this using slow NOT IN queries with large amount of users and documents, so I figured I might need to implement some custom data structure for this purpose. Perhaps there is already a paper describing some algorithm that might be adapted to my needs?
Currently I'm considering the following approach:
Read all the entries from the database
Create a linked list based index for each category from the IDs of documents belonging to the this category. Shuffle it
Create a Bloom Filter containing all of the entries viewed by a particular user
Traverse the index using the iterator, randomly select items using Bloom Filter to pick not viewed items.
If you track via a table what entries that the user has seen... try this. And I'm going to use mysql because that's the quickest example I can think of but the gist should be clear.
On a link being 'used'...
insert into viewed (userid, url_id) values ("jj", 123)
On looking for a link...
select p.url_id
from pages p left join viewed v on v.url_id = p.url_id
where v.url_id is null
order by rand()
limit 1
This causes the database to go ahead and do a 1 for 1 join, and your limiting your query to return only one entry that the user has not seen yet.
Just a suggestion.
Edit: It is possible to make this one operation but there's no guarantee that the url will be passed successfully to the user.
It depend on how users get it's random entries.
Option 1:
A user is paging some entities and stop after couple of them. for example the user see the current random entity and then moving to the next one, read it and continue it couple of times and that's it.
in the next time this user (or another) get an entity from this category the entities that already viewed is clear and you can return an already viewed entity.
in that option I would recommend save a (hash) set of already viewed entities id and every time user ask for a random entity- randomally choose it from the DB and check if not already in the set.
because the set is so small and your data is so big, the chance that you get an already viewed id is so small, that it will take O(1) most of the time.
Option 2:
A user is paging in the entities and the viewed entities are saving between all users and every time user visit your page.
in that case you probably use all the entities in each category and saving all the viewed entites + check whether a entity is viewed will take some time.
In that option I would get all the ids for this topic- shuffle them and store it in a linked list. when you want to get a random not viewed entity- just get the head of the list and delete it (O(1)).
I assume that for any given <user, category> pair, the number of documents viewed is pretty small relative to the total number of documents available in that category.
So can you just store indexed triples <user, category, document> indicating which documents have been viewed, and then just take an optimistic approach with respect to randomly selected documents? In the vast majority of cases, the randomly selected document will be unread by the user. And you can check quickly because the triples are indexed.
I would opt for a pseudorandom approach:
1.) Determine number of elements in category to be viewed (SELECT COUNT(*) WHERE ...)
2.) Pick a random number in range 1 ... count.
3.) Select a single document (SELECT * FROM ... WHERE [same as when counting] ORDER BY [generate stable order]. Depending on the SQL dialect in use, there are different clauses that can be used to retrieve only the part of the result set you want (MySQL LIMIT clause, SQLServer TOP clause etc.)
If the number of documents is large the chance serving the same user the same document twice is neglibly small. Using the scheme described above you don't have to store any state information at all.
You may want to consider a nosql solution like Apache Cassandra. These seem to be ideally suited to your needs. There are many ways to design the algorithm you need in an environment where you can easily add new columns to a table (column family) on the fly, with excellent support for a very sparsely populated table.
edit: one of many possible solutions below:
create a CF(column family ie table) for each category (creating these on-the-fly is quite easy).
Add a row to each category CF for each document belonging to the category.
Whenever a user hits a document, you add a column with named and set it to true to the row. Obviously this table will be huge with millions of columns and probably quite sparsely populated, but no problem, reading this is still constant time.
Now finding a new document for a user in a category is simply a matter of selecting any result from select * where == null.
You should get constant time writes and reads, amazing scalability, etc if you can accept Cassandra's "eventually consistent" model (ie, it is not mission critical that a user never get a duplicate document)
I've solved similar in the past by indexing the relational database into a document oriented form using Apache Lucene. This was before the recent rise of NoSQL servers and is basically the same thing, but it's still a valid alternative approach.
You would create a Lucene Document for each of your texts with a textId (relational database id) field and multi valued categoryId and userId fields. Populate the categoryId field appropriately. When a user reads a text, add their id to the userId field. A simple query will return the set of documents with a given categoryId and without a given userId - pick one randomly and display it.
Store a users past X selections in a cookie or something.
Return the last selections to the server with the users new criteria
Randomly choose one of the texts satisfying the criteria until it is not a member of the last X selections of the user.
Return this choice of text and update the list of last X selections.
I would experiment to find the best value of X but I have in mind something like an X of say 16?

How do I sort, group a query properly that returns a tuple of an orm object and a custom column?

I am looking for a way to have a query that returns a tuple first sorted by a column, then grouped by another (in that order). Simply .sort_by().group_by() didn't appear to work. Now I tried the following, which made the return value go wrong (I just got the orm object, not the initial tuple), but read for yourself in detail:
Base scenario:
There is a query which queries for test orm objects linked from the test3 table through foreign keys.
This query also returns a column named linked that either contains true or false. It is originally ungrouped.
my_query = session.query(test_orm_object)
... lots of stuff like joining various things ...
add_column(..condition that either puts 'true' or 'false' into the column..)
So the original return value is a tuple (the orm object, and additionally the true/false column).
Now this query should be grouped for the test orm objects (so the test.id column), but before that, sorted by the linked column so entries with true are preferred during the grouping.
Assuming the current unsorted, ungrouped query is stored in my_query, my approach to achieve this was this:
# Get a sorted subquery
tmpquery = my_query.order_by(desc('linked')).subquery()
# Read the column out of the sub query
my_query = session.query(tmpquery).add_columns(getattr(tmpquery.c,'linked').label('linked'))
my_query = my_query.group_by(getattr(tmpquery.c, 'id')) # Group objects
The resulting SQL query when running this is (it looks fine to me btw - the subquery 'anon_1' is inside itself properly sorted, then fetched and its id aswell as the 'linked' column is extracted (amongst a few other columns SQLAlchemy wants to have apparently), and the result is properly grouped):
SELECT anon_1.id AS anon_1_id, anon_1.name AS anon_1_name, anon_1.fk_test3 AS anon_1_fk_test3, anon_1.linked AS anon_1_linked, anon_1.linked AS linked
FROM (
SELECT test.id AS id, test.name AS name, test.fk_test3 AS fk_test3, CASE WHEN (anon_2.id = 87799534) THEN 'true' ELSE 'false' END AS linked
FROM test LEFT OUTER JOIN (SELECT test3.id AS id, test3.fk_testvalue AS fk_testvalue
FROM test3)
AS anon_2 ON anon_2.fk_testvalue = test.id ORDER BY linked DESC
)
AS anon_1 GROUP BY anon_1.id
I tested it in phpmyadmin, where it gave me, as expected, the id column (for the orm object id), then the additional columns SQL_Alchemy seems to want there, and the linked column. So far, so good.
Now my expected return values would be, as they were from the original unsorted, ungrouped query:
A tuple: 'test' orm object (anon_1.id column), 'true'/'false' value (linked column)
The actual return value of the new sorted/grouped query is however (the original query DOES indeed return a touple before the code above is applied):
'test' orm object only
Why is that so and how can I fix it?
Excuse me if that approach turns out to be somewhat flawed.
What I actually want is, have the original query simply sorted, then grouped without touching the return values. As you can see above, my attempt was to 'restore' the additional return value again, but that didn't work. What should I do instead, if this approach is fundamentally wrong?
Explanation for the subquery use:
The point of the whole subquery is to force SQLAlchemy to execute this query separately as a first step.
I want to order the results first, and then group the ordered results. That seems to be hard to do properly in one step (when trying manually with SQL I had issues combining order and group by in one step as I wanted).
Therefore I don't simply order, group, but I order first, then subquery it to enforce that the order step is actually completed first, and then I group it.
Judging from manual PHPMyAdmin tests with the generated SQL, this seems to work fine. The actual problem is that the original query (which is now wrapped as the subquery you were confused about) had an added column, and now by wrapping it up as a subquery, that column is gone from the overall result. And my attempt to readd it to the outer wrapping failed.
It would be much better if you provided examples. I don't know if these columns are in separate tables or what not. Just looking at your first paragraph, I would do something like this:
a = session.query(Table1, Table2.column).\
join(Table2, Table1.foreign_key == Table2.id).\
filter(...).group_by(Table2.id).order_by(Table1.property.desc()).all()
I don't know exactly what you're trying to do since I need to look at your actual model, but it should look something like this with maybe the tables/objs flipped around or more filters.

Ultragrid : how to best add a set of sub rows programatically?

I have an Infragistics Ultragrid that is being used to display a list of attributes. Sometimes the attribute is an array so I am adding a sub row for each element so the user can optionally expand the row showing the array attribute and see all the element values.
So for each element I use:
var addedRow = mGrid.DisplayLayout.Bands[1].AddNew();
which if I have 300 elements gets called 300 times and takes around 9 seconds (I have profiled the application and this call is taking 98% of the elapsed time)
Is there a way to add these sub rows more efficiently?
I know I'm late with an answer, but hopefully someone can use my answer anyway. Whenever I need to set rows and subrows for ultragrid, I simply set the datasource by using linq and anonymous types to generate the propper collection.
say you have a list of persons (id, Name), and a list of cars (id, CarName, and OwnerId (personId))
now you like to show a gridview showing all persons, with an expandabel subrow providing which cars they own. simply do the following.
List<Person> persons = GetAllPersons();
List<Car> cars = GetAllCars();
grid.DataSource = persons.Select(x => new {x.Id, x.Name, Cars = cars.Where(z => z.OwnerId == x.Id).ToList()}).ToList();
Note the anonymous type I make, this will generate a list of objects having an id, Name, and a collection of cars. Also note that I call the ToList method twice in the last line, this is necessary in order to get ultragrid to bind properly.
Note further more that if you need to edit the gridview, the above method migth not be sufficient, as the ultragrid needs an underlaying datasource for modifying, and I dont believe that this will cope. BUT on the internet you'll find some extensions that can copy a Linq collection into a DataTable, doing that and then you should also be able of editing the grid.
I have often used the above method and it performs extremely well, even for huge collections.
Hope this helps somebody
you might want to use ultraGrid1.BeginUpdate(); and ultraGrid1.EndUpdate(true); to stop screen from repainting. made huge performance benefit for my app.
Also in my case I was populating nearly >10,000 rows, so have used UltraDataSource

Resources