sum of costs from another table by current ID - laravel

I have 4 tables:
"Cars" table, where every car has an ID.
"Operations" table that holds the operations have been done on a car.
| ID | CarID | Operation | User | JournalID |
| --- | ----- | --------- | ---- | --------- |
"Transactions" table that records the costs of the operations and other daily expenses, where every operation has 2 transactions, one is > 0 and the other is < 0 (for example: +100 and -100):
| ID | Account | JournalID | Amount | Date |
| --- | ------- | --------- | ------ | ---- |
"Journal" table that records the daily finance:
| ID | Amount | Date |
| --- | ------ | ---- |
What I want is knowing the sum of operations costs amount of a specific car, I was looping through all operations of that car and then looping for every journal row to sum, which lead to a bad result of course.
What can I do in that case to get the result as fast as possible?
NOTE: ALL THE NAMES OF THE COLUMNS ARE IN LOWER CASE

You need to join three tables based on the foreign keys like this and with some select and raw query you can get solution to your problem.
DB::table('cars')->join('operations','cars.id','operations.car_id')
->join('journal','journal.id','operations.journal_id')
->select(DB::raw('SUM(amount) as total_cost'),'cars.*')
->groupBy('cars.id')
->get();

I ended up solving it with #Segar's answer modified:
$result = DB::table('cars')
->join('operations','cars.id','operations.car_id')
->join('journal','journal.id','operations.journal')
->join('transactions','transactions.journal','operations.journal')->where('transactions.type', 0)
->select(DB::raw('SUM(amount) as total_cost'),'cars.*')
->groupBy('cars.id')
->get();
print_r($result);
Thanks.

Related

See number of live/dead tuples in MonetDB

I'm trying to get some precise row counts for all tables, given that some have deleted rows. I have been using sys.storage.count. But this seems to count the deleted ones also.
I assume using sys.storage would be simpler and faster than looping through count(*) queries, though both strategies may be fine in practice.
Maybe there is some column that counts modifications so I could just subtract the two counts?
If all you need to know is the number of actual rows in a table, I'd recommend just using a count(*) query. It's very fast. Even if you have N tables, it's easy to do a count(*) for each table.
sys.storage gives you information from the raw storage. With that, you can get pretty low-level information, but it has some edges. sys.storage.count returns the count in the storage, hence, indeed, it includes the delete rows since they are not actually deleted. As of Jul2021 version of MonetDB, deleted rows are automatically overwritten by new inserts (i.e. auto-vacuuming). So, to get the actual row count, you need to look up the 'deletes' from sys.deltas('<schema>', '<table>'). For instance:
sql>create table tbl (id int, city string);
operation successful
sql>insert into tbl values (1, 'London'), (2, 'Paris'), (3, 'Barcelona');
3 affected rows
sql>select * from tbl;
+------+-----------+
| id | city |
+======+===========+
| 1 | London |
| 2 | Paris |
| 3 | Barcelona |
+------+-----------+
3 tuples
sql>select schema, table, column, count from sys.storage where table='tbl';
+--------+-------+--------+-------+
| schema | table | column | count |
+========+=======+========+=======+
| sys | tbl | city | 3 |
| sys | tbl | id | 3 |
+--------+-------+--------+-------+
2 tuples
sql>select id, deletes from sys.deltas ('sys', 'tbl');
+-------+---------+
| id | deletes |
+=======+=========+
| 15569 | 0 |
| 15570 | 0 |
+-------+---------+
2 tuples
After we delete one row, the actual row count is sys.storage.count - sys.deltas ('sys', 'tbl').deletes:
sql>delete from tbl where id = 2;
1 affected row
sql>select * from tbl;
+------+-----------+
| id | city |
+======+===========+
| 1 | London |
| 3 | Barcelona |
+------+-----------+
2 tuples
sql>select schema, table, column, count from sys.storage where table='tbl';
+--------+-------+--------+-------+
| schema | table | column | count |
+========+=======+========+=======+
| sys | tbl | city | 3 |
| sys | tbl | id | 3 |
+--------+-------+--------+-------+
2 tuples
sql>select id, deletes from sys.deltas ('sys', 'tbl');
+-------+---------+
| id | deletes |
+=======+=========+
| 15569 | 1 |
| 15570 | 1 |
+-------+---------+
2 tuples
After we insert a new row, the deleted row is overwritten:
sql>insert into tbl values (4, 'Praag');
1 affected row
sql>select * from tbl;
+------+-----------+
| id | city |
+======+===========+
| 1 | London |
| 4 | Praag |
| 3 | Barcelona |
+------+-----------+
3 tuples
sql>select schema, table, column, count from sys.storage where table='tbl';
+--------+-------+--------+-------+
| schema | table | column | count |
+========+=======+========+=======+
| sys | tbl | city | 3 |
| sys | tbl | id | 3 |
+--------+-------+--------+-------+
2 tuples
sql>select id, deletes from sys.deltas ('sys', 'tbl');
+-------+---------+
| id | deletes |
+=======+=========+
| 15569 | 0 |
| 15570 | 0 |
+-------+---------+
2 tuples
So, the formula to compute the actual row count (sys.storage.count - sys.deltas ('sys', 'tbl').deletes) is generally applicable. sys.deltas() keeps stats for every column of a table, but the count and deletes are table wide, so you only need to check one column.

In SCD Type2 how to find latest record

I have table i have run the job in scdtype 2 load the data below
no | name | loc |
-----------------
1 | abc | hyd |
-----------------
2 | def | bang |
-----------------
3 | ghi | chennai |
then i have run the second run load the data given below
no | name | loc |
-----------------
1 | abc | hyd |
-----------------
2 | def | bang |
-----------------
3 | ghi | chennai |
--------------------
1 | abc | bang |
here no dates,flags,and run ids
how to find second updated record in this situtation
Thanks
I don't think you'll be able to distinguish between the updated record and the original record.
A Dimension table using Type 2 SCD requires additional columns that describes the period in which the record is valid (or current), exactly for this reason.
The solution is to ensure your dimension table has these columns (Typically ValidFrom and ValidTo dates or date/times, and sometimes an IsCurrent flag for good measure). Your ETL process would then populate these columns as part of making the Type 2 updates.

How to arrange multi partitions in hive?

say i have a order table, which contains multi time column(spend_time,expire_time,withdraw_time),
usually,i will query the table with the above column independently,so how do i create the partitions?
order_no | spend_time | expire_time | withdraw_time | spend_amount
A001 | 2017/5/1 | 2017/6/1 | 2017/6/2 | 100
A002 | 2017/4/1 | 2017/4/19 | 2017/4/25 | 500
A003 | 2017/3/1 | 2017/3/19 | 2017/3/25 | 1000
Usually the business situation is to calculate total spend_amount between certain spend_time or expire_time or withdraw_time, or the combination of the 3.
But with 3 time dimensions cross combination(each has about 1000 partitions) can be a lot of partitions(1000*1000*1000),is that ok and efficient?
my solution is that i create 3 tables with 3 different columns.Is this a efficient way to solve this problem?

Automatically generating documentation about the structure of the database

There is a database that contains several views and tables.
I need create a report (documentation of database) with a list of all the fields in these tables indicating the type and, if possible, an indication of the minimum/maximum values and values from first row. For example:
.------------.--------.--------.--------------.--------------.--------------.
| Table name | Column | Type | MinValue | MaxValue | FirstRow |
:------------+--------+--------+--------------+--------------+--------------:
| Table1 | day | date | ‘2010-09-17’ | ‘2016-12-10’ | ‘2016-12-10’ |
:------------+--------+--------+--------------+--------------+--------------:
| Table1 | price | double | 1030.8 | 29485.7 | 6023.8 |
:------------+--------+--------+--------------+--------------+--------------:
| … | | | | | |
:------------+--------+--------+--------------+--------------+--------------:
| TableN | day | date | ‘2014-06-20’ | ‘2016-11-28’ | ‘2016-11-16’ |
:------------+--------+--------+--------------+--------------+--------------:
| TableN | owner | string | NULL | NULL | ‘Joe’ |
'------------'--------'--------'--------------'--------------'--------------'
I think the execution of many queries
SELECT MAX(column_name) as max_value, MIN(column_name) as min_value
FROM table_name
Will be ineffective on the huge tables that are stored in Hadoop.
After reading documentation found an article about "Statistics in Hive"
It seems I must use request like this:
ANALYZE TABLE tablename COMPUTE STATISTICS FOR COLUMNS;
But this command ended with error:
Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.ColumnStatsTask
Do I understand correctly that this request add information to the description of the table and not display the result? Will this request work with view?
Please suggest how to effectively and automatically create documentation for the database in HIVE?

LISTAGG function with two columns

I have one table like this (report)
--------------------------------------------------
| user_id | Department | Position | Record_id |
--------------------------------------------------
| 1 | Science | Professor | 1001 |
| 1 | Maths | | 1002 |
| 1 | History | Teacher | 1003 |
| 2 | Science | Professor | 1004 |
| 2 | Chemistry | Assistant | 1005 |
--------------------------------------------------
I'd like to have the following result
---------------------------------------------------------
| user_id | Department+Position |
---------------------------------------------------------
| 1 | Science,Professor;Maths, ; History,Teacher |
| 2 | Science, Professor; Chemistry, Assistant |
---------------------------------------------------------
That means I need to preserve the empty space as ' ' as you can see in the result table.
Now I know how to use LISTAGG function but only for one column. However, I can't exactly figure out how can I do for two columns at the sametime. Here is my query:
SELECT user_id, LISTAGG(department, ';') WITHIN GROUP (ORDER BY record_id)
FROM report
Thanks in advance :-)
It just requires judicious use of concatenation within the aggregation:
select user_id
, listagg(department || ',' || coalesce(position, ' '), '; ')
within group ( order by record_id )
from report
group by user_id
i.e. aggregate the concatentation of department with a comma and position and replace position with a space if it is NULL.

Resources