Power Query - Repeat rows for each month - powerquery

I'm a beginner in Power Query and I have some data that that I want to repeat for each end of month to show a running total.
I can get the running total every time there's a transaction but how do I show a running total by month even if there are no transactions in that month?

You can merge the original table to a date table. Since you can do a running total if you were to just use amounts of 0 on days where there was no data (Find and replace nulls with 0) then you should be golden from there.

Related

Truncating a table with many subpartitions taking too long time

We have a job that loads some tables every night from our source db to target db, many of them are partitioned by range or list. Before loading a table we truncate it first and for some reason, this process is taking too long time for particular tables.
For instance,TABLE A has 62 mln rows and has been partitioned by list (column BRANCH_CODE). Number of partitions is 213. Truncating this table took 20 seconds .
TABLE B has 17 mln rows, has been range partitioned by DAY column, interval is month, every partitiion has 213 subpartitions by list (column BRANCH_CODE). So in this case, number of partitions is 60 and number of subpartitions is 12 780. Truncating this table took 15 minutes.
Is the reason of long truncate process too many partitions? Or maybe we have missed some table specs or should we set specifig storage parameters for a table?
Manually gathering fixed object and data dictionary statistics may improve the performance of metadata queries needed to support truncating 12,780 objects:
begin
dbms_stats.gather_fixed_objects_stats;
dbms_stats.gather_dictionary_stats;
end;
/
The above command may take many minutes to complete, but you generally only need to run it once after a significant change to the number of objects in your system. Adding 12,780 subpartitions can cause weird issues like this. (While you're investigating these issues, you might also want to check the space overhead associated with so many subpartitions. It's easy to waste many gigabytes of space when creating so many partitions.)

Best data structure to find price difference between two time frames

I am working on a project where my task is to find the % price change given 2 time frames for 100+ stocks in an efficient way.
The time frames are pre defined and can only be 5 mins, 10 mins, 30 mins, 1 hour, 4 hour, 12 hour, 24 hours.
Given a time frame, I need a way to efficiently figure out the % price change of all the stocks that I am tracking.
As per the current implementation, I am getting price data for those stocks every second and dumping the data to a price table.
I have another cron job which updates the % change of the stock based on the values in the price table every few seconds.
The solution is kind of in working state but its not efficient. Is there any data structure/ algorithm that I can use to find the % change efficiently?

Group By Date in Rethinkdb is slow

I am trying group by date like following for total count
r.db('analytic').table('events').group([r.row('created_at').inTimezone("+08:00").year(), r.row('created_at').inTimezone("+08:00").month(),r.row('created_at').inTimezone("+08:00").day()]).count()
However, it slow and it took over 2 seconds for 17656 records.
Is there any way to get data faster for group by date ?
If you want to group and count all the records, it's going to have to read every record, so the speed will be determined mostly by your hardware rather than the specific query. If you only want one particular range of dates, you could get that much faster with an indexed between query.

HBase Group by time stamp and count

I would like to scan the entire Hbase table and get the count of the number of records added on a particular day on daily basis.
Since we do not have multiple versions of the columns, I can use the time stamp of the latest version(which will always be one).
One approach is to use map reduce. Where map scans all the rows, and we emit timestamp(actual date) and 1 as key and value. Then the reducer, I would count based on timestamp value. Approach is similar to group count based on timestamp.
Is there a better way of doing this? Once implemented, this job would be run on a daily basis, to verify the counts with other modules(Hive table row count and solr document count). I use this, as the starting point to identify any errors, during flow at different integration points in application.

After table Partition Select query performance get slow

I am using Postgresql 9.1 and I have a table consisting of 36 column and almost 10 cr. 50 lacks record with date time stamp On this Table we have one composite primary key (DEVICE ID TEXT AND DT_DATETIME timestamp without time zone)
Now to get query performance we have partition the table day wise based on the DT_DATETIME Fild. Now After partition I see that the query data retrieval time takes more that the unpartition table. I have on the parameter called constraint_exclusion in config file.
Please any solution for the same.
Let me explain Little farther
I have 45 days GPS data in a table of size 40 GB. Every second We insert min 27 new records(2.5 million record in a day). To keep the table size at steady 45 days we delete 45th days data every night. Now This poses problem in vacuum on this table due to lock.If we have partition table we can simply drop the 45th days child table.
so by partitioning we wanted to increase query performance as well as solve locking problem. We have tried pg_repack but Twice the system load factor increased to 21 and we had to reboot the server.
Ours is a 24x7 system so there is no down time.
try to use pg_bouncer for connection management and memory management or increase RAM in your server....

Resources