I have a SQL Server table Energy_CC with two columns: time [int] (Epoch time) and E_CC14 [float]. Every 30 minutes, the total amount of my energy (kWh) is appended to the data table - something like this:
time
E_CC14
1670990400
5469.00223
1670992200
5469.02791
1670994000
5469.056295
1670995800
5469.082706
1670997600
5469.10558
1670999400
5469.128534
I tried this SQL statement:
SELECT
MONTH(DATEADD(SS, i.time, '1970-01-01')),
i.E_CC14 AS mwh,
iprev.E_CC14 AS prevmwh,
(i.E_CC14 - iprev.E_CC14) AS diff
FROM
Energy_CC i
LEFT OUTER JOIN
Energy_CC iprev ON MONTH(iprev.time) = MONTH(i.time) - MONTH(DATEADD(month, -1, i.time))
AND DAY(iprev.time) = 1
WHERE
DAY(i.time) = 1
GROUP BY
MONTH(i.time), i.E_CC14, iprev.E_CC14;
I expect the result for monthly like this :
time
E_CC14
DECEMBER-22
10223
Any help would be greatly appreciated.
Related
my company has numbers of shops around all the locations. They raised a request for delivering the item to their shop which they can sell . We wanted to understand how much time the company takes to deliver the item in minutes.However, we don't want to add the time in our elapsed time when the shop is closed i.e.
lets consider shop opening and closing time are
now elapsed time
When I deduct complain time and resolution time then I get calculatable elasped time in minutes but I need Required elapsed time in minutes so in the first case out of 2090 minutes those minutes are deducated when shop was closed. I need to write an oracle query to calcualted the required elapsed time in minutes which is in green.
help what query we can write.
One formula to get the net time is as follows:
For every day involved add up the opening times. For your first example this is two days 2021-01-11 and 2021-01-12 with 13 daily opening hours (09:00 - 22:00). That makes 26 hours.
If the first day starts after the store opens, subtract the difference. 10:12 - 09:00 = 1:12 = 72 minutes.
If the last day ends before the store closes, subtract the difference. 22:00 - 21:02 = 0:58 = 58 minutes.
Oracle doesn't have a TIME datatype, so I assume you are using Oracle's datetime data type they call DATE to store the opening and closing time and we must ignore the date part. And you are probably using the DATE type for the complain_time and the resolution_time, too.
In below query I convert the time parts to minutes right away, so the calculations get a tad more readable later.
with s as
(
select
shop,
extract(hour from opening_time) * 60 + extract(minute from opening_time) as opening_minute,
extract(hour from closing_time) * 60 + extract(minute from closing_time) as closing_minute
from shops
)
, r as
(
select
request, shop, complain_time, resolution_time,
trunc(complain_time) as complain_day,
trunc(resolution_time) as resolution_day,
extract(hour from complain_time) * 60 + extract(minute from complain_time) as complain_minute,
extract(hour from resolution_time) * 60 + extract(minute from resolution_time) as resolution_minute
from requests
)
select
r.request, r.shop, r.complain_time, r.resolution_time,
(r.resolution_day - r.complain_day + 1) * 60
- case when r.complain_minute > s.opening_minute) then r.complain_minute - s.opening_minute else 0 end
- case when r.resolution_minute < s.opening_minute) then s.closing_minute - r.resolution_minute else 0 end
as net_duration_in_minutes
from r
join s on s.shop = r.shop
order by r.request;
I'm testing Spark performance with very many rows table.
What I did is very simple.
Prepare csv file which has many rows and only 2 data records.
eg, csv file is like as follows:
col000001,col000002,,,,,,,col100000
dtA000001,dtA000002,,,,,,,,dtA100000
dtB000001,dtB000002,,,,,,,,dtB100000
dfdata100000 = sqlContext.read.csv('../datasets/100000c.csv', header='true')
dfdata100000.registerTempTable("tbl100000")
result = sqlContext.sql("select col000001,ol100000 from tbl100000")
Then get 1 row by show(1)
%%time
result.show(1)
File sizes are as follows(very small).
File name shows the number of rows:
$ du -m *c.csv
3 100000c.csv
1 10000c.csv
1 1000c.csv
1 100c.csv
1 20479c.csv
2 40000c.csv
2 60000c.csv
3 80000c.csv
Results are like as follows:
As you can see, the execution time is exponentially increase.
Example result:
+---------+---------+
|col000001|col100000|
+---------+---------+
|dtA000001|dtA100000|
+---------+---------+
only showing top 1 row
CPU times: user 218 ms, sys: 509 ms, total: 727 ms
Wall time: 53min 22s
Question1: Is it an acceptable result? Why is the execution time exponentially increase?
Question2: Is there any other method to do faster?
I have a service that can be started or stopped. Each operation generates a record with timestamp and operation type. Ultimately, I end up with a series of timestamped operation records. Now I want to calculate the up-time of the service during a day. The idea is simple. For each pair of start/stop records, compute the timespan and sum up. But I don't know how to implement it with Hive, if possible at all. It's OK that I create tables to store intermediate results for this. This is the main blocking issue, and there are some other minor issues as well. For example, some start/stop pairs may span across a single day. Any idea how to deal with this minor issue would be appreciated too.
Sample Data:
Timestamp Operation
... ...
2017-09-03 23:59:00 Start
2017-09-04 00:01:00 Stop
2017-09-04 06:50:00 Start
2017-09-04 07:00:00 Stop
2017-09-05 08:00:00 Start
... ...
The service up-time for 2017-09-04 should then be 1 + 10 = 11 mins. Note that the first time interval spans across 09-03 and 09-04, and only the part that falls within 09-04 is counted.
select to_date(from_ts) as dt
,sum (to_unix_timestamp(to_ts) - to_unix_timestamp(from_ts)) / 60 as up_time_minutes
from (select case when pe.i = 0 then from_ts else cast(date_add(to_date(from_ts),i) as timestamp) end as from_ts
,case when pe.i = datediff(to_ts,from_ts) then to_ts else cast(date_add(to_date(from_ts),i+1) as timestamp) end as to_ts
from (select `operation`
,`Timestamp` as from_ts
,lead(`Timestamp`) over (order by `Timestamp`) as to_ts
from t
) t
lateral view posexplode(split(space(datediff(to_ts,from_ts)),' ')) pe as i,x
where `operation` = 'Start'
and to_ts is not null
) t
group by to_date(from_ts)
;
+------------+-----------------+
| dt | up_time_minutes |
+------------+-----------------+
| 2017-09-03 | 1.0 |
| 2017-09-04 | 11.0 |
+------------+-----------------+
Game_ID | BeginTime | EndTime
1 | 1235000140| 1235002457
2 | 1235000377| 1235003300
3 | 1235000414| 1235056128
1 | 1235000414| 1235056128
2 | 1235000377| 1235003300
Here i would like to get the Milliseconds between two Epoch time fields, BeginTime and EndTime. Then Calculate the Average time for each games.
games = load 'games.txt' using PigStorage('|') as (gameid: int, begin_time: long, end_time:long);
dump games;
(1,1235000140,1235002457)
(2,1235000377,1235003300)
(3,1235000414,1235056128)
(1,1235000414,1235056128)
(2,1235000377,1235003300)
Step 1: Calculate the time difference
difference = foreach games generate gameid, end_time - begin_time as time_lapse;
dump difference;
(1,2317)
(2,2923)
(3,55714)
(1,55714)
(2,2923)
Step 2: Group the data on Game_ID
game_group = group difference by gameid;
dump game_group;
(1,{(1,55714),(1,2317)})
(2,{(2,2923),(2,2923)})
(3,{(3,55714)})
Step 3: Then the Average
average = foreach game_group generate group, AVG(difference.time_lapse);
dump average;
(1,29015.5)
(2,2923.0)
(3,55714.0)
I am trying to improve the performance of Postgresql (version 9.2.3) inserts for a simple table with 1 bigint, 1 varchar, 1 float and 2 time stamps.
A simple replication of my JDBC program is attached. Here are the important points I want to mention:
I am running this program on the same system which hosts the PostgreSQL DB. (64 GB RAM and 8 CPUs.)
I am using INSERT statements AND I DO NOT want to use COPY statement. I have read and understand the COPY performs better but I am tuning the insert performance here.
I am using PreparedStatement.addbatch() and executeBatch() to insert in batches of 1000's
The performance of the insert scales well when I increase the batch size but flattens out at around a batch size of 8000. What I notice is that the postgresql thread on the system is CPU saturated as observed by the "top" command. The CPU usage of the postgres thread steadily increases and tops out at 95% when the batch size reaches 8K. The other interesting thing I notice is that it is using only up to 200MB of RAM per thread.
In comparison an Oracle DB scales much better and the the same number of insets with comparable batch sizes finish 3 to 4 times faster. I logged on to the Oracle DB machine (Sun Solaris machine) and noticed that the CPU utilization peaks out at a much bigger batch size and also each Oracle thread is using 6 to 8 GB of memory.
Given that I have memory to spare is there a way to increase the memory usage for a postgres thread for better performance?
Here are my current postgresql settings:
temp_buffers = 256MB
bgwriter_delay = 100ms
bgwriter_lru_maxpages = 1000
bgwriter_lru_multiplier = 4
maintenance_work_mem = 2GB
shared_buffers = 8GB
vacuum_cost_limit = 800
work_mem = 2GB
max_connections = 100
checkpoint_completion_target = 0.9
checkpoint_segments = 32
checkpoint_timeout =10min
checkpoint_warning =1min
wal_buffers = 32MB
wal_level = archive
cpu_tuple_cost = 0.03
effective_cache_size = 48GB
random_page_cost = 2
autovacuum = on
autovacuum_vacuum_cost_delay = 10ms
autovacuum_max_workers = 6
autovacuum_naptime = 5
autovacuum_vacuum_threshold = 100
autovacuum_analyze_threshold = 100
autovacuum_vacuum_scale_factor = 0.2
autovacuum_analyze_scale_factor = 0.1
autovacuum_vacuum_cost_limit = -1
Here are the measurements:
Time to insert 2 million rows in postgreql.
batch size - Execute batch time (sec)
1K - 73
2K - 64
4K - 60
8K - 59
10K - 59
20K - 59
40K - 59
Time to insert 4 million rows in Oracle.
batch size - Execute batch time (sec)
1K - 14
2K - 12
4K - 10
8K - 8.9
10K - 8.4
As you can see Oracle is inserting a 4 million row table much faster than Postgresql.
Here is the snippet of the program I am using for insertion.
stmt.executeUpdate("CREATE TABLE "
+ tableName
+ " (P_PARTKEY bigint not null, "
+ " P_NAME varchar(55) not null, "
+ " P_RETAILPRICE float not null, "
+ " P_TIMESTAMP Timestamp not null, "
+ " P_TS2 Timestamp not null)");
PreparedStatement pstmt = conn.prepareStatement("INSERT INTO " + tableName + " VALUES (?, ?, ?, ?, ? )");
for (int i = start; i <= end; i++) {
pstmt.setInt(1, i);
pstmt.setString(2, "Magic Maker " + i);
pstmt.setFloat(3, i);
pstmt.setTimestamp(4, new Timestamp(1273017600000L));
pstmt.setTimestamp(5, new Timestamp(1273017600000L));
pstmt.addBatch();
if (i % batchSize == 0) {
pstmt.executeBatch();
}
}
autovacuum_analyze_scale_factor = 0.002
autovacuum_vacuum_scale_factor = 0.001
You might need to change the above parameters
Specifies a fraction of the table size to add to autovacuum_analyze_threshold when deciding whether to trigger an ANALYZE. The default is 0.1 (10% of table size). In our case we have lowered that to 0.002 to make it more aggressive.
Specifies a fraction of the table size to add to autovacuum_vacuum_threshold when deciding whether to trigger a VACUUM. The default is 0.2 (20% of table size).