Why is my dbt snapshot result in so many dbt_from_dates from the same day? - snapshot

Can someone help me understand the behavior of my snapshot please?
I created a snapshot and scheduled it to run once daily at 10 AM UTC. It takes less than 4 minutes to run (along with a few other snapshots)
However, when I query
select distinct dbt_from_date from mysnapshot where to_date(dbt_from_date) = '2021-10-07' (see screenshot 1) ,
the result is 993 rows spanning 8 hours of the day. I would expect that the result would be only one row with the time when the snapshot ran. Or at worst, the span should not be more than the 4 min it took to build the snapshot.
This is the code of my snapshot
{% snapshot XYZ_snapshot %}
{{
config(
target_database='analytics',
target_schema='snapshots',
unique_key='id',
strategy='timestamp',
updated_at='updated_at',
invalidate_hard_deletes=True
)
}}
select * from {{ source('XYZ', 'ABC') }}
{% endsnapshot %}
Screenshot 1

According to the DBT Snapshots documentation, it states that "For the timestamp strategy, the configured updated_at column is used to populate the dbt_valid_from, dbt_valid_to and dbt_updated_at columns." This means that your dbt_valid_from field is being populated from the "updated_at" field in your data source, rather than when the snapshot is run. This results in your values spanning a greater period than the 4-minute dbt snapshot runtime.

Related

AWS Quicksight aggregate data

i have a dataset like this
Order
id
expected date
1
11-04-2022
2
10-04-2022
2
14-04-2022
Order Event
Id
Order Id
Order status
Date
1
1
created
01-04-2022
2
1
completed
12-04-2022
3
2
created
01-04-2022
4
2
in progress
07-04-2022
5
2
completed
10-04-2022
6
3
created
10-04-2022
and i need to create a graph that show, for all order with completed status the difference between expected date and actual order date.
How can i archueve that
First, you have to join both of the tables into one because QuickSight can only work with multiple data files if they are merged. You can apply an inner join on the order ID.
Then, you can calculate the difference between the expected date and the order date and add an if-statement to filter out the orders who are not completed yet. You do this by adding a calculated field to your dataset with the following code:
ifelse(
{Order_status}="completed",
dateDiff({expected_date},{Date},"DD"),
0
)
You can also modify this field. Here, I wrote "DD" for the date difference in days, you can also select hours etc.. Also, if the order is not completed, I selected 0 as a default value. To find out more about the commands used in this calculated field, visit this AWS Docs links:
If-Else Command
Date-Diff Command
Now that the calculated field is created, you can plot it together with the order ID.
BR mylosf

Grouping then getting the sum of data in dynamic dated columns

I am currently working on a resource tracker for my company and I have each individuals capacity figure by week (weeks are in the columns and each person's information is in the row). I need to be able to sum all the time in a specific month for each job role to be able to report on.
I have currently thought about grouping the dates by selecting 4 weeks but due to my fields being dynamic and there being some 5 week months, it would not be able to accurately be able to report that months figures.
Unfortunately, you can't pivot the information due to the dates been in the columns rather than the rows.
I have yet to find any formula/code that can be used to get that information.
In the picture, I have added the information that I would like to be able to dynamically sum. The red outlines the month and the green outlines the job role information.
So I would like to be able to sum all that information under "July" and then the same for the other months so I can give my stakeholders a monthly figure of how many days capacity there is for each person/job role in that month.
=ARRAYFORMULA(QUERY({INDIRECT("Sheet1!B3:B"&COUNTA(Sheet1!B3:B)+2),
MMULT(QUERY(TRANSPOSE(QUERY(TRANSPOSE(Sheet1!E2:Z),
"where month(Col1)+1=7 and year(Col1)=2019", 0)),
"where Col1 is not null offset 1", 0), ROW(INDIRECT("A1:A"&COUNTA(
FILTER(Sheet1!A2:2, MONTH(Sheet1!A2:2)=7, YEAR(Sheet1!A2:2)=2019))))^0)},
"select Col1, sum(Col2) group by Col1 label sum(Col2)''", 0))

Cognos 11 Crosstab - need a value that doesn't have a reference to the column values

Crosstab report works 99%.
About 20 rows, all but one are ok.
5 columns - Company Division.
The rows are things like cost, revenue, revenue 2, etc.
All the rows that work have three attributes I'm using to select them:
Fiscal Year
Period
Solution.
The problem is there is table that lists an YTD rate for each period. This table is not Division Specific; it's company wide.
All the tables are linked to the accounting period table that has fiscal year and period. So the overall query limits data to fiscal year (?pFiscalYear?) and period <= ?pPeriod?, based on prompt page results.
The source table has this:
FY_CD PD_NO ACT_CURR_RT ACT_YTD_RT
2018 1 0.36121715 0.36121715
2018 2 0.32471476 0.34255512
2018 3 0.25240906 0.31210183
2018 4 0.33154745 0.31925874
Note the YTD rate is not an average of any of the other numbers.
When I select the ACT_YTD_RT, as a row, I want the ACT_YTD_RT that matches the selected period.
What I get is the average if I set the aggregation to average or the lowest if I set it to other aggregations. So sometimes, it looks right (if I run for period 1,2,3, as the rate kept falling), and sometimes it's wrong (period 4
returns .3121 instead of .3192).
I've tried a number of different methods and can generate garbage data (totals, min, max, average) and crossjoins but can't figure out how to get the value I'm looking for.
I want YTD_RT where fiscal year =?pFiscal? and period = ?pPeriod?.
I tried a straight if then clause:
if (sourcetable.fiscalYear = ?pFiscalYear?) and (sourcetable.Period = ?pPeriod?) then (ACT_YTD_RT)
but I get an error like this:
'ACT_YTD_RT' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause. (SQLSTATE=42000, SQLERRORCODE=8120)
If I create another query that generates the right response and try to include it, I get a crossjoin error that the query I'm referencing is trying to crossjoin several other items in the crosstab query.
A union doesn't work (different number of columns).
Not sure how a join would work since the division doesn't exist in the rate table.
I maybe could create a view in the database that did a crossjoin of the division table and the rate table, add that to the framework and then I wouldn't have a crossjoin since the solution would be in the rate "table" (really view), but that seems wrong somehow.
If I could just write a freaking parameterized query direct to the database I'd be done. But in Cognos 11 crosstabs I can't find a place for a SQL query object. And that shouldn't be necessary.
I've spent hours and hours chasing this in circles.
Anybody have any ideas?
Thanks
Paul
So the earlier problem was that this:
if (sourcetable.fiscalYear = ?pFiscalYear?) and (sourcetable.Period = ?pPeriod?) then (ACT_YTD_RT)
Generated an error like this:
'ACT_YTD_RT' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause. (SQLSTATE=42000, SQLERRORCODE=8120)
To fix the above, I had to add a cross join of the division table and the rate table as a view in the database. Then add that to the framework. Then build the data item this way:
total (
if (sourcetable.fiscalYear = ?pFiscalYear?) and (sourcetable.Period = ?pPeriod?) then (ACT_YTD_RT)
)
And now the "total" provides the missing group by. And the crossjoin in the database provides the division information so the crosstab is happy.
I still think there should have been an easier way to do this, but I have a functioning hammer at the moment.

Daily unique count, weekly unique count in the same Timelion chart

I want to visualize the unique count for a field aggregated daily and weekly per day in the same sheet. But timelion aggregation affects the entire sheet instead of just a single chart.
The expression I am using to get the daily unique count is
.es(metric='cardinality:userId').bars().title('Unique users over time')
If I change the bucket range on the right to 1d, I get the correct chart. How do I create the weekly aggregation?
There is a possibility to specify the interval used for timelion expressions by specifying interval as 1d or respectively 1w in the es() function. For details, please see the docs here.
In your case this should work with the following expression:
.es(metric='cardinality:userId',interval=1w).bars().title('Unique users per week')
Be aware of the comment inside the docs, stating that this should not be used in favor of working with the interval picker. But probably this is a use case where it is okay to do it like this...

Kibana subtracting the values of 2 indices

I have 2 indices in kibana 4:
1st index is basing time from events (Date Created)
2nd Index is basing time from events (Date Closed)
Both are date values and I want to create a query which will return the total amount of docs Date Created (Today) - total amount of docs Date Closed (Today)
If this is not possible is it possible if i have both fields in one index?
Yes you need to have both the date values within the same index so that you can do the subtraction using a scripted field in Kibana. You could simply have your script as such:
doc.['date_created'].value - doc.['date_closed'].value
----------------^----------------------------------------^ Make sure to give your exact field names
And then you could use this scripted field as a Date Historgram to show the total count of the docs within the retrieved date range.
Hope this helps!

Resources