I was looking for a good way to do time series projection in BigQuery, and found this one, which nicely works to calculate correlations and slope: View Post. But it doesn't help to extend the timeline to your choice.
But can anyone please suggest a complete solution, where I can extend the timeline (x) according to my need and get projections of (Y) using a single query?
Any help will be highly appreciated.
The basic idea is to left join the model specs to a generated dates table and use it:
WITH stats AS (
SELECT * FROM UNNEST([
STRUCT( 'a' AS model, 0.3 AS slope, 11 AS intercept ),
STRUCT( 'b', 0.2, 7)
])
)
SELECT
date,
model,
slope,
intercept,
UNIX_DATE(date) AS X,
slope * UNIX_DATE(date) + intercept AS Y
FROM
UNNEST(GENERATE_DATE_ARRAY(DATE('2018-05-01'),DATE('2018-07-01'))) AS date
LEFT JOIN stats ON TRUE
ORDER BY date ASC
I did not repeat the statistics part since it is already answered, but I created a dummy table with two models which replaces it, The model can also be a bucket of course, then you'd have to left join on that as a key.
I'm also assuming you created the model with dates using unix date (days since 1970-01-01), if not you need to modify accordingly.
Related
I have a BigQuery table with point registers along a whole country, and I need to assign a "censal zone" to each one of them, which polygons are contained in another table. I've been trying to do so using a query like this one:
SELECT id_point, code_censal_zone
FROM `points_table`
JOIN `zones_table`
ON ST_CONTAINS(zone_polygon, point_geo)
The first table is quite large, so the query performes very inefficiently as it is comparing each possible pairs of (point, censal zone). However, both tables have a column identifier for the municipality in which they are in, so the question is, can rewrite my query in some way that ST_CONTAINS(*) is performed for each (point, censal zone) pair that belongs to the same municipality, hence not comparing all posible censal zones within the country for each point? Can I do this without having to read points_table multiple times?
SELECT id_point, code_censal_zone
FROM `points_table`
JOIN `zones_table`
ON 1.municipality = 2.municipality
AND ST_CONTAINS(zone_geo, point_geo)
I'm quite new to BigQuery so I don't really know if a query like this would actually do what I'am expecting, as I couldn't find anything in the documentation.
Thanks!
SELECT id_point, code_censal_zone
FROM `points_table`
JOIN `zones_table`
ON 1.municipality = 2.municipality
AND ST_CONTAINS(zone_geo, point_geo)
I don't know if I even worded the question correctly, but I'm trying to create a measure that depends on what is showing in the pivot table (using PowerPivot). In the image I posted, "DealMonth" is an expression in the PowerQuery table itself that simply takes the start date of the employee and subtracts it from the month a deal was closed in. That will show how long it took for that salesperson to close the deal. "TenureMonths" is also an expression in the PowerQuery table that calculates the tenure of the person. The values populating this screenshot are coming from a total headcount measure created. What I'm trying to do is create a separate measure that will show when the "TenureMonths" is less than the "DealMonth." So if the TenureMonths is 5, then after DealMonth of 5, the value would be 0. Is this possible?
Screenshot
I should add the following information.
"DealMonth" - Comes from the FactData table
"TenureMonths" - Comes from the DimSalesStart table
These two tables are joined by name. I feel like I'm so close because I can see what I want. The second image below is a copy/paste of the pivot table result but with my edits to show what I'd want to have shown. Basically, if(TenureMonths >= DealMonth,1,0). The trouble seems to be that since they're in two different tables, I can't make it work. The rows in the fact table are transactions, but the rows in the dim table are just the people with their start and end dates.
Desired Result
This is possible with some IF([measure1]<[measure2],blank(),[measure1]), however without seeing more of the data it will be hard to guide you specifically.
However you need to create two separate measures, one for TenureMonths and one for DealMonth, depending on the data this can be done with an aggregator forumla such as sum, min, max, etc (depends if there will be more than one value).
Then reference those two measures in the formula pattern I mentioned above, and that should give you want you want.
I figured out a solution. I added a dimension table for DealMonth itself and joined to my fact table. That allowed me to do the formulas that I needed.
I used COUNT (CUST_ID) as measure value to come up [Total No of Customer]. When I created new measure for [Average Profit per customer] by formula - [Total Profit] / [Total No of Customer], the error of Aggregate and non aggregate error prompted.
DB level:
Cust ID_____Profit
123_______100
234_______500
345_______350
567_______505
You must be looking for avg aggregate function.
Select cust_id, avg(profit)
From your_table
Group by cust_id;
Cheers!!
In your database table, you appear to have one data row per customer. Customer ID is serving as a unique primary key. The level of detail (or granularity) of the database table is the customer.
Given that, the simplest solution to your question is to display AVG([Profit]) -- without having [Cust ID] in the view (i.e. not on any shelf)
If the assumptions mentioned above are not correct, then you may need to employ other methods depending on how you define your question. I suggest making sure you understand what COUNT() actually does compared to COUNTD(). The behavior is not what people tend to assume. LOD calculations may prove useful. All described in the online help.
Put the calculations directly in the calculated field as:
SUM([Profit])/COUNT([CUST_ID])
This will give you aggregate and aggregate calculation.
If you want to show Average profit using a key like [CUST_ID], you can use LOD expression:
{FIXED [CUST_ID]: AVG[Profit]}
Crosstab report works 99%.
About 20 rows, all but one are ok.
5 columns - Company Division.
The rows are things like cost, revenue, revenue 2, etc.
All the rows that work have three attributes I'm using to select them:
Fiscal Year
Period
Solution.
The problem is there is table that lists an YTD rate for each period. This table is not Division Specific; it's company wide.
All the tables are linked to the accounting period table that has fiscal year and period. So the overall query limits data to fiscal year (?pFiscalYear?) and period <= ?pPeriod?, based on prompt page results.
The source table has this:
FY_CD PD_NO ACT_CURR_RT ACT_YTD_RT
2018 1 0.36121715 0.36121715
2018 2 0.32471476 0.34255512
2018 3 0.25240906 0.31210183
2018 4 0.33154745 0.31925874
Note the YTD rate is not an average of any of the other numbers.
When I select the ACT_YTD_RT, as a row, I want the ACT_YTD_RT that matches the selected period.
What I get is the average if I set the aggregation to average or the lowest if I set it to other aggregations. So sometimes, it looks right (if I run for period 1,2,3, as the rate kept falling), and sometimes it's wrong (period 4
returns .3121 instead of .3192).
I've tried a number of different methods and can generate garbage data (totals, min, max, average) and crossjoins but can't figure out how to get the value I'm looking for.
I want YTD_RT where fiscal year =?pFiscal? and period = ?pPeriod?.
I tried a straight if then clause:
if (sourcetable.fiscalYear = ?pFiscalYear?) and (sourcetable.Period = ?pPeriod?) then (ACT_YTD_RT)
but I get an error like this:
'ACT_YTD_RT' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause. (SQLSTATE=42000, SQLERRORCODE=8120)
If I create another query that generates the right response and try to include it, I get a crossjoin error that the query I'm referencing is trying to crossjoin several other items in the crosstab query.
A union doesn't work (different number of columns).
Not sure how a join would work since the division doesn't exist in the rate table.
I maybe could create a view in the database that did a crossjoin of the division table and the rate table, add that to the framework and then I wouldn't have a crossjoin since the solution would be in the rate "table" (really view), but that seems wrong somehow.
If I could just write a freaking parameterized query direct to the database I'd be done. But in Cognos 11 crosstabs I can't find a place for a SQL query object. And that shouldn't be necessary.
I've spent hours and hours chasing this in circles.
Anybody have any ideas?
Thanks
Paul
So the earlier problem was that this:
if (sourcetable.fiscalYear = ?pFiscalYear?) and (sourcetable.Period = ?pPeriod?) then (ACT_YTD_RT)
Generated an error like this:
'ACT_YTD_RT' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause. (SQLSTATE=42000, SQLERRORCODE=8120)
To fix the above, I had to add a cross join of the division table and the rate table as a view in the database. Then add that to the framework. Then build the data item this way:
total (
if (sourcetable.fiscalYear = ?pFiscalYear?) and (sourcetable.Period = ?pPeriod?) then (ACT_YTD_RT)
)
And now the "total" provides the missing group by. And the crossjoin in the database provides the division information so the crosstab is happy.
I still think there should have been an easier way to do this, but I have a functioning hammer at the moment.
I have a Telerik report with a graph. The graph's x-axis is a series of dates. Our client would like those dates in order from oldest to most recent. They also want the dates formatted to not include the time portion of the date. I've tried for the past day to get this to work and can't figure it out. Can someone explain how to do this?
I started out with a graph based on this query:
SELECT AnalysisNumber
, convert(varchar, DateSampled, 01) as DateSampled
, ViscosityAt100C
FROM tblSample
ORDER BY a.DateSampled ASC
The results look correct with the dates in order from oldest to most recent:
but a graph is produced where the dates were not in order:
I can't begin to include all the setting for the graph, but here is what I think is the relevant part. Let me know if there's something else I can show you.
Notice the sorting is by DateSampled which is now, of course, text not a date.
If I remove that sorting (to try to preserve the original sorting from the SQL query), the graph no longer works:
So I tried to use a date instead of text. The query is now this:
SELECT AnalysisNumber
, DateSampled
, ViscosityAt100C
FROM tblSample
ORDER BY a.DateSampled ASC
...the output looks the same:
and the graph looks like this:
The dates are sorted the way I want, but all the dates have a time element that I don't want because it's irrelevant and it takes up too much space.
I tried changing the type in SQL:
Cast(DateSampled as Date) as DateSampled
but it still showed the time in the graph.
I tried formatting it using the properties for the x-axis:
but it did not change the formats of the date. In fact, changing to any of the formats in that property did not change anything.
Lastly I tried to include both a string and date in my query:
SELECT AnalysisNumber
, convert(date, DateSampled) as DateSampledText
, DateSampled
, ViscosityAt100C
FROM tblSample
ORDER BY a.DateSampled ASC
and using the DateSampledText to group by and the DateSampled to sort by:
it just ruins my graph again:
I tried adding the text version to sorting and other variations, but never got the graph back to where it was showing data.
Sorting and formatting a graph doesn't sounds like it should be difficult. This was supposed to be one of the final changes before going into production and I've already spent so much time on this. Can someone tell me how to make this work? Thank you!
I believe you need to change the scale of your graph. I think by default it is Category Scale, but when using dates, you would need to change it to DateTime scale.
In your graph properties, where you set the Format of the X-Axis, there should be a property called Scale. Try setting it to DateTime.
Keith
The reason you can't format the dates is because the graph is treating them as strings.
You need to change the x axis to be of type DateTime Scale instead of Category Scale. Category scale is the default and is more appropriate for when you are graphing the number of Apples, Oranges, and Pears, for example.
In the standalone report designer the setting is under Presentation Category > Coordinate Systems > cartesiancoordinatesystem1 > X Axis > Scale
..
In addition to changing the scale type, because the scale expression is now not just a string, you also need to set the X value on your line series.
This setting is under Presentation Category > Series > lineseries1 > X
For some unknown reason the setting should not be "=Fields.DateSampledText", but "DateSampledText". The documentation is irritating bare of details like this.