I am very new to programming and have what I believe is a basic question I can't find the answer to.
I am trying to use this function, the statement is below :
SELECT
*
FROM
ML.DETECT_ANOMALIES(MODEL mydataset.my_arima_plus_model,
STRUCT(0.9 AS anomaly_prob_threshold),
TABLE mydataset.my_time_series_data_table)
#it is from this link
https://cloud.google.com/bigquery-ml/docs/reference/standard-sql/bigqueryml-syntax-detect-anomalies
My question is, how can I create the table for "TABLE mydataset.my_time_series_data_table)" and what should I put in the columns for headers and data?
The mydataset.my_time_series_data_table is the data that you want to detect the anomalies. In the example that you linked, the timestamp is date and time series data is temperature, as mentioned on that page:
.. and TIME_SERIES_TIMESTAMP_COL, and the TIME_SERIES_DATA_COL
column names are date and temperature ...
So,
what should I put in the columns for headers and data?
Your table should have at least date and temperature two columns.
Related
I have a table ("Issues") which I am creating in PowerBI from a JIRA data connector, so this changes each time I refresh it. I have three columns I am using
Form Name
Effort
Status
I created a second table and have summarized the Form Names and obtained the Total Effort:
SUMMARIZE(Issues,Issues[Form Name],"Total Effort",SUM(Issues[Effort (Days)]))
But I also want to add in a column for
Total Effort for each form name where the Status field is "Done"
My issue is that I don't know how to compare both tables / form names since these might change each time I refresh the table.
I need to write a conditional, something like
For each form name, print the total effort for each form name, print the total effort for each form name where the status is done
I have tried SUMX, CALCULATE, SUM, FILTER but cannot get these to work - can someone help, please?
If all you need is to add a column to your summarized table that sums "Effort" only when the Status is set to 'Done' -- then this is the right place to use CALCULATE.
Table =
SUMMARIZE(
Issues,
Issues[Form Name],
"Total Effort", SUM(Issues[Effort]),
"Total Effort (Done)", CALCULATE(SUM(Issues[Effort]), Issues[Status] = "Done")
)
Here is a quick capture of what some of the mock data that I used to test this looks like. The Matrix is just the mock data with [Form Name] on the rows and [Status] on the columns. The last table shows the 'summarized' data calculated by the DAX above. You can compare this to the values in the matrix and see that they tie out.
I don't know if I even worded the question correctly, but I'm trying to create a measure that depends on what is showing in the pivot table (using PowerPivot). In the image I posted, "DealMonth" is an expression in the PowerQuery table itself that simply takes the start date of the employee and subtracts it from the month a deal was closed in. That will show how long it took for that salesperson to close the deal. "TenureMonths" is also an expression in the PowerQuery table that calculates the tenure of the person. The values populating this screenshot are coming from a total headcount measure created. What I'm trying to do is create a separate measure that will show when the "TenureMonths" is less than the "DealMonth." So if the TenureMonths is 5, then after DealMonth of 5, the value would be 0. Is this possible?
Screenshot
I should add the following information.
"DealMonth" - Comes from the FactData table
"TenureMonths" - Comes from the DimSalesStart table
These two tables are joined by name. I feel like I'm so close because I can see what I want. The second image below is a copy/paste of the pivot table result but with my edits to show what I'd want to have shown. Basically, if(TenureMonths >= DealMonth,1,0). The trouble seems to be that since they're in two different tables, I can't make it work. The rows in the fact table are transactions, but the rows in the dim table are just the people with their start and end dates.
Desired Result
This is possible with some IF([measure1]<[measure2],blank(),[measure1]), however without seeing more of the data it will be hard to guide you specifically.
However you need to create two separate measures, one for TenureMonths and one for DealMonth, depending on the data this can be done with an aggregator forumla such as sum, min, max, etc (depends if there will be more than one value).
Then reference those two measures in the formula pattern I mentioned above, and that should give you want you want.
I figured out a solution. I added a dimension table for DealMonth itself and joined to my fact table. That allowed me to do the formulas that I needed.
I have Table 1. It is filled with dates a inspection is going on. Plumbing or Garden inspections for example.
Table 2 links to these appointments and has additional columns with details such as a Person assigned to the inspection, and what property the inspection is at. I need these two tables to be separate as described, and they are linked by a simple ID column.
Is it possible at all to add a column to Table 2 called 'Last Date of Plumbing Inspection'. The idea is for any given Property in Table 2, there can be multiple inspection entries in Table 1 for it. The point of this column is that it should look in Table 1, find the matching ID, find the latest inspection date out of all the Plumbing-related inspections, and then set the column value to that.
The problem I am having with this is it seems like calculated columns can ONLY implement logic using the columns of the table the calculated column was created in. In Table 2, I can't create a calculated column that interacts with Table 1 at all. I could create a look up column, but I can't combine calculated columns with look up columns. Is there a way to build this latest inspection date column without too much complexity?
Actually you can create a Rollup field and put a MAX aggregate function for achieving your requirement from related table. Read more
The project I'm working on has monthly data for gas prices in California. The data is taken from a website and loaded into a table. I've done this part - the data is current until March 2016. We are now in April, which does not have any data yet, so the next step I need to do is use March's data and place that into April.
Here is what my table looks like right now:
My question is: How do I add a new row with first column data of 201604 and use March's price?
Let me know if I need to add more information.
INSERT INTO GAS_PRICES(YYYYMM, GAS_PRICE) VALUES (201604, 2.68);
Commit;
I can't help but thinking that your table structure is going to hurt later.
You don't appear to have a primary key which helps with integrity and performance.
YYYYMM could be a key but it's not clear whether you are storing it as a number or a string.
The use of YYYYMM as a column name might prove troublesome as that is part of the Oracle data format.
your naming convention of GAS_PRICES table and GAS_PRICE column could provide confusion due the similarity
In Cassandra, a row can be very long and store units of time relevant data. For example, one row could look like the following:
RowKey: "weather"
name=2013-01-02:temperature, value=90,
name=2013-01-02:humidity, value=23,
name=2013-01-02:rain, value=false",
name=2013-01-03:temperature, value=91,
name=2013-01-03:humidity, value=24,
name=2013-01-03:rain, value=false",
name=2013-01-04:temperature, value=90,
name=2013-01-04:humidity, value=23,
name=2013-01-04:rain, value=false".
9 columns of 3 days' weather info.
time is a primary key in this row. So the order of this row would be time based.
My question is, is there any way for me to do a query like: what is the last/first day's humidity value in this row? I know I could use a Order By statement in CQL but since this row is already sorted by time, there should be some way to just get the first/last one directly, instead of doing another sort. Or is cassandra optimizing it already with Order By statement under the hood?
Another way I could think of is, store another column in this row called "last_time_stamp" that always updates itself as new data is inserted in. But that would require one more update every time I insert new weather data.
Thanks for any suggestion!:)
Without seeing more of your actual table, I suggest using a timestamp (or timeuuid if there is a possibility for collisions) as the second component in a compound primary key. Using this, you can get the last "row" by selecting ORDER BY t DESC LIMIT 1.
You could also change the clustering order in your schema to order it naturally for "last N" queries.
Please see examples and linked resource in this answer.