Databricks truncate delta table restart identity 1 - azure-databricks

We are created SQL notebook in Databricks and we are trying to develop onetime script.
we have to truncate and load the data every time and the table sequence id generated always start with 1. if we do truncate and load the data. the sequence of id taking last insert value, means not able to restart 1.
how do we restart 1 value if we try to insert the data after truncate command.
actual table created,
CREATE TABLE IF NOT EXISTS schema.tablename(
SerialNo BIGINT GENERATED ALWAYS AS IDENTITY (START WITH 1 INCREMENT BY 1) COMMENT "SerialNo - auto generated sequence id",
adlsSourceMountPath STRING COMMENT "location",
adlsTargetMountPath STRING COMMENT "location",
tableSchemaName STRING COMMENT "tableSchemaName - adls target delta table schema name",
DescriptionDetails STRING COMMENT "DescriptionDetails",
CreateDate DATE COMMENT "CreateDate - generate current date",
Active BOOLEAN COMMENT "Active - check active status of connections"
)USING DELTA
LOCATION "location/path"
COMMENT "descri"
we tried below command but throwing error.
TRUNCAT TABLE TABLENAME RESTART IDENTITY;
please suggest.

There is no such thing as "restart identity" in the truncate command - you can check documentation for it.
If you just have a normal Delta Lake table, then you can either use restore to version 0 or use "create or replace table" as it's described in the following answers.
If you're using Delta Live Tables (looking into tags), then you may need to perform a refresh that should reset identity column as well.

Related

Snowflake on update

MySQL has an "on update" feature e.g.
CREATE TABLE t1 (
ts TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
dt DATETIME DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
);
I need a similar behavior in snowflake where I can update a column say "lastupdated" every time there is an update on the row.
Is this possible in snowflake?
Checkout "Snowflake Stream" option. You can create a stream on top of your table, and stream will have a couple of columns which will give you exactly what you're looking for!
Its not very well explored feature unfortunately!
In other database implementations this is achieved through triggers.
Snowflake doesn't support triggers.
I wonder if you could create a stored procedure in Snowflake to accomplish what you are trying to do.
If you are trying to update a row with a timestamp
or you could just update the field in your copy or replace statement.
Similarly done this way: https://community.snowflake.com/s/question/0D50Z00006uSiEKSA0/syntax-for-adding-a-column-with-currenttimestamp-default-constraint
Example 1:
> UPDATE <target_table>
SET Lastupdate = current_timestamp()
[ FROM <additional_tables> ]
[ WHERE <condition> ]
Example 2:
>create or replace table x(i int, t timestamp default current_timestamp());
>insert into x(i) values(1);
borrowed from this link
as noted above, triggers are not supported - you'll have to do this explicitly in sql. note that your process should also be handling data in some type of batches; if you are trying to do anything a single record at a time in snowflake - at least for any real volume - you're going to have a bad time.
That's a pretty nice feature request. I've been using MS SQL Server for years... any "updated" columns were either done in the code or, as already indicated, using triggers.
I checked the snowflake docs and found this reference, which only applies to INSERTs and CTAS:
DEFAULT ... or AUTOINCREMENT ...
Specifies whether a default value is automatically inserted in the column if a value is not explicitly specified via an **INSERT or CREATE TABLE AS SELECT** statement:
https://docs.snowflake.net/manuals/sql-reference/sql/create-table.html
you can do something like this:
CREATE or REPLACE TABLE t1 (
ts TIMESTAMP_LTZ(9) as CURRENT_TIMESTAMP,
dt DATE as CURRENT_DATE,
NAME VARCHAR(200)
);
insert into t1 (NAME) VALUES ('Jerry Smith');
insert into t1 (NAME) VALUES ('Gazorpazorp Smith');
select * from t1;
just means that your values change every time you select from the table
You can use a combination of streams, external or internal stages and eventing , to record the DML CRUD changes. actually this combination is very elegant because your simulated triggers can trigger external events.
1) Create a stream
create stream supplierStream on table SupplierTable before(statement => 'yourGUID `statementID');`
2) Configure your event grid topic if using Azure. let's say your topic name is "SupplierTopic"
MS event grid
3) create your notification integration
CREATE NOTIFICATION INTEGRATION supplierIntegration
ENABLED = true
TYPE = QUEUE
NOTIFICATION_PROVIDER = AZURE_STORAGE_QUEUE
4) create your stage
create or replace stage supplierStage
url='azure://your account container ID'
storage_integration = SupplierIntegration;
5) consume the event grid event , in server or serverless system.
Have you tried MERGE combined with UPDATE?
https://docs.snowflake.com/en/sql-reference/sql/merge.html

Informatica Transaction Control Transformation

I have to develop an informatica process that loads data from a flatfile into the target (simple truncate & load), but the catch is that :
If the number of rejected rows is greater than 100, the process should stop, i.e. the session should fail & the data in the target must be rolled back to what it was originally before load.
I think the TC Transformation might be useful here , but am not sure of how to use this. It would be great if I could get some help on this.
Thanks !
You can't use truncate in such scenario - it's irreversible. Try loading the data into a temporary table first (with Truncate table option enabled). Create a second session that will execute a set of sql commands like
`truncate table YourTable
insert into YourTable select * from YourTempTable`
Link the two with a condition like $yourTempTableSession.TgtFailedRows>100.
To meet the second requirement (i.e. to fail the workflow) add a Control task and set it to Abort top level workflow. Add a link from the temp table session load with a condition like $yourTempTableSession.TgtFailedRows>100.

SQLite: how to enable counting number of rows modified from trigger

is there any way to enable counting of rows that trigger modified in SQLite?
I know it is disabled https://www.sqlite.org/c3ref/changes.html and i understand why, but can i enable it somehow?
CREATE TABLE Users_data (
Id INTEGER PRIMARY KEY AUTOINCREMENT,
Deleted BOOLEAN DEFAULT (0),
Name STRING
);
CREATE VIEW Users AS
SELECT Id, Name
FROM Users_data
WHERE Deleted = 0;
CREATE TRIGGER UsersDelete2UsersData
INSTEAD OF DELETE
ON Users
FOR EACH ROW
BEGIN
UPDATE Users_data SET Deleted = 1 WHERE Id = OLD.Id;
END;
-- etc for insert & update
then delete from Users where Name like 'foo' /* doesnt even need 'Id = 1' */; works fine, but numbers of modified rows is, as documentation say, always zero.
(I cant modify my DAL to automatically add "where Deleted = 0", so backup plan is to have table Users_deleted and 'on delete' trigger on Users table without any view, but then i have to keep tracking FKs (for example, what to do when someone delete from FK table) and so on...)
Edit: Returned number is used for checking on database concurrency.
Edit2: To be more clear: As i say, I can not modify my DAL (Entity Framework 6), so the preferred answer should operate as follow pseudo code: int affectedRow = query("delete from Users where Name like 'foo';").Execute();
Its all about SQLite "trigger on view" behavior.
Use sqlite3_total_changes() instead:
This function returns the total number of rows inserted, modified or deleted by all INSERT, UPDATE or DELETE statements completed since the database connection was opened, including those executed as part of trigger programs.
Its imposible in sqlite3 (in 2015).
Basically I was looking for instead of trigger on view (as in question) with return function, which is not supported in sqlite.
By the way, postgresql (and i believe some others full db servers) can do it.

Operation must be an updateable query - VB Script, Paradox Table

I'm using a Win XP box with BDE Administrator and Access 2007 installed. I'm able to open and perform select queries on existing Paradox tables without problem but have some very strange behavior when attempting INSERT/UPDATE. I can even create a new Paradox table and it has the same behavior. Here is sample code:
' create new table
conObj.Execute "CREATE TABLE test (id INT, comment VARCHAR(30))"
' first insert works fine
conObj.Execute "INSERT INTO test VALUES (1, 'something')"
' second insert fails for unknown reason
conObj.Execute "INSERT INTO test VALUES (2, 'something else')"
I've tried using Jet 4.0, MS Access Paradox driver, and native Paradox driver connection strings but all yield the same result. On the second insert statement it throws an error:
Operation must be an updateable query
I've read numerous posts in forums and pages on help sites that tell me this error is caused by a file permissions issue. The account running this script is part of the Administrator group and I've changed file permissions to allow the Everyone group Full Control of the db file but this changes nothing.
This page put out by Micrsoft Support did not fix the problem: http://support.microsoft.com/kb/175168
Additionally, I can create a new table but any time I try to create a PRIMARY KEY or UNIQUE field I get an error message that says:
"Index_[random characters] is not a valid name."
try
"CREATE TABLE test (id INT, comment VARCHAR(30), primary key(id))"
I don't know much about Paradox databases but this has indeed been a learning experience. Even though I have a table file called table.db that's not enough to store more than a single row of data. I also need several other files to insert or update a paradox database:
table.DB
table.PX
table.VAL
table.XG0
table.XG1
table.YG0
table.YG1
I was nosing around in another program that generates paradox databases and found when I copied a blank database from it along with these other files it generated I was able to insert and update without problems. I have no idea what these files are for or why they need to be present to insert or update but having them present fixed my issue.

DB2 duplicate key error when inserting, BUT working after select count(*)

I have a - for me unknown - issue and I don't know what's the logic/cause behind it. When I try to insert a record in a table I get a DB2 error saying:
[SQL0803] Duplicate key value specified: A unique index or unique constraint *N in *N
exists over one or more columns of table TABLEXXX in SCHEMAYYY. The operation cannot
be performed because one or more values would have produced a duplicate key in
the unique index or constraint.
Which is a quite clear message to me. But actually there would be no duplicate key if I inserted my new record seeing what records are already in there. When I do a SELECT COUNT(*) from SCHEMAYYY.TABLEXXX and then try to insert the record it works flawlessly.
How can it be that when performing the SELECT COUNT(*) I can suddenly insert the records? Is there some sort of index associated with it which might give issues because it is out of sync? I didn't design the data model, so I don't have deep knowledge of the system yet.
The original DB2 SQL is:
-- Generate SQL
-- Version: V6R1M0 080215
-- Generated on: 19/12/12 10:28:39
-- Relational Database: S656C89D
-- Standards Option: DB2 for i
CREATE TABLE TZVDB.PRODUCTCOSTS (
ID INTEGER GENERATED BY DEFAULT AS IDENTITY (
START WITH 1 INCREMENT BY 1
MINVALUE 1 MAXVALUE 2147483647
NO CYCLE NO ORDER
CACHE 20 )
,
PRODUCT_ID INTEGER DEFAULT NULL ,
STARTPRICE DECIMAL(7, 2) DEFAULT NULL ,
FROMDATE TIMESTAMP DEFAULT NULL ,
TILLDATE TIMESTAMP DEFAULT NULL ,
CONSTRAINT TZVDB.PRODUCTCOSTS_PK PRIMARY KEY( ID ) ) ;
ALTER TABLE TZVDB.PRODUCTCOSTS
ADD CONSTRAINT TZVDB.PRODCSTS_PRDCT_FK
FOREIGN KEY( PRODUCT_ID )
REFERENCES TZVDB.PRODUCT ( ID )
ON DELETE RESTRICT
ON UPDATE NO ACTION;
I'd like to see the statements...but since this question is a year old...I won't old my breath.
I'm thinking the problem may be the
GENERATED BY DEFAULT
And instead of passing NULL for the identity column, you're accidentally passing zero or some other duplicate value the first time around.
Either always pass NULL, pass a non-duplicate value or switch to GENERATED ALWAYS
Look at preceding messages in the joblog for specifics as to what caused this. I don't understand how the INSERT can suddenly work after the COUNT(*). Please let us know what you find.
Since it shows *N (ie n/a) as the name of the index or constraing, this suggests to me that is is not a standard DB2 object, and therefore may be a "logical file" [LF] defined with DDS rather than SQL, with a key structure different than what you were doing your COUNT(*) on.
Your shop may have better tools do view keys on dependent files, but the method below will work anywhere.
If your table might not be the actual "physical file", check this using Display File Description, DSPFD TZVDB.PRODUCTCOSTS, in a 5250 ("green screen") session.
Use the Display Database Relations command, DSPDBR TZVDB.PRODUCTCOSTS, to find what files are defined over your table. You can then DSPFD on each of these files to see the definition of the index key. Also check there that each of these indexes is maintained *IMMED, rather than *REBUILD or *DELAY. (A wild longshot guess as to a remotely possible cause of your strange anomaly.)
You will find the DB2 for i message finder here in the IBM i 7.1 Information Center or other releases
Is it a paging issue? we seem to get -0803 on inserts occasionally when a row is being held for update and it locks a page that probably contains the index that is needed for the insert? This is only a guess but it appears to me that is what is happening.
I know it is an old topic, but this is what Google shown me on the first place.
I had the same issue yesterday, causing me a lot of headache. I did the same as above, checked the table definitions, keys, existing items...
Then I found out the problem was with my INSERT statement. It was trying to insert to identical records at once, but as the constraint prevented the commit, I could not find anything in the database.
Advice: review your INSERT statement carefully! :)

Resources