Truncate Oracle table using Spark

Truncate Oracle table using Spark - oracle

my first question here!
I’m learning Spark and so far is awesome. Now I’m writing some DFs to Oracle using DF.write.mode(“append”).jdbc
Now, I need to truncate the table since I don’t want to append. If I use “overwrite” mode, it will drop the table and create a new one but I’ll will have to reGRANT users to Get access to it. Not good.
Can I do something like truncate in Oracle using spark SQL? Open for suggestions! Thanks for your time.

There is an option to make Spark to truncate target Oracle table instead of dropping it. You can find the syntax https://github.com/apache/spark/pull/14086
spark.range(10).write.mode("overwrite").option("truncate", true).jdbc(url, "table_with_index", prop)
Depending on the versions of Spark, Oracle and JDBC driver, there are other parameters that you could use to make the truncate on cascade as you can see from https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html
From my experience, that works on some of DB engines, and depends a lot on the JDBC that you use, because not all of them support it
Hope this helps

Related

Spark JDBC direct path inserts

When writing data from Hive to Oracle using (Py)Spark JDBC connector, I run into problems with the buffer cache on Oracle.
So my question is if there is a way to bypass oracle buffer cache by using direct path inserts (as suggested here https://renenyffenegger.ch/notes/development/databases/Oracle/architecture/instance/SGA/database-buffer-cache/index).
I was wondering if I can just use the initSessionStatement as described in the docs. https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html
Something along the lines of option("sessionInitStatement", """BEGIN execute immediate 'alter session set "<DIRECT PATH INSERT PARAMETER?>"=true'; END;""").
Another approach I'm wondering if it could work is using spark sql to insert into oracle as described in this answer what's SparkSQL SQL query to write into JDBC table? and then specifiying the /+ append +/ for direct path insert.
Does anyone have experience with this problem?

Oracle Table Queried or Modified Date

I’ve been tasked with doing some housekeeping on an Oracle schema I have access to. In a nutshell, I’d like to drop any tables that have not been ‘used’ in the last 3 months (tables that haven’t been queried or had data manipulated in the last 3 months). I have read/write access to the schema but I’m not a DBA; I run relatively basic DML/DDL queries in Oracle.
I’m trying to figure out if there’s a way for me to identify old/redundant tables; here’s what I’ve tried (mostly unsuccessfully)
USER_TABLES was my first port of call, but the LAST_ANALYZED date in this table doesn’t seem to be the last modified/queried date I’m looking for
Googling has brought DBA_Hist tables to my attention, I’ve tried querying some of these (i.e. DBA_HIST_SYSSTAT) but I’m confronted with (ORA-00942: table or view does not exist)
I’ve also tried querying V$SESSION_WAIT, V$ACTIVE_SESSION_HISTORY and V$SEGMENT_STATISTICS, but I get the same ORA-00942 error
I’d be grateful for any advice about whether the options above actually offer the sort of information I need about tables, and if so what I can do to work around the errors I’m getting. Alternatively, are there any other options that I could explore?

Probably the easiest thing to do, to be 100% sure, is to enable auditing on the Oracle tables that you're interested in (possibly all of them). Once enabled, Oracle has an audit table (dba_audit_trail) that you can query to find if the table(s) have been accessed. You can enable auditing by issuing: AUDIT on . BY SESSION;
I chose "by session" so that you only get a single record per session, no matter how many times the session performs the operation (to minimize the records in the audit table).
Example:
audit select on bob.inventory by session;
Then you can query the dba_audit_trail after some time passes to see if any records show up for that table.
You can disable auditing by issuing the "noaudit" command.
Hope that helps.
-Jim

what is equivalent of Postgresql to dbms_stats.gather_table_stats in Oracle?

I have some question about Postgres, I have used dbms_stats.gather_table_stats for performance optimization in Oracle. I would like to switch our database from Oracle to Postgres, therefore, I want to achieve same feature on Postgres also. I searched internet whether there is some equivalent feature existing in Postgres with dbms_stats.gather_table_stats in Oracle. The only I found was EXPLAIN, VACUUM something like that. I think these are already existing in Oracle with same name. but I can't find proper ones for dbms_stats.gather_table_stats. I am spedning a lot time on it, if you guys have some advice, could I get some?

The GATHER_TABLE_STATS procedure of DBMS_STATS package collects statistics of the specified table in Oracle.
In Postgres, we use ANALYZE for the same purpose.
ANALYZE collects statistics about the contents of tables in the database, and stores the results in the pg_statistic system catalog. Subsequently, the query planner uses these statistics to help determine the most efficient execution plans for queries.

How to convert CONNECT BY in greenplum

Can anyone suggest how to convert CONNECT BY Oracle query into Greenplum. Greenplum doesn't support recursive queries. So, we can not use WITH RECURSIVE. Is there any alternate solution to re-write the below query.
SELECT child_id, Parnet_id, LEVEL , SYS_CONNECT_BY_PATH (child_id,'/') as HIERARCHY
FROM pathnode
START WITH Parnet_id = child_id
CONNECT BY NOCYCLE PRIOR child_id = Parnet_id;

There are ways to do this but it will be a one-off per query. You will need to create a function that loops through your pathnode table and "return next" to return each row. You can search on this site to find examples of doing this with PostgreSQL 8.2.
Work is happening to rebase Greenplum to PostgreSQL 8.3, 8.4, and so on. Those later PostgreSQL versions support "with recursive" which is the ANSI SQL way to write your SQL but Greenplum doesn't support it yet. When it does get supported by Greenplum, I don't think it will perform all that well. The query will force looping and individual row lookups. This works great in an OLTP database but not so well for an MPP database.
I suggest you transform your data in Oracle with a VIEW and then just dump the view to a file to load into Greenplum. The DDL of having a self-referencing, N-level table will never be a good idea in an MPP database.

oracle to flat file

I need to create a flat file and push information into it from oracle database using JSP.
I require a sample code. Help will be appreciated.

If you're looking for an easy way to write different SQL statements to a file, use this procedure: http://www.oracle-developer.net/content/utilities/data_dump.sql
Also you might want to look into DBMS_XSLPROCESSOR.CLOB2FILE.

I think you need to look into Oracle external tables. These are flat files that appear as tables in the Oracle database. You would simply insert data into it using SQL (as per any other database table). Google "Oracle External Tables" for more information.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Truncate Oracle table using Spark - oracle

Related

Spark JDBC direct path inserts

Oracle Table Queried or Modified Date

what is equivalent of Postgresql to dbms_stats.gather_table_stats in Oracle?

How to convert CONNECT BY in greenplum

oracle to flat file

Categories

Resources