I am completely new to Python and pandas. I want to load a some tables and Sql Queries from Oracle and Teradata to pandas Dataframes and want to analyse them.
I know, we have to create some connection strings to Oracle and Teradata in Pandas. Can you please suggest me them and also add the sample code to read both table and SQL query in that?
Thanks Inadvance
I don't have Oracle server, so I take Teradata as an example
This is not the only way to to that, just one approach
Make sure you have installed Teradata ODBC Driver. Please refer to Teradata official website about the steps, I suppose you use Windows (since it is easy to use SQL Assistant to run query against Teradata, that is only on Windows). You can check it in ODBC Data Source Administrator
Install pyodbc by the command pip install pyodbc. Here is the official website
The connection string is db_conn_str = "DRIVER=Teradata;DBCNAME={url};UID={username};PWD={pwd}"
Get a connection object conn = pyodbc.connect(db_conn_str)
Read data from a SQL query to a DataFrame df = pd.read_sql(sql="select * from tb", con=conn)
The similar for Oracle, you need to have the driver and the format of ODBC connection string. I know there is a python module from Teradata which supports the connection too, but I just prefer use odbc as it is more generic purpose.
Here is an Oracle example:
import cx_Oracle # pip install cx_Oracle
from sqlalchemy import create_engine
engine = create_engine('oracle+cx_oracle://scott:tiger#host:1521/?service_name=hr')
df = pd.read_sql('select * from table_name', engine)
One way to query an Oracle DB is with a function like this one:
import pandas as pd
import cx_Oracle
def query(sql: str) -> pd.DataFrame:
try:
with cx_Oracle.connect(username, password, database, encoding='UTF-8') as connection:
dataframe = pd.read_sql(sql, con=connection)
return dataframe
except cx_Oracle.Error as error: print(error)
finally: print("Fetch end")
here, sql corresponds to the query you want to run. Since it´s a string it also supports line breaks in case you are reading the query from a .sql file
eg:
"SELECT * FROM TABLE\nWHERE <condition>\nGROUP BY <COL_NAME>"
or anything you need... it could also be an f-string in case you are using variables.
This function returns a pandas dataframe with the results from the sql string you need.
It also keeps the column names on the dataframe
Related
I have problem with delta lake docs. I know that I can query on delta table with presto,hive,spark sql and other tools but in delta's documents mentioned that "You can load a Delta table as a DataFrame by specifying a table name or a path"
but it isn't clear. how can I run sql query like that?
To read data from tables in DeltaLake it is possible to use Java API or Python without Apache Spark. See details at:
https://databricks.com/blog/2020/12/22/natively-query-your-delta-lake-with-scala-java-and-python.html
See how to use with Pandas:
pip3 install deltalake
python3
from deltalake import DeltaTable
table_path = "/opt/data/delta/my-table" # whatever table name and object store
# now using Pandas
df = DeltaTable(table_path).to_pandas()
df
Use the spark.sql() function
spark.sql("select * from delta.`hdfs://192.168.2.131:9000/Delta_Table/test001`").show()
I'm trying to export a R dataframe to Oracle database.
I read this post: how to export data frame (R) into Oracle table. In short,
dbWriteTable(jdbcConnection,"TABLE_NAME",data.frame.name., rownames=FALSE, overwrite = TRUE, append = FALSE)
However, I do not know what is 'jdbcConnection' and how to declare it.
BTW I'm able to connect Oracle with R Studio using RODBC package.
The accepted answer in that link cites the RJDBC package, which connects to the SQL database using the Java JDBC driver, in this case, the driver for Oracle. If you poke around the documentation, you will find some boilerplate code for how to do this:
drv <- JDBC("oracle.jdbc.driver.OracleDriver", "/path/to/ojdbc6.jar", " ")
conn <- dbConnect(drv, "jdbc:oracle:thin:#localhost:1521:orclt")
dbWriteTable(conn, "TABLE_NAME", data.frame.name, rownames=FALSE, overwrite = TRUE, append = FALSE)
Note that to make the above work, you will need locally the ojdbc6.jar JAR file for the Oracle JDBC driver. You may download this from the Oracle site directly if you don't already have it. The second parameter being used above in the call to dbConnect is the JDBC url for your Oracle instance. Refer to any number of posts on Stack Overflow to learn how to form the appropriate URL for your Oracle instance.
Here's another example based on this doc:
# Load RJDBC library
library(RJDBC)
# Create connection driver and open connection
jdbcDriver <- JDBC(driverClass="oracle.jdbc.OracleDriver", classPath="lib/ojdbc6.jar")
jdbcConnection <- dbConnect(jdbcDriver, "jdbc:oracle:thin:#//database.hostname.com:port/service_name_or_sid", "username", "password")
# Write to table
dbWriteTable(jdbcConnection,"TABLE_NAME",data.frame.name, rownames=FALSE, overwrite = TRUE, append = FALSE)
To get the column names in a mysql or mssql connection I'm able to do the following:
>>> cursor.execute('select * from table)
>>> [item[0] for item in cursor.description]
[u'provider', u'title', u'date', u'apple_id', u'country', u'genre', u'sales_in_usd']
How would I get the column names from an Oracle cursor?
The code you have above works just fine with cx_Oracle (the driver that enables access to Oracle databases) since it follows the Python Database API!
I'm attempting to reverse engineer an existing Oracle schema into some declarative SQLAlchemy models. My problem is that when I use MetaData.reflect, it doesn't find the tables in my schema, just a Global Temp Table. However, I can still query against the other tables.
I'm using SQLAlchemy 0.7.8, CentOS 6.2 x86_64, python 2.6, cx_Oracle 5.1.2 and Oracle 11.2.0.2 Express Edition. Here's a quick sample of what I'm talking about:
>>> import sqlalchemy
>>> engine = sqlalchemy.create_engine('oracle+cx_oracle://user:pass#localhost/xe')
>>> md = sqlalchemy.MetaData(bind=engine)
>>> md.reflect()
>>> md.tables
immutabledict({u'my_gtt': Table(u'my_gtt', MetaData(bind=Engine(oracle+cx_oracle://user:pass#localhost/xe)), Column(u'id', NUMBER(precision=15, scale=0, asdecimal=False), table=<my_gtt>), Column(u'parent_id', NUMBER(precision=15, scale=0, asdecimal=False), table=<my_gtt>), Column(u'query_id', NUMBER(precision=15, scale=0, asdecimal=False), table=<my_gtt>), schema=None)})
>>> len(engine.execute('select * from my_regular_table').fetchall())
4
Thanks to some quick help from #zzzeek I discovered (by using the echo='debug' argument to create_engine) that my problem was caused by the tables being owned by an old user, even though the current user could access them from the default schema without requiring any explicit synonyms.
Getting ready to clean up some old tables which are no longer in use, but I would like to be able to archive the contents before removing them from the database.
Is it possible to export the contents of a table to a file? Ideally, one file per table.
You can use Oracle's export tool: exp
Edit:
exp name/pwd#dbname file=filename.dmp tables=tablename rows=y indexes=n triggers=n grants=n
You can easily do it using Python and cx_Oracle module.
Python script will extract data to disk in CSV format.
Here’s how you connect to Oracle using Python/cx_Oracle:
constr='scott/tiger#localhost:1521/ORCL12'
con = cx_Oracle.connect(constr)
cur = con.cursor()
After data fetch you can loop through Python list and save data in CSV format.
for i, chunk in enumerate(chunks(cur)):
f_out.write('\n'.join([column_delimiter.join(row[0]) for row in chunk]))
f_out.write('\n')
I used this approach when I wrote TableHunter-For-Oracle