I'm attempting to reverse engineer an existing Oracle schema into some declarative SQLAlchemy models. My problem is that when I use MetaData.reflect, it doesn't find the tables in my schema, just a Global Temp Table. However, I can still query against the other tables.
I'm using SQLAlchemy 0.7.8, CentOS 6.2 x86_64, python 2.6, cx_Oracle 5.1.2 and Oracle 11.2.0.2 Express Edition. Here's a quick sample of what I'm talking about:
>>> import sqlalchemy
>>> engine = sqlalchemy.create_engine('oracle+cx_oracle://user:pass#localhost/xe')
>>> md = sqlalchemy.MetaData(bind=engine)
>>> md.reflect()
>>> md.tables
immutabledict({u'my_gtt': Table(u'my_gtt', MetaData(bind=Engine(oracle+cx_oracle://user:pass#localhost/xe)), Column(u'id', NUMBER(precision=15, scale=0, asdecimal=False), table=<my_gtt>), Column(u'parent_id', NUMBER(precision=15, scale=0, asdecimal=False), table=<my_gtt>), Column(u'query_id', NUMBER(precision=15, scale=0, asdecimal=False), table=<my_gtt>), schema=None)})
>>> len(engine.execute('select * from my_regular_table').fetchall())
4
Thanks to some quick help from #zzzeek I discovered (by using the echo='debug' argument to create_engine) that my problem was caused by the tables being owned by an old user, even though the current user could access them from the default schema without requiring any explicit synonyms.
Related
I have problem with delta lake docs. I know that I can query on delta table with presto,hive,spark sql and other tools but in delta's documents mentioned that "You can load a Delta table as a DataFrame by specifying a table name or a path"
but it isn't clear. how can I run sql query like that?
To read data from tables in DeltaLake it is possible to use Java API or Python without Apache Spark. See details at:
https://databricks.com/blog/2020/12/22/natively-query-your-delta-lake-with-scala-java-and-python.html
See how to use with Pandas:
pip3 install deltalake
python3
from deltalake import DeltaTable
table_path = "/opt/data/delta/my-table" # whatever table name and object store
# now using Pandas
df = DeltaTable(table_path).to_pandas()
df
Use the spark.sql() function
spark.sql("select * from delta.`hdfs://192.168.2.131:9000/Delta_Table/test001`").show()
To get the column names in a mysql or mssql connection I'm able to do the following:
>>> cursor.execute('select * from table)
>>> [item[0] for item in cursor.description]
[u'provider', u'title', u'date', u'apple_id', u'country', u'genre', u'sales_in_usd']
How would I get the column names from an Oracle cursor?
The code you have above works just fine with cx_Oracle (the driver that enables access to Oracle databases) since it follows the Python Database API!
I am completely new to Python and pandas. I want to load a some tables and Sql Queries from Oracle and Teradata to pandas Dataframes and want to analyse them.
I know, we have to create some connection strings to Oracle and Teradata in Pandas. Can you please suggest me them and also add the sample code to read both table and SQL query in that?
Thanks Inadvance
I don't have Oracle server, so I take Teradata as an example
This is not the only way to to that, just one approach
Make sure you have installed Teradata ODBC Driver. Please refer to Teradata official website about the steps, I suppose you use Windows (since it is easy to use SQL Assistant to run query against Teradata, that is only on Windows). You can check it in ODBC Data Source Administrator
Install pyodbc by the command pip install pyodbc. Here is the official website
The connection string is db_conn_str = "DRIVER=Teradata;DBCNAME={url};UID={username};PWD={pwd}"
Get a connection object conn = pyodbc.connect(db_conn_str)
Read data from a SQL query to a DataFrame df = pd.read_sql(sql="select * from tb", con=conn)
The similar for Oracle, you need to have the driver and the format of ODBC connection string. I know there is a python module from Teradata which supports the connection too, but I just prefer use odbc as it is more generic purpose.
Here is an Oracle example:
import cx_Oracle # pip install cx_Oracle
from sqlalchemy import create_engine
engine = create_engine('oracle+cx_oracle://scott:tiger#host:1521/?service_name=hr')
df = pd.read_sql('select * from table_name', engine)
One way to query an Oracle DB is with a function like this one:
import pandas as pd
import cx_Oracle
def query(sql: str) -> pd.DataFrame:
try:
with cx_Oracle.connect(username, password, database, encoding='UTF-8') as connection:
dataframe = pd.read_sql(sql, con=connection)
return dataframe
except cx_Oracle.Error as error: print(error)
finally: print("Fetch end")
here, sql corresponds to the query you want to run. Since it´s a string it also supports line breaks in case you are reading the query from a .sql file
eg:
"SELECT * FROM TABLE\nWHERE <condition>\nGROUP BY <COL_NAME>"
or anything you need... it could also be an f-string in case you are using variables.
This function returns a pandas dataframe with the results from the sql string you need.
It also keeps the column names on the dataframe
Updated:
I saw some guidance here, but I can't seem to figure out how to specify the column types when I insert data from pandas into Oracle using the code below. One column is a date, for example, but upon importing, it is converted to a string.
Also, if I want the column names to be slightly different in my Oracle database, do I need to first rename the columns via pandas then send them to Oracle via to_sql?
import pandas as pd
from sqlalchemy import create_engine
import cx_Oracle as cx
pwd=input('Enter Password for server:')
engine = create_engine('oracle+cx_oracle://schema:'+pwd+'#server:1521/service_name')
df=pd.read_csv(r'path\data.csv',encoding='latin-1',index_col=0)
name='table1'
df.to_sql(name,engine,if_exists='append')
Please read SQL Data Types section of the pandas documentation as well as the to_sql method.
You are able to specify the data type using dtype parameter like this:
from sqlalchemy.types import String, Date, DateTime
df.to_sql(table_name, engine, if_exists='append', dtype={'mydatecol': DateTime})
As to the names of the columns, it is easiest to rename columns in the dataframe before calling to_sql:
df2 = df.rename(columns={'oldname': 'newname', ...})
I am working on migration project of Oracle to Teradata.
The tables have been migrated using datastage jobs.
How do I migrate Oracle Views to Teradata?
Direct script copying is not working due to SQL statements difference of both databases
Please help?
The DECODE() Oracle function is available as part of the Oracle UDF Library on the Teradata Developer Exchange Downloads section. Otherwise, you are using the DECODE function in your example in the same manner in which the ANSI COALESCE() function behaves:
COALESCE(t.satisfaction, 'Not Evaluated')
It should be noted that the data types of the COALESCE() function must be implicitly compatible or you will receive an error. Therefore, t.satisfaction would need to be at least CHAR(13) or VARCHAR(13) in order for the COALESCE() to evaluate. If it is not, you can explicitly cast the operand(s).
COALESCE(CAST(t.satisfaction AS VARCHAR(13)), 'Not Evaluated')
If your use of DECODE() includes more evaluations than what is in your example I would suggest implementing the UDF or replacing it with a more standard evaluated CASE statement. That being said, with Teradata 14 (or 14.1) you will find that many of the Oracle functions that are missing from Teradata will be made available as standard functions to help ease the migration path from Oracle to Teradata.