I should extract data from an oracle database.
How can I find out which schema are defined in the database?
When I do not define any schema in the description of Metadata(), I find no tables.
thanks for your help,
Default Oracle schema matches the username that was used in Oracle connection.
If you don't see any tables - it means the tables are created in another schema.
Looks like you have two questions here:
1) about Oracle schemas - how to find schema and tables in Oracle
2) about SQLAlchemy reflections - how to specify Oracle schema for table
You can find answer for the first question in many places. I.e. here: https://stackoverflow.com/a/2247758/1296661
Answering second question:
Table class constructor has schema argument to specify table's schema if it is different from default user's schema. See more here
http://docs.sqlalchemy.org/en/rel_0_7/core/schema.html#sqlalchemy.schema.Table
Here is the python code to answer second question. You will need to setup db connection and table name values to match your case:
from sqlalchemy import Table
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import create_engine
engine = create_engine('oracle://<user_name>:<password>#<hostname>:1521/<instance name>', echo=True)
Base = declarative_base()
reflected_table = Table('<Table name>',
Base.metadata,
autoload=True,
autoload_with=engine,
schema='<Schema name other then user_name>')
print [c.name for c in reflected_table.columns]
p = engine.execute("SELECT OWNER,count(*) table_count FROM ALL_OBJECTS WHERE OBJECT_TYPE = 'TABLE' GROUP BY OWNER");
for r in p:
print r
Good luck with using sqlschema and reflection feature - it is a lot of fun. You get your python program working with existing database almost without defining schema information in your program.
I'm using this feature with oracle db in production - the only thing I have to define were relations between tables explicitly setting foreign and primary keys.
I found two other ways to set an alternate Oracle schema.
Using MetaData and passing it into the creation of the SqlAlchemy instance or engine.
Or using __table_args__ when defining your models and pass in {'schema': 'other_schema' }
http://docs.sqlalchemy.org/en/latest/core/metadata.html#sqlalchemy.schema.MetaData.params.schema
How to specify PostgreSQL schema in SQLAlchemy column/foreign key mixin?
But both of these don't help the foreign key calls. For those, you still have to manually prefix with other_schema. before the table name.
The snippet below is in the context of a Flask app with Flask-SqlAlchemy in use.
from sqlalchemy import schema
from flask_sqlalchemy import SQLAlchemy
import os
os.environ['TNS_ADMIN'] = os.path.join( 'oracle_tools', 'network', 'admin')
oracle_connection_string = 'oracle+cx_oracle://{username}:{password}#{tnsname}'.format(
username='user',
password='pass',
tnsname='TNS_SPACE',
)
oracle_schema = 'user_schema_1'
foreign_key_prefix = oracle_schema + '.'
app.config['SQLALCHEMY_DATABASE_URI'] = oracle_connection_string
oracle_db_metadata = schema.MetaData(schema=oracle_schema)
db = SQLAlchemy(app, metadata=oracle_db_metadata)
user_groups_table = db.Table(db_table_prefix + 'user_groups', db.Model.metadata,
db.Column('...', db.Integer, db.ForeignKey(foreign_key_prefix + 'group.id')),
# schema=oracle_schema # Not necessary because of the use of MetaData above
)
class User(db.Model):
__tablename__ = 'user'
# __table_args__ = {'schema': oracle_schema } # Not necessary because of the use of MetaData above
user_name = db.Column('USER_NAME', db.String(250))
And... two other related links:
https://github.com/mitsuhiko/flask-sqlalchemy/issues/172
How to specify PostgreSQL schema in SQLAlchemy column/foreign key mixin?
Hope that helps.
Related
This is my current repo structure, I'm looking for a solution that works with both Postgres and OracleDB and preferably does not involve changing my DB schema to accomodate the ORM. Whether Postgres or Oracle is used is in defined in the spring.datasource.url in the application.properties file.
data class NewsCover(
#Id val tenantId: TenantId,
val openOnStart: Boolean,
val cycleDelay: Int,
#MappedCollection(idColumn = "tenant_id", keyColumn = "tenant_id")
val sections: Set<NewsCoverSection>,
)
data class NewsCoverSection(
#Id val id: NewsCoverSectionId,
val title: String,
val pinnedOnly: Boolean,
val position: Int,
val tenantId: TenantId,
... some other fields ...
)
interface NewsCoverRepo : CrudRepository<NewsCover, TenantId> { ... }
This works just fine with Postgresql, but creates errors when uses with Oracle:
SELECT "NEWS_COVER_SECTION"."ID" AS "ID", "NEWS_COVER_SECTION"."TITLE" AS "TITLE", "NEWS_COVER_SECTION"."POSITION" AS "POSITION", "NEWS_COVER_SECTION"."TENANT_ID" AS "TENANT_ID", "NEWS_COVER_SECTION"."PINNED_ONLY" AS "PINNED_ONLY"
FROM "NEWS_COVER_SECTION"
WHERE "NEWS_COVER_SECTION"."tenant_id" = ?
See the quoted idColumn/keyColumn names in the #MappedCollection. They are lower case. That is fine for Postgres, but does not work with Oracle. Changing tenant_id to TENANT_ID fixes the problem for Oracle, but breaks Postgres.
What I tried:
A NamingStrategy override for Oracle, but I can't seem to override those quoted identifiers.
Conditional column names in #MappedCollection, but #MappedCollection only accepts compile time constants and does not support SpEL, so I can't differentiate based on the spring.datasource.url property.
Any ideas how I can get it to query for "news_cover_section"."tenant_id" when the DB is Postgres and "NEWS_COVER_SECTION"."TENANT_ID" when the DB is Oracle?
As you found out you can disable the behaviour of quoting all names by setting the forceQuote property of the JdbcMappingContext to false.
Alternatively you can create the schema in a consistent way on both databases by quoting the names in your schema creation script.
The first option allows you not to fiddle with the database schema.
But it makes the application depend on avoiding database key words like for example: ORDER or USER.
The second option is arguably the conceptual cleaner one, because it actually uses the same schema (as far as names are concerned) for both databases, which in itself is certainly valuable. But comes at the cost of quoting names because Postgres doesn't adhere to the behaviour prescribed by the SQL standard of treating unquoted names as uppercase.
Note: There is now an issue for supporting SpEL expressions for table and column names.
I have an Oracle database that contains multiple users/schemas and I would like to generate Slick Schemas automatically for a specific user. This is what I've tried so far :
import scala.concurrent.ExecutionContext.Implicits.global
val profileInstance: JdbcProfile =
Class.forName("slick.jdbc.OracleProfile$")
.getField("MODULE$")
.get(null).asInstanceOf[JdbcProfile]
val db = profileInstance.api.Database
.forURL("jdbc:oracle:thin:#//myhost:myport/servicename","user","pass")
val modelAction = OracleProfile.createModel(Some(OracleProfile.defaultTables))
val model = Await.result(db.run(modelAction), Duration.Inf)
model.tables.foreach(println)
This doesn't print anything, I guess I have to provide the current schema to use, but I don't know how to do this.
On the other hand, I am able to list all the schemas of the database, using the following code :
val resultSet = db.createSession().metaData.getSchemas.getStatement.getResultSet
while(resultSet.next()) {
println(resultSet.getString(1))
}
How can I specify which schema I want to use with Slick ?
I've found out how to do it. Instead of using OracleProfile.defaultTable I manually defined the tables and views I needed like this :
val modelAction = OracleProfile.createModel(
Some(MTable.getTables(None, Some("MYSCHEMA"), None, Some(Seq("TABLE", "VIEW"))))
)
I have a dataset in BigQuery. This dataset contains multiple tables.
I am doing the following steps programmatically using the BigQuery API:
Querying the tables in the dataset - Since my response is too large, I am enabling allowLargeResults parameter and diverting my response to a destination table.
I am then exporting the data from the destination table to a GCS bucket.
Requirements:
Suppose my process fails at Step 2, I would like to re-run this step.
But before I re-run, I would like to check/verify that the specific destination table named 'xyz' already exists in the dataset.
If it exists, I would like to re-run step 2.
If it does not exist, I would like to do foo.
How can I do this?
Thanks in advance.
Alex F's solution works on v0.27, but will not work on later versions. In order to migrate to v0.28+, the below solution will work.
from google.cloud import bigquery
project_nm = 'gc_project_nm'
dataset_nm = 'ds_nm'
table_nm = 'tbl_nm'
client = bigquery.Client(project_nm)
dataset = client.dataset(dataset_nm)
table_ref = dataset.table(table_nm)
def if_tbl_exists(client, table_ref):
from google.cloud.exceptions import NotFound
try:
client.get_table(table_ref)
return True
except NotFound:
return False
if_tbl_exists(client, table_ref)
Here is a python snippet that will tell whether a table exists (deleting it in the process--careful!):
def doesTableExist(project_id, dataset_id, table_id):
bq.tables().delete(
projectId=project_id,
datasetId=dataset_id,
tableId=table_id).execute()
return False
Alternately, if you'd prefer not deleting the table in the process, you could try:
def doesTableExist(project_id, dataset_id, table_id):
try:
bq.tables().get(
projectId=project_id,
datasetId=dataset_id,
tableId=table_id).execute()
return True
except HttpError, err
if err.resp.status <> 404:
raise
return False
If you want to know where bq came from, you can call build_bq_client from here: http://code.google.com/p/bigquery-e2e/source/browse/samples/ch12/auth.py
In general, if you're using this to test whether you should run a job that will modify the table, it can be a good idea to just do the job anyway, and use WRITE_TRUNCATE as a write disposition.
Another approach can be to create a predictable job id, and retry the job with that id. If the job already exists, the job already ran (you might want to double check to make sure the job didn't fail, however).
Enjoy:
def doesTableExist(bigquery, project_id, dataset_id, table_id):
try:
bigquery.tables().get(
projectId=project_id,
datasetId=dataset_id,
tableId=table_id).execute()
return True
except Exception as err:
if err.resp.status != 404:
raise
return False
There is an edit in exception.
you can use exists() now to check if dataset exists same with table
BigQuery exist documentation
recently big query introduced so called scripting statements that can be quite a game changer for some flows.
check them out here:
https://cloud.google.com/bigquery/docs/reference/standard-sql/scripting
Now for example to check if table exists you can use something like this:
sql = """
BEGIN
IF EXISTS(SELECT 1 from `YOUR_PROJECT.YOUR_DATASET.YOUR_TABLE) THEN
SELECT 'table_found';
END IF;
EXCEPTION WHEN ERROR THEN
# you can print your own message like above or return error message
# however google says not to rely on error message structure as it may change
select ##error.message;
END;
"""
With my_bigquery being an instance of class google.cloud.bigquery.Client (already authentified and associated to a project):
my_bigquery.dataset(dataset_name).table(table_name).exists() # returns boolean
It does an API call to test for the existence of the table via a GET request
Source: https://googlecloudplatform.github.io/google-cloud-python/0.24.0/bigquery-table.html#google.cloud.bigquery.table.Table.exists
It works for me using 0.27 of the Google Bigquery Python module
Inline SQL Alternative
tarheel's answer is probably the most correct at this point in time
but I was considering the comment from Ivan above that "404 could also mean the resource is not there for a bunch of reasons", so here is a solution that should always successfully run a metadata query and return a result.
It's not the fastest, because it always has to run the query, bigquery has overhead for small queries
A trick I've seen previously is to query information_schema for a (table) object, and union that to a fake query that ensures a record is always returned even if the the object doesn't. There's also a LIMIT 1 and an ordering to ensure the single record returned represents the table, if it does exist. See the SQL in the code below.
In spite of doc claims that Bigquery standard SQL is ISO compliant, they don't support information_schema, but they do have __table_summary__
dataset is required because you can't query __table_summary__ without specifying dataset
dataset is not a parameter in the SQL because you can't parameterize object names without sql injection issues (apart from with the magical _TABLE_SUFFIX, see https://cloud.google.com/bigquery/docs/querying-wildcard-tables )
#!/usr/bin/env python
"""
Inline SQL way to check a table exists in Bigquery
e.g.
print(table_exists(dataset_name='<dataset_goes_here>', table_name='<real_table_name'))
True
print(table_exists(dataset_name='<dataset_goes_here>', table_name='imaginary_table_name'))
False
"""
from __future__ import print_function
from google.cloud import bigquery
def table_exists(dataset_name, table_name):
client = bigquery.Client()
query = """
SELECT table_exists FROM
(
SELECT true as table_exists, 1 as ordering
FROM __TABLES_SUMMARY__ WHERE table_id = #table_name
UNION ALL
SELECT false as table_exists, 2 as ordering
) ORDER by ordering LIMIT 1"""
query_params = [bigquery.ScalarQueryParameter('table_name', 'STRING', table_name)]
job_config = bigquery.QueryJobConfig()
job_config.query_parameters = query_params
if dataset_name is not None:
dataset_ref = client.dataset(dataset_name)
job_config.default_dataset = dataset_ref
query_job = client.query(
query,
job_config=job_config
)
results = query_job.result()
for row in results:
# There is only one row because LIMIT 1 in the SQL
return row.table_exists
I have run sqlmetal.exe agaisnt my database.
SqlMetal.exe /server:server /database:dbname /code:mapping.cs
I have included this into my solution. So I can now create an object for each of the database tables. Great. I now wish to use ling to query by database. Can I presume that none of the connection etc is handled by the output of sqlmetal.exe. If this is correct what ways can I use ling to query my database?
Does the generated code include a Data Context (a class which inherits from System.Data.Linq.DataContext)? If so, then that's probably what you're looking for. Something like this:
var db = new SomeDataContext();
// You can also specify a connection string manually in the above constructor if you want
var records = db.SomeTable.Where(st => st.id == someValue);
// and so on...
I have been looking at the sqlalchemy recipes on their wiki, but don't know which one is best to implement what I am trying to do.
Every row on in my tables have an user_id associated with it. Right now, for every query, I queried by the id of the user that's currently logged in, then query by the criteria I am interested in. My concern is that the developers might forget to add this filter to the query (a huge security risk). Therefore, I would like to set a global filter based on the current user's admin rights to filter what the logged in user could see.
Appreciate your help. Thanks.
Below is simplified redefined query constructor to filter all model queries (including relations). You can pass it to as query_cls parameter to sessionmaker. User ID parameter don't need to be global as far as session is constructed when it's already available.
class HackedQuery(Query):
def get(self, ident):
# Use default implementation when there is no condition
if not self._criterion:
return Query.get(self, ident)
# Copied from Query implementation with some changes.
if hasattr(ident, '__composite_values__'):
ident = ident.__composite_values__()
mapper = self._only_mapper_zero(
"get() can only be used against a single mapped class.")
key = mapper.identity_key_from_primary_key(ident)
if ident is None:
if key is not None:
ident = key[1]
else:
from sqlalchemy import util
ident = util.to_list(ident)
if ident is not None:
columns = list(mapper.primary_key)
if len(columns)!=len(ident):
raise TypeError("Number of values doen't match number "
'of columns in primary key')
params = {}
for column, value in zip(columns, ident):
params[column.key] = value
return self.filter_by(**params).first()
def QueryPublic(entities, session=None):
# It's not directly related to the problem, but is useful too.
query = HackedQuery(entities, session).with_polymorphic('*')
# Version for several entities needs thorough testing, so we
# don't use it yet.
assert len(entities)==1, entities
cls = _class_to_mapper(entities[0]).class_
public_condition = getattr(cls, 'public_condition', None)
if public_condition is not None:
query = query.filter(public_condition)
return query
It works for single model queries only, and there is a lot of work to make it suitable for other cases. I'd like to see an elaborated version since it's MUST HAVE functionality for most web applications. It uses fixed condition stored in each model class, so you have to modify it to your needs.
Here is a very naive implementation that assumes there is the attribute/property self.current_user logged in user has stored.
class YourBaseRequestHandler(object):
#property
def current_user(self):
"""The current user logged in."""
pass
def query(self, session, entities):
"""Use this method instead of :method:`Session.query()
<sqlalchemy.orm.session.Session.query>`.
"""
return session.query(entities).filter_by(user_id=self.current_user.id)
I wrote an SQLAlchemy extension that I think does what you are describing: https://github.com/mwhite/multialchemy
It does this by proxying changes to the Query._from_obj and QueryContext._froms properties, which is where the tables to select from ultimately get set.