Creating a table from a schema? Using Ruby Sequel gem - ruby

I can dump the current Database#schema, but I'd like to create a new table (in a different database) with that schema, but I dont see how this can be done?
DB = Sequel.connect('sqlite:///database_from.db')
schema = DB.schema :table
# schema => [[:id, {:auto_increment=>true, :allow_null=>false, :default=>nil, :primary_key=>true, :db_type=>"smallint(5) unsigned", :type=>:integer, :ruby_default=>nil}], [:field, {:allow_null=>true, :default=>nil, :primary_key=>false, :db_type=>"smallint(5) unsigned", :type=>:integer, :ruby_default=>nil}]]
DB2 = Sequel.connect('sqlite:///database_to.db')
DB2.create_table('table name', schema) #< allowing this would be cool!

One way it can be done via Migrations
Dump the copy from database :-
sequel -d mysql://root#localhost/database1 > db/001_test.rb
Edit to only include the table required.
Run the migration into the new db :-
sequel -m db/ mysql://root#localhost/database2

Related

How to merge orc files for external tables?

I am trying to merge multiple small ORC files. Came across ALTER TABLE CONCATENATE command but that only works for managed tables.
Hive gave me the following error when I try to run it :
FAILED: SemanticException
org.apache.hadoop.hive.ql.parse.SemanticException: Concatenate/Merge
can only be performed on managed tables
Following are the table parameters :
Table Type: EXTERNAL_TABLE
Table Parameters:
COLUMN_STATS_ACCURATE true
EXTERNAL TRUE
numFiles 535
numRows 27051810
orc.compress SNAPPY
rawDataSize 20192634094
totalSize 304928695
transient_lastDdlTime 1512126635
# Storage Information
SerDe Library: org.apache.hadoop.hive.ql.io.orc.OrcSerde
InputFormat: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
Compressed: No
Num Buckets: -1
Bucket Columns: []
Sort Columns: []
Storage Desc Params:
serialization.format 1
I believe your table is an external table,then there are two ways:
Either you can change it to Managed table (ALTER TABLE <table> SET
TBLPROPERTIES('EXTERNAL'='FALSE') and run the ALTER TABLE
CONCATENATE.Then you can convert the same back to external changing
it to TRUE.
Or you can create a managed table using CTAS and insert the data. Then run the merge query and import the data back to external table
From my previous answer to this question, here is a small script in Python using PyORC to concatenate the small ORC files together. It doesn't use Hive at all, so you can only use it if you have direct access to the files and are able to run a Python script on them, which might not always be the case in managed hosts.
import pyorc
import argparse
def main():
parser = argparse.ArgumentParser()
parser.add_argument('-o', '--output', type=argparse.FileType(mode='wb'))
parser.add_argument('files', type=argparse.FileType(mode='rb'), nargs='+')
args = parser.parse_args()
schema = str(pyorc.Reader(args.files[0]).schema)
with pyorc.Writer(args.output, schema) as writer:
for i, f in enumerate(args.files):
reader = pyorc.Reader(f)
if str(reader.schema) != schema:
raise RuntimeError(
"Inconsistent ORC schemas.\n"
"\tFirst file schema: {}\n"
"\tFile #{} schema: {}"
.format(schema, i, str(reader.schema))
)
for line in reader:
writer.write(line)
if __name__ == '__main__':
main()

How to insert records into a table using Sequel

This is my code to insert a few lines in my simple "series" table:
db = sequel.postgres(config['dbname'],:user=>config['user'],:password=>config['password'],:host=>config['host'],:port=>config['port'],:max_connections=>10)
#db.create_table? 'series' do
primary_key "series_id" , :autoincrement=>true
String "series_name"
end
seriesDS = db['series']
seriesDS.insert('series_name' => 'test_value')
At seriesDS.insert I get a
Sequel::DatabaseError - PG::SyntaxError: ERREUR: erreur de syntaxe sur ou près de « series »
I didn't manage to get the full SQL query for analysys in STDOUT. It's strange because I added this:
logger = Logger.new STDOUT
logger.level = Logger::DEBUG
db.loggers << logger
It appears to be generating the wrong SQL, but I have no clue to the error's source.
I'm using:
Ruby 2.2.5
Sequel 4.4.1
Postgresql 9.6
The program is launched using ruby -E utf8.
Sequel uses Ruby symbols to represent SQL identifiers. At the very least, you must use seriesDS = db[:series].
Other cases where you want an SQL identifier you should probably switch from using strings to symbols.

SPARK SQL (1.5.1) connect to Oracle and write to Avro

I am using spark-sql to connect to oracle databse and getting data as dataframes. I would like to write this retrieved data into avro file. While writing to avro I am seeing multiple issues, could you help us.
Here is the code -
val df = sqlContext.read.format("jdbc")
.options(Map( "driver"->"oracle.jdbc.driver.OracleDriver",
"url" -> "jdbc:oracle:thin:user/password#host/service"
, "numPartitions" -> "1", "dbtable"-> "
(Select * from schema.table WHERE STAGE_NUM <=39 and
guid='I284ba1f9cdba11dea82ab9f4ee295c21')"))
.load()
df.write.format("com.databricks.spark.avro").save("Outputfile")
Dependencies that are there in my project -
<dependency><br> <groupId>org.apache.spark</groupId><br> <artifactId>spark-sql_2.10</artifactId><br> <version>1.5.1</version><br></dependency><br><dependency><br> <groupId>com.databricks</groupId><br> <artifactId>spark-avro_2.10</artifactId><br> <version>2.0.1</version><br></dependency><br><dependency><br> <groupId>org.apache.avro</groupId><br> <artifactId>avro</artifactId><br> <version>1.7.7</version><br></dependency><br><dependency><br> <groupId>org.apache.avro</groupId><br> <artifactId>avro-mapred</artifactId><br> <version>1.7.7</version><br></dependency>
Here is the exception information -
java.lang.RuntimeException: com.databricks.spark.avro.DefaultSource does not allow create table as select
If I use - df.write.avro("headnotes"), I get the following exception.
java.lang.IllegalAccessError: tried to access class org.apache.avro.SchemaBuilder$FieldDefault from class com.databricks.spark.avro.SchemaConverters$$anonfun$convertStructToAvro$1

Clojure: using ragtime with sqlite3

I'd like to use ragtime to manage migrations on an SQLite database. Following the instructions here, i've tried the following in the REPL:
(require '[ragtime.jdbc :as jdbc]
'[ragtime.repl :as repl])
(def config
{:datastore (jdbc/sql-database {:connection-uri "jdbc:sqlite:resources/db.sqlite3"})
:migrations (jdbc/load-resources "migrations")})
(repl/migrate config)
All I get is the following error:
ClassCastException clojure.lang.PersistentVector cannot be cast to clojure.lang.Named clojure.core/name (core.clj:1546)
The database file exists in resources/db.sqlite3. I've tried tracing the exception (i can add the stack trace if needed), but it seems to happen deep in clojure.java.jdbc.
As I'm new to the JVM and JDBC, I'm also not sure whether I'm specifying the :connection-uri correctly; I've tried several variants but can't seem to make it worK.
Any help would be much appreciated !
EDIT: stack trace:
java.lang.ClassCastException: clojure.lang.PersistentVector cannot be cast to clojure.lang.Named
at clojure.core$name.invokeStatic (core.clj:1546)
clojure.core$name.invoke (core.clj:1540)
clojure.java.jdbc$as_sql_name.invokeStatic (jdbc.clj:67)
clojure.java.jdbc$as_sql_name.invoke (jdbc.clj:56)
clojure.java.jdbc$create_table_ddl$spec_to_string__2511.invoke (jdbc.clj:1052)
clojure.core$map$fn__4785.invoke (core.clj:2646)
clojure.lang.LazySeq.sval (LazySeq.java:40)
clojure.lang.LazySeq.seq (LazySeq.java:49)
clojure.lang.LazySeq.first (LazySeq.java:71)
clojure.lang.RT.first (RT.java:667)
clojure.core$first__4339.invokeStatic (core.clj:55)
clojure.string$join.invokeStatic (string.clj:180)
clojure.string$join.invoke (string.clj:180)
clojure.java.jdbc$create_table_ddl.invokeStatic (jdbc.clj:1056)
clojure.java.jdbc$create_table_ddl.doInvoke (jdbc.clj:1041)
clojure.lang.RestFn.invoke (RestFn.java:423)
ragtime.jdbc$migrations_table_ddl.invokeStatic (jdbc.clj:16)
ragtime.jdbc$migrations_table_ddl.invoke (jdbc.clj:15)
ragtime.jdbc$ensure_migrations_table_exists.invokeStatic (jdbc.clj:22)
ragtime.jdbc$ensure_migrations_table_exists.invoke (jdbc.clj:20)
ragtime.jdbc.SqlDatabase.applied_migration_ids (jdbc.clj:42)
ragtime.core$migrate_all.invokeStatic (core.clj:43)
ragtime.core$migrate_all.invoke (core.clj:32)
ragtime.repl$migrate.invokeStatic (repl.clj:49)
ragtime.repl$migrate.invoke (repl.clj:34)
thulium.core$eval8407.invokeStatic (form-init2686611279014890656.clj:1)
(the rest is REPL and compiler calls)
And the two migration files, resources/migrations/001-initial.up.sql:
CREATE TABLE tests (
id INTEGER PRIMARY KEY AUTOINCREMENT
);
and resources/migrations/001-initial.down.sql:
DROP TABLE tests;
Give it a go with these versions:
[org.clojure/java.jdbc "0.6.1"]
[org.xerial/sqlite-jdbc "3.8.7"]

Errors when creating table using cassandra-jdbc-1.2.1 jar

I am having some difficulty creating a column family (table) in cassandra via the cassandra-jdbc driver.
The cql command works correctly in cqlsh, but doesn't when using cassandra jdbc. I suspect this is something to do with the way I have defined my connection string. Any help would be greatly helpful.
Let me try and explain what I have done.
I have created a keyspace using cqlsh with the following command
CREATE KEYSPACE authdb WITH
REPLICATION = {
'class' : 'SimpleStrategy',
'replication_factor' : 1
};
This is as per the documentation at: http://www.datastax.com/docs/1.2/cql_cli/cql/CREATE_KEYSPACE#cql-create-keyspace
I am able to create a table (column family) in cqlsh using
CREATE TABLE authdb.users(
user_name varchar PRIMARY KEY,
password varchar,
gender varchar,
session_token varchar,
birth_year bigint
);
This works correctly.
My problems start when I try to create the table using cassandra-jdbc-1.2.1.jar
The code I use is:
public static void createColumnFamily() {
try {
Class.forName("org.apache.cassandra.cql.jdbc.CassandraDriver");
Connection con = DriverManager.getConnection("jdbc:cassandra://localhost:9160/authdb?version=3.0.0");
String qry = "CREATE TABLE authdb.users(" +
"user_name varchar PRIMARY KEY," +
"password varchar," +
"gender varchar," +
"session_token varchar," +
"birth_year bigint" +
")";
Statement smt = con.createStatement();
smt.executeUpdate(qry);
con.close();
} catch (Exception e) {
e.printStackTrace();
}
When using cassandra-jdbc-1.2.1.jar I get the following error:
main DEBUG jdbc.CassandraDriver - Final Properties to Connection: {cqlVersion=3.0.0, portNumber=9160, databaseName=authdb, serverName=localhost}
main DEBUG jdbc.CassandraConnection - Connected to localhost:9160 in Cluster 'authdb' using Keyspace 'Test Cluster' and CQL version '3.0.0'
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.cassandra.thrift.Cassandra$Client.execute_cql3_query(Ljava/nio/ByteBuffer;Lorg/apache/cassandra/thrift/Compression;Lorg/apache/cassandra/thrift/ConsistencyLevel;)Lorg/apache/cassandra/thrift/CqlResult;
at org.apache.cassandra.cql.jdbc.CassandraConnection.execute(CassandraConnection.java:447)
Note: the cluster and key space are not correct
When using cassandra-jdbc-1.1.2.jar I get the following error:
main DEBUG jdbc.CassandraDriver - Final Properties to Connection: {cqlVersion=3.0.0, portNumber=9160, databaseName=authdb, serverName=localhost}
main INFO jdbc.CassandraConnection - Connected to localhost:9160 using Keyspace authdb and CQL version 3.0.0
java.sql.SQLSyntaxErrorException: Cannot execute/prepare CQL2 statement since the CQL has been set to CQL3(This might mean your client hasn't been upgraded correctly to use the new CQL3 methods introduced in Cassandra 1.2+).
Note: in this instance the cluster and keyspace appear to be correct.
The error when using the 1.2.1 jar is because you have an old version of the cassandra-thrift jar. You need to keep that in sync with the cassandra-jdbc version. The cassandra-thrift jar is in the lib directory of the binary download.

Resources