How to insert a Clob into a Oracle table with Slick 3 and Oracle 12? - oracle

We have a table with a CLOB column (to save JSON data)
As we understand (from the docs) slick support LOB types (http://slick.lightbend.com/doc/3.1.1/schemas.html)
We are able to query the table succesfuly. Including the CLOB column.
We are not able to insert a register with a Clob. We are converting a String to java.sql.Clob with:
private java.sql.Clob stringToClob(String source)
{
try
{
return new javax.sql.rowset.serial.SerialClob(source.toCharArray());
}
catch (Exception e)
{
log.error("Could not convert string to a CLOB",e);
return null;
}
}
but in the end the exception from slick is the following:
java.lang.ClassCastException: javax.sql.rowset.serial.SerialClob cannot be cast to oracle.sql.CLOB
Is this possible?

We finally found a workaround as follows:
According to the column definition in slick
def column[C](n: String, options: ColumnOption[C]*)(implicit tt: TypedType[C]): Rep[C]
You can specify how the column is going to be translated between the driver and your code. If you want to use the out-of-the-box translations fine but for Oracle the translation for the CLOB type doesn't seem to work properly.
What we did was to define the column as a String but letting Slick to handle the translation with our custom code. The column definiton is the following:
def myClobColumn = column[String]( "CLOBCOLUMN" )( new StringJdbcType )
asd
Being StringJdbcType our custom code to solve the translation between our String to be inserted (up to 65535 bytes) and an Oracle CLOB.
The code for StringJdbcType is as follows:
class StringJdbcType extends driver.DriverJdbcType[String] {
def sqlType = java.sql.Types.VARCHAR
// Here's the solution
def setValue( v: String, p: PreparedStatement, idx: Int ) = {
val conn = p.getConnection
val clob = conn.createClob()
clob.setString( 1, v )
p.setClob( idx, clob )
}
def getValue( r: ResultSet, idx: Int ) = scala.io.Source.fromInputStream( r.getAsciiStream( "DSPOLIZARIESGO" ) )( Codec.ISO8859 ).getLines().mkString
def updateValue( v: String, r: ResultSet, idx: Int ) = r.updateString( idx, v )
override def hasLiteralForm = false
}
The setValue function was our salvation because we could build an Oracle CLOB with the already instantiated PreparedStatement and the String comming from our domain. In our implementation we only had to do the plumbing and dirty work for the Oracle CLOB.
In sum, the extension point offered by Slick in driver.DriverJdbcType[A] was what we actually used to make the thing work.

These are some improvements related to the solution: close resources and stream inspection
class BigStringJdbcType
extends profile.DriverJdbcType[String] {
def sqlType: Int = java.sql.Types.VARCHAR
def setValue(v: String, p: PreparedStatement, idx: Int): Unit = {
val connection = p.getConnection
val clob = connection.createClob()
try {
clob.setString(1, v)
p.setClob(idx, clob)
} finally {
clob.free()
}
}
def getValue(r: ResultSet, idx: Int): String = {
val asciiStream = r.getAsciiStream(idx)
try {
val (bufferEmpty, encoding) = getInputStreamStatus(asciiStream)
if (bufferEmpty) {
convertInputStreamToString(asciiStream, encoding)
} else ""
} finally {
asciiStream.close()
}
}
def updateValue(v: String, r: ResultSet, idx: Int): Unit =
r.updateString(idx, v)
override def hasLiteralForm: Boolean = false
}
Some utilities to complement the solution
def getInputStreamStatus(stream: InputStream): (Boolean, String) = {
val reader = new InputStreamReader(stream)
try {
val bufferEmpty = reader.ready()
val encoding = reader.getEncoding
bufferEmpty -> encoding
} finally {
reader.close()
}
}
def convertInputStreamToString(
stream: InputStream,
encoding: String
): String = {
scala.io.Source.fromInputStream(stream)(encoding).getLines().mkString
}

Related

Kotlin MVVM, How to get the latest value from Entity in ViewModel?

I have created an app where I try to insert a record with the latest order number increased by one.
The main function is triggered from Activity, however, the whole process is in my ViewModel.
Issue no 1, After I insert a new record the order by number is not updated.
Issue no 2, When I insert first record the order by number is null, for that reason I am checking for null and setting the value to 0.
My goal here is to get the latest order_by number from Entity in my ViewModel, increased by 1 and add that new number to my new record using fun addTestData(..).
Entity:
#Entity(tableName = "word_table")
data class Word(
#ColumnInfo(name = "id") val id: Int,
#ColumnInfo(name = "word") val word: String,
#ColumnInfo(name = "order_by") val orderBy: Int
Dao:
#Query("SELECT order_by FROM word_table ORDER BY order_by DESC LIMIT 1")
suspend fun getHighestOrderId(): Int
Repository:
#Suppress("RedundantSuspendModifier")
#WorkerThread
suspend fun getHighestOrderId(): Int {
return wordDao.getHighestOrderId()
}
ViewModel:
private var _highestOrderId = MutableLiveData<Int>()
val highestOrderId: LiveData<Int> = _highestOrderId
fun getHighestOrderId() = viewModelScope.launch {
val highestOrderId = repository.getHighestOrderId()
_highestOrderId.postValue(highestOrderId)
}
fun addTestData(text: String) {
for (i in 0..1500) {
getHighestOrderId()
var highestNo = 0
val highestOrderId = highestOrderId.value
if (highestOrderId == null) {
highestNo = 0
} else {
highestNo = highestOrderId
}
val addNumber = highestNo + 1
val word2 = Word(0, text + "_" + addNumber,addNumber)
insertWord(word2)
}
}
Activity:
wordViewModel.addTestData(text)

Get value from sorting data in Kolin

My task is to return ArrayList of type transactionsList
First I have to parse date in string and then stream it (ascending)
I know how to do that but sortedWith give back type Unit not Array.
val cmp = compareBy<transactionsList> { LocalDate.parse(it.date,
DateTimeFormatter.ofPattern("dd.MM.yyyy.")) }
val sortedList: List<transactionsList> = ArrayList()
acountTransactionList
. sortedWith(cmp)
.forEach(::println)
return acountTransactionList
I cannot store data from that sort because it gives me type Unit.
The following works as intended (the issue is that forEach() method returns Unit, not each object):
fun main() {
val acountTransactionList: ArrayList<transactionsList> = arrayListOf(transactionsList("10.10.2010."),
transactionsList("10.10.2000."),
transactionsList("10.09.2010."),
transactionsList("10.11.2010."),
transactionsList("11.11.2010."),
transactionsList("10.10.2001."))
val cmp = compareBy<transactionsList> {
LocalDate.parse(it.date, DateTimeFormatter.ofPattern("dd.MM.yyyy."))
}
val sortedList: List<transactionsList> = acountTransactionList.sortedWith(cmp)
println(sortedList)
}
data class transactionsList(val date: String)

How to create a sorted merged list from two diffrent ArrayList of Objects based on a common value field in Kotlin?

I have two ArrayLists of different Data classes as given below:
class Record{
var id: Long = 0
var RecordId: Int = 0
var Record: String? = null
var title: String? = null
var description: String? = null
var longDate: Long = 0
}
class Type{
var id: Long = 0
var typeId: Int = 0
var subTypeId: Int = 0
var typeString: String? = null
var longDate: Long = 0
}
var recordsList: ArrayList<Record>
var typesList: ArrayList<Type>
Now, I want a merged list of these two which will be sorted based on a common field in both the Objects i.e. longDate. I have tried .associate , sortedBy, sortedWith(compareBy<>) etc. but could not achieve the desired result.
Here, also there is one point to note is that while comparing the two lists it is possible that one on them may be empty.
This will produce a List<Any> with all items sorted by longDate:
(recordsList + typesList)
.sortedBy {
when (it) {
is Record -> it.longDate
is Type -> it.longDate
else -> error("")
}
}
Or you might consider creating an interface that has val longDate: Long that both of these classes implement. Then you wouldn't need the when expression, and your List would be of the type of the interface.
Something like this should work, but I personally think that it is quite the code smell. There is no guarantee that Record.longDate is truly the same type as Type.longDate (we know that it is, since we create the model, but the compiler would never know).
val result = (recordsList + typesList).sortedBy {
when(it){
is Record -> it.longDate
is Type -> it.longDate
else -> error("incompatible list element $it")
}
}
And it would work something like this: (I've removed some parameters from the models as they don't really count here)
fun main() {
val recordsList = listOf(Record().apply { longDate = 5 }, Record().apply { longDate = 3})
val typesList = listOf(Type().apply { longDate = 3 }, Type().apply { longDate = 2 })
val result = (recordsList + typesList).sortedBy {
when(it){
is Record -> it.longDate
is Type -> it.longDate
else -> error("incompatible list element $it")
}
}
result.forEach{
println(it.toString())
}
}
class Record{
var longDate: Long = 0
override fun toString(): String {
return "Record(longDate=$longDate)"
}
}
class Type{
var longDate: Long = 0
override fun toString(): String {
return "Type(longDate=$longDate)"
}
}
This will output:
Type(longDate=2)
Record(longDate=3)
Type(longDate=3)
Record(longDate=5)
Doing it in a more generic way, so that you can create a fun where you state which property to be used from each object type would most likely use reflection, which I'd avoid at all costs.
So I would definitely consider if one object can inherit the other, or create an interface, or anything else.
I'll end with 2 questions: why no constructors? why ArrayList and not list?

Replacing for loops for searching list in kotlin

I am trying to convert my code as clean as possible using the Kotlin's built-in functions. I have done some part of the code using for loops. But I want to know the efficient built-in functions to be used for this application
I have two array lists accounts and cards.
My goal is to search a specific card with the help of its card-number, in the array list named cards.
Then I have to validate the pin. If the pin is correct, by getting that gift card's customerId I have to search the account in the array list named accounts. Then I have to update the balance of the account.
These are the class which I have used
class Account{
constructor( )
var id : String = generateAccountNumber()
var name: String? = null
set(name) = if (name != null) field = name.toUpperCase() else { field = "Unknown User"; println("invalid details\nAccount is not Created");}
var balance : Double = 0.0
set(balance) = if (balance >= 0) field = balance else { field = 0.0 }
constructor(id: String = generateAccountNumber(), name: String?,balance: Double) {
this.id = id
this.balance = balance
this.name = name
}
}
class GiftCard {
constructor( )
var cardNumber : String = generateCardNumber()
var pin: String? = null
set(pin) = if (pin != null) field = pin else { field = "Unknown User"; println("Please set the pin\nCard is not Created");}
var customerId : String = ""
set(customerId) = if (customerId != "") field = customerId else { field = "" }
var cardBalance : Double = 0.0
set(cardBalance) = if (cardBalance > 0) field = cardBalance else { field = 0.0; println("Card is created with zero balance\nPlease deposit") }
var status = Status.ACTIVE
constructor(cardNumber: String = generateCardNumber(),
pin: String,
customerId: String,
cardBalance: Double = 0.0,
status: Status = Status.ACTIVE){
this.cardNumber = cardNumber
this.pin = pin
this.customerId = customerId
this.cardBalance = cardBalance
this.status = status
}
}
This is the part of code, I have to be changed :
override fun closeCard(cardNumber: String, pin: String): Pair<Boolean, Boolean> {
for (giftcard in giftcards) {
if (giftcard.cardNumber == cardNumber) {
if (giftcard.pin == pin) {
giftcard.status = Status.CLOSED
for (account in accounts)
account.balance = account.balance + giftcard.cardBalance
giftcard.cardBalance = 0.0
return Pair(true,true)
}
\\invalid pin
return Pair(true,false)
}
}
\\card is not present
return Pair(false,false)
}
Both classes are not very idiomatic. The primary constructor of a Kotlin class is implicit and does not need to be defined, however, you explicitly define a constructor and thus you add another one that is empty.
// good
class C
// bad
class C {
constructor()
}
Going further, Kotlin has named arguments and default values, so make use of them.
class Account(
val id: String = generateAccountNumber(),
val name: String = "Unknown User",
val balance: Double = 0.0
)
Double is a very bad choice for basically anything due to its shortcomings, see for instance https://www.floating-point-gui.de/ Choosing Int, Long, heck even BigDecimal would be better. It also seems that you don’t want the balance to ever go beneath zero, in that case consider UInt and ULong.
Last but not least is the mutability of your class. This can make sense but it also might be dangerous. It is up to you to decide upon your needs and requirements.
enum class Status {
CLOSED
}
#ExperimentalUnsignedTypes
class Account(private var _balance: UInt) {
val balance get() = _balance
operator fun plusAssign(other: UInt) {
_balance += other
}
}
#ExperimentalUnsignedTypes
class GiftCard(
val number: String,
val pin: String,
private var _status: Status,
private var _balance: UInt
) {
val status get() = _status
val balance get() = _balance
fun close() {
_status = Status.CLOSED
_balance = 0u
}
}
#ExperimentalUnsignedTypes
class Main(val accounts: List<Account>, val giftCards: List<GiftCard>) {
fun closeCard(cardNumber: String, pin: String) =
giftCards.find { it.number == cardNumber }?.let {
(it.pin == pin).andAlso {
accounts.forEach { a -> a += it.balance }
it.close()
}
}
}
inline fun Boolean.andAlso(action: () -> Unit): Boolean {
if (this) action()
return this
}
We change the return type from Pair<Boolean, Boolean> to a more idiomatic Boolean? where Null means that we did not find anything (literally the true meaning of Null), false that the PIN did not match, and true that the gift card was closed. We are not creating a pair anymore and thus avoid the additional object allocation.
The Boolean.andAlso() is a handy extension function that I generally keep handy, it is like Any.also() from Kotlin’s STD but only executes the action if the Boolean is actually true.
There's probably a million different ways to do this, but here's one that at least has some language features I feel are worthy to share:
fun closeCard(cardNumber: String, pin: String): Pair<Boolean, Boolean> {
val giftCard = giftcards.find { it.cardNumber == cardNumber }
?: return Pair(false, false)
return if (giftCard.pin == pin) {
giftCard.status = Status.CLOSED
accounts.forEach {
it.balance += giftCard.cardBalance
}
Pair(true, true)
} else
Pair(true, false)
}
The first thing to notice if the Elvis operator - ?: - which evaluates the right side of the expression if the left side is null. In this case, if find returns null, which is equivalent to not finding a card number that matches the desired one, we'll immediately return Pair(false, false). This is the last step in your code.
From there one it's pretty straight forward. If the pins match, you loop through the accounts list with a forEach and close the card. If the pins don't match, then we'll go straight to the else branch. In kotlin, if can be used as an expression, therefore we can simply put the return statement before the if and let it return the result of the last expression on each branch.
PS: I won't say this is more efficient than your way. It's just one way that uses built-in functions - find and forEach - like you asked, as well as other language features.
PPS: I would highly recommend to try and find another way to update the lists without mutating the objects. I don't know your use cases, but this doesn't feel too thread-safe. I didn't post any solution for this, because it's outside the scope of this question.

SPARK SQL - update MySql table using DataFrames and JDBC

I'm trying to insert and update some data on MySql using Spark SQL DataFrames and JDBC connection.
I've succeeded to insert new data using the SaveMode.Append. Is there a way to update the data already existing in MySql Table from Spark SQL?
My code to insert is:
myDataFrame.write.mode(SaveMode.Append).jdbc(JDBCurl,mySqlTable,connectionProperties)
If I change to SaveMode.Overwrite it deletes the full table and creates a new one, I'm looking for something like the "ON DUPLICATE KEY UPDATE" available in MySql
It is not possible. As for now (Spark 1.6.0 / 2.2.0 SNAPSHOT) Spark DataFrameWriter supports only four writing modes:
SaveMode.Overwrite: overwrite the existing data.
SaveMode.Append: append the data.
SaveMode.Ignore: ignore the operation (i.e. no-op).
SaveMode.ErrorIfExists: default option, throw an exception at runtime.
You can insert manually for example using mapPartitions (since you want an UPSERT operation should be idempotent and as such easy to implement), write to temporary table and execute upsert manually, or use triggers.
In general achieving upsert behavior for batch operations and keeping decent performance is far from trivial. You have to remember that in general case there will be multiple concurrent transactions in place (one per each partition) so you have to ensure that there will no write conflicts (typically by using application specific partitioning) or provide appropriate recovery procedures. In practice it may be better to perform and batch writes to a temporary table and resolve upsert part directly in the database.
A pity that there is no SaveMode.Upsert mode in Spark for such quite common cases like upserting.
zero322 is right in general, but I think it should be possible (with compromises in performance) to offer such replace feature.
I also wanted to provide some java code for this case.
Of course it is not that performant as the built-in one from spark - but it should be a good basis for your requirements. Just modify it towards your needs:
myDF.repartition(20); //one connection per partition, see below
myDF.foreachPartition((Iterator<Row> t) -> {
Connection conn = DriverManager.getConnection(
Constants.DB_JDBC_CONN,
Constants.DB_JDBC_USER,
Constants.DB_JDBC_PASS);
conn.setAutoCommit(true);
Statement statement = conn.createStatement();
final int batchSize = 100000;
int i = 0;
while (t.hasNext()) {
Row row = t.next();
try {
// better than REPLACE INTO, less cycles
statement.addBatch(("INSERT INTO mytable " + "VALUES ("
+ "'" + row.getAs("_id") + "',
+ "'" + row.getStruct(1).get(0) + "'
+ "') ON DUPLICATE KEY UPDATE _id='" + row.getAs("_id") + "';"));
//conn.commit();
if (++i % batchSize == 0) {
statement.executeBatch();
}
} catch (SQLIntegrityConstraintViolationException e) {
//should not occur, nevertheless
//conn.commit();
} catch (SQLException e) {
e.printStackTrace();
} finally {
//conn.commit();
statement.executeBatch();
}
}
int[] ret = statement.executeBatch();
System.out.println("Ret val: " + Arrays.toString(ret));
System.out.println("Update count: " + statement.getUpdateCount());
//conn.commit();
statement.close();
conn.close();
overwrite org.apache.spark.sql.execution.datasources.jdbc JdbcUtils.scala insert into to replace into
import java.sql.{Connection, Driver, DriverManager, PreparedStatement, ResultSet, SQLException}
import scala.collection.JavaConverters._
import scala.util.control.NonFatal
import com.typesafe.scalalogging.Logger
import org.apache.spark.sql.catalyst.InternalRow
import org.apache.spark.sql.execution.datasources.jdbc.{DriverRegistry, DriverWrapper, JDBCOptions}
import org.apache.spark.sql.jdbc.{JdbcDialect, JdbcDialects, JdbcType}
import org.apache.spark.sql.types._
import org.apache.spark.sql.{DataFrame, Row}
/**
* Util functions for JDBC tables.
*/
object UpdateJdbcUtils {
val logger = Logger(this.getClass)
/**
* Returns a factory for creating connections to the given JDBC URL.
*
* #param options - JDBC options that contains url, table and other information.
*/
def createConnectionFactory(options: JDBCOptions): () => Connection = {
val driverClass: String = options.driverClass
() => {
DriverRegistry.register(driverClass)
val driver: Driver = DriverManager.getDrivers.asScala.collectFirst {
case d: DriverWrapper if d.wrapped.getClass.getCanonicalName == driverClass => d
case d if d.getClass.getCanonicalName == driverClass => d
}.getOrElse {
throw new IllegalStateException(
s"Did not find registered driver with class $driverClass")
}
driver.connect(options.url, options.asConnectionProperties)
}
}
/**
* Returns a PreparedStatement that inserts a row into table via conn.
*/
def insertStatement(conn: Connection, table: String, rddSchema: StructType, dialect: JdbcDialect)
: PreparedStatement = {
val columns = rddSchema.fields.map(x => dialect.quoteIdentifier(x.name)).mkString(",")
val placeholders = rddSchema.fields.map(_ => "?").mkString(",")
val sql = s"REPLACE INTO $table ($columns) VALUES ($placeholders)"
conn.prepareStatement(sql)
}
/**
* Retrieve standard jdbc types.
*
* #param dt The datatype (e.g. [[org.apache.spark.sql.types.StringType]])
* #return The default JdbcType for this DataType
*/
def getCommonJDBCType(dt: DataType): Option[JdbcType] = {
dt match {
case IntegerType => Option(JdbcType("INTEGER", java.sql.Types.INTEGER))
case LongType => Option(JdbcType("BIGINT", java.sql.Types.BIGINT))
case DoubleType => Option(JdbcType("DOUBLE PRECISION", java.sql.Types.DOUBLE))
case FloatType => Option(JdbcType("REAL", java.sql.Types.FLOAT))
case ShortType => Option(JdbcType("INTEGER", java.sql.Types.SMALLINT))
case ByteType => Option(JdbcType("BYTE", java.sql.Types.TINYINT))
case BooleanType => Option(JdbcType("BIT(1)", java.sql.Types.BIT))
case StringType => Option(JdbcType("TEXT", java.sql.Types.CLOB))
case BinaryType => Option(JdbcType("BLOB", java.sql.Types.BLOB))
case TimestampType => Option(JdbcType("TIMESTAMP", java.sql.Types.TIMESTAMP))
case DateType => Option(JdbcType("DATE", java.sql.Types.DATE))
case t: DecimalType => Option(
JdbcType(s"DECIMAL(${t.precision},${t.scale})", java.sql.Types.DECIMAL))
case _ => None
}
}
private def getJdbcType(dt: DataType, dialect: JdbcDialect): JdbcType = {
dialect.getJDBCType(dt).orElse(getCommonJDBCType(dt)).getOrElse(
throw new IllegalArgumentException(s"Can't get JDBC type for ${dt.simpleString}"))
}
// A `JDBCValueGetter` is responsible for getting a value from `ResultSet` into a field
// for `MutableRow`. The last argument `Int` means the index for the value to be set in
// the row and also used for the value in `ResultSet`.
private type JDBCValueGetter = (ResultSet, InternalRow, Int) => Unit
// A `JDBCValueSetter` is responsible for setting a value from `Row` into a field for
// `PreparedStatement`. The last argument `Int` means the index for the value to be set
// in the SQL statement and also used for the value in `Row`.
private type JDBCValueSetter = (PreparedStatement, Row, Int) => Unit
/**
* Saves a partition of a DataFrame to the JDBC database. This is done in
* a single database transaction (unless isolation level is "NONE")
* in order to avoid repeatedly inserting data as much as possible.
*
* It is still theoretically possible for rows in a DataFrame to be
* inserted into the database more than once if a stage somehow fails after
* the commit occurs but before the stage can return successfully.
*
* This is not a closure inside saveTable() because apparently cosmetic
* implementation changes elsewhere might easily render such a closure
* non-Serializable. Instead, we explicitly close over all variables that
* are used.
*/
def savePartition(
getConnection: () => Connection,
table: String,
iterator: Iterator[Row],
rddSchema: StructType,
nullTypes: Array[Int],
batchSize: Int,
dialect: JdbcDialect,
isolationLevel: Int): Iterator[Byte] = {
val conn = getConnection()
var committed = false
var finalIsolationLevel = Connection.TRANSACTION_NONE
if (isolationLevel != Connection.TRANSACTION_NONE) {
try {
val metadata = conn.getMetaData
if (metadata.supportsTransactions()) {
// Update to at least use the default isolation, if any transaction level
// has been chosen and transactions are supported
val defaultIsolation = metadata.getDefaultTransactionIsolation
finalIsolationLevel = defaultIsolation
if (metadata.supportsTransactionIsolationLevel(isolationLevel)) {
// Finally update to actually requested level if possible
finalIsolationLevel = isolationLevel
} else {
logger.warn(s"Requested isolation level $isolationLevel is not supported; " +
s"falling back to default isolation level $defaultIsolation")
}
} else {
logger.warn(s"Requested isolation level $isolationLevel, but transactions are unsupported")
}
} catch {
case NonFatal(e) => logger.warn("Exception while detecting transaction support", e)
}
}
val supportsTransactions = finalIsolationLevel != Connection.TRANSACTION_NONE
try {
if (supportsTransactions) {
conn.setAutoCommit(false) // Everything in the same db transaction.
conn.setTransactionIsolation(finalIsolationLevel)
}
val stmt = insertStatement(conn, table, rddSchema, dialect)
val setters: Array[JDBCValueSetter] = rddSchema.fields.map(_.dataType)
.map(makeSetter(conn, dialect, _))
val numFields = rddSchema.fields.length
try {
var rowCount = 0
while (iterator.hasNext) {
val row = iterator.next()
var i = 0
while (i < numFields) {
if (row.isNullAt(i)) {
stmt.setNull(i + 1, nullTypes(i))
} else {
setters(i).apply(stmt, row, i)
}
i = i + 1
}
stmt.addBatch()
rowCount += 1
if (rowCount % batchSize == 0) {
stmt.executeBatch()
rowCount = 0
}
}
if (rowCount > 0) {
stmt.executeBatch()
}
} finally {
stmt.close()
}
if (supportsTransactions) {
conn.commit()
}
committed = true
Iterator.empty
} catch {
case e: SQLException =>
val cause = e.getNextException
if (cause != null && e.getCause != cause) {
if (e.getCause == null) {
e.initCause(cause)
} else {
e.addSuppressed(cause)
}
}
throw e
} finally {
if (!committed) {
// The stage must fail. We got here through an exception path, so
// let the exception through unless rollback() or close() want to
// tell the user about another problem.
if (supportsTransactions) {
conn.rollback()
}
conn.close()
} else {
// The stage must succeed. We cannot propagate any exception close() might throw.
try {
conn.close()
} catch {
case e: Exception => logger.warn("Transaction succeeded, but closing failed", e)
}
}
}
}
/**
* Saves the RDD to the database in a single transaction.
*/
def saveTable(
df: DataFrame,
url: String,
table: String,
options: JDBCOptions) {
val dialect = JdbcDialects.get(url)
val nullTypes: Array[Int] = df.schema.fields.map { field =>
getJdbcType(field.dataType, dialect).jdbcNullType
}
val rddSchema = df.schema
val getConnection: () => Connection = createConnectionFactory(options)
val batchSize = options.batchSize
val isolationLevel = options.isolationLevel
df.foreachPartition(iterator => savePartition(
getConnection, table, iterator, rddSchema, nullTypes, batchSize, dialect, isolationLevel)
)
}
private def makeSetter(
conn: Connection,
dialect: JdbcDialect,
dataType: DataType): JDBCValueSetter = dataType match {
case IntegerType =>
(stmt: PreparedStatement, row: Row, pos: Int) =>
stmt.setInt(pos + 1, row.getInt(pos))
case LongType =>
(stmt: PreparedStatement, row: Row, pos: Int) =>
stmt.setLong(pos + 1, row.getLong(pos))
case DoubleType =>
(stmt: PreparedStatement, row: Row, pos: Int) =>
stmt.setDouble(pos + 1, row.getDouble(pos))
case FloatType =>
(stmt: PreparedStatement, row: Row, pos: Int) =>
stmt.setFloat(pos + 1, row.getFloat(pos))
case ShortType =>
(stmt: PreparedStatement, row: Row, pos: Int) =>
stmt.setInt(pos + 1, row.getShort(pos))
case ByteType =>
(stmt: PreparedStatement, row: Row, pos: Int) =>
stmt.setInt(pos + 1, row.getByte(pos))
case BooleanType =>
(stmt: PreparedStatement, row: Row, pos: Int) =>
stmt.setBoolean(pos + 1, row.getBoolean(pos))
case StringType =>
(stmt: PreparedStatement, row: Row, pos: Int) =>
stmt.setString(pos + 1, row.getString(pos))
case BinaryType =>
(stmt: PreparedStatement, row: Row, pos: Int) =>
stmt.setBytes(pos + 1, row.getAs[Array[Byte]](pos))
case TimestampType =>
(stmt: PreparedStatement, row: Row, pos: Int) =>
stmt.setTimestamp(pos + 1, row.getAs[java.sql.Timestamp](pos))
case DateType =>
(stmt: PreparedStatement, row: Row, pos: Int) =>
stmt.setDate(pos + 1, row.getAs[java.sql.Date](pos))
case t: DecimalType =>
(stmt: PreparedStatement, row: Row, pos: Int) =>
stmt.setBigDecimal(pos + 1, row.getDecimal(pos))
case ArrayType(et, _) =>
// remove type length parameters from end of type name
val typeName = getJdbcType(et, dialect).databaseTypeDefinition
.toLowerCase.split("\\(")(0)
(stmt: PreparedStatement, row: Row, pos: Int) =>
val array = conn.createArrayOf(
typeName,
row.getSeq[AnyRef](pos).toArray)
stmt.setArray(pos + 1, array)
case _ =>
(_: PreparedStatement, _: Row, pos: Int) =>
throw new IllegalArgumentException(
s"Can't translate non-null value for field $pos")
}
}
usage:
val url = s"jdbc:mysql://$host/$database?useUnicode=true&characterEncoding=UTF-8"
val parameters: Map[String, String] = Map(
"url" -> url,
"dbtable" -> table,
"driver" -> "com.mysql.jdbc.Driver",
"numPartitions" -> numPartitions.toString,
"user" -> user,
"password" -> password
)
val options = new JDBCOptions(parameters)
for (d <- data) {
UpdateJdbcUtils.saveTable(d, url, table, options)
}
ps: pay attention to the deadlock, not update data frequently, just use in re-run in case of emergency, I think that's why spark not support this official.
If your table is small, then you can read the sql data and do the upsertion in spark dataframe. And overwrite the existing sql table.
zero323's answer is right, I just wanted to add that you could use JayDeBeApi package to workaround this:
https://pypi.python.org/pypi/JayDeBeApi/
to update data in your mysql table. It might be a low-hanging fruit since you already have mysql jdbc driver installed.
The JayDeBeApi module allows you to connect from Python code to
databases using Java JDBC. It provides a Python DB-API v2.0 to that
database.
We use Anaconda distribution of Python, and JayDeBeApi python package comes standard.
See examples in that link above.
In PYSPARK I was not able to do that so I decided to use odbc.
url = "jdbc:sqlserver://xxx:1433;databaseName=xxx;user=xxx;password=xxx"
df.write.jdbc(url=url, table="__TableInsert", mode='overwrite')
cnxn = pyodbc.connect('Driver={ODBC Driver 17 for SQL Server};Server=xxx;Database=xxx;Uid=xxx;Pwd=xxx;', autocommit=False)
try:
crsr = cnxn.cursor()
# DO UPSERTS OR WHATEVER YOU WANT
crsr.execute("DELETE FROM Table")
crsr.execute("INSERT INTO Table (Field) SELECT Field FROM __TableInsert")
cnxn.commit()
except:
cnxn.rollback()
cnxn.close()

Resources