SPARK SQL - update MySql table using DataFrames and JDBC - jdbc
I'm trying to insert and update some data on MySql using Spark SQL DataFrames and JDBC connection.
I've succeeded to insert new data using the SaveMode.Append. Is there a way to update the data already existing in MySql Table from Spark SQL?
My code to insert is:
myDataFrame.write.mode(SaveMode.Append).jdbc(JDBCurl,mySqlTable,connectionProperties)
If I change to SaveMode.Overwrite it deletes the full table and creates a new one, I'm looking for something like the "ON DUPLICATE KEY UPDATE" available in MySql
It is not possible. As for now (Spark 1.6.0 / 2.2.0 SNAPSHOT) Spark DataFrameWriter supports only four writing modes:
SaveMode.Overwrite: overwrite the existing data.
SaveMode.Append: append the data.
SaveMode.Ignore: ignore the operation (i.e. no-op).
SaveMode.ErrorIfExists: default option, throw an exception at runtime.
You can insert manually for example using mapPartitions (since you want an UPSERT operation should be idempotent and as such easy to implement), write to temporary table and execute upsert manually, or use triggers.
In general achieving upsert behavior for batch operations and keeping decent performance is far from trivial. You have to remember that in general case there will be multiple concurrent transactions in place (one per each partition) so you have to ensure that there will no write conflicts (typically by using application specific partitioning) or provide appropriate recovery procedures. In practice it may be better to perform and batch writes to a temporary table and resolve upsert part directly in the database.
A pity that there is no SaveMode.Upsert mode in Spark for such quite common cases like upserting.
zero322 is right in general, but I think it should be possible (with compromises in performance) to offer such replace feature.
I also wanted to provide some java code for this case.
Of course it is not that performant as the built-in one from spark - but it should be a good basis for your requirements. Just modify it towards your needs:
myDF.repartition(20); //one connection per partition, see below
myDF.foreachPartition((Iterator<Row> t) -> {
Connection conn = DriverManager.getConnection(
Constants.DB_JDBC_CONN,
Constants.DB_JDBC_USER,
Constants.DB_JDBC_PASS);
conn.setAutoCommit(true);
Statement statement = conn.createStatement();
final int batchSize = 100000;
int i = 0;
while (t.hasNext()) {
Row row = t.next();
try {
// better than REPLACE INTO, less cycles
statement.addBatch(("INSERT INTO mytable " + "VALUES ("
+ "'" + row.getAs("_id") + "',
+ "'" + row.getStruct(1).get(0) + "'
+ "') ON DUPLICATE KEY UPDATE _id='" + row.getAs("_id") + "';"));
//conn.commit();
if (++i % batchSize == 0) {
statement.executeBatch();
}
} catch (SQLIntegrityConstraintViolationException e) {
//should not occur, nevertheless
//conn.commit();
} catch (SQLException e) {
e.printStackTrace();
} finally {
//conn.commit();
statement.executeBatch();
}
}
int[] ret = statement.executeBatch();
System.out.println("Ret val: " + Arrays.toString(ret));
System.out.println("Update count: " + statement.getUpdateCount());
//conn.commit();
statement.close();
conn.close();
overwrite org.apache.spark.sql.execution.datasources.jdbc JdbcUtils.scala insert into to replace into
import java.sql.{Connection, Driver, DriverManager, PreparedStatement, ResultSet, SQLException}
import scala.collection.JavaConverters._
import scala.util.control.NonFatal
import com.typesafe.scalalogging.Logger
import org.apache.spark.sql.catalyst.InternalRow
import org.apache.spark.sql.execution.datasources.jdbc.{DriverRegistry, DriverWrapper, JDBCOptions}
import org.apache.spark.sql.jdbc.{JdbcDialect, JdbcDialects, JdbcType}
import org.apache.spark.sql.types._
import org.apache.spark.sql.{DataFrame, Row}
/**
* Util functions for JDBC tables.
*/
object UpdateJdbcUtils {
val logger = Logger(this.getClass)
/**
* Returns a factory for creating connections to the given JDBC URL.
*
* #param options - JDBC options that contains url, table and other information.
*/
def createConnectionFactory(options: JDBCOptions): () => Connection = {
val driverClass: String = options.driverClass
() => {
DriverRegistry.register(driverClass)
val driver: Driver = DriverManager.getDrivers.asScala.collectFirst {
case d: DriverWrapper if d.wrapped.getClass.getCanonicalName == driverClass => d
case d if d.getClass.getCanonicalName == driverClass => d
}.getOrElse {
throw new IllegalStateException(
s"Did not find registered driver with class $driverClass")
}
driver.connect(options.url, options.asConnectionProperties)
}
}
/**
* Returns a PreparedStatement that inserts a row into table via conn.
*/
def insertStatement(conn: Connection, table: String, rddSchema: StructType, dialect: JdbcDialect)
: PreparedStatement = {
val columns = rddSchema.fields.map(x => dialect.quoteIdentifier(x.name)).mkString(",")
val placeholders = rddSchema.fields.map(_ => "?").mkString(",")
val sql = s"REPLACE INTO $table ($columns) VALUES ($placeholders)"
conn.prepareStatement(sql)
}
/**
* Retrieve standard jdbc types.
*
* #param dt The datatype (e.g. [[org.apache.spark.sql.types.StringType]])
* #return The default JdbcType for this DataType
*/
def getCommonJDBCType(dt: DataType): Option[JdbcType] = {
dt match {
case IntegerType => Option(JdbcType("INTEGER", java.sql.Types.INTEGER))
case LongType => Option(JdbcType("BIGINT", java.sql.Types.BIGINT))
case DoubleType => Option(JdbcType("DOUBLE PRECISION", java.sql.Types.DOUBLE))
case FloatType => Option(JdbcType("REAL", java.sql.Types.FLOAT))
case ShortType => Option(JdbcType("INTEGER", java.sql.Types.SMALLINT))
case ByteType => Option(JdbcType("BYTE", java.sql.Types.TINYINT))
case BooleanType => Option(JdbcType("BIT(1)", java.sql.Types.BIT))
case StringType => Option(JdbcType("TEXT", java.sql.Types.CLOB))
case BinaryType => Option(JdbcType("BLOB", java.sql.Types.BLOB))
case TimestampType => Option(JdbcType("TIMESTAMP", java.sql.Types.TIMESTAMP))
case DateType => Option(JdbcType("DATE", java.sql.Types.DATE))
case t: DecimalType => Option(
JdbcType(s"DECIMAL(${t.precision},${t.scale})", java.sql.Types.DECIMAL))
case _ => None
}
}
private def getJdbcType(dt: DataType, dialect: JdbcDialect): JdbcType = {
dialect.getJDBCType(dt).orElse(getCommonJDBCType(dt)).getOrElse(
throw new IllegalArgumentException(s"Can't get JDBC type for ${dt.simpleString}"))
}
// A `JDBCValueGetter` is responsible for getting a value from `ResultSet` into a field
// for `MutableRow`. The last argument `Int` means the index for the value to be set in
// the row and also used for the value in `ResultSet`.
private type JDBCValueGetter = (ResultSet, InternalRow, Int) => Unit
// A `JDBCValueSetter` is responsible for setting a value from `Row` into a field for
// `PreparedStatement`. The last argument `Int` means the index for the value to be set
// in the SQL statement and also used for the value in `Row`.
private type JDBCValueSetter = (PreparedStatement, Row, Int) => Unit
/**
* Saves a partition of a DataFrame to the JDBC database. This is done in
* a single database transaction (unless isolation level is "NONE")
* in order to avoid repeatedly inserting data as much as possible.
*
* It is still theoretically possible for rows in a DataFrame to be
* inserted into the database more than once if a stage somehow fails after
* the commit occurs but before the stage can return successfully.
*
* This is not a closure inside saveTable() because apparently cosmetic
* implementation changes elsewhere might easily render such a closure
* non-Serializable. Instead, we explicitly close over all variables that
* are used.
*/
def savePartition(
getConnection: () => Connection,
table: String,
iterator: Iterator[Row],
rddSchema: StructType,
nullTypes: Array[Int],
batchSize: Int,
dialect: JdbcDialect,
isolationLevel: Int): Iterator[Byte] = {
val conn = getConnection()
var committed = false
var finalIsolationLevel = Connection.TRANSACTION_NONE
if (isolationLevel != Connection.TRANSACTION_NONE) {
try {
val metadata = conn.getMetaData
if (metadata.supportsTransactions()) {
// Update to at least use the default isolation, if any transaction level
// has been chosen and transactions are supported
val defaultIsolation = metadata.getDefaultTransactionIsolation
finalIsolationLevel = defaultIsolation
if (metadata.supportsTransactionIsolationLevel(isolationLevel)) {
// Finally update to actually requested level if possible
finalIsolationLevel = isolationLevel
} else {
logger.warn(s"Requested isolation level $isolationLevel is not supported; " +
s"falling back to default isolation level $defaultIsolation")
}
} else {
logger.warn(s"Requested isolation level $isolationLevel, but transactions are unsupported")
}
} catch {
case NonFatal(e) => logger.warn("Exception while detecting transaction support", e)
}
}
val supportsTransactions = finalIsolationLevel != Connection.TRANSACTION_NONE
try {
if (supportsTransactions) {
conn.setAutoCommit(false) // Everything in the same db transaction.
conn.setTransactionIsolation(finalIsolationLevel)
}
val stmt = insertStatement(conn, table, rddSchema, dialect)
val setters: Array[JDBCValueSetter] = rddSchema.fields.map(_.dataType)
.map(makeSetter(conn, dialect, _))
val numFields = rddSchema.fields.length
try {
var rowCount = 0
while (iterator.hasNext) {
val row = iterator.next()
var i = 0
while (i < numFields) {
if (row.isNullAt(i)) {
stmt.setNull(i + 1, nullTypes(i))
} else {
setters(i).apply(stmt, row, i)
}
i = i + 1
}
stmt.addBatch()
rowCount += 1
if (rowCount % batchSize == 0) {
stmt.executeBatch()
rowCount = 0
}
}
if (rowCount > 0) {
stmt.executeBatch()
}
} finally {
stmt.close()
}
if (supportsTransactions) {
conn.commit()
}
committed = true
Iterator.empty
} catch {
case e: SQLException =>
val cause = e.getNextException
if (cause != null && e.getCause != cause) {
if (e.getCause == null) {
e.initCause(cause)
} else {
e.addSuppressed(cause)
}
}
throw e
} finally {
if (!committed) {
// The stage must fail. We got here through an exception path, so
// let the exception through unless rollback() or close() want to
// tell the user about another problem.
if (supportsTransactions) {
conn.rollback()
}
conn.close()
} else {
// The stage must succeed. We cannot propagate any exception close() might throw.
try {
conn.close()
} catch {
case e: Exception => logger.warn("Transaction succeeded, but closing failed", e)
}
}
}
}
/**
* Saves the RDD to the database in a single transaction.
*/
def saveTable(
df: DataFrame,
url: String,
table: String,
options: JDBCOptions) {
val dialect = JdbcDialects.get(url)
val nullTypes: Array[Int] = df.schema.fields.map { field =>
getJdbcType(field.dataType, dialect).jdbcNullType
}
val rddSchema = df.schema
val getConnection: () => Connection = createConnectionFactory(options)
val batchSize = options.batchSize
val isolationLevel = options.isolationLevel
df.foreachPartition(iterator => savePartition(
getConnection, table, iterator, rddSchema, nullTypes, batchSize, dialect, isolationLevel)
)
}
private def makeSetter(
conn: Connection,
dialect: JdbcDialect,
dataType: DataType): JDBCValueSetter = dataType match {
case IntegerType =>
(stmt: PreparedStatement, row: Row, pos: Int) =>
stmt.setInt(pos + 1, row.getInt(pos))
case LongType =>
(stmt: PreparedStatement, row: Row, pos: Int) =>
stmt.setLong(pos + 1, row.getLong(pos))
case DoubleType =>
(stmt: PreparedStatement, row: Row, pos: Int) =>
stmt.setDouble(pos + 1, row.getDouble(pos))
case FloatType =>
(stmt: PreparedStatement, row: Row, pos: Int) =>
stmt.setFloat(pos + 1, row.getFloat(pos))
case ShortType =>
(stmt: PreparedStatement, row: Row, pos: Int) =>
stmt.setInt(pos + 1, row.getShort(pos))
case ByteType =>
(stmt: PreparedStatement, row: Row, pos: Int) =>
stmt.setInt(pos + 1, row.getByte(pos))
case BooleanType =>
(stmt: PreparedStatement, row: Row, pos: Int) =>
stmt.setBoolean(pos + 1, row.getBoolean(pos))
case StringType =>
(stmt: PreparedStatement, row: Row, pos: Int) =>
stmt.setString(pos + 1, row.getString(pos))
case BinaryType =>
(stmt: PreparedStatement, row: Row, pos: Int) =>
stmt.setBytes(pos + 1, row.getAs[Array[Byte]](pos))
case TimestampType =>
(stmt: PreparedStatement, row: Row, pos: Int) =>
stmt.setTimestamp(pos + 1, row.getAs[java.sql.Timestamp](pos))
case DateType =>
(stmt: PreparedStatement, row: Row, pos: Int) =>
stmt.setDate(pos + 1, row.getAs[java.sql.Date](pos))
case t: DecimalType =>
(stmt: PreparedStatement, row: Row, pos: Int) =>
stmt.setBigDecimal(pos + 1, row.getDecimal(pos))
case ArrayType(et, _) =>
// remove type length parameters from end of type name
val typeName = getJdbcType(et, dialect).databaseTypeDefinition
.toLowerCase.split("\\(")(0)
(stmt: PreparedStatement, row: Row, pos: Int) =>
val array = conn.createArrayOf(
typeName,
row.getSeq[AnyRef](pos).toArray)
stmt.setArray(pos + 1, array)
case _ =>
(_: PreparedStatement, _: Row, pos: Int) =>
throw new IllegalArgumentException(
s"Can't translate non-null value for field $pos")
}
}
usage:
val url = s"jdbc:mysql://$host/$database?useUnicode=true&characterEncoding=UTF-8"
val parameters: Map[String, String] = Map(
"url" -> url,
"dbtable" -> table,
"driver" -> "com.mysql.jdbc.Driver",
"numPartitions" -> numPartitions.toString,
"user" -> user,
"password" -> password
)
val options = new JDBCOptions(parameters)
for (d <- data) {
UpdateJdbcUtils.saveTable(d, url, table, options)
}
ps: pay attention to the deadlock, not update data frequently, just use in re-run in case of emergency, I think that's why spark not support this official.
If your table is small, then you can read the sql data and do the upsertion in spark dataframe. And overwrite the existing sql table.
zero323's answer is right, I just wanted to add that you could use JayDeBeApi package to workaround this:
https://pypi.python.org/pypi/JayDeBeApi/
to update data in your mysql table. It might be a low-hanging fruit since you already have mysql jdbc driver installed.
The JayDeBeApi module allows you to connect from Python code to
databases using Java JDBC. It provides a Python DB-API v2.0 to that
database.
We use Anaconda distribution of Python, and JayDeBeApi python package comes standard.
See examples in that link above.
In PYSPARK I was not able to do that so I decided to use odbc.
url = "jdbc:sqlserver://xxx:1433;databaseName=xxx;user=xxx;password=xxx"
df.write.jdbc(url=url, table="__TableInsert", mode='overwrite')
cnxn = pyodbc.connect('Driver={ODBC Driver 17 for SQL Server};Server=xxx;Database=xxx;Uid=xxx;Pwd=xxx;', autocommit=False)
try:
crsr = cnxn.cursor()
# DO UPSERTS OR WHATEVER YOU WANT
crsr.execute("DELETE FROM Table")
crsr.execute("INSERT INTO Table (Field) SELECT Field FROM __TableInsert")
cnxn.commit()
except:
cnxn.rollback()
cnxn.close()
Related
How to bind oracle params in scala?
i have done of code that it executes any query in scala it works perfect unless if my query has to use same parameter twice. the database version is 12 and the oracle jar is ojdbc6, i wrote this code in order to execute query def executeQuery(locale: String, query: String, input: Map[String, String], output: List[String]): Vector[Map[String, Any]] = { var connection: Connection = null; val properties = ConnectionLoader.getConnectionProperties(locale); try { connection = getDBConnection(properties); val statement = connection prepareCall (query) if (null != input) for ((k, v) <- input) { statement.setObject(k, v) } for (k <- output) { statement.registerOutParameter(k, OracleTypes.INTEGER) } val resultSet = statement.executeQuery(); realize(resultSet); } catch { case e => throw e; } finally { if (null != connection) connection.close(); } } and my query is SELECT COUNT (1) FROM ORDERS WHERE ORDER_ID = :P_ORDER_ID AND STATUS_ID = 4 this query works fine but im getting an error when executing SELECT COUNT (1) FROM ORDERS WHERE ORDER_ID = :P_ORDER_ID AND STATUS_ID = 4 and :P_ORDER_ID=9 Regardless to this unlogical query i'm getting this error Execution exception[[SQLException: Missing IN or OUT parameter at index:: 2]] i have googled everything but i got no result please advise
How to insert a Clob into a Oracle table with Slick 3 and Oracle 12?
We have a table with a CLOB column (to save JSON data) As we understand (from the docs) slick support LOB types (http://slick.lightbend.com/doc/3.1.1/schemas.html) We are able to query the table succesfuly. Including the CLOB column. We are not able to insert a register with a Clob. We are converting a String to java.sql.Clob with: private java.sql.Clob stringToClob(String source) { try { return new javax.sql.rowset.serial.SerialClob(source.toCharArray()); } catch (Exception e) { log.error("Could not convert string to a CLOB",e); return null; } } but in the end the exception from slick is the following: java.lang.ClassCastException: javax.sql.rowset.serial.SerialClob cannot be cast to oracle.sql.CLOB Is this possible?
We finally found a workaround as follows: According to the column definition in slick def column[C](n: String, options: ColumnOption[C]*)(implicit tt: TypedType[C]): Rep[C] You can specify how the column is going to be translated between the driver and your code. If you want to use the out-of-the-box translations fine but for Oracle the translation for the CLOB type doesn't seem to work properly. What we did was to define the column as a String but letting Slick to handle the translation with our custom code. The column definiton is the following: def myClobColumn = column[String]( "CLOBCOLUMN" )( new StringJdbcType ) asd Being StringJdbcType our custom code to solve the translation between our String to be inserted (up to 65535 bytes) and an Oracle CLOB. The code for StringJdbcType is as follows: class StringJdbcType extends driver.DriverJdbcType[String] { def sqlType = java.sql.Types.VARCHAR // Here's the solution def setValue( v: String, p: PreparedStatement, idx: Int ) = { val conn = p.getConnection val clob = conn.createClob() clob.setString( 1, v ) p.setClob( idx, clob ) } def getValue( r: ResultSet, idx: Int ) = scala.io.Source.fromInputStream( r.getAsciiStream( "DSPOLIZARIESGO" ) )( Codec.ISO8859 ).getLines().mkString def updateValue( v: String, r: ResultSet, idx: Int ) = r.updateString( idx, v ) override def hasLiteralForm = false } The setValue function was our salvation because we could build an Oracle CLOB with the already instantiated PreparedStatement and the String comming from our domain. In our implementation we only had to do the plumbing and dirty work for the Oracle CLOB. In sum, the extension point offered by Slick in driver.DriverJdbcType[A] was what we actually used to make the thing work.
These are some improvements related to the solution: close resources and stream inspection class BigStringJdbcType extends profile.DriverJdbcType[String] { def sqlType: Int = java.sql.Types.VARCHAR def setValue(v: String, p: PreparedStatement, idx: Int): Unit = { val connection = p.getConnection val clob = connection.createClob() try { clob.setString(1, v) p.setClob(idx, clob) } finally { clob.free() } } def getValue(r: ResultSet, idx: Int): String = { val asciiStream = r.getAsciiStream(idx) try { val (bufferEmpty, encoding) = getInputStreamStatus(asciiStream) if (bufferEmpty) { convertInputStreamToString(asciiStream, encoding) } else "" } finally { asciiStream.close() } } def updateValue(v: String, r: ResultSet, idx: Int): Unit = r.updateString(idx, v) override def hasLiteralForm: Boolean = false } Some utilities to complement the solution def getInputStreamStatus(stream: InputStream): (Boolean, String) = { val reader = new InputStreamReader(stream) try { val bufferEmpty = reader.ready() val encoding = reader.getEncoding bufferEmpty -> encoding } finally { reader.close() } } def convertInputStreamToString( stream: InputStream, encoding: String ): String = { scala.io.Source.fromInputStream(stream)(encoding).getLines().mkString }
How to connect Hbase With JDBC driver of Apache Drill programmatically
I tried to use JDBC driver of Apache Drill programatically. Here's the code: import java.sql.DriverManager object SearchHbaseWithHbase { def main(args: Array[String]): Unit = { Class.forName("org.apache.drill.jdbc.Driver") val zkIp = "192.168.3.2:2181" val connection = DriverManager.getConnection(s"jdbc:drill:zk=${zkIp};schema:hbase") connection.setSchema("hbase") println(connection.getSchema) val st = connection.createStatement() val rs = st.executeQuery("SELECT * FROM Label") while (rs.next()){ println(rs.getString(1)) } } } I have set the database schema with type : hbase, Like: connection.setSchema("hbase") But it fails with the error code: Exception in thread "main" java.sql.SQLException: VALIDATION ERROR: From line 1, column 15 to line 1, column 19: Table 'Label' not found SQL Query null The Label table is exactly exit in my hbase. I can find My data when I use sqline like: sqline -u jdbc:drill:zk.... use hbase; input :select * from Label;
I have solved this problem. I confused the drill's schema and jdbc driver schema...... the correct codes should be like: object SearchHbaseWithHbase{ def main(args: Array[String]): Unit = { Class.forName("org.apache.drill.jdbc.Driver") val zkIp = "192.168.3.2:2181" val p = new java.util.Properties p.setProperty("schema","hbase") // val connectionInfo = new ConnectionInfo val url = s"jdbc:drill:zk=${zkIp}" val connection = DriverManager.getConnection(url, p) // connection.setSchema("hbase") // println(connection.getSchema) val st = connection.createStatement() val rs = st.executeQuery("SELECT * FROM Label") while (rs.next()){ println(rs.getString(1)) } } }
Scala: exception handling in anonymous function
If I pass an anonymous function as an argument, like e.g. in this code sample: val someMap = someData.map(line => (line.split("\\|")(0), // key line.split("\\|")(1) + "|" + // value as string concat line.split("\\|")(4) + "|" + line.split("\\|")(9))) I could catch, e.g. an ArrayIndexOutOfBoundsException like this: try { val someMap = someData.map(line => (line.split("\\|")(0), // key line.split("\\|")(1) + "|" + // value as string concat line.split("\\|")(4) + "|" + line.split("\\|")(9))) } catch { case e1: ArrayIndexOutOfBoundsException => println("exception in line " ) } The problem with this is that I do not have access to the inner function's scope. In this case I would like to print the line (from the anonymous function) which caused the exception. How can I do this? Is there some way of catching an exception within an anonymous function? Is there a way to access the scope of an anonymous function from the outside for debugging purposes? edit: I'm using Scala 2.9.3
You could use Either val result = someData.map { line => try { val values = (line.split("\\|")(0), // key line.split("\\|")(1) + "|" + // value as string concat line.split("\\|")(4) + "|" + line.split("\\|")(9)) Right(values) } catch { case e1: ArrayIndexOutOfBoundsException => Left(s"exception in line $line") } } result.foreach { case (Right(values)) => println(values) case (Left(msg)) => println(msg) } But if you are importing data from a text file, I would try to do it without exceptions (because it's not really exceptional to get invalid data in that case): val result = someData.map { line => val fields = line.split("\\|") if (fields.length < 9) { Left(s"Error in line $line") } else { val values = (fields(0), Seq(fields(1), fields(4), fields(9))) Right(values) } } result.foreach { case (Right((key, values))) => println(s"$key -> ${values.mkString("|")}") case (Left(msg)) => println(msg) }
Perhaps this will give you some ideas: try { val someMap = someData.map { line => try { (line.split("\\|")(0), // key line.split("\\|")(1) + "|" + // value as string concat line.split("\\|")(4) + "|" + line.split("\\|")(9))) } catch { case inner: ArrayIndexOutOfBoundsException => { println("exception in " + line) throw inner; } } } } catch { case outer: ArrayIndexOutOfBoundsException => ... }
The other answers give nice functional solutions using Either etc. If you were using Scala 2.10, you could also use Try as val lines = List("abc", "ef"); println(lines.map(line => Try(line(3)))); to get List[Try[Char]], where you can examine each element if it succeeded or failed. (I haven't tried to compile this.) If for any reasons you prefer exceptions, you need to catch the exception inside the mapping function and rethrow it with information about the line. For example: // Your own exception class holding a line that failed: case class LineException(line: String, nested: Exception) extends Exception(nested); // Computes something on a line and throw a proper `LineException` // if the processing fails: def lineWorker[A](worker: String => A)(line: String): A = try { worker(line) } catch { case (e: Exception) => throw LineException(line, e); } def getNth(lines: List[String], i: Int): List[Char] = lines.map(lineWorker(_.apply(i))); val lines = List("abc", "ef"); println(getNth(lines, 1)); println(getNth(lines, 2)); You can also express it using Catch from scala.util.control.Exception: case class LineException(line: String, nested: Throwable) extends Exception(nested); // we need Throwable here ^^ import scala.util.control.Exception._ // Returns a `Catch` that wraps any exception to a proper `LineException`. def lineExceptionCatch[T](line: String): Catch[T] = handling[T](classOf[Exception]).by(e => throw LineException(line, e)); def lineWorker[A](worker: String => A)(line: String): A = lineExceptionCatch[A](line)(worker(line)) // ...
First your outer try/catch is useless. If you List (or other structure) is empty, map function won't do anything => no ArrayIndexOutOfBoundsException will be thrown. As for the inner loop, i would sugest another solution with Scalaz Either: import scalaz._ import EitherT._ import Id.Id val someMap = someData.map { line => fromTryCatch[Id, (String, String)] { (line.split("\\|")(0), // key line.split("\\|")(1) + "|" + // value as string concat line.split("\\|")(4) + "|" + line.split("\\|")(9)) } } and then chain you operations on List[EitherT[...]]
The method 'OrderBy' must be called before the method 'Skip' Exception
I was trying to implement the jQgrid using MvcjQgrid and i got this exception. System.NotSupportedException was unhandled by user code Message=The method 'Skip' is only supported for sorted input in LINQ to Entities. The method 'OrderBy' must be called before the method 'Skip'. Though OrdeyBy is used before Skip method why it is generating the exception? How can it be solved? I encountered the exception in the controller: public ActionResult GridDataBasic(GridSettings gridSettings) { var jobdescription = sm.GetJobDescription(gridSettings); var totalJobDescription = sm.CountJobDescription(gridSettings); var jsonData = new { total = totalJobDescription / gridSettings.PageSize + 1, page = gridSettings.PageIndex, records = totalJobDescription, rows = ( from j in jobdescription select new { id = j.JobDescriptionID, cell = new[] { j.JobDescriptionID.ToString(), j.JobTitle, j.JobType.JobTypeName, j.JobPriority.JobPriorityName, j.JobType.Rate.ToString(), j.CreationDate.ToShortDateString(), j.JobDeadline.ToShortDateString(), } }).ToArray() }; return Json(jsonData, JsonRequestBehavior.AllowGet); } GetJobDescription Method and CountJobDescription Method public int CountJobDescription(GridSettings gridSettings) { var jobdescription = _dataContext.JobDescriptions.AsQueryable(); if (gridSettings.IsSearch) { jobdescription = gridSettings.Where.rules.Aggregate(jobdescription, FilterJobDescription); } return jobdescription.Count(); } public IQueryable<JobDescription> GetJobDescription(GridSettings gridSettings) { var jobdescription = orderJobDescription(_dataContext.JobDescriptions.AsQueryable(), gridSettings.SortColumn, gridSettings.SortOrder); if (gridSettings.IsSearch) { jobdescription = gridSettings.Where.rules.Aggregate(jobdescription, FilterJobDescription); } return jobdescription.Skip((gridSettings.PageIndex - 1) * gridSettings.PageSize).Take(gridSettings.PageSize); } And Finally FilterJobDescription and OrderJobDescription private static IQueryable<JobDescription> FilterJobDescription(IQueryable<JobDescription> jobdescriptions, Rule rule) { if (rule.field == "JobDescriptionID") { int result; if (!int.TryParse(rule.data, out result)) return jobdescriptions; return jobdescriptions.Where(j => j.JobDescriptionID == Convert.ToInt32(rule.data)); } // Similar Statements return jobdescriptions; } private IQueryable<JobDescription> orderJobDescription(IQueryable<JobDescription> jobdescriptions, string sortColumn, string sortOrder) { if (sortColumn == "JobDescriptionID") return (sortOrder == "desc") ? jobdescriptions.OrderByDescending(j => j.JobDescriptionID) : jobdescriptions.OrderBy(j => j.JobDescriptionID); return jobdescriptions; }
The exception means that you always need a sorted input if you apply Skip, also in the case that the user doesn't click on a column to sort by. I could imagine that no sort column is specified when you open the grid view for the first time before the user can even click on a column header. To catch this case I would suggest to define some default sorting that you want when no other sorting criterion is given, for example: switch (sortColumn) { case "JobDescriptionID": return (sortOrder == "desc") ? jobdescriptions.OrderByDescending(j => j.JobDescriptionID) : jobdescriptions.OrderBy(j => j.JobDescriptionID); case "JobDescriptionTitle": return (sortOrder == "desc") ? jobdescriptions.OrderByDescending(j => j.JobDescriptionTitle) : jobdescriptions.OrderBy(j => j.JobDescriptionTitle); // etc. default: return jobdescriptions.OrderBy(j => j.JobDescriptionID); } Edit About your follow-up problems according to your comment: You cannot use ToString() in a LINQ to Entities query. And the next problem would be that you cannot create a string array in a query. I would suggest to load the data from the DB with their native types and then convert afterwards to strings (and to the string array) in memory: rows = (from j in jobdescription select new { JobDescriptionID = j.JobDescriptionID, JobTitle = j.JobTitle, JobTypeName = j.JobType.JobTypeName, JobPriorityName = j.JobPriority.JobPriorityName, Rate = j.JobType.Rate, CreationDate = j.CreationDate, JobDeadline = j.JobDeadline }) .AsEnumerable() // DB query runs here, the rest is in memory .Select(a => new { id = a.JobDescriptionID, cell = new[] { a.JobDescriptionID.ToString(), a.JobTitle, a.JobTypeName, a.JobPriorityName, a.Rate.ToString(), a.CreationDate.ToShortDateString(), a.JobDeadline.ToShortDateString() } }) .ToArray()
I had the same type of problem after sorting using some code from Adam Anderson that accepted a generic sort string in OrderBy. After getting this excpetion, i did lots of research and found that very clever fix: var query = SelectOrders(companyNo, sortExpression); return Queryable.Skip(query, iStartRow).Take(iPageSize).ToList(); Hope that helps ! SP