HBase shell "OutOfOrderScannerNextException" error on scanner & count calls - hadoop

Either I run a scan command or a count, this error pops up and the error message doesn't make sense to me.
What does it say & how to solve it?
org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException:
Expected nextCallSeq: 1 But the nextCallSeq got from client: 0;
request=scanner_id: 788 number_of_rows: 100 close_scanner: false
next_call_seq: 0
Commands:
count 'table', 5000
scan 'table', {COLUMN => ['cf:cq'], FILTER => "ValueFilter( =, 'binaryprefix:somevalue')"}
EDIT:
I have added the following settings in hbase-site.xml
<property>
<name>hbase.rpc.timeout</name>
<value>1200000</value>
</property>
<property>
<name>hbase.client.scanner.caching</name>
<value>100</value>
</property>
NO IMPACT
EDIT2: Added sleep
Result[] results = scanner.next(100);
for (int i = 0; i < results.length; i++) {
result = results[i];
try {
...
count++;
...
Thread.sleep(10); // ADDED SLEEP
} catch (Throwable exception) {
System.out.println(exception.getMessage());
System.out.println("sleeping");
}
}
New Error after Edit2:
org.apache.hadoop.hbase.client.ScannerTimeoutException: 101761ms passed since the last invocation, timeout is currently set to 60000
...
Caused by: org.apache.hadoop.hbase.UnknownScannerException: org.apache.hadoop.hbase.UnknownScannerException: Name: 31, already closed?
...
Caused by: org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.UnknownScannerException): org.apache.hadoop.hbase.UnknownScannerException: Name: 31, already closed?
...
FINALLY BLOCK: 9900
Exception in thread "main" java.lang.RuntimeException: org.apache.hadoop.hbase.client.ScannerTimeoutException: 101766ms passed since the last invocation, timeout is currently set to 60000
...
Caused by: org.apache.hadoop.hbase.client.ScannerTimeoutException: 101766ms passed since the last invocation, timeout is currently set to 60000
...
Caused by: org.apache.hadoop.hbase.UnknownScannerException: org.apache.hadoop.hbase.UnknownScannerException: Name: 31, already closed?
...
Caused by: org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.UnknownScannerException): org.apache.hadoop.hbase.UnknownScannerException: Name: 31, already closed?
...

EDIT: By using the same client version shipped with the downloaded hbase(not maven 0.99), i was able to solve this issue.
Server version is 0.98.6.1
Contains client jars inside ./lib folder
Don't forget to attach the zookeeper library
OLD:
Right now I did two things, changed the table connection API (0.99)
Configuration conf = HBaseConfiguration.create();
TableName name = TableName.valueOf("TABLENAME");
Connection conn = ConnectionFactory.createConnection(conf);
Table table = conn.getTable(name);
Then when the error pops up, i try to recreate the connection
scanner.close();
conn.close();
conf.clear();
conf = HBaseConfiguration.create();
conn = ConnectionFactory.createConnection(conf);
table = conn.getTable(name);
table = ConnectionFactory.createConnection(conf).getTable(name);
scanner = table.getScanner(scan);
This works but is might slow after the first error it receives. Very slow to scan through all the rows

this sometimes occurs when you did huge deletes, you need to merge empty regions and try to balance your regions

Can be caused by a broken disk as well. In my case it was not so broken so that Ambari, HDFS or our monitoring services noticed it, but broken enough so that it couldn't serve one region.
After stopping the regionserver using that disk, the scan worked.
I found the regionserver by running hbase shell in debug mode:
hbase shell -d
Then some regionservers appeared in the output and one of them stood out.
Then I ran dmesg on the host to find the failing disk.

Related

java.sql.SQLException: Could not commit with auto-commit set on

I have few insert and update operations in my application. Everything is running fine in Tomcat server. But while deploying in Oracle Weblogic server I'm getting the below exception
java.sql.SQLException: Could not commit with auto-commit set on
In my executUpdate method, I have set DatabaseConnection.setAutoCommit to false at the beginning
dbConnection.setAutoCommit(false, id);
After the PreparedStatement's executeUpdate, if the returned integer is > 0 I'm again setting the setAutoCommit to true something like below:
dbConnection.setIsolationLevel(2,id);
count = PreparedStatement.executeUpdate();
if (cnt > 0)
dbConn.setAutoCommit(true,id);
After all the operations in finally block, we check for DatabaseConnection is null or not and then close it as something like below:
if(dbConnection!=null)
{
dbConnection.close(tranid);
dbConnection= null;
}
The close method we have mocked something like below within a try catch and a message within this catch block's is getting printed :
if(connection!=null)
{
connection.commit();
connection.close();
connection = null;
}
Someone please help me out with this as a proper commit should occur in realtime as I tried setting AutoCommit to false in the below part and it worked without the SQLException.
if (count > 0)
dbConnection.setAutoCommit(false,id);
My worry is, this is not the solution I'm looking for as this causes problem in realtime

Websphere connection pool issue i.e DSRA9110E

LOGGER.debug("Connection Status Disb isClosed = " + conn.isClosed());
// returns true.
crsDisbDetailstmp = DataAccess.getData("select 1 cnt from dual", conn, new String[] {});
crsDisbDetailstmp.first();
LOGGER.debug("crsDisbDetailstmp"+ crsDisbDetailstmp.getString("cnt"));
DataAccess.executeProc("PRC_MCLR_TRNPRCDTL", new String[]{strOrgId, strAccountid,strFixedRate ,strModifiers }, conn);
Exception occured while executing last statement i.e execute procedure.
Exception=com.ibm.websphere.ce.cm.ObjectClosedException: DSRA9110E: Connection is closed.
I search a lot on google it shows this exception is occurred because connection is closed also i checked with conn.isclosed() which return true..
But If connection is closed then how i am able to fire select queries???
Please help me to figure it out as i worked on JBOSS only and first time on Websphere

Dataproc conflict in hadoop temporary tables

I have a flow that executes spark jobs on Dataproc clusters in parallel for different zones. For each zone it creates a cluster, execute the spark job and delete the cluster after it finishes.
The spark job uses the org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopDataset method passing the BigQuery Configuration to save data on BigQuery table. The job saves data in more than one table, calling the saveAsNewAPIHadoopDataset method more than one time per job.
The problem is that sometimes I'm getting a error caused by a conflict in the Hadoop temporary BigQuery Dataset that it internally creates to run the jobs:
Exception in thread "main" com.google.api.client.googleapis.json.GoogleJsonResponseException: 409 Conflict
{
"code" : 409,
"errors" : [ {
"domain" : "global",
"message" : "Already Exists: Dataset <my-gcp-project>:<MY-DATASET>_hadoop_temporary_job_201802250620_0013",
"reason" : "duplicate"
} ],
"message" : "Already Exists: Dataset <my-gcp-project>:<MY-DATASET>_hadoop_temporary_job_201802250620_0013"
}
at com.google.api.client.googleapis.json.GoogleJsonResponseException.from(GoogleJsonResponseException.java:145)
at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:113)
at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:40)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest$1.interceptResponse(AbstractGoogleClientRequest.java:321)
at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1056)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:419)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:352)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:469)
at com.google.cloud.hadoop.io.bigquery.BigQueryOutputCommitter.setupJob(BigQueryOutputCommitter.java:107)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply$mcV$sp(PairRDDFunctions.scala:1150)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply(PairRDDFunctions.scala:1078)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply(PairRDDFunctions.scala:1078)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:358)
at org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopDataset(PairRDDFunctions.scala:1078)
at org.apache.spark.api.java.JavaPairRDD.saveAsNewAPIHadoopDataset(JavaPairRDD.scala:819)
...
The timestamp 201802250620_0013 on the exception above has the _0013 sufix that I'm unsure if it represents time.
My thoughts is that sometimes the jobs runs at the same time and try to create a dataset with same timestamp in the name. Either in a parallel job or inside the same job on another saveAsNewAPIHadoopDataset call.
How can we avoid this error without putting a delay on the job execution?
The dependency that I'm using is:
<dependency>
<groupId>com.google.cloud.bigdataoss</groupId>
<artifactId>bigquery-connector</artifactId>
<version>0.10.2-hadoop2</version>
<scope>provided</scope>
</dependency>
The Dataproc image version is 1.1
Edit 1:
I tried using IndirectBigQueryOutputFormat but now I'm getting an error saying that the gcs output path already exists even passing it different time on each saveAsNewAPIHadoopDataset call.
Here is my code:
SparkConf sc = new SparkConf().setAppName("MyApp");
try (JavaSparkContext jsc = new JavaSparkContext(sc)) {
JavaPairRDD<String, String> filesJson = jsc.wholeTextFiles(jsonFolder, parts);
JavaPairRDD<String, String> jsons = filesJson.flatMapToPair(new FileSplitter()).repartition(parts);
JavaPairRDD<Object, JsonObject> objsJson = jsons.flatMapToPair(new JsonParser()).filter(t -> t._2() != null).cache();
objsJson
.filter(new FilterType(MSG_TYPE1))
.saveAsNewAPIHadoopDataset(createConf("my-project:MY_DATASET.MY_TABLE1", "gs://my-bucket/tmp1"));
objsJson
.filter(new FilterType(MSG_TYPE2))
.saveAsNewAPIHadoopDataset(createConf("my-project:MY_DATASET.MY_TABLE2", "gs://my-bucket/tmp2"));
objsJson
.filter(new FilterType(MSG_TYPE3))
.saveAsNewAPIHadoopDataset(createConf("my-project:MY_DATASET.MY_TABLE3", "gs://my-bucket/tmp3"));
// here goes another ingestion process. same code as above but diferrent params, parsers, etc.
}
Configuration createConf(String table, String outGCS) {
Configuration conf = new Configuration();
BigQueryOutputConfiguration.configure(conf, table, null, outGCS, BigQueryFileFormat.NEWLINE_DELIMITED_JSON, TextOutputFormat.class);
conf.set("mapreduce.job.outputformat.class", IndirectBigQueryOutputFormat.class.getName());
return conf;
}
I believe what may be happening is that each mapper tries to create its own dataset. This is rather inefficient (and burns your daily quota proportional to the number of mappers).
An alternative is to use IndirectBigQueryOutputFormat for output class:
IndirectBigQueryOutputFormat works by first buffering all the data into a Cloud Storage temporary table, and then, on commitJob, copies all data from Cloud Storage into BigQuery in one operation. Its use is recommended for large jobs since it only requires one BigQuery "load" job per Hadoop/Spark job, as compared to BigQueryOutputFormat, which performs one BigQuery job for each Hadoop/Spark task.
See the example here: https://cloud.google.com/dataproc/docs/tutorials/bigquery-connector-spark-example

Timeout and out of memory errors reading large table using jdbc drivers

I am attempting to read a large table into a spark dataframe from an Oracle database using spark's native read.jdbc in scala. I have tested this with small and medium sized tables (up to 11M rows) and it works just fine. However, when attempting to bring in a larger table (~70M rows) I keep getting errors.
Sample code to show how I am reading this in:
val df = sparkSession.read.jdbc(
url = jdbcUrl,
table = "( SELECT * FROM keyspace.table WHERE EXTRACT(year FROM date_column) BETWEEN 2012 AND 2016)"
columnName = "id_column", // numeric column, 40% NULL
lowerBound = 1L,
upperBound = 100000L,
numPartitions = 60, // same as number of cores
connectionProperties = connectionProperties) // this contains login & password
I am attempting to parallelise the operation, as I am using a cluster with 60 cores and 6 x 32GB RAM dedicated to this app. However, I still keep getting errors relating to timeouts and out of memory issues, such as:
17/08/16 14:01:18 WARN Executor: Issue communicating with driver in heartbeater
org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [10 seconds]. This timeout is controlled by spark.executor.heartbeatInterval
at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:47)
....
Caused by: java.util.concurrent.TimeoutException: Futures timed out after [10 seconds}
...
17/08/16 14:17:14 ERROR RetryingBlockFetcher: Failed to fetch block rdd_2_89, and will not retry (0 retries)
org.apache.spark.network.client.ChunkFetchFailureException: Failure while fetching StreamChunkId{streamId=398908024000, chunkIndex=0}: java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE
at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:869)
at org.apache.spark.storage.DiskStore$$anonfun$getBytes$4.apply(DiskStore.scala:125)
...
17/08/16 14:17:14 WARN BlockManager: Failed to fetch block after 1 fetch failures. Most recent failure cause:
org.apache.spark.SparkException: Exception thrown in awaitResult:
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:205)
There should be more than enough RAM across the cluster for a table of this size (I've read in local tables 10x bigger), so I have a feeling that for some reason the data read may not be happening in parallel? Looking at the timeline in the spark UI, I can see that one executor hangs and is 'computing' for very long periods of time. Now, the partitioning column has a lot of NULL values in it (about 40%), but it is the only numeric column (other's are dates and strings) - could this make a difference? Is there another way to parallelise a jdbc read?
the partitioning column has a lot of NULL values in it (about 40%), but it is the only numeric column (other's are dates and strings) - could this make a difference?
It makes a huge difference. All values with NULL will go to the last partition:
val whereClause =
if (uBound == null) {
lBound
} else if (lBound == null) {
s"$uBound or $column is null"
} else {
s"$lBound AND $uBound"
}
Is there another way to parallelise a jdbc read?
You can use predicates with other columns than numeric ones. You could for example use ROWID pseudocoulmn in table and use a series of predicates based on prefix.

Why does h2 ignore slf4j messages on the first connection when LOG is set?

See sample code & output below (with Slf4j/logback on stdout). I can't find any bug reports on this. I'm using h2 version 1.3.176 (last stable), in-memory mode. It doesn't seem to matter what value is set for the LOG (0, 1 or 2) but just has to be set.
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.SQLException;
public class H2TraceTest {
public static void main(String[] args) throws SQLException {
System.out.println("Query connection 1");
Connection myConn = DriverManager.getConnection("jdbc:h2:mem:tracetest;TRACE_LEVEL_FILE=4;LOG=2");
myConn.createStatement().execute("SELECT 1");
System.out.println("Query connection 2");
DriverManager.getConnection("jdbc:h2:mem:tracetest").createStatement().execute("SELECT 1");
System.out.println("Query connection 1 again");
myConn.createStatement().execute("SELECT 1");
System.out.println("End");
}
}
Output:
Query connection 1
Query connection 2
16:17:02.955 INFO h2database - jdbc[3]
/**/Connection conn2 = DriverManager.getConnection("jdbc:h2:mem:tracetest", "", "");
16:17:02.958 DEBUG h2database - jdbc[3]
/**/Statement stat2 = conn2.createStatement();
16:17:02.959 DEBUG h2database - jdbc[3]
/**/stat2.execute("SELECT 1");
16:17:02.959 INFO h2database - jdbc[3]
/*SQL #:1*/SELECT 1;
Query connection 1 again
End
I know that the H2 documentation says about TRACE_LEVEL_FILE: it affects all connections. But thats not (fully) correct:
Every connection keeps a lazy reference to the logging system. And if you change that with the special marker TRACE_LEVEL_FILE=4, then that reference isn't changed for all existing connections - but only for those who do their first logging after that change.
So if you use the connection string "jdbc:h2:mem:tracetest;TRACE_LEVEL_FILE=4" everything is as expected, because your session will write no logging message before changing the logging system. Unfortunately the LOG=2 in jdbc:h2:mem:tracetest;TRACE_LEVEL_FILE=4;LOG=2 is evaluated first, because both parameter are written into and read from an unordered Map. And because LOG=2 is generating a log statement, the reference to the log adapter (=4) is never applied to the current session. Only to the next one.
What can you do:
Use only "jdbc:h2:mem:tracetest;TRACE_LEVEL_FILE=4" - LOG=2 is the default anyway. If you need any other log mode you can use connection.createStatement().executeUpdate("SET LOG 1")
Add some default parameters to the connection string until the TRACE_LEVEL_FILE parameter is the first parameter in the map (not really reliable, as the order may depend on the VM)
Discard the first connection at once
Fill in a bug report and wait for the fix (or fix it yourself), as I think this is somehow a bug
I know this is an old question but here is a reliable way to do it (i.e. you can ensure that TRACE_LEVEL_FILE is set to 4 first:
String url = "jdbc:h2:mem:tracetest;INIT=SET TRACE_LEVEL_FILE=4\\;SET DB_CLOSE_DELAY=-1/* for example, i.e. do other stuff */";

Resources