Race Condition Warning OpenTSDB - hadoop

I am getting a race condition warning while running multiple imports simultaneously to openTSDB. Following is one of the log sequences showing the race condition.
2013-08-21 14:34:24,745 INFO [main] UniqueId: Creating an ID for
kind='tagv' name='25447'
2013-08-21 14:34:24,747 INFO [main] UniqueId: Got ID=307 for
kind='tagv' name='25447'
2013-08-21 14:34:24,752 WARN [main] UniqueId: Race condition: tried
to assign ID 307 to tagv:25447, but CAS failed on
PutRequest(table="tsdb-uid", key="25447", family="id",
qualifiers=["tagv"], values=["\x00\x013"],
timestamp=9223372036854775807, lockid=-1, durable=true,
bufferable=true, attempt=0, region=null), which indicates this UID
must have been allocated concurrently by another TSD. So ID 307 was
leaked.
Following questions I have:
Since it is a warning, is it that the record is actually written and not skipped?
At the end it says, 'ID 307 was leaked', so is some other ID assigned to the record?
How to verify that the said record has been written in HBase's table named 'tsdb-uid'? (HBase shell commands, I tried a few but in vain).

This just means that a UID was allocated for nothing, but otherwise everything is fine. If you are worried about the state of your tsdb-uid table, you could run the tsdb uid fsck command, and it will probably report that some UIDs are allocated but unused.
If you see the message only occasionally, you can ignore it. If you see it a lot, then the only undesirable consequence is that you're burning through the UID space faster than you should be, so you may run out of UIDs sooner (there are 16777215 possible UIDs for each of: metric names, tag names, tag values).

Related

Near Explorer block query starts from 9820221

BlockExplorer - https://explorer.mainnet.near.org/blocks/2RPJGA17MQ9GAtwSVuVbasuosgkWqDgXHKWLuX4VyYv4
i am able to query starting from block - 9820221.
Can any one help me understand why this is the case and if there are other explorers where i can query the blockDetails
mainnet started from block height 9820210 (see mainnet genesis config), so there are no blocks before that one. The first 3 blocks are missing due to validators being offline or something like that, so the first produced block is 9820214, and you can query it: https://explorer.mainnet.near.org/blocks/CFAAJTVsw5y4GmMKNmuTNybxFJtapKcrarsTh5TPUyQf
Blocks before 9820210 were produced in mainnet running before July 22nd, 2020, but for some reason NEAR needed to restart the network from genesis, and thus we dumped the state as of block 9820210 and called it a new genesis, and that was the start. You have no access to the history before that moment, you can only inspect the state as of genesis, where certain accounts exist with certain balances, contract code, and states.

In Substrate what does code: 1012 "Transaction is temporarily banned" mean?

The full text of the message is :
{code: 1012, message: "Transaction is temporarily banned"}
This would indicate that the transaction is held somewhere in Substrate Runtime mempool or something of that nature, but it is not entirely clear what possible causes can trigger this, and what the eventual outcome might be.
For example,
1) is it that too many transactions have been sent from a given account, IP address or other? Has some threshold been reached?
2) is the transaction actually invalid, or not?
3) The use of the word "temporary" suggests a delay in processing, not an outright rejection of the transaction. Therefore does this suggest that the transaction is valid, but delayed? If so, for how long?
The comments in the substrate runtime core/rpc/src/author/errors.rs and core/transaction-pool/graph/src/errors.rs is no clearer about what is the outcome.
In front of the mempool, exists a transaction blacklist, which can trigger this error. Specifically, this error means that a transaction with the same hash was either:
Part of recently mined block
Detected as invalid during block production and removed from the pool.
Additionally, this error can occur when:
The transaction reaches it's longevity, i.e. is not mined for TransactionValidation::longevity blocks after being imported to the pool.
By default longevity is set to u64::max so this normally should not be the problem.
In any case -ltxpool=log should reveal more details around this error.
A transaction is only temporarily banned because it will be removed from the blacklist when either:
30 minutes pass
There are more than 4,000 transactions on the blacklist
Check out core/transaction-pool/graph/src/rotator.rs.

Hibernate - Getting exception : maximum number of processes (550) exceeded

I am using hibernate 3 along with spring.My Hibernate configurations are as under :
hibernate.dialect=org.hibernate.dialect.Oracle8iDialect
hibernate.connection.release_mode=on_close
But after starting application, even if only one user accesses it then also I am getting this exception :
ORA-00020: maximum number of processes (550) exceeded
This is stacktrace:
Caused by: java.sql.SQLException: ORA-00020: maximum number of processes (550) exceeded
at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:112)
at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:331)
at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:288)
at oracle.jdbc.driver.T4C8Oall.receive(T4C8Oall.java:743)
at oracle.jdbc.driver.T4CPreparedStatement.doOall8(T4CPreparedStatement.java:216)
at oracle.jdbc.driver.T4CPreparedStatement.executeForDescribe(T4CPreparedStatement.java:799)
at oracle.jdbc.driver.OracleStatement.executeMaybeDescribe(OracleStatement.java:1038)
at oracle.jdbc.driver.T4CPreparedStatement.executeMaybeDescribe(T4CPreparedStatement.java:839)
at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1133)
at oracle.jdbc.driver.OraclePreparedStatement.executeInternal(OraclePreparedStatement.java:3285)
at oracle.jdbc.driver.OraclePreparedStatement.executeQuery(OraclePreparedStatement.java:3329)
at com.mchange.v2.c3p0.impl.NewProxyPreparedStatement.executeQuery(NewProxyPreparedStatement.java:76)
at org.hibernate.jdbc.AbstractBatcher.getResultSet(AbstractBatcher.java:208)
at org.hibernate.loader.Loader.getResultSet(Loader.java:1953)
at org.hibernate.loader.Loader.doQuery(Loader.java:802)
at org.hibernate.loader.Loader.doQueryAndInitializeNonLazyCollections(Loader.java:274)
at org.hibernate.loader.Loader.loadEntity(Loader.java:2037)
I have kept connection pool time out = 5000. I have also tried to found the cause and got that release mode may affect the mechanism of closing DB resources. But I couldn't find exact solution for that.
Please help..
Thanks in advance..
This is a database error not an application error so you need to go to the database to solve it. 550 processes is a lot more than it sounds so either someone has gone insane or you have a lot of inactive processes running.
The best way to find out is to query the v$session view or Gv$session if you're using a RAC, look at the STATUS column.
Take careful not of where all these sessions are coming from; the OSUSER, TERMINAL and PROGRAM will probably be the most useful. It might almost be worth creating a temporary table with this information - proof and a record afterwards. Then after checking that you're not going to break anything, and with your DBAs if you have any, kill all the inactive sessions simultaneously or one at a time.
That'll remove the error but if it's occurred once it can occur again, so you need to solve it. Either:
You've got a lot of people using the database.
There is an application / program somewhere that is not closing it's
sessions after it's finished.
Someone is connecting in the middle of a loop.
Whichever reason it is you need to track it down and correct it. I'd start with the program or terminal from v$session that had the most number of processes.

hadoop streaming jobs fails to report?

All jobs were running successfully using hadoop-streaming, but all of a sudden I started to see errors due to one of worker machines
Hadoop job_201110302152_0002 failures on master
Attempt Task Machine State Error Logs
attempt_201110302152_0002_m_000037_0 task_201110302152_0002_m_000037 worker2 FAILED
Task attempt_201110302152_0002_m_000037_0 failed to report status for 622 seconds. Killing!
-------
Task attempt_201110302152_0002_m_000037_0 failed to report status for 601 seconds. Killing!
Last 4KB
Last 8KB
All
Questions :
- Why does this happening ?
- How can I handle such issues?
Thank you
The description for mapred.task.timeout which defaults to 600s says "The number of milliseconds before a task will be terminated if it neither reads an input, writes an output, nor updates its status string. "
Increasing the value of mapred.task.timeout might solve the problem, but you need to figure out if more than 600s is actually required for the map task to complete processing the input data or if there is a bug in the code which needs to be debugged.
According to the Hadoop best practices, on average a map task should take a minute or so to process an InputSplit.

Why are my delayed_job jobs re-running even though I tell them not to?

I have this in my initializer:
Delayed::Job.const_set( "MAX_ATTEMPTS", 1 )
However, my jobs are still re-running after failure, seemingly completely ignoring this setting.
What might be going on?
more info
Here's what I'm observing: jobs with a populated "last error" field and an "attempts" number of more than 1 (10+).
I've discovered I was reading the old/wrong wiki. The correct way to set this is
Delayed::Worker.max_attempts = 1
Check your dbms table "delayed_jobs" for records (jobs) that still exist after the job "fails". The job will be re-run if the record is still there. -- If it shows that the "attempts" is non-zero then you know that your constant setting isn't working right.
Another guess is that the job's "failure," for some reason, is not being caught by DelayedJob. -- In that case, the "attempts" would still be at 0.
Debug by examining the delayed_job/lib/delayed/job.rb file. Esp the self.workoff method when one of your jobs "fail"
Added #John, I don't use MAX_ATTEMPTS. To debug, look in the gem to see where it is used. Sounds like the problem is that the job is being handled in the normal way rather than limiting attempts to 1. Use the debugger or a logging stmt to ensure that your MAX_ATTEMPTS setting is getting through.
Remember that the DelayedJobs jobs runner is not a full Rails program. So it could be that your initializer setting is not being run. Look into the script you're using to run the jobs runner.

Resources