Postgres Backup Restoration Issue - postgres-9.6

 my  oobjective is simple,  just a backup and retsore it  on other machine , which have no raltion with running cluter .
My steps .
1.  Remotly pg_basebackup on new machine .
2.  rm -fr ../../main/
3.  mv bacnkup/main/ ../../main/
4.  start postgres service
** During backup no error occur. **
But getting error:
2018-12-13 10:05:12.437 IST [834] LOG: database  system was shut down in recovery at 2018-12-12 23:01:58 IST
2018-12-13 10:05:12.437 IST [834] LOG:  invalid primary  checkpoint record
2018-12-13 10:05:12.437 IST [834] LOG: invalid secondary checkpoint record
2018-12-13 10:05:12.437 IST [834] PANIC: could not locate a valid checkpoint record
 2018-12-13 10:05:12.556 IST [833] LOG: startup process (PID 834) was terminated by signal 6: Aborted
 2018-12-13 10:05:12.556 IST [833] LOG: aborting  startup due to startup process failure
 2018-12-13 10:05:12.557 IST [833] LOG: database  system is shut down

Based on the answer to a very similar question (How to mount a pg_basebackup on a stand alone server to retrieve accidently deleted data and on the fact that that answer helped me get this working glitch-free, the steps are:
do the basebackup, or copy/untar previously made one, to the right location /var/lib/postgresql/9.5/main
remove the file backup_label
run /usr/lib/postgresql/9.5/bin/pg_resetxlog -f /var/lib/postgresql/9.5/main
start postgres service
(replying to this old question because it is the first one I found when looking to find the solution to the same problem).

Related

Issue installing openwhisk with incubator-openwhisk-devtools

I have a blocking issue installing openwhisk with docker
I typed make quick-start right after a git pull of the project incubator-openwhisk-devtools. My OS is Fedora 29, docker version is 18.09.0, docker-compose version is 1.22.0. JDk 8 Oracle.
I get the following error:
[...]
adding the function to whisk ...
ok: created action hello
invoking the function ...
error: Unable to invoke action 'hello': The server is currently unavailable (because it is overloaded or down for maintenance). (code ciOZDS8VySDyVuETF14n8QqB9wifUboT)
[...]
[ERROR] [#tid_sid_unknown] [Invoker] failed to ping the controller: org.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s) for health-0: 30069 ms has passed since batch creation plus linger time
[ERROR] [#tid_sid_unknown] [KafkaProducerConnector] sending message on topic 'health' failed: Expiring 1 record(s) for health-0: 30009 ms has passed since batch creation plus linger time
Please note that controller-local-logs.log is never created.
If I issue a touch controller-local-logs.log in the right directory the log file is always empty after I try to issue make quick-start again.
http://localhost:8888/ping gives me the right answer: pong.
http://localhost:9222 is not reacheable.
Where am I wrong?
Thank you in advance

Hive issue using yarn

I am running hive sql on yarn,
it's throwing error with join condition , I am able to create External as well as internal table but failed to create table when use command
create table as AS SELECT name from student.
when running same query through hive cli it's working fine but with spring jog it throws error
2016-03-28 04:26:50,692 [Thread-17] WARN
org.apache.hadoop.hive.shims.HadoopShimsSecure - Can't fetch tasklog:
TaskLogServlet is not supported in MR2 mode.
Task with the most failures(4):
-----
Task ID:
task_1458863269455_90083_m_000638
-----
Diagnostic Messages for this Task:
AttemptID:attempt_1458863269455_90083_m_000638_3 Timed out after 1 secs
2016-03-28 04:26:50,842 [main] INFO
org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Killed application
application_1458863269455_90083
2016-03-28 04:26:50,849 [main] ERROR com.mapr.fs.MapRFileSystem - Failed to
delete path maprfs:/home/pro/amit/warehouse/scratdir/hive_2016-03-28_04-
24-32_038_8553676376881087939-1/_task_tmp.-mr-10003, error: No such file or
directory (2)
2016-03-28 04:26:50,852 [main] ERROR org.apache.hadoop.hive.ql.Driver -
FAILED: Execution Error, return code 2 from
As per my findings I think there is some issue with scratdir.
Kindly suggest if any one face same issue.
This issue occurs if the recursive directory doesnot exist. Hive doesnt automatically create directories recursively.
Please check existence of directories to child\table level from root
I faced a similar issue while running the below Hive query
select * from <db_name>.<internal_tbl_name> where <field_name_of_double_type> in (<list_of_double_values>) order by <list_of_order_fields> limit 10;
I performed an explain on the above statement and below was the result.
fs.FileUtil: Failed to delete file or dir [/hdfs/Hadoop_Misc_Logs/Edge01/local_scratch/<hive_username>/41289638-cd53-4d4b-88c9-3359e9ec99e2/hive_2017-05-08_04-26-36_658_6626096693992380903-1/.nfs0000000057b93e2d00001590]: it still exists.
2017-05-08 04:26:37,969 WARN [41289638-cd53-4d4b-88c9-3359e9ec99e2 main] fs.FileUtil: Failed to delete file or dir [/hdfs/Hadoop_Misc_Logs/Edge01/local_scratch/<hive_username>/41289638-cd53-4d4b-88c9-3359e9ec99e2/hive_2017-05-08_04-26-36_658_6626096693992380903-1/.nfs0000000057b93e2700001591]: it still exists.
Time taken: 0.886 seconds, Fetched: 24 row(s)
And checked the logs through
yarn logs -applicationID application_1458863269455_90083
The error happened after a MapR upgrade from the admin team. It is probably due to some upgrade or installation issue and Tez configurations (as suggested by the line 873 in log below). Or probably, the Hive query is syntactically not supporting the Tez optimization. Saying so, because another Hive query on an external table is running fine in my case. Have to check a bit deeper though.
Though not sure but the error line in the logs that looks to be most relevant is as follows:
2017-05-08 00:01:47,873 [ERROR] [main] |web.WebUIService|: Tez UI History URL is not set
Solution:
It is probably happening due to some open files or applications that are using some resources. Pls check https://unix.stackexchange.com/questions/11238/how-to-get-over-device-or-resource-busy
You can run the explain <your_Hive_statement>
In the result execution plan, you can come across the filenames/dirs that Hive execution engine fails to delete e.g.
2017-05-08 04:26:37,969 WARN [41289638-cd53-4d4b-88c9-3359e9ec99e2 main] fs.FileUtil: Failed to delete file or dir [/hdfs/Hadoop_Misc_Logs/Edge01/local_scratch/<hive_username>/41289638-cd53-4d4b-88c9-3359e9ec99e2/hive_2017-05-08_04-26-36_658_6626096693992380903-1/.nfs0000000057b93e2d00001590]: it still exists.
Go to the path given in the step 2 e.g. /hdfs/Hadoop_Misc_Logs/Edge01/local_scratch/<hive_username>/41289638-cd53-4d4b-88c9-3359e9ec99e2/hive_2017-05-08_04-26-36_658_6626096693992380903-1/
In path 3, doing ls -a or lsof +D /path will show the open process_ids blocking the files from delete.
If you run ps -ef | grep <pid>, you get
hive_username <pid> 19463 1 05:19 pts/8 00:00:35 /opt/mapr/tools/jdk1.7.0_51/jre/bin/java -Xmx256m -Dhiveserver2.auth=PAM -Dhiveserver2.authentication.pam.services=login -Dmapr_sec_enabled=true -Dhadoop.login=maprsasl -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/opt/mapr/hadoop/hadoop-2.7.0/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/opt/mapr/hadoop/hadoop-2.7.0 -Dhadoop.id.str=hive_username -Dhadoop.root.logger=INFO,console -Djava.library.path=/opt/mapr/hadoop/hadoop-2.7.0/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Xmx512m -Dlog4j.configurationFile=hive-log4j2.properties -Dlog4j.configurationFile=hive-log4j2.properties -Djava.util.logging.config.file=/opt/mapr/hive/hive-2.1/bin/../conf/parquet-logging.properties -Dhadoop.security.logger=INFO,NullAppender -Djava.security.auth.login.config=/opt/mapr/conf/mapr.login.conf -Dzookeeper.saslprovider=com.mapr.security.maprsasl.MaprSaslProvider -Djavax.net.ssl.trustStore=/opt/mapr/conf/ssl_truststore org.apache.hadoop.util.RunJar /opt/mapr/hive/hive-2.1//lib/hive-cli-2.1.1-mapr-1703.jar org.apache.hadoop.hive.cli.CliDriver
CONCLUSION:
The HiveCLiDriver clearly shows that running "Hive on Spark" (or managed) tables through Hive CLI is not supported any more from Hive 2.0 onwards and it is going to be deprecated going forward. You have to use HiveContext in Spark for running Hive queries. But you can still run queries on Hive external tables through Hive CLI.

TeamCity has failed to start

On starting TeamCity from URL, I get following error :
It was working fine till last week. And when opened today, it is showing error.
I logged in as administrator to view logs. Below is the stacktrace :
<pre>jetbrains.buildServer.processes.ProcessTreeTerminatorException: Process has not exited within 30 seconds
at jetbrains.buildServer.processes.ProcessTreeTerminatorImplBase.readCommandOutput(ProcessTreeTerminatorImplBase.java:159)
at jetbrains.buildServer.processes.ProcessTreeTerminatorImplBase.buildProcessTree(ProcessTreeTerminatorImplBase.java:315)
at jetbrains.buildServer.processes.ProcessTreeTerminatorImplBase.getProcessVisitor(ProcessTreeTerminatorImplBase.java:82)
at jetbrains.buildServer.processes.ProcessTreeTerminatorImplBase.getCurrentPid(ProcessTreeTerminatorImplBase.java:28)
at jetbrains.buildServer.processes.ProcessTreeTerminator$1.getCurrentPid(ProcessTreeTerminator.java:144)
at jetbrains.buildServer.processes.ProcessTreeTerminator.getCurrentPid(ProcessTreeTerminator.java:101)
at jetbrains.buildServer.maintenance.StartupProcessor.createPidFile(StartupProcessor.java:476)
at jetbrains.buildServer.maintenance.StartupProcessor.doInitialStage(StartupProcessor.java:326)
at jetbrains.buildServer.maintenance.StartupProcessor.processConcreteStage(StartupProcessor.java:83)
at jetbrains.buildServer.maintenance.StartupProcessor.processConcreteStageSafe(StartupProcessor.java:503)
at jetbrains.buildServer.maintenance.StartupProcessor.processTeamCityLifecycle(StartupProcessor.java:558)
at jetbrains.buildServer.maintenance.StartupProcessor.access$000(StartupProcessor.java:92)
at jetbrains.buildServer.maintenance.StartupProcessor$1.run(StartupProcessor.java:2)
at java.lang.Thread.run(Thread.java:745)
Some more information :
Current Startup State
Startup status
Current step: TeamCity server startup error
Next step: not defined yet
Data Directory
Data Directory path is not specified/detected yet
Database
Not connected to the database yet.
Versions
Software version: 722
Data directory version: unknown
Database version: unknown
Logs
Logs path: C:\TeamCity\logs
What should be done to resolve error?
As this is an EAP build, you should report this issue to JetBrains with complete information logs here

Socket error - Connection reset by peer

I have a simple housekeeping session that deletes data prior to one month from teradata tables.This is done using SQL transformation.The mapping has logic such that it handles 3 types of files (daily , monthly first week ,monthly -last week).logic for all 3 types is same.Deleting data prior to 1 month.Router is used to handle this.
The session executes properly for low volume of data(daily file) but when the table has higher volume of data in production , it gives below error :
TRANSF_1_1_1> pmsql_50065 [ERROR] ODL error:
FnName: Close -- [Teradata][Unix system error] Socket error - Connection reset by peer FnName: Execute Direct -- [Teradata][Unix system error] 104 Socket error - Connection reset by peer ODBC call to SQLError failed. .
TRANSF_1_1_1> CMN_1761 Timestamp Event: [Fri Aug 08 09:45:01 2014]
We have tried re-executing the session but no luck. Weird part is the delete command is being executed in database as we could see data deleted as required but the session is failing.
Please let me know what is the issue here and also the solution.Thank you.

TFS Corrupted Database Project Odyssee

I have been upgrading to another Visual Studio Version 2013 (Update 3) on another machine dev machine.
I then tried to create a test project in an existing collection. it crashed. Tried it three times then deleted the corrupted projects.
After that I tought. Well I should upgrade to TFS 2013 (Update 3) too. So it tried to Upgrade my existing collections. It failed for the collection with the corrupted project.
So I tought its easy just restore the database. But thats not so easy. And it tells me that I need to restore the configuration db too. In order to do so it says I need to rename the configuration db. But then I cannot start the management tool to restore ?! It freezes.
What would you suggest? I have a backup but I cannot restore it so far. And I do not understand why it tells me that I need to restore the configuration backup too. I always tought that collections are independent.
Here are some addition screenshots:
Upgrade progess problem:
Complete Screenshot:
[2014-08-07 23:30:13Z][Error] TF400744: An error occurred while executing the following script: SetRecoveryModelToSimple.sql. Failed batch starts on the line 1. Statement line: 1. Script line: 1. Error: 5069 ALTER DATABASE statement failed.
As suggested I have run the best practice analyzer.
The upgrade log is actually large. I am posting just the last lines:
"[Info #23:29:51.189]
[Info #23:29:51.189] +-+-+-+-+-| ResultsSqmData |+-+-+-+-+-
[Info #23:29:51.189] Feature: ApplicationTier (1)
[Info #23:29:51.190] Feature: ApplicationTier; previousFailure: False
[Info #23:29:51.192] Error count: 0
[Info #23:29:51.192] Warning count: 0
[Info #23:29:51.192] Overall Result: TotalSuccess (1)
[Info #23:29:51.192] WebSiteData: 9
[Info #23:29:51.192] SqlData: 8
[Info #23:29:51.193] RSData: 0
[Info #23:29:51.193] WSSData: 0
[Info #23:29:51.193] Wizard: UpgradeWizard (4)
[Info #23:29:51.193] TfsConfigData: 8194
[Info #23:29:51.197] serviceLevel: Dev12.M68
[Info #23:29:51.197] Fatal Error Location: 0
[Info #23:29:51.197] Activity = ApplicationTierUpgrade (4)
[Info #23:29:53.053] ResultSqmData.UpdateIssues
[Info #23:29:53.068] no issues
[Error #06:53:08.370] TF400744: An error occurred while executing the following script: SetRecoveryModelToSimple.sql. Failed batch starts on the line 1. Statement line: 1. Script line: 1. Error: 5069 ALTER DATABASE statement failed.
[Info #06:53:08.385] To configure the new features for a team project, follow the steps in http://go.microsoft.com/fwlink/?LinkID=229859
"
When I try to detatch it this occurs:
TF401219: The team project collection 'XXX' cannot be detached because its version ID is different than the ID for the configuration database. The collection has the following version: Dev12.M62. The Team Foundation Server is at the following version: Dev12.M68.
When I try to restore a backup this occurs:
TF400990: Database Tfs_Configuration exists on SQL instance NUBO-XXX\SqlExpress. Please drop or rename the existing database before the restore operation
First of all keep calm.
I would try to complete the upgrade before trying other options. From what you show seems you have issue at SQL level, could be a permission: check both the TFS service account and the your user.
If you want to rollback, and you used the integrated backup, you have to restore all databases in practice.

Resources