Open Source Greenplum: GPFDIST error 'Segmentation fault' when selecting from external table - greenplum

I'm trying to simply setup an Open Source Greenplum instance and have been hitting the same issue regarding GPFDIST for days! Simply put, I do a full installation from scratch on CentOS 7.6 (can provide further details regarding setup if needed) installing OS GPDB software version 5.18 with GPORCA disabled. Full command for the compile is:
./configure --prefix=/usr/local/gpdb --with-perl --with-python --with-libxml --with-gssapi --with-includes=/usr/local/gpdb/include --with-libs=/usr/local/gpdb/lib --disable-orca
This compiles successfully, and the following make/make install commands too complete without issue. The initialisation of the Greenplum database itself also succeeds, and I can then go into a database and create tables, insert data and run queries like normal.
But if I try to select from an external table, such as the following:
create external table test_external_table
(testing smallint
)
location ('gpfdist://mdw:8080/test_data.csv')
format 'csv' (header delimiter '|')
;
with GPFDIST run as follows:
gpfdist -d /home/gpadmin/test/ -p 8080 -l /home/gpadmin/greenplum/logs/gpfdist_log 2>&1 &
then I get two errors; one from the external table, and one from GPFDIST. These are as follows:
External Table Returns:
ERROR: connection with gpfdist failed for gpfdist://mdw:8080/test_data.csv. effective url: http://127.0.0.1:8080/test_data.csv. error code = 104 (Connection reset by peer) (seg0 slice1 127.0.0.1:6000 pid=27962)
GPFDIST Returns:
[1]+ Segmentation fault gpfdist -d /home/gpadmin/test -p 8080 -l /home/gpadmin/greenplum/logs/gpfdist_log 2>&1
I have removed everything that isn't on the OS GPDB GitHub installation guide (for a 'bare-bones' setup), so I don't think that is causing the issue. I have tried everything to do with the hostname and network firewall, and it's all perfect as far as I can see.
I have also downloaded the same version of GPDB (5.18) from Pivotal and installed that version on the same instance simultaneously, and GPFDIST works perfectly fine.
I have also tried OS GPDB 5.17, 6 beta and 7 beta, and I get the same issue for all of them.
Any ideas at all on what might be causing this is VERY much appreciated, as I'm slowly going insane trying to figure this out now.
Thank you very much in advance for any help.
-- Edit --
Okay.. Having nearly chewed my own arm off in sheer frustration at trying to install debuginfo stuff on CentOS 7, I've finally generated a core dump with gdb. I then run:
gdb -c core_dump.<pid>
and get the following output:
Core was generated by `gpfdist -d /home/gpadmin/test -p 8080 -l /home/gpadmin/greenplum/logs/gpfdist_log'.
Program terminated with signal 11, Segmentation fault.
#0 0x00007f4f2c07bdff in ?? ()
But I have absolutely no idea what that means... Totally honest, I'm a little over my head with this now and really am stuck on how to proceed.

The connection reset by peer only indicates that the other end of the socket had dropped (...in this case, gpfdist because it crashed out).
Setup your gpfdist and try a wget to a hosted file adding:
--header='X-GP-PROTO:0'
You will need to add the header to avoid having the request rejected.
Are you able to retrieve a file there? Or does that also crash out?
If that crashes out, it's nothing to do with the database - and you will likely need a core dump to determine what the segfault is about (r/w permissions, memory, ...).

I've finally managed to solve this issue. Should anyone come looking with a similar problem, make sure you are installing Libevent version 1.4[.15], and nothing above that.
I had 2.2.0 installed, and whilst Greenplum sees this as fine, it actually doesn't work for it. Unfortunately, I did have to do an entire system installation from scratch to seemingly get it to work, as just installing Libevent 1.4 on the old system with Greenplum already compiled did not work for me.

Related

Does Neo4j spatial server plugin 3.0.2 work with Neo4j 3.0.3 community?

I've been struggling to install Neo4j spatial for quite some time now using several methods I've found on the Web.
I am using neo4j-community-3.0.3, but I don't remember the link I got it from. It was a pre-compiled version from an ftp site (I believe it was an /archives folder somewhere on neo4j's website but I can't find it in google for the life of me). If someone has a link for downloading precompiled versions of neo4j that would be greatly appreciated. The neo4j other-releases webpage only provides recent versions: https://neo4j.com/download/other-releases/
I tried compiling my own version of neo4j from github but to be honest it is very confusing, as the directory tree is extremely dense. It seems like both community and enterprise versions are included in the same repo, without READMEs, so I don't even know where to begin.
As far as the plugin goes, I have tried both the precompiled version, and compiling my own. For the precompiled, I have followed instructions on the git page to the tee.
https://github.com/neo4j-contrib/spatial#using-the-neo4j-spatial-server-plugin
I downloaded the jar file, and copied it over to $NEO4J_HOME/plugins/
Then I restarted the neo4j server. Finally I make the rest call to see if the plugin has been loaded, but I do not see it.
$ http :7474/db/data/ -a neo4j
http: password for neo4j#localhost:7474:
HTTP/1.1 200 OK
Access-Control-Allow-Origin: *
Content-Length: 795
Content-Type: application/json; charset=UTF-8
Date: Fri, 01 Jul 2016 19:49:44 GMT
Server: Jetty(9.2.9.v20150224)
{
"batch": "http://localhost:7474/db/data/batch",
"constraints": "http://localhost:7474/db/data/schema/constraint",
"cypher": "http://localhost:7474/db/data/cypher",
"extensions": {},
"extensions_info": "http://localhost:7474/db/data/ext",
"indexes": "http://localhost:7474/db/data/schema/index",
"neo4j_version": "3.0.3",
"node": "http://localhost:7474/db/data/node",
"node_index": "http://localhost:7474/db/data/index/node",
"node_labels": "http://localhost:7474/db/data/labels",
"relationship": "http://localhost:7474/db/data/relationship",
"relationship_index": "http://localhost:7474/db/data/index/relationship",
"relationship_types": "http://localhost:7474/db/data/relationship/types",
"transaction": "http://localhost:7474/db/data/transaction"
}
The compiled version gave me the same result, only it takes longer to achieve. I cloned the git repo for version 3.0.2, and run the following:
git clone git://github.com/neo4j/spatial.git spatial
cd spatial
mvn clean package -Dmaven.test.skip=true install
Note: This mvn command actually failed for me at one point, but after some googling I found that this command worked
mvn clean compile package assembly:single -Dmaven.test.skip=true install
Finally I run
cp target/neo4j-spatial-0.17-neo4j-3.0.2-server-plugin.jar $NEO4J_HOME/plugins
$NEO4J_HOME/bin/neo4j restart
And voila, the exact same results as before (no plugin listing).
I have never had so much trouble installing something. I really do not want to go back to versions 2.* because I want to take advantage of the new bolt driver with python, and get the latest and greatest performance. Please, any help is greatly appreciated. (Even just finding an archive of direct links to precompiled versions of neo4j would help me).
Okay so I figured out several issues which were probably the cause to my confusion.
Issue 1:
If you start the server as root (sudo), you must stop the server as root!
Issue 2:
Make sure you do not have another version simultaneously running (with default port 7474).
I believe a combination of these 2 issues were the real culprit to my problem. It would be great if Neo4J had some sort of check upon startup whether or not that port is already being used.
Also, it is very confusing when attempting to stop the service as someone other than the original user, Neo4J shows the following output:
$ sudo bin/neo4j start
Starting Neo4j.
WARNING: Max 1024 open files allowed, minimum of 40000 recommended. See the Neo4j manual.
Started neo4j (pid 25418). By default, it is available at http://localhost:7474/
There may be a short delay until the server is ready.
See /opt/neo4j/neo4j-community-3.0.2/logs/neo4j.log for current status.
$ bin/neo4j stop
Neo4j not running
rm: remove write-protected regular file ‘/opt/neo4j/neo4j-community-3.0.2/run/neo4j.pid’? ^C
That last line caught my attention and then after running
$ ps aux | grep neo
I found that Neo4j WAS actually running.
As far as the download link goes, credit to William in the comments above.
He pointed me to http://dist.neo4j.org/neo4j-community-3.0.2-unix.tar.gz,
I suppose one can just change the version number in the url if they want other ones.
So figuring this out, I found that the 3.0.2 spatial plugin indeed does show up in the response from http://localhost:7474/db/data/ for neo4j version 3.0.3. However, I am going to stick to using neo4j version 3.0.2 just to be safe for now.

Ambari remove dead host

I have a host configured into Ambari which no longer exists. Ambari still thinks it's there. When I try to delete it through the UI I get:
400 status code received on DELETE method for API:
/api/v1/clusters/handy091015/hosts/r-hadoopeco-celeryworker-07ac46a4.hbinternal.com/host_components/ZOOKEEPER_CLIENT
Error message: Bad Request
When I try to delete it via the api, with the command below, I get the same host information as with a GET:
curl -H "X-Requested-By: ambari" -DELETE http://admin:admin#ambari.handy-internal.com//api/v1/clusters/handy091015/hosts/r-hadoopeco-celeryworker-07ac46a4.hbinternal.com
I have tried the instructions here to no avail:
https://cwiki.apache.org/confluence/display/AMBARI/Using+APIs+to+delete+a+service+or+all+host+components+on+a+host
My question is: how do I get Ambari to no longer know about/try to do things with this host.
I am not able to reproduce your behaviour with Ambari 2.1.2 and HDP 2.3 stack.
Limitation:
Note that host removing is supported only for hosts with no master components, so if they are present, then deleting is not possible.
Options:
Try to do ambari-server restart, sometimes it have intermittent issues
If this is an option, I recomend you to do ambari-server reset and install it from scratch. If you don't have much setup, it will save your time probably.
If not, you may want to post ambari-server.log file additionally. This may help to debug the core issue
Another option - just ignore that host, it will not do much harm to you. You can move it to maintenance mode, that will ease cluster operation.

Yosemite - MAMP - Can't connect to local MySQL server through socket '/Applications/MAMP/tmp/mysql/mysql.sock' (2)

I have problem when I try to connect to MySQL MAMP :
ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/Applications/MAMP/tmp/mysql/mysql.sock' (2)
I'm doing research but I have not found a solution to my problem, I try to link mysql.sock like this :
sudo ln -s /Applications/MAMP/tmp/mysql/mysql.sock /tmp/mysql.sock
But, the file /tmp/mysql.sock doesn't exist.
Do you have any ideas ? The problem is blocking me for 2 days, I searched for this time but nothing good for me.
Thanks in advance,
I had the same problem after upgrading MySQL on MAMP from 5.5 to 5.6 version.
After long research I founded this decision https://drupal.stackexchange.com/questions/32402/drush-and-mysql-database-with-mamp-connection-problem
In my case there was no socket file at that location /Applications/MAMP/tmp/mysql/mysql.sock.
The easy solution is to create a symlink:
cd /tmp
ln -s /Applications/MAMP/tmp/mysql/mysql.sock ./mysql.sock
The effect of which is to route all calls for /tmp/mysql.sock to the appropriate MAMP specific path.
simply remove two files ib_logfile0 and ib_logfile1 from /Applications/MAMP/db/mysql56
In some cases /Applications/MAMP/tmp/mysql/mysql.sock.lock may create the problem. remove .lock and it will work (OSX)
I had the same problem. I solved it following these steps:
I stopped and restarted MySQL via System Preferences -> MySQL (see also https://stackoverflow.com/a/26523977/204807)
I entered sudo mysql_upgrade in a terminal window, and pressed enter
After the update process I was able to connect with my MySQL.
Rename .sock file of mysql and restart your MySQL server.
/Applications/MAMP/tmp/mysql/mysql.sock to /Applications/MAMP/tmp/mysql/mysql_old.sock
Also check to make sure MySql is running. You can get this error if you try to use MySql from the command line when MAMP/MySql is not running.
In case this helps anyone, I got past this roadblock by:
Stopping all other versions of MySQL (I forgot that I had Oracle MySQL starting on system launch and I was attempting to connect via that installation.) and removing them anyway to let MAMP handle it.
Use sudo launchctl unload -F /Library/LaunchDaemons/com.oracle.oss.mysql.mysqld.plist
Ensure MAMP is running.
Running /Applications/MAMP/Library/bin/mysql.
1) mysql.server stop
2) /Applications/MAMP/bin/startMysql.sh &&
now on MAMP go to
Tools => Upgrade MySQL databases

Postgres DB not starting on Mac OSX: ERROR says: connections on Unix domain socket [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 5 years ago.
Improve this question
I ve installed Postgresql and then ran a bunch of rails apps on my local Mac OSX Mountain Lion and created databases etc. Today after a while when I launched pgAdminIII and tried to launch a database server I got this error:
A quick google showed this post. More browsing pointed to the fact that there might be some sort of postmaster.pid file lying around that might be the root cause of this. If I delete that things would be fine.
However, before I go deleting stuff on my computer I wanted to make sure Im debugging this in a systematic way which would not result in more problems.
Somewhere I read that before deleting that file, I need to run this command:
ps auxw | grep post
If I get no results then, its OK to delete the file. Else not. Well, I got this result of that command:
AM 476 0.0 0.0 2423356 184 s000 R+ 9:28pm 0:00.00 grep post
So now of course Im throughly confused.
So what should I do?
Here is part of my postgres server error log:
FATAL: lock file "postmaster.pid" already exists
HINT: Is another postmaster (PID 171) running in data directory "/usr/local/var/postgres"?
Postgresql is still not running, still get the same error and nothing has changed. Im too chicken to delete things without checking on SO.
Could some of you experts please guide a noob.
Thanks
I had the same problem today on Mac Sierra. In Mac Sierra you can find postmaster.pid inside /Users/<user_name>/Library/Application Support/Postgres/var-9.6. Delete postmaster.pid and problem will be fixed.
This can happen if the database did not shut down correctly. To fix it simply delete the postmaster.pid file. The location differs based on your OS:
MacOS:
rm /Users/<user_name>/Library/Application\ Support/Postgres/var-9.6/postmaster.pid
or using Postgres.app:
rm /Users/<user>/Library/Application\ Support/Postgres/var-10/postmaster.pid
Linux:
rm /usr/local/var/postgres/postmaster.pid
I have the database working now.
Here are the steps I took:
I rebooted my computer
I opened the terminal and ran cd /
Then I did ls -la
Ensured that I could get to MackintoshHD/usr/local/var/postgres
Then did ls -la
Here I saw the postmaster.pid file
I ran this command cp postmaster.pid ~/Desktop which copied the
file to my desktop.I like to do this if I am deleting files. If
something does wrong i can put it back
Then I ran this command to remove the file from the postgres
directory rm -r postmaster.pid
I went to my pgadmin3 gui and fired it up. and Voila it worked :)
Thanks to #Craig Ringer for his help
I'm using Postgres.app, and the below worked for me:
I entered the commands into my terminal below, locating the Postgres folder beforehand and not using "justin".
$declare -x PGDATA="/Users/justin/Library/Application Support/Postgres/var-9.4"
$pg_ctl restart -m immediate
As Justin explains in his post, the output after this was:
waiting for server to shut down……………………………………………………… failed pg_ctl:
server does not shut down
After entering the command again:
$pg_ctl restart -m immediate
It worked and I got this message:
pg_ctl: old server process (PID: 373) seems to be gone starting server
anyway server starting LOG: database system was interrupted; last
known up at 2015-07-28 18:15:26 PDT LOG: database system was not
properly shut down; automatic recovery in progress LOG: record with
zero length at 0/4F0F7A8 LOG: redo is not required LOG: database
system is ready to accept connections LOG: autovacuum launcher started
Source

Installing Membase from source

I am trying to build and install membase from source tarball. The steps I followed are:
Un-archive the tar membase-server_src-1.7.1.1.tar.gz
Issue make (from within the untarred folder)
Once done, I enter into directory install/bin and invoke the script membase-server.
This starts up the server with a message:
The maximum number of open files for the membase user is set too low.
It must be at least 10240. Normally this can be increased by adding
the following lines to /etc/security/limits.conf:
Tried updating limits.conf as suggested, but no luck it continues to pop up the same message and continues booting
Given that the server is started I tried accessing memcached over port 11211, but I get a connection refused message. Then figured out (netstat) that memcached is listening to 11210 and tried telneting to port 11210, unfortunately the connection is closed as soon as I issue the following commands
stats
set myvar 0 0 5
Note: I am not getting any output from the commands above {Yes: stats did not show anything but still I issued set.}
Could somebody help me build and install membase from source? Also why is memcached listening to 11210 instead of 11211?
It would be great if somebody could also give me a step-by-step guide which I can follow to build from source from Git repository (I have not used autoconf earlier).
P.S: I have tried installing from binaries (debian package) on the same machines and I am able to successfully install and telnet. Hence not sure why is build from source not working.
You can increase the number of file descriptors on your machine by using the ulimit command. Try doing (you might need to use sudo as well):
ulimit -n 10240
I personally have this set in my .bash_rc so that whenever I start my terminal it is always set for me.
Also, memcached listens on port 11210 by default for Membase. This is done because Moxi, the memcached proxy server, listens on port 11211. I'm also pretty sure that the memcached version used for Membase only listens for the binary protocol so you won't be able to successfully telnet to 11210 and have commands work correctly. Telneting to 11211 (moxi) should work though.

Resources