Best practice to query the performance state of the mongo - ruby

I'm interested in the best practice to query the performance state of the mongo cluster (on mongohq) using a ruby script.
I would like to build some ruby script that checks if the mongo is idle (or near idle) and if so, start to do some work (lots of queries and updates) on it.

I suggest instead of writing this yourself to have a look at MMS. MongoHQ supports this for their dedicated database plans. See https://mms.10gen.com/docs/faq for information.
If you really want to do this yourself, you need to call the serverStatus command.

Dru:
Also, there are some additional tools that MongoHQ can make available to you. Please drop the team a note and they can give you some things to try out. But yes, as was stated above, MMS is a good solution as well.
Jason
MongoHQ

Related

Schema deployment management for Athena

In order to apply devops principles to data (ugh, dataops!), things like continuous deployment need to be considered.
Hence why tools like dbDeploy exist. However dbDeploy seems to have been orphaned and is not maintained any more. In the past i've used this tool again and again, but I don't see much support for it, and I'm not sure why?
So i'm wondering just what do people use to manage and version their schemas. In particular i'm looking for something that will work with Athena (But this has a jdbc driver, so in theory any jdbc compliant tool)
I know one answer may be to switch mindset, and use the AWS Glue crawlers instead. But do people actually do that? Or are the crawlers more for POC/Quick start situations? I'm pretty sure you'll always want to override decisions the crawler makes, so how can that be handled?

Neo4j and Ruby: How can I speed up unit testing?

Am starting to write unit tests along the lines of https://github.com/neo4jrb/neo4j/wiki/How-To-Test
One of the approaches there is really slow (10 seconds per test) and the other doesn't delete labels (and probably other things)
Can anyone suggest a more elaborate approach? I noticed that in the core neo4j material, the java documentation describes methods that create and tear down temporary databases, but I don't see a way to access those from the (very nice) ruby and rails neo4j gems. Perhaps via the low-level REST api? It's hard to figure out what api calls are available.
So you could probably surround your tests in transactions which is a typical approach for testing with ActiveRecord in Ruby. That might be more performant, but it should also help keep the database clean.
But you're right, the impermanent database is the tool that's provided in Neo4j for temporary databases for testing. I think that's only available if you're running JRuby, though. I did run across this, though:
https://groups.google.com/forum/#!topic/neo4j/7xeEPWEiqD0
Which links to a project which lets you start up a Neo4j server in "in memory" mode (using the impermanent database):
https://github.com/jexp/neo4j-in-memory-server
That's showing examples for Neo4j 2.0.0, so I don't know if it would work for later versions, but it might be worth a shot for your testing database.
EDIT: Another thing that I just thought of is to use the vcr gem:
https://github.com/vcr/vcr
It basically records all of the requests made to your server and then plays them back. This works great for API endpoints where result are idempotent, but if you use it for a database like Neo4j you should make sure that your tests are clearing the database before every test run so that it always starts fresh

PostgreSQL from NodeJS application

I am exploring how best to access a PostgreSQL/PostGIS DB from NodeJS. All I need is simple SQL SELECT queries. Nothing more complex than:
SELECT *
FROM portal.catalog AS cat
WHERE ST_Intersects(st_geogfromtext('SRID=4326;POLYGON((20 50 ,19 50,19 49,20 50 ))'), cat.gpoly)
LIMIT 5000;
This will be on a windows7 or windows2008 server, running PostgreSQL 9.2/PostGIS 2.0, The traffic will be pretty light (only a few requests per minute).
Some preliminary research I have done has come up with the following potential directions. But I was interested in hearing from others what is working for them (as an easy implementation).
https://github.com/brianc/node-postgres (But I am having trouble building it do to firewall issues), Supposed the "pure" solution is better, but I am having issues there also) https://github.com/brianc/node-postgres-pure
http://www.infoq.com/articles/the_edge_of_net_and_node (And then I guess I would write my own ADO.NET adapter to PostgreSQL)
I have also seen references to ODBC for NodeJS (unclear if this is the way to go)
Is there something like the SQL adapter for NodeJS? http://blogs.msdn.com/b/sqlphp/archive/2012/06/08/introducing-the-microsoft-driver-for-node-js-for-sql-server.aspx
There was also a full blown ORM by EntitySpaces (which went bankrupt). Now a defunct opensource project: https://github.com/EntitySpaces/entityspaces.js
I've used node-postgres in the past, but recently opted for any-db, which has support for PostgreSQL.
Both have worked well, although I prefer any-db, particularly with respect to pooling and transactions. I believe any-db deserves more recognition.
Any-db is layered on top of BrianC's node-postgres.
But I just got the https://github.com/brianc/node-postgres-pure working, and it is a pleasure.
EntitySpaces is the way to go, not defunct at all.
http://download.cnet.com/EntitySpaces-Studio/3000-10250_4-10590953.html?tag=mncol;1
I got BrianC's PostgresPure system working (must have been a dependent module malfunctioning, since I did not do anything special.
Works just great.
See: https://github.com/brianc/node-postgres-pure

What is a good choice for Fulltext indexing when developing a OSX application?

Hy,
I'm implementing an IMAP client as a Mac OSX application using MacRuby.
For the sake of offline availability, I wanted to allow fulltext indexing and attribute based indexing of all messages. Attributes include common E-Mail stuff like from:, to:, etc...
This would allow for advanced results sprinkled with faceting, analytic calculations and such.
Now I'm unsure about the choices and good practices when it comes to integrating such a search feature. I have a strong web development background, therefore my intuitive action would be to setup a Solr server and start feeding it with data. This might just work theoretically, as I could write an Agent that manages the solr instance for my application in the background. But to me, this approach seems like an infrastructure hassle.
On the other side, I've read about people using the FTS3 functionality from SQLite. This approach is easily accessible by CoreData. I haven't used SQLite's FTS3 but I don't think it is as powerful as Solr can be.
What is your weapon of choice for a use case like mine?
I'm mainly interested in solutions that are actually in use by Objective-C/Cocoa/MacRuby developers.
In you're going to develop the app with Ruby give a try to picky. It is very simple to use.
There is an Objective-C Lucene port
http://svn.gna.org/viewcvs/etoile/trunk/Etoile/Frameworks/LuceneKit/
I have not used it, but in your situation, I'd at least check it out. In my experience, SQL based full text search can't compete with Lucene, but haven't tried SQLite for this.
EDIT: just noticed the ruby tag -- this started out as port of Lucene
https://github.com/dbalmain/ferret

Employee Web Usage From Proxy Logs

I need to find/create an application that will create employee web usage reports from HTTP proxy logs. Does anyone know of a good product that will do this?
#Joe Liversedge - Good point. I don't have to worry about this, however, as I am the only person in my company with the know-how to pull off an SSH tunnel.
Here's a scenario: What's to stop two employees, let's call them 'Eric' and 'Tim', from running their own little SSH tunnel back home to prevent 'the Man', in this case you, from narc'ing out their use of the Internet. Now you have a useless report.
If you're serious about getting real data, you'll want something close to the pipes.
But agreed, Splunk would work pretty well, as would an over-long and unmaintainable Perl script or a series of awks, sorts, uniq -c's, etc.
I don't use it, but I hear Splunk is a really good log aggregation and reporting tool. It comes recommended by my sysadmin buddys, I just haven't had a need for it, since we use, IndexTools for our usage stats.
or Sensage... which has a more formalized model {more like a normal relational database} of data - at the expense of requiring a little more thought and setup cost to start consuming logs.
I don't think that they offer a free low-volume version like splunk do though.

Resources