Employee Web Usage From Proxy Logs - reporting

I need to find/create an application that will create employee web usage reports from HTTP proxy logs. Does anyone know of a good product that will do this?
#Joe Liversedge - Good point. I don't have to worry about this, however, as I am the only person in my company with the know-how to pull off an SSH tunnel.

Here's a scenario: What's to stop two employees, let's call them 'Eric' and 'Tim', from running their own little SSH tunnel back home to prevent 'the Man', in this case you, from narc'ing out their use of the Internet. Now you have a useless report.
If you're serious about getting real data, you'll want something close to the pipes.
But agreed, Splunk would work pretty well, as would an over-long and unmaintainable Perl script or a series of awks, sorts, uniq -c's, etc.

I don't use it, but I hear Splunk is a really good log aggregation and reporting tool. It comes recommended by my sysadmin buddys, I just haven't had a need for it, since we use, IndexTools for our usage stats.

or Sensage... which has a more formalized model {more like a normal relational database} of data - at the expense of requiring a little more thought and setup cost to start consuming logs.
I don't think that they offer a free low-volume version like splunk do though.

Related

Monitoring solution for EC2 based deployment

We have some 20 or so servers in EC2, most are dynamically spawned (scaling groups).
We're looking for a solution to monitor the uptime of our application.
As an added bonus this solution could also extend to actually monitoring the servers involved so its easy to go back in time and see what happened just before a downtime or whatnot.
We're looking for a hosted solution ideally, and it should be easy to scale with it (it needs to somehow dynamically deal with servers being added/removed with no interaction from us).
Anyways, hoping for some recommendations from you guys.
A bit of background ...
We're currently using a custom Nagios setup, its been reduced to basically doing a simple http check now that the servers have become fully dynamic. We've already been using PagerDuty to deliver the pages. It does ok, but for the maintenance cost we could well be using a http check # Server Density of Pingdom.
I've looked briefly at ServerDensity, and it does look promising, I especially like their install mechanism of just dumping their files into your AMI and it takes care of the rest.
I'd like to know what options there are tho before diving deeper into any particular solution.
We use a combination of Server Density for monitoring and PagerDuty for alerting. The two work quite well together.

Best practice to query the performance state of the mongo

I'm interested in the best practice to query the performance state of the mongo cluster (on mongohq) using a ruby script.
I would like to build some ruby script that checks if the mongo is idle (or near idle) and if so, start to do some work (lots of queries and updates) on it.
I suggest instead of writing this yourself to have a look at MMS. MongoHQ supports this for their dedicated database plans. See https://mms.10gen.com/docs/faq for information.
If you really want to do this yourself, you need to call the serverStatus command.
Dru:
Also, there are some additional tools that MongoHQ can make available to you. Please drop the team a note and they can give you some things to try out. But yes, as was stated above, MMS is a good solution as well.
Jason
MongoHQ

website - redundancy and failure

After researching various hosts, I still get the feeling that it is somewhat impossible to get a host that would never go down.
Maybe these hosts employ redundancy, maybe they do not. Either case, how would one display a friendly message to the user along the lines of "BRB". What if your host goes down completely for an hour? You would need a way to tell users you would be back. How do you accomplish that?
I doubt any ISP or hosting provider would do that for you. To archieve that you need very expensive and complicated infrastructure like redundant fail-safe routers and backbones in addition to servers of course - and you need multiple. The concepts like Simple Failover requires DNS updates which take minutes to hours to propagate normally, so it's not a 100% solution either. See a good Joel's article for a related discussion.
If the host is down and you're on a single server, then you are definitely down. This is a limitation of shared hosting... there's not much you can do about it. You can ask your host if you are hosted on multiple servers for redundancy... if so, then you wouldn't have to worry about it.
If you host your own server, then you could maybe get your hands on Simple Failover and maybe have a cheap Virtual Dedicated server that goes UP when your primary goes down.
Ok, every host will have downtime at some point. Your best bet would be to go with someone who has the great customer service that can help get your box back up. 99% of the time when your box goes down its your fault (if you have access to the OS/Apache etc).
The people at Rackspace are awesome for hosting + customer service. The rackspace cloud is great allowing you to create and take down servers instantly. (slicehost is good for persistent boxes charged by month, also owned by rackspace)
As for a way to communicate to your users, i would employ twitter, tumblr, or a hosted blog service. This way if your box goes down you can communicate your message via these services which are most likely on a different host/network.

How to implement a secure distributed social network?

I'm interested in how you would approach implementing a BitTorrent-like social network. It might have a central server, but it must be able to run in a peer-to-peer manner, without communication to it:
If a whole region's network is disconnected from the internet, it should be able to pass updates from users inside the region to each other
However, if some computer gets the posts from the central server, it should be able to pass them around.
There is some reasonable level of identification; some computers might be dissipating incomplete/incorrect posts or performing DOS attacks. It should be able to describe some information as coming from more trusted computers and some from less trusted.
It should be able to theoretically use any computer as a server, however, optimizing dynamically the network so that typically only fast computers with ample internet work as seeders.
The network should be able to scale to hundreds of millions of users; however, each particular person is interested in less than a thousand feeds.
It should include some Tor-like privacy features.
Purely theoretical question, though inspired by recent events :) I do hope somebody implements it.
Interesting question. With the use of already existing tor, p2p, darknet features and by using some public/private key infrastructure, you possibly could come up with some great things. It would be nice to see something like this in action. However I see a major problem. Not by some people using it for file sharing, BUT by flooding the network with useless information. I therefore would suggest using a twitter like approach where you can ban and subscribe to certain people and start with a very reduced set of functions at the beginning.
Incidentally we programmers could make a good start to accomplish that goal by NOT saving and analyzing to much information about the users and use safe ways for storing and accessing user related data!
Interesting, the rendezvous protocol does something similar to this (it grabs "buddies" in the local network)
Bittorrent is a mean of transfering static information, its not intended to have everyone become producers of new content. Also, bittorrent requires that the producer is a dedicated server until all of the clients are able to grab the information.
Diaspora claims to be such one thing.

Anything wrong with moving CLI validation/logic server-side?

I have a client/server application. One of the clients is a CLI. The CLI performs some basic validation then makes SOAP requests to a server. The response is interpreted and relevant information presented to the user. Every command involved a request to a web service.
Every time services are modified server-side, a new CLI needs to released.
What I'm wondering is if there would be anything wrong with making my CLI incredibly thin. All it would do is send the command string to the server where it would be validated, interpreted and a response string returned.
(Even TAB completion could be done with the server's cooperation.)
I feel in my case this would simplify development and reduce maintenance work.
Are there pitfalls I am overlooking?
UPDATE
Scalability issues are not a high priority.
I think this is really just a matter of taste. The validation has to happen somewhere; you're just trading off complexity in your client for the same amount of complexity in your software. That's not necessarily a bad thing for your architecture; you're really just providing an additional service that gives callers an alternate means of accessing your existing services. The only pitfall I'd look out for is code duplication; if you find that your CLI validation is doing the same things as some of your services (parsing numbers, for example), then refactor to avoid the duplication.
in general, you'd be okay, but client-side validation is a good way to reduce your workload if bad requests can be rejected early.
What I'm wondering is if there would be anything wrong with making my CLI incredibly thin.
...
I feel in my case this would simplify development and reduce maintenance work.
People have been doing this for years using telnet/SSH for remoting a CLI that runs on the server. If all the intelligence must be on the server anyway, there might be no reason to have your CLI be a distributed client with intelligence. Just have it be a terminal session - if you can get away with using SSH, that's what I'd do - then the client piece is done once (or possibly just an off-the-shelf bit of software) and all the maintenance and upgrades happen on the server (welcome to 1978).
Of course this only really applies if there really is no requirement for the client to be intelligent (which sounds like the case in your situation).
Using name / value pairs in a request string is actually pretty prevalant. However, at that point, why bother with SOAP at all? Instead just move to a RESTful architecture?

Resources