How to show the parallelism of requests in a nice graphical view? - ruby

I have currently the following setup:
An object graph of all requests read from an application server log file.
Each line is represented as a RequestPart, with the following information: start time, stop time, tier, application part that is done.
I would like to draw / to graph something that shows the following:
Show different colors for the tier the request part is in.
Show for requests that are done in parallel, that they overlap.
The relation of start and stop should be shown (not exactly, but approximately)
My first idea was to fill the rows of an excel sheet with the requests, and color each cell according to the time, the tier, ... But then I found out that excel does only allow 2^8 cells (with Excel < 2010), so that is not an option.
I'm a Ruby boy, so I checked RMagick and Gruff, but I don't like that at the end, I only have an image, so no further analysis is possible. Does anyone has an idea what to do (well, last resort: install Excel 2010, but my company will not like that).

Check out open source Timeline
Added
Tips for using it:
send your data using JSON, faster parsing on the client compared with XML.
suggest that your clients use FF, Safari or (fastest), Google Chrome
Even faster parsing of dates: send Javascript datetime literals for parsing on client. Of course, at that point you're not sending kosher JSON, but it is the fastest way to send the data.

Related

Python structure for working with variable log messages in a large file

I am trying to get data out of debug log messages created by a certain piece of open source software. It has many lines describing what it is doing during stages. It does not have a specific structure, i.e. some data covers multiple lines with different indents and no separator so does not import nicely into a pandas data frame, which would be my go-to usually.
Is there a good way to structure a python script that parses this data and one that can be used in the future for the same function, and also be extendable to extract different data? I have to do a bunch of different steps to extract the data. The other complication is that the file is much too big to store in memory (10^6 lines) so i need to iterate through the lines.
Please could anyone give me some tips on how to do this, is it best to move to do each step and save to a new file? Or my idea is to create a data object and store relevant line numbers as attributes in lists, that are generated in different method. Then each subsequent method only loads the lines from that list.
Or alternatively, maybe I am totally using the wrong tool and I need to learn awk or regex commands to do it? I just know python already so have a preference for it. Not looking for a specific answer necessarily, some tips and pointers would also be very useful!
(--details--) I am trying to trace on a freeradius server the difference between log messages of requests, accepts and rejects of a mac address to see if I can find out why it is sometimes accepted and other times rejected, seemingly randomly.
There are a lot of plugins running on the server setup before I got to dealing with it so the debug is a massive wall of text, labelling each request with a number. I can split it into requests by that number, find the request that mentions the mac, split those requests into different files, then run want to filter out all the boilerplate info that comes with each message and get to the things that are different between them. (--details--)

Dynamically loaded Markers: DDOS prevention

My app shows a map where locations (or Markers) are dynamically loaded via an ajax (and database) request after every map Bounds changes.
I'm convinced that this solution is not scalable : at the moment, Europe area shows a total of 10 markers.
If the database grows and I display for instance 1000 locations, that means 1000 rows would be returned to the user.
This is not a JS / UI since I use the MarkerCluster plugin and I avoid the redraw of loaded locations's markers.
I made some tweaks :
- Delay the Ajax request thanks to an Idle gmaps event
- Increase the minimal zoom level, so the entire world can't be displayed.
But this is not enough.
There are lots of ways to approach this but I will just put here the two I think are most appropriate from your question.
First is to really control from your web app what information is asked for and when. You could write this all yourself in javascript and implement caching techniques ect. There are a number of libraries out there that do most of this work for you though.
I would recommend one of the following:
OpenGeo SDK
OpenLayers
GeoExt
Leaflet
All of these have ways of controlling local caching, when to get the data and what data is gathered from the server. Most of them can also be extended to add any functionality that is missing. The top two I know support google maps (as well as a number of others) as well.
If you need to add even more control over your data locally you could even look at implementing something like PouchDB. I think this is more suited to mobile applications or instances where the network connection is either really slow or intermittent.
This sort of solution should be able to easily handle 1000's to 10000's of features with 100's of users.
If you are really going to scale up to 100000's to 1000000's of features with 100's to 1000's of users then I would suggest adding a tile server to the soloution above. The tile server will sit between your web application and your data base. Most of them have lots of caching settings and optimistions for dealing with large datasets and pushing them out to a client. Because they push out tiles rather than features the data output remains reasonably constant even as the number of features grow. The OpenGeo SDK and Openlayers libraries I mentioned above can work really well with any of the following tile servers:
GeoServer
Mapserver
MapGuide
Quantum GIS Server
If you are reluctant to do any coding there are some offers that work out of the box for enterprise environments. They are all expensive and from your question I think they are probably not what you are looking for.

Need Recommendation on impression tracking

I'm doing a research for my work which needs to track impression of a little web app sitting in 3rd party (authorized) websites. I need to analyze the impression close to real time.
I know there are at least two ways
1) use image, and parse the server log for reporting.
2) js sends ajax, and save the request in DB. (either mysql or mongo or other noSQL).
so, which way is the faster way and can handle tones of traffic?
I suspect that server log is slower because it has to append to a file. But I'm not sure if it is really slower, or it is not.
So, what is the pros and cons of each approach? Thanks. :)
P.S. I can't use Google Analytics because there is a limit on Data Export..and also other limitations. :-)
Both options are valid, the image and server logs are simple and work as long as the visitor loads images. This is faster in most cases, since there is no extra processing.
If using JavaScript, I would do what the web-analytics companies do and create a image call with JS and at the other end have either a image file with server logs or a script reading data in to a DB and returning a 1x1 pixel transparent GIF.
If all you need is impressions, I would go with the simpler solution, less to go wrong.

IE performance for rendering huge html data

I have a few perl CGIs which almost query the whole table with more than 5000 rows as result and send that data to browsers. The size of html data generated is around 1MB.
Earlier I was using tables(which should be ideal approach).
Unfortunately most users use IE and it does not display data till it receives closing table tag. Can we do something about it.
To push output as soon as its generated, I used another approach where in I was using printf and <pre>. Which reduced the response size by 200kb and and it appears more faster in display.
Again IE (not any other browser) eats up CPU and hangs for couple of seconds... :-( ..
Can we do something about it too.
FYI I am using IE8.
Perhaps it would be wise from a UX perspective to use some sort of pagination method. Having a single page with thousands upon thousands of rows sounds entirely unfriendly for the end user. Something like a simple means of pagination ("Skip to page =dropdown=") would certainly solve your problem, as well as decrease load times and increase usability.
There are also several solutions which are pre-built and would likely integrate rather easily. One which comes to mind almost immediately is Sencha's Paging Grid:
http://dev.sencha.com/deploy/dev/examples/grid/paging.html
http://www.sencha.com/products/js/
It's pretty nifty and you'll likely get some kudos for using a hip new technology. There are other options, too:
Ingrid: http://www.reconstrukt.com/ingrid/src/example3.html
YUI Data Grid: http://developer.yahoo.com/yui/datatable/#data
MyTableGrid: http://pabloaravena.info/mytablegrid/
Hope this helps!
More people use ie?
http://www.w3schools.com/browsers/browsers_stats.asp
anyways why do you still use tables?
we all know they are easy to maintain, easy to get confused and in your case slow...
output it as div which will show all the data and does not need to wait for a table tag to finish. as div and span have the least properties.

Text difference patch

Am trying to write a piece of code which will allows the user to type text into a textbox which then gets saved on the server. When the user types some more text in the textbox, I want only the difference to be sent to the server.
Is there a difference algorithm for JS which I can use to send only information about the difference. So it should be able to tell the difference between two text boxes essentially.
It could also be language agnostic and I can port it.
Thank you for your time.
UPDATE
In simple words. I have a text area which keeps saving the text in the box every X seconds. Now to save bandwidth I only want it to send the difference from the last saved revision (which I can say put in a variable. Initially this will be empty). Now the JS has to check the difference between the last revision and the current state of the textbox and generate a change list to send to the server.
UPDATE 2
Something like www.etherpad.com
Google DiffMatchPatch has a Javascript implementation, I've used it with much success.
http://code.google.com/p/google-diff-match-patch/
The Python difflib module does this and more. It's very flexible but might be challenging to port to Javascript.
Regarding your update, I'm first wondering why you need to worry about bandwidth. Unless your users are typing a lot of text into an edit box (which has its own usability issues) then there just aren't that many bytes to send. Send the whole text box each time you autosave. Users can't type fast enough to really notice the use of bandwidth.
Or, you could meet halfway. Every time you autosave, check to see whether the user has only added new text to the end compared to the last time. If so, send an "append" type update with just the new text. If the user has gone back and edited anything else, then send a "replace" type update where you send the whole text. This takes care of the common append-only case without severely complicating your implementation.
Instead of calculating a diff between 2 texts, which is difficult,
you could always, while people are editting, record the keystrokes and the caret position in the textbox. If you send this over every now and then (and clean the buffer), the server can playback the exact same sequence.
This code-smells of premature optimization. Perhaps you should implement your solution first and then see about optimizing your transfer rates using diffs. How much text are you looking at? Because the request and response packets are going to be more or less the same size with only a few bytes difference for your message, so the savings could be very minimal.
At the very least, complete your solution without optimization and profile your network traffic using tools like Firebug and then test to see how much worse the performance is with what you would consider to be the maximum text block that could be sent.
Finally, you could always use the TypeWatch JQuery plugin to listen for change events in the textbox. You can set a delay so that once the user finishes typing and the delay elapses, the callback function is triggered. This means that the text will only be sent when the user types something, and only when they are finished typing. This will be significantly more efficient than repeatedly polling the server.
Depends on how far you are ready to go.
You would like to check deltav algorithm, it is used by svn in particular: http://svn.apache.org/repos/asf/subversion/trunk/notes/svndiff

Resources