Machine Learning with Google Data - ruby

A couple years ago I saw a fantastic presentation on machine learning based on using Google as the data source. The idea was to leverage Google and Ruby to get more people involved in the concepts of machine learning since massive amounts of data are now readily accessible. For the life of me I have not been able to find this presentation. I realize that this wouldn't normally be a very good format to ask this question, however the content was so valuable and well presented that I felt we would all be enriched by having another pointer to this information.
Although I realize this is somewhat vague, Can anyone refer us to this original video presentation?
If not, could you share some useful links that would get one started down this road of machine learning leveraging massive data sources that now exist and are generally available?

As was noted in the comments the link is: Intuition & Data-Driven Machine Learning
He particularly piqued my interest with this quote: "... in certain cases, you are simply better off working on getting more data, then spending your time on improving the algorithm..."
Excellent presentation and presenter (Ilya Grigorik)! Highly recommended for anyone wanting to start down the path of machine learning.

Related

Any phalcon vs chicagoboss benchmarks?

These frameworks are the future of speed internet. But I can't find any benchmark or feature comparison of them on google. What framework in which situation would be better for example for building highload online shop? For building stackoverflow clone?
Maybe some basic memory management and request handling differences explanation, please?
Though the official documentation links to techempower, ChicagoBoss is not mentioned anywhere. Looking closely at ChicagoBoss it seems to be targeted mostly at Erlang developers, which is not the most popular language out there. I'm a fanatical about Phalcon, but I feel that ChicagoBoss would be faster and more resource efficient out of the box. But… writing your entire app in binary code right away would be even better in that sense.
Phalcon in less than two years achieved bigger popularity and reputation than ChicagoBoss did in five. There is significantly more information and support out there for Phalcon given all standard PHP rules and information apply to it as well. Phalcon next big release is under active development and looks very promising.
What framework in which situation would be better for example for
building highload online shop? For building stackoverflow clone?
I'm certain that neither Amazon or SO use either of them but both rely on a lot of caching and infrastructure optimisation to get where they are – the job for a different type framework.
Phalcon is a great lightweight tool for building unique projects with focus on high performance. It behaves very nicely with PhpStorm and the development / debugging is a pleasure most of the time. But be sure, it will give a lot of headache (there are a few bugs and some information is hard to come by) – isn't the best choice for enterprise software, you will spend a lot of time figuring out how things work and how to fix some of them.

Learning Oracle and GeoSpatial Systems

Lately, I am getting more engrossed in learning Oracle and Geospatial systems. I feel that mapping systems, combined with solid data structure are two technologies that are making their niche in today's market.
If you are starting to learn about these technologies, where would you recommend starting off? If I understand correctly, the best way to learn them would be through actual work (or hobby), but I can't seem to find good places to get the resources to do so.
I would appreciate any advice, tips, resources and information everyone could provide to jump-start my learning and understanding of these technologies.
Thanks.
Update:
Saw a nice PDF relating about this, but for a hobbyist wanting to learn it, are there free tools to start off with it?
http://download.oracle.com/otndocs/products/mapviewer/pdf/mv11g_spatialvis_inobiee.pdf
You appear to be interested in OLAP/BI combined with GIS/mapping.
See information on Spatial OLAP (aka SOLAP) at http://www.spatialbi.org/ , as well as this list of tools at http://spatialolap.scg.ulaval.ca/DevApproaches.asp
Also, see GeoKettle at http://www.spatialytics.org/

Refactoring Practice/Workbook

I recently saw the Refactoring Workbook while I was cruising Amazon the other day. I haven't actually gotten to read it yet, but it presents an interesting idea. The most enticing part of a "workbook" is that we can finally have every day practice for dealing with tough problems in a systematic way.
Onto the question. Does such a resource exist online or in other books? I know someone is going to suggest Open Source, but some of those projects require understanding of a huge context. I'm looking for something I can pick up, read a few pages, and refactor. Consistently.
As a side note, if such a resource doesn't exist online - it'd be a gold mine of an idea.
Industrial Logic has e-learning resources that I think are somewhat like this.
They aren't free, and I haven't seen enough of them to vouch for the quality, but I know some of the people involved in creating these materials and would expect they're good.

Best practices for online help

we're currently developing a fairly complex web portal. To improve the user experience, we want to provide a context-sensitive online help system that can aid the user in understanding certain aspects of the site.
In our case, the site has a variety of widgets that display all kinds of tabular data, graphs, etc. For instance, one such widget may display the VIX and a the help system would offer a brief description of what the VIX is.
Now, I've looked around in the internet and found some interesting articles such as the Design Checklists for Online Help, but most of what I found seems fairly outdated. What I'm specifically interested in are design issues such as these:
whether (or when) to use popups, divs, or link to external pages
how comprehensive should the help entry be? how much is the average user willing to read?
what's a good way to provide access to the help system? cluttering the UI with questionmark-icons is certainly not optimal
should the help entry be loaded on demand with AJAX (kinda sucks, you want the info right away) or preload it (causing tons of unnecessary traffic)
other dos and don'ts
The answers to some of these questions may seem obvious, but when it comes to usability I've made the experience that the intuitive answer isn't always the best. Secondly, I'm a software developer and as such I tend to look at things from an engineer's point of view. And I think we all know that this is, more often than not, a pretty poor angle from which to approach the design of a user interface. This is why I would very much like get some feedback from people more experienced in this field.
See here:
https://ux.stackexchange.com/questions/1351/best-practices-for-online-help

What do you do when you're suddenly thrown onto a large project? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
I recently started a career in software development after graduating a couple of years ago in CS. The current project I'm on is a large ongoing project that has it's origins in the 90s with a mix of C, C++, and Java. There are multiple platforms (UNIX, WIN, etc) being supported, older technologies in use like CVS, and some dated documentation in some areas.
The extent of my software development skills stem from going to university as I've had little real world experience. I felt like I had a decent foundation in CS but I cannot but help feel slightly overwhelmed by it all. I'm excited to be part of something so huge but at the same time I feel like it's a lot of information to absorb.
My coworkers have been great people and answer a lot of questions I. My employer hired me knowing that I am entry level.
I've tried poking around the source code and examining how everything gets built but it's on a scale I've never seen before.
How do more experienced people situate themselves when joining a large ongoing project? What are some common tasks you do when getting yourself up to speed?
Good question. I haven't had your exact experience, but in cases like this I like to think, "how do you eat a whale?" The answer is (predictably) "one bite at a time." Reasonable people won't expect you to grasp the whole thing immediately, but they will want to see progress. Perhaps there are some small areas of the larger project that are not too complex, without too many dependencies. Work toward understanding one of those and you're one 'bite' (and/or 'byte') closer to expertise on the whole project.
Being familiar with all existing documentation I would try to get the big picture. Literally.
generate a TreeMap of the source code
I would use GrandPerspective on Mac or WinDirStat on Windows. It will give you some insights about the structure of the project's files (sometimes it gives some hints about the code structure). Having this, you can ask your colleagues for some of the clusters, what they do, how they relate to each other.
learn how to build the project
This is important to have it compiling all the time if you are about to do any changes. Having tests executed at the build time is always a good thing, so ask for it also. Even better if there is some kind of continuous integration server in place. If there is, look at its configuration - figure out how the build is done. If there was no CI server, but you already got the knowledge how to build the project, create such a server on your local machine, and show it to your fellows - they should fell in love with it.
browse the source code with Structure101 or similar tool
This is useful especially for Java projects. This tool does great job. That will give you more details about the code structure, and sometimes about the system architecture. This experience may be sometimes hard, you may learn from this tool that a code is basically a Big Ball of Mud ;)
look for tests, and explore them
If you will be lucky there may be some JUnit, or CPPUnit tests. This is always good to try to understand what those tests are doing. It may be a good starting point to explore the code further.
My coworkers have been great people
and answer a lot of questions I. My
employer hired me knowing that I am
entry level.
You have little to worry about, you're employer knows what you are capable of and your co-workers seem eager to help you out - to be honest most developers love explaining things to others...
From what I've seen, it take truly 6+ years to become fully knowledgeable in a language, so don't expect to become a guru within a year... and even these so called gurus end up learning something new about their language everyday.
Learning a new system (large) will always take time.... the systems were usually not built in 2 weeks but over many years, so don't expect to understand it fully yet. You'll eventually discover what each part does piece by piece.
I know how you feel, because I felt like that once...
"I took a speed reading course and read 'War and Peace' in twenty minutes. It involves Russia." (Woody Allen)
I agree on what the others said before me. You need some tools that give you an overview on the code. I personally used inFusion (http://www.intooitus.com/inFusion) because it gives also other interesting data beside structure.
The method that has worked best for me is to grab a copy from source control, with the intention of throwing this version away...
Then try and refactor the code. It is even better if you can refactor the code that you know you will be working on at a later stage.
The reason this is effective is because:
refactoring gives you a goal for you to aim towards. Whereas "playing" an "breaking" the code is great - it is unfocused.
To refactor code you really have to understand the code.
Refactored code leaves code that has less concepts to retain in memory. If you don't understand a large codebase its not because you are a graduate - its because nobody can retain more than 7 (give or take a few) concepts at a time.
If you follow correct refactoring guidelines it means you will be writing tests. Although, make sure that you will be working on the modules that you are testing as writing tests can be very time consumning (although very rewarding)
Do invest in buying this book at some point:
http://www.amazon.co.uk/Refactoring-Improving-Design-Existing-Technology/dp/0201485672
But these links should get you started:
Signs that your code needs refactoring and what refacoring to use (From Refactoring - Martin Fowler)
http://industriallogic.com/papers/smellstorefactorings.pdf
A taxonomy of code smells:
http://www.soberit.hut.fi/mmantyla/BadCodeSmellsTaxonomy.htm
Good luck!!!
I agree to the first comment but I also Think that you have to learn and see the big picture in some way. You have to trace the main flow from code at least.
I was in the exact same situation several years ago when I joined a software project with 50+ ClearCase version control vobs, 5 million lines of code, and some of it dating back to the 1980's.
The first thing I did was look through every source controlled directory and made a quick summary of my best guess about what the software in that folder did and what language the code was. You can make a pretty good guess by looking at filenames and any comments or documents in those folders.
I then looked at the build scripts to see if they were readable enough to get an idea of dependencies between different parts of the code.
Finally - and I believe this was the most valuable - throw an IDE like Eclipse or NetBeans on top of the code and start reading through pieces of it. Having the ability to jump to the definition of any functions or classes using the IDE allows you to move around a massive software baseline with relative ease.
Overall, have some confidence - it is unlikely that anyone else on the project knows all of the code, so you don't need to either. Use what other people said to get a good idea of the overall project and interfaces and requirements (if they exist) and poke through the code to get an idea of the most commonly used classes and methods.

Resources