New to Titan db, help installing titan db - hadoop

I am new to Titan db and I have been reading the documentation in this website: http://s3.thinkaurelius.com/docs/titan/0.5.4/ I really could not find much documentations on installing Titan db, can I install it on my windows 7 or do I need to install it on a Virtual machine that runs on Linux?
Is this the only download I need to get started? Titan 0.5.4 with Hadoop 2 (signature). https://github.com/thinkaurelius/titan/wiki/Downloads ?
Do I also need to install hadoop or will the link above i provided will install it as well?

The Titan distribution you mentioned generally has what you need to get started. I'm not sure what you mean by "installing titan" because how you set it up is highly dependent on what you plan to do with it. If you are just playing around in the Titan Gremlin Console, just unapackage the Titan distribution you mentioned and start it with the included gremlin.sh file (or in the case of windows gremlin.bat). If you plan to use a backend like hbase or cassandra, you probably need to have those up and running first - check their documentation to understand how that works. The same could be said of the indexing backends like elasticsearch. In short, your question is a bit too general to answer with real specifics.
I would recommend that you spend some more time reviewing the documentation you referenced to make sure you understand the Titan architecture and approach. If you skipped Getting Started or didn't fully understand that, I would re-read it and try out the examples. If that's a bit too much you might even consider stepping back to just pure Gremlin and trying the Getting Started docs there.

Related

install Hadoop,Pig and hive in laptop

I want to install hadoop, pig and hive in my laptop. I don't know how to install and configure hadoop,pig and hive and what software are required to do it.
Please let me know exact steps require to install/configure Hadoop, Pig and hive in laptop.
and i can use windows OS and i install the hadoop in windows OS
For beginners, I would recommend sticking to a good prepackaged Hadoop distribution/sandbox. Even if you want to learn how to setup up a Hadoop cluster before using the tools it provides (e.g. Hive etc.), setting up a common distribution is a lot easier at least in the beginning.
Prepackaged sandboxes for Hadoop are going to be in Linux. But most likely, you will not need to do a lot in Linux to start using Hadoop if you start from these sandboxes. Personally, I think the time you will save by avoiding support and documentation issues on Windows ports will compensate greatly for any added effort required for jumping into Linux, and you will at least enter the domain of Linux which itself is a tremendously important tool.
For prepackaged solutions, you may try to aim at Cloudera quickstart VM or MapR quickstart VM as these are the most widely used distributions. By using sandboxes, you will skip the installation process (which may be hectic if you don't know what you want and specially if you aren't familiar with Linux) and jump right into usage of tools. Due to availability of good documentation for large vendors such as Cloudera and MapR, you will also face lesser issues in accessing the tools you want to learn.
Follow the vendor specific setup guidelines (also listed on the download pages as getting started guides) for further details on setting up the sandbox.
Once you have the sandbox setup, you can use a lot of different ways to access Hive and Pig. You can use a command line interface for Hive (called beeline). If you are familiar with JDBC, you can access Hive through that. Install Apache-Thrift to enable much wider access options, but you can also save that for later.
I would not recommend learning Pig unless you have very specific uses for it. If you are familiar with Java (or Scala, or even Python, among other options), try writing some Map-Reduce style jobs to learn more about how Hadoop works. Open Ambari (or Cloudera Manger etc.) interface which comes pre-configured with these sandboxes and see the tools and services that come pre-packaged with the sandbox. These are the most common ones and can be used as a useful list for starters. Start learning about them (but skip Pig if you can, even if it is pre-installed ;)
Once you are familiar with the sandbox you have, I would suggest going for Apache Nifi which has easier learning curve and give a lot of flexibility. But you will most likely have to setup a new sandbox for that. It may also serve as a good revision exercise for learning. Integrate that with your Hadoop sandbox, implement some decent use cases and you will have some good experience to show.

Suggestion for building a small Hadoop cluster for learning purpose

I have a test for my Big Data class where I have to do some sort of big data analytics with 'smaller' datasets. I actually have my stuff figured it out. I installed Hadoop 2.8.1 and Spark 2.2.0 (I use PySpark to build a program) in standalone mode on my Ubuntu 16.04 from source. I'm actually good to go to do my thing by my own.
The thing is, some of my friends are struggling in configuring all of these and I thought to myself "why don't I make my own little cluster with my classmates". So I'm looking for suggestions.
My laptop has 12 GB RAM and Intel Core i5.
If I understand correctly, your friends have trouble setting up spark in standalone mode (meaning no cluster at all, just local computation). I don't think setting up a cluster they can work with takes away from the complexity they will face. Or are they trying to set up a cluster? Because standalone mode of Spark really doesn't need much configuration.
Another approach is to use a preconfigured VM everyone can use individually. Either prepared by yourself, or there are sandboxes by different providers, e.g. Cloudera and Hortonworks.

The nosql database most similar to Rethinkdb

I need to do a project in rethinkdb. However, it can not work in windows. Is there a database that is seen to be most similar so that i can learn the ropes and then start my project in a unix environment. I heard this could be mongodb or couchdb. However, I would like some opinion
Thanks
There are many different way to tackle this problem.
Installing RethinkDB in Windows
First, you can, in fact, get RethinkDB working on Windows. You just have to use a virtual machine. Here is a tutorial on how to install RethinkDB on Windows. That being said, it's not as simple to get it running on Windows as in UNIX environments.
Similar Databases
What NoSQL database is more similar to RethinkDB depends on what features of RethinkDB you are looking for.
If you're looking for a NoSQL JSON document store that you want to use in your server and has a similar API, MongoDB is probably the closes thing to RethinkDB. That being said, there are some pretty big differences between MongoDB and RethinkDB. I would take a look at an objective look at the differences and a more opinionated look at the differences.
If you're looking for something to provide RethinkDB's similar realtime features, you might take a look at FireBase. FireBase is not a NoSQL database, obviously, but it provides some of the realtime features of RethinkDB.
CouchDB might also be a good option, since it's also an open-source, NoSQL database and it also has support for change notifications, which are a way to pull changes from a particular table (Different from changefeed's push model).
Native RethinkDB for Windows
If you're interested in keeping track of Windows support for Windows, you should take a look at this GitHub issue.

Code Combat not interacting locally

I wanted to install Code Combat locally to be also able to understand it better. I have followed the steps (for Mac OSX, I have Yosemite) described at: https://github.com/codecombat/codecombat/wiki/Developer-environment
Everything worked. I have all the scripts running, without problems, mongo is up and running, the game is starting, but then, the game itself can't be proceeded.
I haven't restored the mongo dump, which is 2GB fat and which I can't download easily with my current internet connection, but it seems to be optional.
Looking on the console, I have a couple 404 that I can't explain, see below. If somebody could help me to get the game running locally, I would be very grateful.
GET /db/thang.type/529ffbf1cf1818f2be000001/version 404
GET /db/level/dungeons-of-kithgard/session 404
As well as the mp3 files, which I am fine not to have.
Thanks in advance,
Matthieu
PS: I would have liked to specify more tags, but as it concerns many languages and doesn't have a specific tag, I didn't know which one to add
OK, it's mandatory to have a mongo dump to have the base elements for the levels. But 2GB is too much and a solution will attempt to be found to be able to export only the required data. 200MB should be enough
UPDATE: With the following ticket being implemented, the size is reduced to less than 100MB: https://github.com/codecombat/codecombat/issues/1988

Good resource for what to look for on Munin graphs?

I have installed Munin to provide some insight into server performance for a VPS server with some small rails and Sinatra applications. Is there a good resource for reading up on what to look for on the graphs Munin provides. Or a good resource on getting more details on specific measures (Fork rate, Swap in/out) - what they are telling me, what are signals that need to be looked into...
Mainly I am trying to learn about what measures I should pay attention to on the server side as I try to work with some small ruby application for fun.
When you install munin you get a bunch of default plugins which graph system metrics. To begin with, it would be a good idea to keep an eye on load-avg, cpu%, memory, swap in/out.
If you're not sure exactly how munin is calculating a specific metric, you can try reading the source code for the plugin script. Usually system metrics are obtained from the /proc filesystem. On a debian/ubuntu box, munin plugin scripts are installed (via a symlink) under /etc/munin/plugins. You can install your custom plugins by simply dropping them somewhere and symlinking to them from /etc/munin/plugins.

Resources