can I use cluto and carrot2 tools to cluster tweets into groups? - tweets

Can I use cluto and carrot2 tool to cluster tweets into groups?
And last question is carrot2 and cluto language independent NLP tools?

You can use Carrot2 to cluster any natural lanaguage texts, including tweets. Carrot2 is fairly language-independent and comes with support for the major languages out of the box.

Related

Local indexing of rich text files

I am trying to create a local index for my notes which comprises mainly of markdown files, text files, codes in python, javascript and dart.
I came across Solr and Elasticsearch.
But the main differences are focused around online use and distributedness.
Which can be a better choice if i need a good integrarion with javascript through electronjs?
Keeping in mind that the files are on local storage and there is not much focus on distributedness but on integration with javascript frontend and efficiency on local system.
Elasticsearch is more popular among newer developers due to its ease of use. But if you are already used to working with Solr, stay with it because there is no specific advantage of migrating to Elasticsearch.
I believe for your use case either of them would work.
However, If you need it to handle analytical queries in addition to searching text, Elasticsearch is the better choice
In terms of popularity, a larger community, documentations I would say elasticsearch is the winner, You can look at the below google trends
You can use the solr along with Apache Tika.
Apache Tika help in extracting the content/Text of different file system.
Using the above the you can index the metadata of the files and content of the files to the Apache solr.
You get admin tool for the analysis of the index and the fields to determine if you able to achieve the desired result.

Full text search in Neo4j vs Elasticsearch

Both Neo4j 4.0 and elasticsearch have full text seach and inverted index with apache lucene.
So how elastic search is better than neo4j full text search?
Consider that we are dealing with the knowledge graph as a data storage model developed in Neo4j.
Apart from that why should we use elasticsearch with Neo4j 4.0. what are things that elasticsearch offer but not neo4j 4.0
So how elastic search is better than neo4j full text search?
"Better" is largely dependent on your use case. But the tools (Neo4j and ElasticSearch) were built for drastically different purposes.
Neo4j is best when used as a graph-traversal engine, returning data from edge (relationship) based queries. It might have similar capabilities, but it just wasn't meant to be used as a search engine.
Want things like "fuzzy" matching and relevance ranking? Neo4j is not going to do any of that. Also, ElasticSearch is a true out-of-the-box distributed datastore. Neo4j can't distribute without an enterprise license.
Basically, it comes down to business requirements. If a datastore mainly needs to execute graph traversals, and serve some simple search-like requests, Neo4j might be enough on its own. Need a full-featured search engine to serve that same data? ElasticSearch is the better suited to handle that.

what does ElasticSearch unlike Solr designed from the ground up to be a distributed index mean?

In a talk, I heard that ElasticSearch
Unlike Solr, was designed from the ground up to be a distributed index
I was wondering what that means by ElasticSearch designed from the ground up to be a distributed index?
What is Solr designed to be? How is the answer different from distributed index?
The first versions of Solr did not support clustering - it didn't even support more than one core inside each instance of Solr. Multicore support was introduced later, then SolrCloud (the clustering support) and collections was introduced with Solr 4.
You did have manual clustering support (i.e. what's known as sharding) and replication support (first through external programs such as rsync, then built-in through http replication) before SolrCloud was introduced, but SolrCloud was the first version that supported it without explicit handling from your own code.

How can I integrate Hadoop with Mahout?

How can I integrate Hadoop with Mahout ?
i want to perform data Analytics and need to have machine learning libraries.
I would start by reviewing the mahout site, reviewing the tutorials, there are lots of useful links http://mahout.apache.org
There are a number of different books out there that will take you from first principles to producing Data Analytics, this is probably a good place to start (http://shop.oreilly.com/product/0636920033400.do) if you know python.

Any NLP API or Utility for Hadoop?

I am working on large scare text based analysis. More precisely I am doing Sentiment analysis on Twitter data for particular products.
I am using Flume to pull Twitter data in HDFS.
Is there any NLP API or Utility I can apply on these twitts to get correct and meaningful sentiment out of it?
I am looking for NLP API or utility that i can use in Hadoop system.
Two possible solutions are:
Integrating nltk with Hadoop. Some resources: http://strataconf.com/stratany2013/public/schedule/detail/30806, http://www.datacommunitydc.org/blog/2013/05/nltk-hadoop, https://danrosanova.files.wordpress.com/2014/04/practical-natural-language-processing-with-hadoop.pdf
Using Apache Mahout, http://www.slideshare.net/Hadoop_Summit/stella-june27-1150amroom210av2

Resources