How to implement faster search in website using Apache Solr? - performance

I want to use Apache Solr in my website in order to make the search faster.
I need java code to index data from mysql database so that I can perform faster search?
So can anybody please tell me how to implement this?

You can start with Looking at Solr DataImportHandler
This will enable you to index data from DB to Solr.
You would need to configure Solr to make it faster in performance though and it depends how much and what kind of data you have.

If you specifically want to use Java code to add data to the index, you should use the SolrJ client. For the specific case of adding data focus on the Adding Data to Solr section. However as #Jayendra pointed out, you can use other means than just Java, like the DataImportHandler, to load data into Solr. Also, please refer to the Integrating Solr page for a list of additional Solr Client/Language Bindings.

Related

Is there application client for ElasticSeach 6.4.3 (similar to DBvear)

I tried to see my node data from application client (like DBvear), but I didn't found information about that. someone found way to connect DBvear to this version or to see the data by similar application?
I believe what you are looking for is GUI for Elasticsearch.
Typically the industry calls the elasticsearch stack as ELK stack and I believe what you are looking for is the K part of it which is Kibana.
I'm not sure if you are asking for SQL feature but if you are thinking to make use of the SQL feature you can check the Elasticsearch SQL plugin.
Other widely used client application for elasticsearch is Grafana. There are others available too(I think Splunk, Graylog, Loggly) but I believe Kibana and Grafana are the best bet.
Hope this helps!
Actually no, I using elastic search as a Database in different deployments and I don't want to maintenance Kibana instance (i prefer to see all the data in tool like DBvear)

How to integrate AEM with ElasticSearch?

I have been through all the sites currently available to refer AEM & ElasticSearch, but could not find anything exact which is related to integration of these both.
Requirement : To create site search functionality for publish which will bring out all the results which are related to particular keyword. Currently we are using default AEM site search functionality, which very slow and thus we want to migrate it to ES. There are very less documents available on integration of these both, so we are troubling with it. Mainly we have to do this In Java.
That's because you are question is very vague. You have not specified what is it that you are trying to achieve. Do you want you the search results on the AEM publish side to be served by Elastic Search or do you want all your content(even in AEM author to be indexed?). There are multiple patterns hence it is not possible to provide a general answer. There are multiple ways you can integrate.
1) write custom replication agents in AEM to push content to ES.
2) create a workflow which can be triggered with launchers whenever node is added/modified. I would suggest you to refrain from this and consider option 1 instead as this will trigger too many workflow instances and will impact overall performance.
3) You can write crawlers to crawl your aem publish & index the content in ES.
4) you can write code which runs in ES(river in ES terminology) to fetch the content from AEM & index it.
Here is complete implementation of Apache Solr, Elasticsearch and Apache Lucene with AEM 6.5 - https://github.com/tadijam64/search-engines-comparison
There is detailed explanation of how every search engine works, and how it is integrated with AEM - step by step explained in six write-ups here
Its an old repo but may help you with the integration..
https://github.com/viveksachdeva/elasticsearch-cq
I know, this is an old question but I had the same problem and came up with a new implementation you can find on github:
https://github.com/deveth0/elasticsearch-aem
The usage is quite easy, you have to include several bundles and then configure, which Elasticsearch Instance to use.
Upon Page-Activation AEM triggers a Replication Agent that pushes the data to Elasticsearch.
For more detailed information, have a look at my blog

Elasticsearch Sync with Hibernate Transaction

I have an application running on spring2.5, hibernate 3.1 and compass search engine.The compass search engine is synchronized with all db operations. So that I can get the data from compass cache quickly. Now I would like to replace compass with Elastic search engine .I'm newer to elastic search and I think the author of compass was developed the elastic search. So that the synchronization mechanism should implemented in elastic search also. Anyone please suggest a way how to do this.
I don't know compass but Elasticsearch is search server on top of Lucene.
And the only thing it does is update the search index based on a database query.
This can be triggered manually or based on a timestamp filed in the database.
https://github.com/jprante/elasticsearch-jdbc
Hibernate Search provides a similar integration as Compass Search used to do.
Older versions of Hibernate Search would only provide Lucene embedded, but the latest is giving the option of using Elasticsearch instead.
This is in pretty good shape already, but highly contributed too so it's a great time to try it out and let us know what you feel is missing.
The catch is that it requires to use Hibernate ORM at version 5.0.0 at least: please upgrade Hibernate (you will benefit from that in many other ways too, not least much higher performance).

export data from elasticsearch to neo4j

I'm looking for the best method to export data from elasticsearch.
is there something better than running a query with from/size, until all the data exported?
specifically, i want to copy parts of it to Neo4j, if there is any plugin for that..
I'm not aware of any kind of the plugin, which can do what you want.
But you can write one. I recommend you to use Jest, because default ElasticSearch Java client using different Lucene version than Neo4j and those versions are incompatible.
Second option is export data from ElasticSearch to CSV and then use Load CSV in Neo4j. This approach is good enough if import data is one time operation.

crawler + elasticsearch integration

I wasn't able to find out, how to crawl website and index data to elasticsearch. I managed to do that in the combination nutch+solr and as nutch should be able from the version 1.8 export data directly to elasticsearch (source), I tried to use nutch again. Nevertheless I didn't succeed. After trying to invoke
$ bin/nutch elasticindex
I get:
Error: Could not find or load main class elasticindex
I don't insist on using nutch. I just would need the simpliest way to crawl websites and index them to elasticsearch. The problem is, that I wasn't able to find any step-by-step tutorial and I'm quite new to these technologies.
So the question is - what would be the simpliest solution to integrate crawler to elasticsearch and if possible, I would be grateful for any step-by-step solution.
Did you have a look at the River Web plugin? https://github.com/codelibs/elasticsearch-river-web
It provides a good How To section, including creating the required indexes, scheduling (based on Quartz), authentication (basic and NTLM are supported), meta data extraction, ...
Might be worth having a look at the elasticsearch river plugins overview as well: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-plugins.html#river
Since the River plugins have been deprecated, it may be worth having a look at ManifoldCF or Norconex Collectors.
You can evaluate indexing Common Crawl metadata into Elasticsearch using Hadoop:
When working with big volumes of data, Hadoop provides all the power to parallelize the data ingestion.
Here is an example that uses Cascading to index directly into Elasticsearch:
http://blogs.aws.amazon.com/bigdata/post/TxC0CXZ3RPPK7O/Indexing-Common-Crawl-Metadata-on-Amazon-EMR-Using-Cascading-and-Elasticsearch
The process involves the use of a Hadoop cluster (EMR on this example) running the Cascading application that indexes the JSON metadata directly into Elasticsearch.
Cascading source code is also available to understand how to handle the data ingestion in Elasticsearch.

Resources