Does elastic search have a query manager? - elasticsearch

Currently I am using sense when I want to query elasticsearch.
It is pretty complicated to build a query, manage them, and results are hard to display in a table.
(I usually copy to a json to csv tool which often helps a lot ).
Is there a tool/way to easily build and manage queries (with intelisense maybe) over elastic search?

For displaying data, you can use Elasticsearch Browser.
https://github.com/OlegKunitsyn/elasticsearch-browser/wiki
Nested data goes wrong, but it's nice for one-level data.

Related

Filter result in memory to search in elasticsearch from multiple indexes

I have 2 indexes and they both have one common field (basically relationship).
Now as elastic search is not giving filters from multiple indexes, should we store them in memory in variable and filter them in node.js (which basically means that my application itself is working as a database server now).
We previously were using MongoDB which is also a NoSQL DB but we were able to manage it through aggregate queries but seems the elastic search is not providing that.
So even if we use both databases combined, we have to store results of them somewhere to further filter data from them as we are giving users advanced search functionality where they are able to filter data from multiple collections.
So should we store results in memory to filter data further? We are currently giving advanced search in 100 million records to customers but that was not having the advanced text search that elastic search provides, now we are planning to provide elastic search text search to customers.
What do you suggest should we use the approach here to make MongoDB and elastic search together? We are using node.js to serve data.
Or which option to choose from
Denormalizing: Flatten your data
Application-side joins: Run multiple queries on normalized data
Nested objects: Store arrays of objects
Parent-child relationships: Store multiple documents through joins
https://blog.mimacom.com/parent-child-elasticsearch/
https://spoon-elastic.com/all-elastic-search-post/simple-elastic-usage/denormalize-index-elasticsearch/
Storing things client side in memory is not the solution.
First of all the simplest way to solve this problem is to simply make one combined index. Its very trivial to do this. Just insert all the documents from index 2 into index 1. Prefix all fields coming from index-2 by some prefix like "idx2". That way you won't overwrite any similar fields. You can use an ingestion pipeline to do this, or just do it client side. You only will ever do this once.
After that you can perform aggregations on the single index, since you have all the data in one-index.
If you are using somehting other than ES as your primary data-store you need to reconfigure the indexing operation to redirect everything that was earlier going into index-2 to go into index-1 as well(with the prefixed terms).
100 million records is trivial for something like ELasticsearch. Doing anykind of "joins" client side is NOT RECOMMENDED, as this will obviate the entire value of using ES.
If you need any further help on executing this, feel free to contact me. I have 11 years exp in ES. And I have seen people struggle with "joins" for 99% of the time. :)
The first thing to do when coming from MySQL/PostGres or even Mongodb is to restructure the indices to suit the needs of data-querying. Never try to work with multiple indices, ES is not built for that.
HTH.

What is elastic search

I'm just wanting to know what is exactly Elastic Search.
It is said it helps to search data but when I see some webinars it feels like I have to replicate my data in a kind of Elastic datastore... which not means very otpimized to me. In that way all modification done on left hand will have to be reported on right hand and data returned by Elastic Search may not be in the right format.
Can Elastic Search can directly search in my database?
It's to use with a Neo4J graph database. Does somebody already did something like that? Does that only replace the Cypher queries?
Thanks for advices, helping me on realize on what Elastic Search can really helps on our project.
Elasticsearch is a database, however it's not a relational database like you may be used to. It is a NoSQL database.
You insert JSON documents into an index. You query that index to find documents that match a particular criterion.
It is also sharded and node distributed, which gives it resilience and scalability, and also - if you set it up right - performance.
This means it's really good at 'search engine' style database queries, but because it's not relational, it cannot do the equivalent of a SQL JOIN operation very easily.
One example use case is logstash and kibana - known as the ELK stack - where system event logs (syslog, httpd logs, that kind of thing) are processed by logstash to parse metadata - like log source, referrer, URL, session ID, etc. - and then inserted into elasticsearch.
As each event is a self contained piece of information, this is what elasticsearch does particularly well.
You can then use Kibana as a visualisation engine to display your logs, but also perform analysis - most hit pages, geographic distribution of requests, incoming referrers, time based distribution of requests, etc.
But it also collates these logs, so if you run a really large, geographically distributed website with multiple webserver nodes - or maybe you just have a lot of servers in your computer room and want to summarise the system logs - you can feed the whole lot into elastic search.
It's design is such that it's good at handling near-real-time data insertion and analysis. It also works quite well for 'forum style' data models, as essentially all you're doing is querying a list of posts with a particular forum name, and finding replies to a particular parent node - but they're standalone 'documents'.
So yes, you probably could use it to search an existing database, but you'll have to think about your data model - you can't just translate a conventional relational model, you would have to flatten it. Denormalisation is something of a sin in RDBMS terms, but it's actually quite good for search engines, because you can execute queries in parallel more efficiently.
There exist some way to combine both approaches. Have a look at this blog post:
http://graphaware.com/neo4j/2015/09/30/recommendations-with-neo4j-and-graph-aided-search.html
Databases cannot be optimized for all use cases, but luckily there are many databases available so we can choose the best one for each task.
Elasticsearch is optimized for:
Filtering of documents (exact match)
Search ranking of documents (relevance of search terms)
Aggregation of results (sums, distinct counts, percentiles, ...)
Neo4j is optimized for:
Graph traversal (naturally)
High performance when operated on a "local" graph neighborhood (context)
Actually both databases use the same underlying library Lucene to "index" data to be searched later.
ES is an open source, distributed, RESTful, JSON-based search engine. It is easy to use, scalable and flexible. The indexing feature helps in fast retrieval of search queries.

Comparison of Handling Logs and PDFs in Solr & Elasticsearch and Data Visualization in Banana & Kibana

How do Elasticsearch and Solr compare in respect to the following:
Indexing logs.
Indexing events.
Indexing PDF documents.
Ease of creating and distributing visualizations. Kibana vs Banana.
Support and documentation for developers.
Any help is appreciated.
EDIT
More specifically, i am trying to figure out how exactly a PDF document or an event can be indexed at all. I have worked a little bit on Elasticsearch and since i am a fan of JSON, i found it quite useful when i tried to index structured data.
For example logs are mostly structured and thus i guess easier to index and search. Now what if i want to index the whole log file itself?
Follow up
Is Kibana the only visualization tool available for Elasticsearch?
Is Banana the only visualization tool available for Solr?
Here is an answer to try to address just the Elasticsearch aspect of the post.
Take a look at https://github.com/elastic/elasticsearch-mapper-attachments for handling PDFs
For events/logs, you would need to transform those into structured data to index in Elasticsearch. You can have a field in there for the source (the log file the data came from and other information like that) - you will have all the data in the whole log file indexed in that fashion. You can take advantage of ES aggregations to group results based on log file, calculate statistics, etc.
The ELK stack is definitely worth a look.
I don't know if Kibana is the only visualization tool but it is probably the most popular and likely to offer more than something else.

Is Elasticsearch suitable as a final storage solution?

I'm currently learning Elasticsearch, and I have noticed that a lot of operations for modifying indices require reindexing of all documents, such as adding a field to all documents, which from my understanding means retrieving the document, performing the desirable operation, deleting the original document from the index and reindex it. This seems to be somewhat dangerous and a backup of the original index seems to be preferable before performing this (obviously).
This made me wonder if Elasticsearch actually is suitable as a final storage solution at all, or if I should keep the raw documents that makes up an index separately stored to be able to recreate an index from scratch if necessary. Or is a regular backup of the index safe enough?
You are talking about two issues here:
Deleting old documents and re-indexing on schema change: You don't always have to delete old documents when you add new fields. There are various options to change the schema. Have a look at this blog which explains changing the schema without any downtime.
http://www.elasticsearch.org/blog/changing-mapping-with-zero-downtime/
Also, look at the Update API which gives you the ability to add/remove fields.
The update API allows to update a document based on a script provided. The operation gets the document (collocated with the shard) from the index, runs the script (with optional script language and parameters), and index back the result (also allows to delete, or ignore the operation). It uses versioning to make sure no updates have happened during the "get" and "reindex".
Note, this operation still means full reindex of the document, it just removes some network roundtrips and reduces chances of version conflicts between the get and the index. The _source field need to be enabled for this feature to work.
Using Elasticsearch as a final storage solution at all : It depends on how you intend to use Elastic Search as storage. Do you need RDBMS , key Value store, column based datastore or a document store like MongoDb? Elastic Search is definitely well suited when you need a distributed document store (json, html, xml etc) with Lucene based advanced search capabilities. Have a look at the various use cases for ES especially the usage at The Guardian:http://www.elasticsearch.org/case-study/guardian/
I'm pretty sure, that search engines shouldn't be viewed as a storage solution, because of the nature of these applications. I've never heard about this kind of a practice to backup index of search engine.
Usual schema when you using ElasticSearch or Solr or whatever search engine you have:
You have some kind of a datasource (it could be database, legacy mainframe, excel papers, some REST service with data or whatever)
You have search engine that should index this datasource to add to your system capability for search. When datasource is changed - you could reindex it, or index only changed part with the help of incremental indexation.
If something happen to search engine index - you could easily reindex all your data.

How to search for multiple strings in very large database

I want to search for multiple strings in a very large database. These strings are part of different attributes of database table. I have tried string search using LIKE in sql query. But it is taking a lot of time to get results. I have used Oracle database.
Should I use indexing of database? I found that Lucene can be used for it.
I also got some suggestions of using big data concepts. Which approach should I use?
The easiest way is:
1.) adding an index to the columns you like to search trough
2.) using oracle text as #lalitKumarB wrote
The most powerful way is:
3.) use an separate search engine (solr, elaticsearch).
But, probably you have to change you application in order to explicit use the search index for searching trough the data,...
I had the same situation some years before. Trying to search text in an big database. After a wile I found out, that database based search will never reach the performance of an dedicate search engine. And: you will have much more search features working out of the box, if you use solr (for example), like spelling correction, "More like this", ...
One option is to hold the data on orcale, searching in solr and return the ID of the document in order to only load the one row form oracle, the is referenced by the ID.
2nd option is to keep oracle as base datapool for your search engine and search in solr (or elasticsearch) in order to return the whole document/row from solr, not only the ID. So you don't need to load the data from the database any more.
The best option depends on your needs.
You have the choice between elasticsearch, solr or lucene

Resources