What's the new Watson Discovery service? - watson-discovery

I just went to Bluemix and saw that there is a new experimental service called Discovery. Apparently, it can ingest PDFs, Word Documents, and HTML pages among other file types.
What's the difference between that service and Document Conversion(DC)? Before, I used to convert my documents using DC and then index them in Retrieve and Rank? Is Discovery the merge of Retrieve and Rank and Document Conversion?

The IBM Watson™ Discovery Service uses data analysis combined with cognitive intuition to take your unstructured data and enrich it so you can query it for the information you need. The service enables you to ingest and index content so that you can subsequently use that information to answer queries.
The service is experimental now but the idea is that you will be able to do something similar to what you currently do with Document Conversion and Retrieve and Rank. One of the main benefits is that ingestion and indexing are now managed by the service.
For detailed information, see the documentation.
Note: I work for IBM Watson

Related

Near Api to fetch all the transactions of an address till date or at any particular blockheight

I found a api "view_account" method using which i can fetch the balance of an address at a particular blockheight. Is there any Api or a way where i pass an address and it provides all the supporting transactions for the balance.
NEAR Protocol nodes do not store the data in that way, and it would be unrealistic to do so for all the use-cases out there, so the solution is to index the network block by block. There is an official Indexer Framework based on the official NEAR node implementation (nearcore), or you can use JSON RPC and build it yourself. There is an official Indexer for Explorer, which stores the data in PostgreSQL database (see the schema in the README), and we have a public shared read-only replica available for everyone out there to play with.
Flux implemented an indexer for their needs (also based on Indexer Framework): flux-capacitor

Should I use Java or Logstash to index db content in Elastic Search?

I'm building a simple web application that will list/search retail items for sale.
design is like this ...
MySQL database -> Elastic Search deployment -> Spring Boot REST service -> Web UI (JSP/Bootstrap or Angular)
I am planning to write Java client code to read the database and post records to Elastic Search for indexing.
Googling, it looks like Logstash is used for this sort of thing. I'm not familiar with Logstash, I am very familiar with Java.
QUESTION: Is Java client considered a "deprecated" or "legacy" way to submit data to Elastic Search for indexing?
I'm very familiar with Java, should I use Java or Logstash?
Adding to #chris answer, logstash will add complexity and another Infrastructure to maintain in your stack, and logstash is known for getting stuck and is not as resilient as Elasticsearch is.
You are already using Java for your application code and btw elasticsearch now officially has a java client known as java high-level rest client(JHLRC) , which is very popular and provides an exhaustive list of APIs for indexing/searching and building a modern search system.
IMHO you should use the JHLRC,
which will save you to the pain points of logstash
you don't have to learn another tool
simple infrastructure
simple deployment
last but not least simple and easy to maintain codebase.
Logstash is good tool to be used to migrate the data from many sources to elastic search. It's build in java language only.
You can use Logstash. It also has options to mutate the data or filter the data. Its a ready to use to tool which will save lot of your development time and efforts.
But if you have a requirement for lot of customisation and need lot of control over your data before pushing it to elastic search then you can build your own application for the same.
Coming back to your question..java is not deprecated for indexing data to elastic search. It is still a preferred option.

How to build parent child relationship search in Elastic-search using Liferay API's?

We have a custom entity in Liferay called 'Publication'. It is indexed in Elastic-search and contains a field named 'journalArticleId'.
Based on our search requirements if some user searches for any keyword in the journal article we have to return the publication document which contains the 'journalArticleId' of the respective journal.
I found the solution for implementing this using Java API but I'm looking for the Liferay API to solve this.
Elastic Search Parent-Child Data Search Java API
Thanks in advance for any response.
for the extension of existing indexers you should try to implement an indexer post processor hook ... instead of actually overriding them with an ext plugin
link for 6.2
https://dev.liferay.com/de/develop/tutorials/-/knowledge_base/6-2/extending-the-indexer-post-processor-using-a-hook
link for 7 aka dxp
https://dev.liferay.com/de/develop/reference/-/knowledge_base/7-0/indexer-post-processor
You should be able to find documentation for overriding an indexer. It sounds like you could just extend the existing Journal Indexer: Just add the additional Publication data to the full text index for the existing Journal article and it will be found automatically.
Edit (after your comment): Without looking it up, I assume that Liferay's encapsulation of the API does not really cater for parent-child relationships (but: I might be wrong, it might be in or easy). However, Liferay also allows you to exchange Elasticsearch with SOLR (and potentially others) so its API naturally doesn't use all of the features of the underlying search engines. However, you should always be able to make the extra calls yourself - probably not in the indexer but closer to the ES adapter.
The solution might be: Prepare the content in the Indexer and separate it into parent and child later, in the Elasticsearch adapter.
The elastic search provides features for parent-child mapping and the solution for the above situation can be implemented using Java API.
Elastic Search Parent-Child Data Search Java API
We have contacted the Liferay support team and they responded that the Liferay-elasticsearch adapter doesn't support this feature yet.
version : liferay-dxp-digital-enterprise-7.0-sp3

Big data implementation on cloud

Could someone please let me know what does it mean by 'Big Data implementation over Cloud'
I have been using Amazon S3 to store data and query using hive, which I read is one of the cloud implementation. I would like to know what exactly does this mean and all possible ways to implement it.
Thanks,
Sree
Following are choices in the levels of services that a Cloud provider can offer for a Big Data analytics solution:
Data platform infrastructure service, such as Hadoop as a Service, that provides pre-installed and managed infrastructures. With this level of service, you are responsible for loading, governing, and managing the data and analytics for the analytics solution.
Data management service, such as a Data Lake Service, that provides data management, catalog services, analytics development, security, and information governance services on top of one or more data platforms. With this level of service, you are responsible for defining the policies for how data is managed and for connecting data sources to the cloud solution. The data owners have direct control of how their data is loaded, secured, and used. Consumers of data are able to use the catalog to locate the data they want, request access, and make use of the data through self-service interfaces.
Insight and Data Service, such as a Customer Analytics Service, that gives you the responsibility for connecting data sources to the cloud solution. The cloud solution then provides APIs to access combinations of your data and additional data sources, both proprietary to the solution and public open data, along with analytical insight generated from this data.
For more information regarding this, read the detailed article published by IBM here: http://www.ibm.com/developerworks/cloud/library/cl-ibm-leads-building-big-data-analytics-solutions-cloud-trs/index.html
Also take a look at the services provided by Qubole, which greatly simplifies, speeds and scales big data analytics workloads against data stored on AWS, Google, or Azure clouds - https://www.qubole.com/features.
Storing and processing big volumes of data
requires scalability plus availability.
Cloud computing delivers all these through hardware
virtualization. For the same reason, it is only logical that big data and cloud computing are
two compatible concepts as cloud enables big data to
be available, scalable and fault tolerant.
Not only that, the implementation does not stop there - many companies are now offering Big Data as A Service (BDaaS), such as Stratoscale, Cloudera and of course Azure and others.

Elasticsearch: security concerns

We are using elasticsearch as back-end for our in-house logging and monitoring system. We have multiple sites pouring in data to one ES cluster but in different index. e.g. abc-us has data from US site, abc-india has it from India site.
Now concerns are we need some security checks before pushing in data to cluster.
data coming to index is coming from right IP address
incoming json request is of inserting new data and not delete/update
while reading we want certain IP should not be able to read data of other index.
Kindly let me know if its possible to achieve using elasticsearch.
The elasticsearch-jetty plugin brings full power of Jetty and adds several new features to elasticsearch. With this plugin elasticsearch can now handle SSL connections, support basic authentication, and log all or some incoming requests in plain text or json formats.
The idea is to add a Jetty wrapper to ElasticSearch, as a plugin.
What remains is only to restrict certain URL and some methods (eg DELETE) to some users.
You can find elasticsearch-jetty on github with detailed specification about it's usage, configuration and limitations of course.

Resources