How to add custom table in liferay 7 elastic search - elasticsearch

I have added a custom table with data in liferay. Now i want to search those data in elastic search.What will be the approach to get the data.
Suppose I add a custom table add entity in service.xml then i want to search these data from elastic search url sample http://localhost:9200.
<entity local-service="true" name="Student" remote-service="true"
uuid="true">
<column name="studentId" primary="true" type="long" />
<column name="name" type="String" />
<column name="sollNumber" type="int" />
<column name="entryDate" type="Date" />
<order by="asc">
<order-column name="name" />
</order>
<finder name="Name" return-type="Collection">
<finder-column name="name" />
</finder>
</entity>
I have added data from mysql script and get those data through elatic search url

You could make your entity an Asset. Liferay uses Indexers for maintaining the external index data (and you'd have to write one). Alternatively (if you don't care about your data being found in Liferay, and shown in Liferay's search results, you could also index to Elasticsearch manually whenever your data changes (e.g. on every add..., update... or delete... method call)

Related

How to unmerge records which are merged from D365 OOB feature?

How to unmerge records which are merged from D365 OOB feature? Is there any custom code solution ?
Unfortunately, un-merging cannot be achieved in any OOB option and AFAIK, no community solutions are available.
As this is totally a rare requirement, we can build some console app for this.
We know that the master record will get/copy all the subordinate record field values after merging (based on choice in Merge screen), the subordinate record will get deactivated (still holding its original field values only). All the related records will be remapped to master record. So we have to reverse engineer all this for un-merging.
From my research, looks like the deactivated subordinate record will hold the master record as a link in masterid field. We have to pull all these records using the query below & iterate through each record, and then achieve what we want.
<fetch>
<entity name="contact" >
<attribute name="fullname" />
<attribute name="merged" />
<attribute name="masterid" />
<filter>
<condition attribute="merged" operator="eq" value="1" />
</filter>
</entity>
</fetch>

Display fields as defined in data config file in result query in SOLR

I am still new to SOLR and I've managed to install and index 1000 documents from the database. When I submit a query, the results are returned correctly but the order of the fields are not displayed as how it is defined in the data config file.
Example of data config file:
<field column="id" name="event_id" />
<field column="event_desc_current" name="event_desc" />
<field column="event_cost" name="event_cost" />
<field column="event_sponsors" name="event_sponsors" />
...
Example of results returned:
<result name="response" numFound="7" start="0">
<doc>
<str name="event_desc">Church Fund Raising</str>
<arr name="event_sponsors">
<str/>
</arr>
<str name="event_id">2</str>
<int name="event_cost">428</int>
...
<long name="_version_">1472652516366745600</long></doc>
How can I output the order of the fields as defined in the data config file like this:
event_id
event_desc
event_cost
event_sponsors
...
Typically, the order of the fields should not matter, as you would de-serialize it in a client and the logic of displaying the search results is with the client.
However, if you do want to dictate the order of fields, you could use the fl parameter in your Solr query to get results in the order you prefer.
You could also choose which fields to include in the search field.
Personally, I would recommend that you need not worry about order of fields, and have a client that can consume it in any order. Reason being, if you add a new field to your schema, in the middle, you could potentially breaking the client's logic!

Index the Raw HTML content using solr/lucene

I have some htmls that I have scraped off the web during different period of time from the same site. and the raw data looks like this
timestamp, htmlcontent(500KB)
..
I have written a parser to parse out a few interesting fields from the HTML and I trying to build a search engine based on the fields that I parsed out. NOT JUST BASED ON THE RAW TEXT OF THE HTML BUT THE RAW COMPLETE HTML CONTENT>
now my data looks like:
timestamp, htmlcontent, parsedfield1, parsedfield2
I want the user search for timestamp, parsedfield1 or parsedfield2 and my search engine returns the raw HTML matching the user's query and populating the browser... so it feels like a search engine time machine :)
In this case, I am wondering how should I design the index? which fields should I store and which not. I am following the book "Lucene in Action" and wondering can anyone help me how to approach this problem..
Based on my understanding of Index, there are a few attributes in the schema.xml... index or not? store or not?.... I assume, "Whatever you want to include in the query result, it should be stored. " .. In that case, I have to store the column which contains the raw HTML...
Since that column is so big one record is usually about hundreds of KB... with only hundreds of rows.. you can easily get a dataset of almost 1GB... which won't work in solr and I am trying to index those columns using Lucene and it run into the heapsize problem..
Here is another idea:
Maybe I should store the parsedfield1, parsedfield2 and pointer... where point column is the absolute path of the raw HTML file. Of course, in this case, I need to store each html into a separate file locally/or on HDFS... So when user search for parsedfield1, it will return the absolute path and I go and retrieve those files...
I think I am describing the problem as clearly as I can and wondering can anyone spend a minute giving me some directional guidance...
Much appreciated!
Some Guidelines
1. You need your data in XML or CSV or JSON format i will give you example of xml
eg.--> your data in xml format
<add>
<doc>
<field name="id">01</field>
<field name="timestamp">somevalue</field>
<field name="parsedfield1">your data 1</field>
<field name="parsedfield2">Java data </field>
<field name="htmlcontent">link to that html file</field>
</doc>
</add>
2. You need to modify schema.xml
-- each document should have one unique id
-- as per your need you need to store only path for htmlcontent
-- other fields index only for searching
<field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false"/>
<field name="timestamp" type="text_general" indexed="true" stored="false" />
<field name="parsedfield1" type="text_general" indexed="true" stored="false"/>
<field name="parsedfield2" type="text_general" indexed="true" stored="false" />
<field name="parsedfield2" type="text_general" indexed="true" stored="false" />
<field name="htmlcontent" type="text_general" indexed="true" stored="true" />
3. you can use post.jar to post all XML files to solr or you can use SOLRJ APIs if you need programmatically
**Fields to be stored or not **
Fields on which you want to perform just search no need store unless you want to display them in result

Searchandising with Enterprise Magento and Solr

I've very a lot of searching on Google and can not find the answer to this question.
We have Enterprise Magento and are using SOLR. We would like to select a group of products and put them at the top of the search results when they are returned by SOLR. We would use this for clearance products, etc.
My idea was to put the items in to a special non-display category on the backend and to somehow configure SOLR to put a greater weighting on these products for the search results. But I can not see how to do this. I can only see how to weight product attributes.
Anyone have any suggestiogs?
Giving fields different weights is very simple in Solr -- simply add the following to your default requestHandler in solrconfig.xml
<requestHandler name="/select" class="solr.SearchHandler">
<lst name="defaults">
...
<qf>Weighted_Category^3 Lesser_Category^0.5</qf>
</lst>
</requestHandler>
This weighs "Weighted Category" by a factor of 3, should there be a match.
Magento-solr has a built in function to do exactly this.
magento/lib/apache/solr/config/elevate.xml:
<elevate>
<query text="foo bar">
<doc id="1" />
<doc id="2" />
<doc id="3" />
</query>
<query text="ipod">
<doc id="MA147LL/A" /> <!-- put the actual ipod at the top -->
<doc id="IW-02" exclude="true" /> <!-- exclude this cable -->
</query>
</elevate>
QueryElevationComponent

SolR - Search for room availability and sort by result

I'm trying to implement a kind of hotel/hostel search using SolR and PHP. For any room available I store a new document inside my index containing relevant information about the accomodation and multivalued attributes containing an availableFrom and availableTill date. Running a query against SolR to get all rooms within a certain timespan shouldn't be that hard, but my brain screws up when it comes to sorting...
My goal is to show not only the available accomodations, but all of them matching a general filter query on the destination (country/city/district) and sort these results so that all available rooms are sorted to the start of the list.
So for a search for rooms in Munich from 1st December '12 till 5th December, I would like to get results like these:
Room A (available)
Room B (available)
Room C (not completly available in the given period => nice to have)
Room D (not available at all)
Currently I'm running SolR 3.6 but could switch to the new 4.0 if necessary.
Has any Solr-Guru out there some suggestions for me?
Any help appreciated :-)
-edit-
I think Samuele pushed me in the right direction. So the question is now, how to create a function query to be able to sort by availability. Maybe there is a better way to store my document, i.e. change my schema.xml?
Here is a litte excerpt from it:
<field name="recordId" type="string" indexed="true" stored="true" />
<field name="language" type="int" indexed="true" stored="true" />
<field name="name" type="string" indexed="true" stored="false" />
<field name="maxPersons" type="int" indexed="true" stored="false" />
<field name="avgPrice" type="tdouble" indexed="true" stored="false" />
<field name="city" type="freetext" indexed="true" stored="false" />
<field name="district" type="freetext" indexed="true" stored="false" />
<field name="country" type="freetext" indexed="true" stored="false" />
<field name="availableFrom" type="date" indexed="true" stored="true" multiValued="true" />
<field name="availableTill" type="date" indexed="true" stored="true" multiValued="true" />
Cheers - Sven
well, you have to boost your query based on the field "rooms" (or availability, depends on you) and give different scores based on the value
quick example:
let's give an available room a boost of 20, a partial available a boost of 10 and not available a boost of 1 (just to be sure)
your query (url-wise, i don't know the php interface to solr) would need something like
<query>&bq=rooms:avail^20.0&bq=rooms:part-avail^10.0...
suggestions: if you're using dismax query handler, it's addictive. this means you'll have to add a bigger boost than that (2000 instead of 20 for example) since it adds the boosting value to the query score
also, you should check this link from the solr wiki, which is better than any explanation.
Well, I did some research and testing on the whole thing here... The currect and possibly best solution for my problem is to perform multiple queries against SolR. As suggested by Samuele I query SolR for all accomodations matching the given criteria and timespan in two steps.
1: Get all rooms matching and being available (this includes partially available rooms)
2: Get all unavailable rooms
The second query is obviously only performed when we need to show more results 'cos of the pagination.
After that all results from step 1 are postprocessed to figure out if they are available in the whole requested timespan.
A further "improvement" would be to introduce a new field in the schema: availableDay. For each bookable day there would be an entry for that date. This would split up the first query into two seperate ones. This is then only a matter of additional filters for SolR.
Thanks again for pointing me in the right direction!

Resources