solr clobtransfomer - oracle

I am stuck with ClobTransformer in solr from the past 3 days. I want to convert an oracle clob field to text field in solr. I am using multiple cores and I started my config and schema files from scratch.
This is my config file:
<lib dir="../../../dist/" regex="apache-solr-dataimporthandler-.*\.jar" />
<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">
<lst name="defaults">
<str name="config">data-config.xml</str>
</lst>
</requestHandler>
These are the columns in my schema file for a core:
<field name="id" type="string" indexed="true" stored="true" multiValued="false" required="true"/>
<field name="mandp" type="text_en_splitting" indexed="true" stored="true" multiValued="false" />
This is my data-config.xml for the core:
<dataConfig>
<dataSource type="JdbcDataSource"
driver="oracle.jdbc.driver.OracleDriver"
url="jdbc:oracle:thin:#***"
user="***"
password="****"/>
<document>
<entity name="wiki" transformer="ClobTransformer"
query="Select t.id as id, t.mandp From table1 t">
<field column="mandp" name="mandp" clob="true" />
</entity>
</document>
</dataConfig>
When I start solr, I can see that dataimporthandler*.jar files have loaded successfully in the console. When I run my dataimport from http://localhost:8983/solr/wiki/dataimport?command=full-import&clean=false, I don't see any errors in the console neither do I see anything related to transformer or clob. So, If I type anything in my transformer parameter (transformer="bla bla bla"), it doesn't throw any errors in the console, that could mean my transformer argument is completely ignored or the full logging is turned off.
When I query solr, I see oracle.sql.CLOB#375c929a in the mandp field. Nothing happens of course if I use HTMLStripTransformer class too. I want to use both on this field.
Any ideas are appreciated!!!

It looks like the ClobTransformer is not fired. I would personally change the mandp column name inside the query like this:
Select t.id as id, t.mandp as mandp From table1 t

please add transformer="ClobTransformer, RegexTransformer" to the entity in your data-config.xml file

Related

Solr not sorting copyField correctly

I'm trying to get Solr to sort by Title, but I'm having no luck.
In my Schema I have the "title" field as text_general for searching, and then a "title_sort" field as a string for sorting. I've created a copyField that should be taking the "title" text_general field and putting it into the "title_sort" field as a string.
<fields>
<field name="title" type="text_general" indexed="true" stored="true"/>
<field name="title_sort" type="string" indexed="true" stored="false" />
</fields>
<copyField source="title" dest="title_sort" />
When I run the sort query "title_sort desc" this is what I get back
title: Don’t Mind If I Do
title: Men Don't Run Marathons
title: Danny
Can a copyField not convert a text_general field into a string?
Solved with the help of #MatsLindh.
I'd been reloading the schema in the CoreAdmin panel, thinking that would make the changes. I had to reindex after adding the copyField instruction, as Solr will not go through all the documents and update secondary fields.

Solr indexing Model class

I am new to solr.I am trying to add data to solr .I have class Education with properties and getter/setter methods. I am trying to add field to solr List. What should be field type in schema.xml ?
<field name="education" type="string" indexed="true" stored="true" multiValued="true" />

Can Solr join tables in-memory?

There is a table of n products, and a table of features of these products. Each product has many features. Given a Solr DataImportHandler configuration:
<document name="products">
<entity name="item" query="select id, name from item">
<field column="ID" name="id" />
<field column="NAME" name="name" />
<entity name="feature"
query="select feature_name, description from feature where item_id='${item.ID}'">
<field name="feature_name" column="description" />
<field name="description" column="description" />
</entity>
</entity>
</document>
Solr will run n + 1 queries to fetch this data. 1 for the main query, n for the queries to fetch the features. This is inefficient for large numbers of items. Is it possible to configure Solr such that it will run these queries separately and join them in-memory instead? All rows from both tables will be fetched.
This can be done using CachedSqlEntityProcessor:
<document name="products">
<entity name="item" query="select id, name from item">
<field column="ID" name="id" />
<field column="NAME" name="name" />
<entity name="feature"
query="select item_id, feature_name, description from feature"
cacheKey="item_id"
cacheLookup="item.ID"
processor="CachedSqlEntityProcessor">
<field name="feature_name" column="description" />
<field name="description" column="description" />
</entity>
</entity>
</document>
Since Solr's index is 'flat', feature_name and description are not connected in any way; each product will have multi-valued fields for each of these.
I am not sure if Solr can do this, but the database can. Assuming that you are using MySQL, use JOIN and GROUP_CONCAT to convert this into a single query. The query should look something like this:
SELECT id, name, GROUP_CONCAT(description) AS desc FROM item INNER JOIN feature ON (feature.item_id = item.id) GROUP BY id
Don't forget to use the RegexTransformer on desc to separate out the multiple values.

Accessing ancestor values in xpath with Solr DataImportHandler

If my xml is structured like so:
<fruit>
<apple appleId="apple_1">
<core coreId="core_1">
<seed>1</seed>
<seed>2</seed>
</core>
</apple>
<apple appleId="apple_2">
<core coreId="core_1">
<seed>1</seed>
</core>
</apple>
</fruit>
and I want the seeds to be the documents in my solr schema, how can I access the appleId and coreId?
Here's the pertinent entity definition from my data-config.xml:
<entity name="apples"
processor="XPathEntityProcessor"
stream="true"
forEach="/fruit/apple/core/seed"
url="fruit.xml"
transformer="script:create_id"
>
<field column="seed_s" xpath="/fruit/apple/core/seed" />
<field column="apple_id_s" xpath="/fruit/apple/#appleId" />
</entity>
script:create_id creates a unique id for each seed.
In this example, apple_id_s is coming back as null.
I found the problem. I need to use commonField="true" and make sure to loop through each apple and core. Also, I need to set the pk="seed_s" which triggers solr to store the document.
Here's my new entity definition:
<entity name="apples"
processor="XPathEntityProcessor"
stream="true"
pk="seed_s"
forEach="/fruit/apple/core/seed | /fruit/apple | /fruit/apple/core"
url="fruit.xml"
transformer="script:create_id"
>
<field column="seed_s" xpath="/fruit/apple/core/seed" />
<field column="apple_id_s" xpath="/fruit/apple/#appleId" commonField="true"/>
<field column="core_id_s" xpath="/fruit/apple/core/#coreId" commonField="true"/>

SolrNet faceting asp.net mvc 3

I'm trying to implement faceting on product catelogue app with Solr, SolrNet and its built with asp.net MCV 3. So far I managed to list all the product results but not the faceting. I could print the facets as shown below.
<ul>
#foreach (var facet in Model.Products.FacetFields["brand"])
{
<li>#facet.Key (#facet.Value)</li>
}
</ul>
I have two issues with above code,
1) If the Search results doesn't contain facets for brand its throwing this error
The given key was not present in the dictionary.
System.Collections.Generic.KeyNotFoundException:
The given key was not present in the
dictionary. at
System.Collections.Generic.Dictionary
2) I need to show facets keys and values as links. So on click of that facet I should be able to list the products of the facet.
Here is the schema.xml, please help me if you know the answers for the above questions.
<field name="product_id" type="string" indexed="true" stored="true" required="true" />
<field name="name" type="string" indexed="true" stored="true"/>
<field name="merchant" type="string" indexed="true" stored="true"/>
<field name="merchant_id" type="string" indexed="true" stored="true"/>
<field name="brand" type="string" indexed="true" stored="true"/>
<field name="brand_id" type="string" indexed="true" stored="true"/>
<field name="categories" type="string" multiValued="true" indexed="true" stored="true"/>
1) If the Search results doesn't contain facets for brand its throwing this error The given key was not present in the dictionary.
If you're not doing a facet field query on that field, just don't ask for it in the results.
2) I need to show facets keys and values as links. So on click of that facet I should be able to list the products of the facet.
Basically you need to convert the clicked facet value into a filter query.
There are many ways to implement this depending on your specific application needs. See the SolrNet sample app for one way to do it, use its source code as guidance.

Resources