Solr DataImportHandler Cache Support for Multiple Values - caching

I'm trying to use cache for some entities in my data import handler configuration. Somehow if I use cache, I only get the first value of my multivalued field. My configuration looks like this:
<entity name="product" query="SELECT product_id FROM Product WHERE 1">
<entity name="strength" query="SELECT *
FROM Strength WHERE product_id = '${product.product_id}'">
<entity name="form" query="SELECT CONCAT(parent_route,'|',form_name) AS form_name, LOWER(CONCAT_WS('\n',form_name,parent_route)) AS form_name_s,
CAST(form_id AS CHAR(10)) AS form_id_string FROM Form WHERE form_id = '${strength.form_id}'"
transformer="RegexTransformer"
cacheImpl="SortedMapBackedCache" cacheLookup="strength.form_id" cacheKey="form_id_string">
<field column="form_name" name="form_name" />
<field column="form_name_s" splitBy="\n" />
</entity>
</entity>
</entity>
There should be two rows returned for the entity "form" but only the first one is visible if cache is enabled. Does Solr not have the ability to cache multiple rows or am I doing something wrong? My Solr version is 4.1.

Problem is fixed when the where part of the cached query is removed. I'm not sure the following configuration is ideal but what I understand is the aim is reducing the count of queries.
<entity name="product" query="SELECT product_id FROM Product WHERE 1">
<entity name="strength" query="SELECT *
FROM Strength WHERE product_id = '${product.product_id}'">
<entity name="form" query="SELECT CONCAT(parent_route,'|',form_name) AS form_name, LOWER(CONCAT_WS('\n',form_name,parent_route)) AS form_name_s,
CAST(form_id AS CHAR(10)) AS form_id_string FROM Form"
transformer="RegexTransformer"
cacheImpl="SortedMapBackedCache" cacheLookup="strength.form_id" cacheKey="form_id_string">
<field column="form_name" name="form_name" />
<field column="form_name_s" splitBy="\n" />
</entity>
</entity>
</entity>

Related

Fetch XML returns 0 records where there is no related field

I'm using FetchXML to retrieve some field values for a few given IDs. The problem is that if I request a related field, and that field does not have a value, no records are returned.
For example, in using the following FetchXML, the Accounts for the given IDs do exist but since they DO NOT have a ParentAccount no values are returned.
<fetch mapping="logical">
<entity name="account">
<attribute name="name" />
<attribute name="ownerid" />
<link-entity name="account" to="parentaccountid" alias="parentaccountid">
<attribute name="name" />
</link-entity>
<filter>
<condition attribute="accountid" operator="in">
<value>9c8539fd-f7b1-e811-a973-000d3af4a510</value>
<value>be76ea1b-f8b1-e811-a973-000d3af4a510</value>
<value>1e76ea1b-f8b1-e811-a973-000d3af4a510</value>
<value>50843103-f8b1-e811-a973-000d3af4a510</value>
<value>b983ea1b-f8b1-e811-a973-000d3af4a510</value>
</condition>
</filter>
</entity>
</fetch>
Is there something I need to add to the link-entity to indicate that if it is null to still return the rest of the values?
You can mention join to be an outer if that field will be null for some records.
<link-entity name="account" to="parentaccountid" alias="parentaccountid" link-type="outer">
Read more

fetchXml from CRM - is it possible to fetch just the "last" record for a linked entity?

I tried using top="1" and an order descending in the linked-entity element, and I still get back multiple records joined.
<fetch version="1.0" mapping="logical" >
<entity name="x" >
<attribute name="xid" />
<link-entity alias="d" top="1" name="t" from="xid" to="xid" link-type="outer">
...
<order attribute="xdate" descending="true" />
</link-entity>
</entity>
</fetch>
You can't limit the number of link entities returned. What you could do instead is make link-entity d the entity and entity x the link-entity so you can use top as it is intended.

Cassandra solr integration with dataimporthandler using CQL Driver . Getting Frame size larger than max length (16384000)!

I am trying to use dataimporthandler to integrate cassandra and solr using org.apache.cassandra.cql.jdbc.CassandraDriver .
I am able to fetch 20000 rows but it tried to fetch all rows its showing "Caused by: org.apache.thrift.transport.TTransportException: Frame size (16402604) larger than max length (16384000)!"
My data-config file :
<dataConfig>
<dataSource autoCommit="true" driver="org.apache.cassandra.cql.jdbc.CassandraDriver" url="jdbc:cassandra://127.0.0.1/test_new" />
<document name="products">
<entity name="testproducts" query="select * from products LIMIT 20015">
<field name="id" column="product_id"/>
<field name="productId" column="product_id"/>
<field name="productPrice" column="sale_price" />
<field name="productSource" column="source"/>
<field name="productMrpPrice" column="mrp_price"/>
<entity name="productrating" query="select * from product_reviews where product_id='${testproducts.product_id}'">
<field name="productRating" column="rating" />
<field name="productReview" column="review" />
<field name="customerId" column="customer_id" />
<field name="customerName" column="customer_name" />
</entity>
</entity>
</document>
</dataConfig>
How to maximze framesize in cql jdbc driver class?
How to import all rows using cql jdbc driver ?

Can Solr join tables in-memory?

There is a table of n products, and a table of features of these products. Each product has many features. Given a Solr DataImportHandler configuration:
<document name="products">
<entity name="item" query="select id, name from item">
<field column="ID" name="id" />
<field column="NAME" name="name" />
<entity name="feature"
query="select feature_name, description from feature where item_id='${item.ID}'">
<field name="feature_name" column="description" />
<field name="description" column="description" />
</entity>
</entity>
</document>
Solr will run n + 1 queries to fetch this data. 1 for the main query, n for the queries to fetch the features. This is inefficient for large numbers of items. Is it possible to configure Solr such that it will run these queries separately and join them in-memory instead? All rows from both tables will be fetched.
This can be done using CachedSqlEntityProcessor:
<document name="products">
<entity name="item" query="select id, name from item">
<field column="ID" name="id" />
<field column="NAME" name="name" />
<entity name="feature"
query="select item_id, feature_name, description from feature"
cacheKey="item_id"
cacheLookup="item.ID"
processor="CachedSqlEntityProcessor">
<field name="feature_name" column="description" />
<field name="description" column="description" />
</entity>
</entity>
</document>
Since Solr's index is 'flat', feature_name and description are not connected in any way; each product will have multi-valued fields for each of these.
I am not sure if Solr can do this, but the database can. Assuming that you are using MySQL, use JOIN and GROUP_CONCAT to convert this into a single query. The query should look something like this:
SELECT id, name, GROUP_CONCAT(description) AS desc FROM item INNER JOIN feature ON (feature.item_id = item.id) GROUP BY id
Don't forget to use the RegexTransformer on desc to separate out the multiple values.

Accessing ancestor values in xpath with Solr DataImportHandler

If my xml is structured like so:
<fruit>
<apple appleId="apple_1">
<core coreId="core_1">
<seed>1</seed>
<seed>2</seed>
</core>
</apple>
<apple appleId="apple_2">
<core coreId="core_1">
<seed>1</seed>
</core>
</apple>
</fruit>
and I want the seeds to be the documents in my solr schema, how can I access the appleId and coreId?
Here's the pertinent entity definition from my data-config.xml:
<entity name="apples"
processor="XPathEntityProcessor"
stream="true"
forEach="/fruit/apple/core/seed"
url="fruit.xml"
transformer="script:create_id"
>
<field column="seed_s" xpath="/fruit/apple/core/seed" />
<field column="apple_id_s" xpath="/fruit/apple/#appleId" />
</entity>
script:create_id creates a unique id for each seed.
In this example, apple_id_s is coming back as null.
I found the problem. I need to use commonField="true" and make sure to loop through each apple and core. Also, I need to set the pk="seed_s" which triggers solr to store the document.
Here's my new entity definition:
<entity name="apples"
processor="XPathEntityProcessor"
stream="true"
pk="seed_s"
forEach="/fruit/apple/core/seed | /fruit/apple | /fruit/apple/core"
url="fruit.xml"
transformer="script:create_id"
>
<field column="seed_s" xpath="/fruit/apple/core/seed" />
<field column="apple_id_s" xpath="/fruit/apple/#appleId" commonField="true"/>
<field column="core_id_s" xpath="/fruit/apple/core/#coreId" commonField="true"/>

Resources