Datastax Driver for cassandra and solr integration - jdbc

I'm trying to integrate Solr-6.6.0 and Cassandra-3.10. I tried using JDBC of cassandra and I got to know that Cassandra JDBC does not support to the latest versions. Then, I came across Datastax Driver. So, right now I'm using Datastax cassandra-driver-core-3.1.4 as my java jar file.
solrconfig.xml
<requestHandler name="/dataimport" class="solr.DataImportHandler">
<lst name="defaults">
<str name="config">sample-data-config.xml</str>
</lst>
</requestHandler>
I added the above lines.
sample-data-config.xml
<dataConfig>
<dataSource type="JdbcDataSource" driver="com.datastax.driver.core.Connection" url="jdbc:cassandra://127.0.0.1:9042/demo" autoCommit="true"/>
<document name="content">
<entity name="test" query="SELECT id,org,name,dep,place,sal from tutor" autoCommit="true">
<field column="id" name="id" />
<field column="org" name="org" />
<field column="name" name="name" />
<field column="dep" name="dep" />
<field column="place" name="place" />
<field column="sal" name="sal" />
</entity>
</document>
</dataConfig>
where tutor is my table and demo is my keyspace.
manageschema.xml
<field name="org" type="string" indexed="true" stored="true" required="true" />
<field name="dep" type="string" indexed="true" stored="true" required="true" />
<field name="place" type="string" indexed="true" stored="true" required="true" />
<field name="sal" type="string" indexed="true" stored="true" required="true" />
<field name="name" type="string" indexed="true" stored="true" required="true"/>
Added the above lines to manage schema
Jar files added to lib folder of my core are
cassandra-driver-core-3.1.4.jar
cassandra-driver-extras-3.1.4.jar
cassandra-driver-mapping-3.1.4.jar
Now, when I run solr dih files, core does not give any requests as well it does not show any error messages. I'm throughly out of my mind.
Can we use the specified jar files to connect cassandra and solr? If so, where I'm wrong? Or else if we need to use the integrated one are we supposed to use only DSE search?
Thanks in advance.

Related

SOLR - Request the last result with sort parameter is slow

I indexed 2M+ documents and when I request the last result order by name, the response is very slow (18s).
Is there a wy to optimize the sorting ?
My request :
http://localhost:8983/solr/items/select?&q=*:*&start=2274000&rows=100&sort=name_sort asc&fl=id,name
Below is the field definition in schema.xml file
<field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" />
<!-- docValues are enabled by default for long type so we don't need to index the version field -->
<field name="_version_" type="plong" indexed="false" stored="false"/>
<field name="name" type="text_general" indexed="true" stored="true" multiValued="false" />
<field name="name_sort" type="string" indexed="true" stored="false" multiValued="false" />

Solr dataimport jdbc multiple columns into one field

I am trying to implement a solr search for a project. Everything was fine so far, a first simple version worked. Now I try to import from a postgres data base where multiple columns should end up in the same field. My config:
<entity name="address" query="SELECT objectid, ags2, ags3, ags5, ags8, ags11, ags20, ags22, pt, stn, hnr_min, hnr_max, plz, ort, ortz, ot1, ot2 FROM variablen2018.ags22_tmp_solr LIMIT 10000;">
<field column="objectid" name="id" />
<field column="plz" name="plz" />
<field column="ort" name="ort" />
<field column="ortz" name="ort" />
<field column="ot1" name="ort" />
<field column="ot2" name="ort" />
<field column="ort" name="ort_res" />
<field column="stn" name="stn" />
<field column="stn" name="stn_res" />
<field column="ags2" name="ags2" />
<field column="ags3" name="ags3" />
<field column="ags5" name="ags5" />
<field column="ags8" name="ags8" />
<field column="ags11" name="ags11" />
<field column="ags20" name="ags20" />
<field column="ags22" name="ags22" />
<field column="pt" name="coord" />
<field column="hnr_min" name="hnr_min" />
<field column="hnr_max" name="hnr_max" />
</entity>
As you can see there are 4 columns from the DB (ort, ortz, ot1, ot2) going into one field (ort). Most of the times only one of the columns is populated at all, in which case the document is indexed normally. But when there are actually multiple entries the indexing of the document fails. The field is defined this way:
<field name="ort" type="text_de" uninvertible="true" indexed="true" required="true" stored="true"/>
DataImporthandler maps the result view of the query to a schema view and hence I don't think that you will be able to map multiple columns to one field. Instead you can assign each column to a new Solr field and then do a copy of them in your schema.
eg
<field name="ort" type="string" />
<field name="ortz" type="string" />
<field name="ot1" type="string" />
<field name="ot2" type="string" />
<field name="ortCombined" type="string" multiValued="true"/>
<copyField source="ort" dest="ortCombined" />
<copyField source="ortz" dest="ortCombined" />
<copyField source="ot1" dest="ortCombined" />
<copyField source="ot2" dest="ortCombined" />
Hope this helps !
you do it this way:
you concatenate all values into a single value in the Select:
select ...,ort||','||ortz||','||or1||','||ort2 AS ort_all FROM variablen2018.ags22_tmp_solr
and then split it into individual values when indexing into solr (this is done with RegexTransformer/splitBy)
< entity name="address" transformer="RegexTransformer"
...
< field column="ort_all" name="ort" splitBy=","/>
Note: inserted a space after < or the text does not show up here...
To watch out:
handle possible nulls, check concat_ws etc
handle possible , inside ort values (use another separator or replace , etc)

query cassandra data with apache solr 6.5.0

I am new to Solr. I am using solr 6.5.0 and i want to integrate it with cassandra(few details: [cqlsh 4.1.1 | Cassandra 2.0.17 | CQL spec 3.1.1 | Thrift protocol 19.39.0]).
The core name is test_core. I have tried to change the details of the solrconfig.xml and managed-schema.xml for test_core so that it can accept data from cassandra, but after i restart solr for reflecting the new changes in test_core, i get the error (simpleposttool: warning: solr returned an error #400 (bad request) for url).
Below are the changes that i have done:
solrconfig.xml
<lib dir="/home/retailteg/solr-6.5.0" regex="solr-dataimporthandler-6.5.0.jar" />
<lib dir="/home/retailteg/solr-6.5.0" regex="cassandra-all-1.2.5.jar" />
<lib dir="/home/retailteg/solr-6.5.0" regex="cassandra-thrift-1.2.5.jar" />
...................................................
<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">
<lst name="defaults">
<str name="config">managed-schema</str>
</lst>
</requestHandler>
managed-schema
<field name="id" type="string" indexed="true" stored="true" required="true"/>
<field name="email" type="string" indexed="true" stored="true" required="true"/>
<field name="name" type="string" indexed="true" stored="true" required="true"/>
<field name="password" type="string" indexed="true" stored="true" required="true"/>
My cassandra table is like below:
id(String), email(String), name(String), password(String)

Datastax Enterprise 5.0.3 Solr High CPU and High I/O Usage On One Node While Others Idle

One of my solr ode in my Solr Datacenter cause High CPU and high I/O. In some way, it affects my solr queries to other solr nodes. When i stop the node cause high cpu, my queries response in normal time.
Thanks to this link https://github.com/patric-r/jvmtop i was able to get that data. I also see the node indexing something but other nodes in the solr data has no load. the load is 0-2 while, the failing node has 20 load.
1273 http-serveripaddress-8983-5 RUNNABLE 51.16% 1.87%
3821 http-serveripaddress-8983-14 RUNNABLE 50.41% 0.95%
1259 http-serveripaddress-8983-2 RUNNABLE 48.68% 2.49%
1295 http-serveripaddress-8983-7 RUNNABLE 48.10% 1.87%
3825 http-serveripaddress-8983-18 RUNNABLE 14.15% 0.75%
1308 http-serveripaddress-8983-9 RUNNABLE 14.13% 4.44%
3486 http-serveripaddress-8983-11 RUNNABLE 13.38% 1.04%
1258 http-serveripaddress-8983-1 RUNNABLE 12.52% 2.03%
1264 http-serveripaddress-8983-4 RUNNABLE 12.07% 1.68%
1296 http-serveripaddress-8983-8 RUNNABLE 12.04% 3.75%
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<schema name="autoSolrSchema" version="1.5">
<types>
<fieldType class="org.apache.solr.schema.StrField" name="StrField"/>
<fieldType class="org.apache.solr.schema.TrieDoubleField" name="TrieDoubleField"/>
<fieldType class="org.apache.solr.schema.TrieIntField" name="TrieIntField"/>
<fieldType class="org.apache.solr.schema.TrieDateField" name="TrieDateField"/>
<fieldType class="org.apache.solr.schema.TrieLongField" name="TrieLongField"/>
</types>
<fields>
<field indexed="true" multiValued="false" name="st" stored="true" type="StrField"/>
<field indexed="true" multiValued="false" name="twd" stored="true" type="TrieDoubleField"/>
<field indexed="true" multiValued="false" name="ctr" stored="true" type="StrField"/>
<field indexed="true" multiValued="false" name="us" stored="true" type="StrField"/>
<field indexed="true" multiValued="false" name="tsb" stored="true" type="TrieDoubleField"/>
<field indexed="true" multiValued="false" name="btrg" stored="true" type="TrieIntField"/>
<field indexed="true" multiValued="false" name="cty" stored="true" type="StrField"/>
<field indexed="true" multiValued="false" name="hc" stored="true" type="TrieIntField"/>
<field indexed="true" multiValued="false" name="isp" stored="true" type="StrField"/>
<field indexed="true" multiValued="false" name="cnt" stored="true" type="StrField"/>
<field indexed="true" multiValued="false" name="scid" stored="true" type="StrField"/>
<field indexed="true" multiValued="false" name="cip" stored="true" type="StrField"/>
<field indexed="true" multiValued="false" name="sid" stored="true" type="StrField"/>
<field indexed="true" multiValued="false" name="pd" stored="true" type="TrieDateField"/>
<field indexed="true" multiValued="false" name="uid" stored="true" type="StrField"/>
<field indexed="true" multiValued="false" name="lfn" stored="true" type="StrField"/>
<field indexed="true" multiValued="false" name="devg" stored="true" type="StrField"/>
<field indexed="true" multiValued="false" name="str" stored="true" type="StrField"/>
<field indexed="true" multiValued="false" name="tcc" stored="true" type="TrieLongField"/>
<field indexed="true" multiValued="false" name="strg" stored="true" type="StrField"/>
<field indexed="true" multiValued="false" name="dev" stored="true" type="StrField"/>
<field indexed="true" multiValued="false" name="lfs" stored="true" type="StrField"/>
<field indexed="true" multiValued="false" name="cid" stored="true" type="TrieIntField"/>
<field indexed="true" multiValued="false" name="btr" stored="true" type="TrieIntField"/>
</fields>
<uniqueKey>(cid,pd,scid,lfn,lfs,uid,sid,cip,strg,str,st,btrg,btr,hc,us,devg,dev,cnt,ctr,cty,isp)</uniqueKey>
</schema>
I would guess that your partitioning is leading to hotspots on your data. A common example is to bucket stuff by the day or hour when loading time series data. The net effect is that only one node at a time will be used during the bucket period.
The other thing to look at is the value of max_solr_concurrency_per_core. The defaults can be too high - I'd normally recommend dropping it to 2 - and then gradually increasing until the server maxes out. What are your server hardware specs like, in terms of memory, cpus and disks?

How to configure Doctrine entities to handle compressed blobs in the database?

I'm using Doctrine 2 as an ORM to the database and i'm having an issue with compressed blobs.
I'm storing text in a compressed blob column in the database. How can i specify this in the entity mapping xml config? I'm currently using type="blob" for this column but this isn't returning a string. I could use type="text" but this returns garbage as it's not uncompressing it.
Can i specify somewhere in my entity config that this text need uncompressing on retrieval and compressing on persisting?
Here's my entity configuration:
<doctrine-mapping xmlns="http://doctrine-project.org/schemas/orm/doctrine-mapping"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://doctrine-project.org/schemas/orm/doctrine-mapping
http://doctrine-project.org/schemas/orm/doctrine-mapping.xsd">
<entity name="AccountNote" table="tblAccountNote">
<id name="intAccountNoteId" type="integer">
<generator strategy="AUTO" />
</id>
<field name="intAccountId" type="integer" nullable="false" unique="no" />
<field name="bolHiddenNote" type="boolean" nullable="false" unique="no" />
<field name="binNote" type="blob" nullable="false" unique="no" />
<field name="strHash" type="string" length="32" nullable="true" unique="no" />
<field name="dtmCreated" type="datetime" nullable="false" unique="no" />
<field name="stmTimestamp" type="datetime" nullable="false" unique="no" />
<many-to-one field="objAccount" target-entity="Account" inversed-by="objNotes">
<join-column name="intAccountId" referenced-column-name="intAccountId" />
</many-to-one>
</entity>
</doctrine-mapping>
In the end we decided on handling the compression in the getter and setter and removing it from the schema.

Resources