Using Carrot2 API with Java ComponentInitializationException: Could not instantiate component class - carrot2

I'm trying to write a prototype for a project that involves having java use carrot2 as a metasearch engine for several sources, such as bing and google , etc.
I've got a maven project with dependency :
I'm trying to run the following :
/* A controller to manage the processing pipeline. */
Controller controller = ControllerFactory.createSimple();
/* Input data for clustering, the query and number of results in this case. */
Map<String, Object> attributes = new HashMap<String, Object>();
attributes.put(AttributeNames.QUERY, "sugar");
attributes.put(AttributeNames.RESULTS, 100);
/* Perform processing */
ProcessingResult result = controller.process(attributes,
Bing3DocumentSource.class, LingoClusteringAlgorithm.class);
/* Documents fetched from the document source, clusters created by Carrot2. */
List<Document> documents = result.getDocuments();
List<Cluster> clusters = result.getClusters();
What I get is :
Exception in thread "main" org.carrot2.core.ComponentInitializationException: Could not instantiate component class:
at org.carrot2.core.SimpleProcessingComponentManager.prepare(
at org.carrot2.core.Controller.process(
at org.carrot2.core.Controller.process(
at com.jbaysolutions.metasearch.Test.main(
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(
at sun.reflect.DelegatingMethodAccessorImpl.invoke(
at java.lang.reflect.Method.invoke(
at com.intellij.rt.execution.application.AppMain.main(
Caused by: java.lang.InstantiationException:
at java.lang.Class.newInstance(
at org.carrot2.core.SimpleProcessingComponentManager.prepare(
... 8 more
Am I using the API correctly ? I've tried going over the documentation of carrot2 but it goes very little into the usage of the API, and also the examples don't seam to work.
Could really use some help here

Answer from Dawid Weiss on the carrot2 mailing list:
You're trying to instantiate an abstract class. Won't fly unless
you're Chuck Norris.
Why not look at the examples distributed with the project? There is
an example that uses Bing there.
All the examples are here, packaged and ready:
If you're planning to use Bing make sure you use your own appkey,
please (and thanks).
The part in question is :
ProcessingResult result = controller.process(attributes,
Bing3DocumentSource.class, LingoClusteringAlgorithm.class);
That should instead read :
ProcessingResult result = controller.process(attributes,
Bing3WebDocumentSource.class, LingoClusteringAlgorithm.class);


JMeter ConcurrencyThreadGroup object creation throws java.lang.ExceptionInInitializerError error

I am trying to upgrade my JMeter DSL implementation to the latest JMeter version(5.4.3). But I got an issue with ConcurrencyThreadGroup object creation, it throws an exception. See the below exception
Using following versions
JMeter 5.4.3
jmeter-plugins-standard 1.4.0
Method implementation
public ConcurrencyThreadGroup getConcurrencyThreadGroup(String name, String targetConcurrency,
String rampUpTime, String rampUpStepCount, String timeUnit, String holdTargetTime,
boolean setEnabled
) {
ConcurrencyThreadGroup concurrencyThreadGroup = new ConcurrencyThreadGroup();
concurrencyThreadGroup.setProperty("TestElement.test_class", ConcurrencyThreadGroup.class.getName());
concurrencyThreadGroup.setProperty("TestElement.gui_class", ConcurrencyThreadGroupGui.class.getName());
return concurrencyThreadGroup;
Observing below exception when try to execute
at org.apache.jmeter.reporters.ResultCollector.<init>(
at org.apache.jmeter.reporters.ResultCollector.<init>(
at com.blazemeter.jmeter.reporters.FlushingResultCollector.<init>(
at com.blazemeter.jmeter.threads.AbstractDynamicThreadGroupModel.<init>(
at com.blazemeter.jmeter.threads.AbstractDynamicThreadGroup.<init>(
at com.blazemeter.jmeter.threads.concurrency.ConcurrencyThreadGroup.<init>(
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(
at sun.reflect.DelegatingMethodAccessorImpl.invoke(
at java.lang.reflect.Method.invoke(
at org.testng.internal.MethodInvocationHelper.invokeMethod(
at org.testng.internal.TestInvoker.invokeMethod(
at org.testng.internal.TestInvoker.invokeTestMethod(
at org.testng.internal.MethodRunner.runInSequence(
at org.testng.internal.TestInvoker$MethodInvocationAgent.invoke(
at org.testng.internal.TestInvoker.invokeTestMethods(
at org.testng.internal.TestMethodWorker.invokeTestMethods(
at java.util.ArrayList.forEach(
at org.testng.TestRunner.privateRun(
at org.testng.SuiteRunner.runTest(
at org.testng.SuiteRunner.runSequentially(
at org.testng.SuiteRunner.privateRun(
at org.testng.SuiteRunnerWorker.runSuite(
at org.testng.TestNG.runSuitesSequentially(
at org.testng.TestNG.runSuitesLocally(
at org.testng.TestNG.runSuites(
at com.intellij.rt.testng.RemoteTestNGStarter.main(
Caused by: java.lang.NullPointerException
at org.apache.jmeter.samplers.SampleSaveConfiguration.<clinit>(
... 38 more
Appreciate any clue or solution to solve this issue.
You're supposed to show your full code and full stacktrace as the partials unfortunately don't tell the full story.
Most probably you didn't load JMeter Properties which are responsible for the Results File Configuration so my expectation is that you need to call the following function somewhere in the beginning of your code:
More information:
JMeter API
Five Ways To Launch a JMeter Test without Using the JMeter GUI
jmeter-from-code example project

Build a Custom Tokenizer for elasticsearch

I'm building a custom tokenizer in response to this: Performance of doc_values field vs analysed field
None of this API appears to be documented (?), so I'm going off of code samples from other plugins/tokenizers, but when I restart elastic having deployed my tokenizer I get this error constantly in the logs:
[2017-09-20 08:45:37,412][WARN ][indices.cluster ] [Samuel Silke] [[storm-crawler-2017-09-11][3]] marking and sending shard failed due to [failed to create index]
[storm-crawler-2017-09-11] IndexCreationException[failed to create index]; nested: CreationException[Guice creation errors:
1) Could not find a suitable constructor in com.cameraforensics.elasticsearch.plugins.UrlTokenizerFactory. Classes must have either one (and only one) constructor annotated with #Inject or a zero-argument constructor that is not private.
at com.cameraforensics.elasticsearch.plugins.UrlTokenizerFactory.class(Unknown Source)
at org.elasticsearch.index.analysis.TokenizerFactoryFactory.create(Unknown Source)
at org.elasticsearch.common.inject.assistedinject.FactoryProvider2.initialize(Unknown Source)
at _unknown_
1 error];
at org.elasticsearch.indices.IndicesService.createIndex(
at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyNewIndices(
at org.elasticsearch.indices.cluster.IndicesClusterStateService.clusterChanged(
at org.elasticsearch.cluster.service.InternalClusterService.runTasksForExecutor(
at org.elasticsearch.cluster.service.InternalClusterService$
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$
at java.util.concurrent.ThreadPoolExecutor.runWorker(
at java.util.concurrent.ThreadPoolExecutor$
Caused by: org.elasticsearch.common.inject.CreationException: Guice creation errors:
1) Could not find a suitable constructor in com.cameraforensics.elasticsearch.plugins.UrlTokenizerFactory. Classes must have either one (and only one) constructor annotated with #Inject or a zero-argument constructor that is not private.
at com.cameraforensics.elasticsearch.plugins.UrlTokenizerFactory.class(Unknown Source)
at org.elasticsearch.index.analysis.TokenizerFactoryFactory.create(Unknown Source)
at org.elasticsearch.common.inject.assistedinject.FactoryProvider2.initialize(Unknown Source)
at _unknown_
1 error
at org.elasticsearch.common.inject.internal.Errors.throwCreationExceptionIfErrorsExist(
at org.elasticsearch.common.inject.InjectorBuilder.injectDynamically(
at org.elasticsearch.common.inject.InjectorImpl.createChildInjector(
at org.elasticsearch.common.inject.ModulesBuilder.createChildInjector(
at org.elasticsearch.indices.IndicesService.createIndex(
... 9 more
My tokenizer is built for v2.3.4, and the TokenizerFactory looks like this:
public class UrlTokenizerFactory extends AbstractTokenizerFactory {
public UrlTokenizerFactory(Index index, IndexSettingsService indexSettings, #Assisted String name, #Assisted Settings settings){
super(index, indexSettings.getSettings(), name, settings);
public Tokenizer create() {
return new UrlTokenizer();
I genuinely don't know what I'm doing wrong. Have I deployed it incorrectly? It appears to be using my classes according to the logs...
I've only deployed it to one of my es nodes (4-node cluster). The /_cat/plugins?v endpoint gives this:
name component version type url
Samuel Silke urltokenizer j
As there's little or no documentation on this process, I've got this far by copying constructs as created in plugins by other people.
The error I'm seeing doesn't make sense. My TokenizerFactory looks just like everyone else's for this version of elastic. What am I doing wrong or, possibly, not doing that I should be to make this work?
Turns out I was missing an Environment variable. It should have been this:
public UrlTokenizerFactory(Index index, IndexSettingsService indexSettings, Environment env, #Assisted String name, #Assisted Settings settings){
I found a similar one here in the end:

Save Spark Dataframe into Elasticsearch - Can’t handle type exception

I have designed a simple job to read data from MySQL and save it in Elasticsearch with Spark.
Here is the code:
JavaSparkContext sc = new JavaSparkContext(
new SparkConf().setAppName("MySQLtoEs")
.set("", "true")
.set("es.nodes", "")
.set("", "id")
.set("spark.serializer", KryoSerializer.class.getName()));
SQLContext sqlContext = new SQLContext(sc);
// Data source options
Map<String, String> options = new HashMap<>();
options.put("driver", MYSQL_DRIVER);
options.put("url", MYSQL_CONNECTION_URL);
options.put("dbtable", "OFFERS");
options.put("partitionColumn", "id");
options.put("lowerBound", "10001");
options.put("upperBound", "499999");
options.put("numPartitions", "10");
// Load MySQL query result as DataFrame"Loading DataFrame");
DataFrame jdbcDF = sqlContext.load("jdbc", options);
DataFrame df ="id", "title", "description",
"merchantId", "price", "keywords", "brandId", "categoryId");;"df.count : " + df.count());
EsSparkSQL.saveToEs(df, "offers/product");
You can see the code is very straightforward. It reads the data into a DataFrame, selects some columns and then performs a count as a basic action on the Dataframe. Everything works fine up to this point.
Then it tries to save the data into Elasticsearch, but it fails because it cannot handle some type. You can see the error log here.
I'm not sure about why it can't handle that type. Does anyone know why this is occurring?
I'm using Apache Spark 1.5.0, Elasticsearch 1.4.4 and elaticsearch-hadoop 2.1.1
I have updated the gist link with a sample dataset along with the source code.
I have also tried to use the elasticsearch-hadoop dev builds as mentionned by #costin on the mailing list.
The answer for this one was tricky, but thanks to samklr, I have managed to figure about what the problem was.
The solution isn't straightforward nevertheless and might consider some “unnecessary” transformations.
First let's talk about Serialization.
There are two aspects of serialization to consider in Spark serialization of data and serialization of functions. In this case, it's about data serialization and thus de-serialization.
From Spark’s perspective, the only thing required is setting up serialization - Spark relies by default on Java serialization which is convenient but fairly inefficient. This is the reason why Hadoop itself introduced its own serialization mechanism and its own types - namely Writables. As such, InputFormat and OutputFormats are required to return Writables which, out of the box, Spark does not understand.
With the elasticsearch-spark connector one must enable a different serialization (Kryo) which handles the conversion automatically and also does this quite efficiently.
Even since Kryo does not require that a class implement a particular interface to be serialized, which means POJOs can be used in RDDs without any further work beyond enabling Kryo serialization.
That said, #samklr pointed out to me that Kryo needs to register classes before using them.
This is because Kryo writes a reference to the class of the object being serialized (one reference is written for every object written), which is just an integer identifier if the class has been registered but is the full classname otherwise. Spark registers Scala classes and many other framework classes (like Avro Generic or Thrift classes) on your behalf.
Registering classes with Kryo is straightforward. Create a subclass of KryoRegistrator,and override the registerClasses() method:
public class MyKryoRegistrator implements KryoRegistrator, Serializable {
public void registerClasses(Kryo kryo) {
// Product POJO associated to a product Row from the DataFrame
Finally, in your driver program, set the spark.kryo.registrator property to the fully qualified classname of your KryoRegistrator implementation:
conf.set("spark.kryo.registrator", "MyKryoRegistrator")
Secondly, even thought the Kryo serializer is set and the class registered, with changes made to Spark 1.5, and for some reason Elasticsearch couldn't de-serialize the Dataframe because it can't infer the SchemaType of the Dataframe into the connector.
So I had to convert the Dataframe to an JavaRDD
JavaRDD<Product> products = df.javaRDD().map(new Function<Row, Product>() {
public Product call(Row row) throws Exception {
long id = row.getLong(0);
String title = row.getString(1);
String description = row.getString(2);
int merchantId = row.getInt(3);
double price = row.getDecimal(4).doubleValue();
String keywords = row.getString(5);
long brandId = row.getLong(6);
int categoryId = row.getInt(7);
return new Product(id, title, description, merchantId, price, keywords, brandId, categoryId);
Now the data is ready to be written into elasticsearch :
JavaEsSpark.saveToEs(products, "test/test");
Elasticsearch's Apache Spark support documentation.
Hadoop Definitive Guide, Chapter 19. Spark, ed. 4 – Tom White.
User samklr.

Is paging broken with spring data solr when using group fields?

I currently use the spring data solr library and implement its repository interfaces, I'm trying to add functionality to one of my custom queries that uses a Solr template with a SimpleQuery. it currently uses paging which appears to be working well, however, I want to use a Group field so sibling products are only counted once, at their first occurrence. I have set the group field on the query and it works well, however, it still seems to be using the un-grouped number of documents when constructing the page attributes.
is there a known work around for this?
the query syntax provides the following parameter for this purpose, but it would seem that Spring Data Solr isn’t taking advantage of it. &group.ngroups=true should return the number of groups in the result and thus give a correct page numbering.
any other info would be appreciated.
There are actually two ways to add this parameter.
Queries are converted to the solr format using QueryParsers, so it would be possible to register a modified one.
QueryParser modifiedParser = new DefaultQueryParser() {
protected void appendGroupByFields(SolrQuery solrQuery, List<Field> fields) {
super.appendGroupByFields(solrQuery, fields);
solrQuery.set(GroupParams.GROUP_TOTAL_COUNT, true);
solrTemplate.registerQueryParser(Query.class, modifiedParser);
Using a SolrCallback would be a less intrusive option:
final Query query = //...whatever query you have.
List<DomainType> result = solrTemplate.execute(new SolrCallback<List<DomainType>>() {
public List<DomainType> doInSolr(SolrServer solrServer) throws SolrServerException, IOException {
SolrQuery solrQuery = new QueryParsers().getForClass(query.getClass()).constructSolrQuery(query);
//add missing params
solrQuery.set(GroupParams.GROUP_TOTAL_COUNT, true);
return solrTemplate.convertQueryResponseToBeans(solrServer.query(solrQuery), DomainType.class);
Please feel free to open an issue.

No such property: org.codehaus.grails.INCLUDED_JS_LIBRARIES

The requirement
I'm trying to run my JavaScript tests in a custom test phase based in the functional test phase. Basically it needs to:
Startup embedded Tomcat
Open a controller
Check the result of the executed tests
What I've done
First, I created my custom test phase, based on this post. So my _Events.groovy looks like
includeTargets << new File("${basedir}/scripts/_RunJavaScriptUiTests.groovy")
eventConfigureTomcat = { tomcat ->
tomcat.connector.setAttribute("compression", "on")
tomcat.connector.setAttribute("compressableMimeType", "text/html,text/xml,text/plain,application/javascript")
tomcat.connector.port = serverPort
eventAllTestsStart = {
phasesToRun << "uijs"
uijsTests = ["uijs"]
uijsTestPhasePreparation = {
uijsTestPhaseCleanUp = {
eventTestPhaseEnd = { phase ->
if( phase == "uijs" ) {
Next, I decided to use PhantomJS to open my page and analyze the executed tests. So I used this in the RunJavaScriptUiTests.groovy script
target(runJavaScriptUiTests:"Running Siesta tests") {
event("StatusUpdate", ["Siesta test phase..."])
//this is the script that evaluates the result of the tests
File script = new File("web-app/js/siesta/siesta-phantomjs-runner.js")
String home = System.getenv("PHANTOMJS_HOME")
if(!home) {
throw new RuntimeException("PHANTOMJS_HOME must be set.")
String executable = "${home}bin${File.separator}phantomjs"
String port = System.getProperty("server.port","8080")
String url = "http://localhost:$port/insoft-ext-ui/siesta" //url of my tests
println "Running Phantomjs ${executable} ${script.absolutePath} "
try {
ant.exec(executable: executable, outputproperty: "cmdOut", failonerror: 'true', errorproperty: "cmdErr") {
arg(value: script.absolutePath)
arg(value: url)
}catch(e) {
println "ERROR: $e"
throw e
try {
String output = "${}"
println output
}catch(e) {
event("StatusError",["Exception $e"])
I can see that the functionalTestPhasePreparation runs, because this starts up my application correctly. I can also see that the phantomjs command is correct, when it prints:
Running: /desenv/phantomjs-1.9.2/bin/phantomjs /desenv/java/projetos/insoft-ext-ui/web-app/js/siesta/siesta-phantomjs-runner.js http://localhost:8080/insoft-ext-ui/siesta
But this gives me the groovy.lang.MissingPropertyException
groovy.lang.MissingPropertyException: No such property: org.codehaus.grails.INCLUDED_JS_LIBRARIES for class:
at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.unwrap(
at org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite$PojoMetaMethodSiteNoUnwrapNoCoerce.invoke(
at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(
at org.codehaus.groovy.grails.web.filters.JavascriptLibraryFilters$_closure1_closure2_closure3.doCall(JavascriptLibraryFilters.groovy:27)
at org.codehaus.groovy.grails.web.filters.JavascriptLibraryFilters$_closure1_closure2_closure3.doCall(JavascriptLibraryFilters.groovy)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(
at sun.reflect.DelegatingMethodAccessorImpl.invoke(
at java.lang.reflect.Method.invoke(
For full Stacktrace see here.
The interesting is that if I just do
grails test run-app
phantomjs /desenv/java/projetos/insoft-ext-ui/web-app/js/siesta/siesta-phantomjs-runner.js http://localhost:8080/insoft-ext-ui/siesta
The script works and I don't get any exception.
The question
Why MissingPropertyException is thrown? I looked at JavascriptLibraryFilters and didn't find a reason for it.
About the Tomcat
I'm using the embedded Tomcat that comes with Grails, but enabling compressing in the _Events.groovy:
eventConfigureTomcat = { tomcat ->
tomcat.connector.setAttribute("compression", "on")
tomcat.connector.setAttribute("compressableMimeType", "text/html,text/xml,text/plain,application/javascript")
tomcat.connector.port = serverPort
I do not have a direct solution, but I can help you research this.
The source of your problem is apparently, which is applied in your Tomcat environment, which explains why your code works standalone.
Other issues which refer to this same Spring class exist on Stack Overflow. Most of them are problems regarding incorrect multi-part request processing. This would lead me to believe PhantomJS is making multi-part calls without the appropriate casting or interfaces for your environment. I suspect a change to either your Tomcat or Grails configuration may be required.
Here are several of the SO questions to which I refer:
SO: Uploading file throws No signature of method exception (in getFile() method)
SO: uploading a file in grails
SO: Communication between Signed Applet and server side Controller
SO: java.lang.IllegalStateException: Standard argument type [org.springframework.web.multipart.MultipartHttpServletRequest]
Here is a potentially relevant bug on Grails / CXF:
Spring Security bug, referring to a CXF bug, which says "To enable MTOM on CXF you have to disable Grails' multipart handling by setting the option grails.web.disable.multipart=true in Config.groovy"
Please provide any details regarding your Tomcat / Grails settings and/or confirm that you have investigated these potential issue paths so that we may discount them.
Hopefully this answer points you or others in the right direction for a proper solution.
