Tweaking Lingo parameters with carrot2 (with PHP) - carrot2

I'm trying to tweak the call to the Carrot2 REST API :
$client = new Client();
try {
$params = [
'multipart'=> [
['name'=> 'dcs.c2stream', 'contents' => $xml],
['name' => 'dcs.algorithm', 'contents' => 'lingo'],
['name' => 'dcs.output.format', 'contents' => 'JSON'],
['name' => 'dcs.clusters.only', 'contents' => 'true'],
['name' => 'MultilingualClustering.defaultLanguage', 'contents' => 'FRENCH'],
['name' => 'preprocessing.labelFilters.minLengthLabelFilter.minLength', 'contents' => 5],
['name' => 'preprocessing.documentAssigner.minClusterSize', 'contents' => 4]
],
'debug' => false
];
$response = $client->request('POST', 'http://devbox:8080/dcs/rest', $params);
The lingo parameters 'preprocessing.labelFilters.minLengthLabelFilter.minLength' and 'preprocessing.documentAssigner.minClusterSize' have no impact in the request.
I've found them in the documentation of the lingo algorithm.
Thanks for help !

With the good docker image, everything is fine (docker pull touane/carrot2) :
$c2Payload = [
'algorithm' => 'Lingo',
'language' => 'French',
'parameters' => [
'preprocessing' => [
'documentAssigner' => [
'minClusterSize' => 4
],
'labelFilters' => [
'minLengthLabelFilter' => [
'minLength' => 8
],
'completeLabelFilter' => [
'labelOverrideThreshold' => 0.35
]
]
],
'scoreWeight' => 1, // Tri par score
'clusterBuilder' => [
'phraseLabelBoost' => 2.5
],
'dictionaries' => [
'wordFilters' => [
['exact' => $this->getParameter('carrot2')['stop_words']]
]
],
'matrixBuilder' => [
'termWeighting' => [
'#type' => 'LinearTfIdfTermWeighting'
],
'boostFields' => ['title']
]
],
'documents' => []
];
$client = new Client();
$params = [
'body' => json_encode($c2Payload ),
'debug' => false
];
$response = $client->request('POST', $this->getParameter('carrot2')['url'], $params);

Related

Elasticsearch mapping with nested objects

It is my mapping. product_base is nested object also manufacturer is child of product_base and nested object. When i create my doc, if product_base count more than 1 , seems product_base.0, product_base.1 . However i couldnt use filter normally. Is my mapping wrong?
When i use filter for example , returns null because in my elasticserach doc. product_base seems as product_base.0 , product_base.1 ....
'path' => 'product_base.lang',
'query' => [
'bool' => [
'must' => [
['match' => [
product_base.lang.id_product' => 1]],
['match' => [
'product_base.lang.id_lang' => 2]],
],
],
$params = 'index' => product,
'body' => [
'mappings' => [
'_source' => [
'enabled' => true,
],
'properties' => [
'id_product' => ['type' => 'integer'],
'id_product_base' => ['type' => 'integer'],
'lang' => [
'type' => 'nested',
'properties' => [
'id_product' => ['type' => 'integer'],
'id_lang' => ['type' => 'byte'],
'name' => ['type' => 'text']
],
],
'product_base' => [
'type' => 'nested',
'properties' => [
'id_product_base' => ['type' => 'integer'],
'id_manufacturer' => ['type' => 'integer'],
'id_product_zone_group' => ['type' => 'text'],
'status' => ['type' => 'byte'],
'manufacturer' => [
'type' => 'nested',
'properties' => [
'id_manufacturer' => ['type' => 'integer'],
'name' => ['type' => 'text'],
'status' => ['type' => 'byte'],
],
],
]
]
]
]
]
]
];

Proper formatting of a JSON and multiplepart Guzzle post request

I have two different Guzzle post requests that I am trying to merge (solely because they basically do a united job and should be performed together).
Initially I have my donation data:
'donation' => [
'web_id' => $donation->web_id,
'amount' => $donation->amount,
'type' => $donation->type,
'date' => $donation->date->format('Y-m-d'),
'collection_id' => NULL,
'status_id' => $donation->status_id,
],
And then I have my files that go with it, which are basically two different PDFs that are enabled or disabled for donors, sometimes they have both. I know the multipart would look something like below, but I'm not sure.
foreach ($uploadDocs as $doc) {
'multipart' => [
[
'name' => 'donation_id',
'contents' => $donation->web_id,
],
[
'name' => 'type_id',
'contents' => $doc->type_id',
],
[
'name' => 'file',
'contents' => fopen($doc->path, 'r'),
'headers' => ['Content-Type' => 'application/pdf'],
],
],
}
Since I've usually only handled one file at a time and I'm not sure how to merge the first block of code with the second for an appropriate Guzzle post request.
You can try this:
$donationData = [
'web_id' => $donation->web_id,
'amount' => $donation->amount,
'type' => $donation->type,
'date' => $donation->date->format('Y-m-d'),
'collection_id' => NULL,
'status_id' => $donation->status_id,
];
$multipart = [];
foreach ($uploadDocs as $doc) {
$multipart[] = [
[
'name' => 'donation_id',
'contents' => $donation->web_id,
],
[
'name' => 'type_id',
'contents' => $doc->type_id,
],
[
'name' => 'file',
'contents' => fopen($doc->path, 'r'),
'headers' => ['Content-Type' => 'application/pdf'],
],
];
}
Than perform your request:
$r = $client->request('POST', 'http://example.com', [
'body' => $donationData,
'multipart' => $multipart,
]);

laravel array validation - work out depth for array

I am trying to validate recursive data which could have any number of levels e.g.
[
'name' => 'test',
'children' => [
[
'name' => 'test2'
],
[
'name' => 'test3',
'children' => [
'name' => 'test4'
]
],
[
'name' => 'test5',
'children' => [
'name' => 'test6'
'children' => [
'name' => 'test7'
]
]
]
]
]
In this example I would require the following rules to ensure that a name is specified at each level:
$rules = [
'name' => ['required'],
'children' => ['array'],
'children.*.name' => ['required'],
'children.*.children' => ['array'],
'children.*.children.*.name' => ['required'],
'children.*.children.*.children' => ['array'],
'children.*.children.*.children.*.name' => ['required'],
]
How could I dynamically generate the validation rules based on the data coming in?

Elastic Search: ES-PHP Client how to create GEOPOINT Index

I am trying to create an index that should contain GEOPoint by using the Elastic Search PHP client -> https://www.elastic.co/guide/en/elasticsearch/client/php-api/5.0/index.html
My Code is as follow
$params = [
'index' => 'sweden_codes',
'type' => 'sweden_c',
'body' => [
'mappings' => [
'location' => [
'properties' => [
'pin' => [
'properties' => [
'location' => [
'type' => 'geo_point'
]
]
]
]
],
'text' => $code->City,
'pin' => [
'location' => [
'lat' => $code->Latitude,
'lon' => $code->Longitude
]
]
]
]
];
$client = ClientBuilder::create()
->setSSLVerification(false)
->setHosts(["elasticsearch:9200"])->build();
The Problem is when i go into kibana it say " No Compatible Fields: The "sweden_codes" index pattern does not contain any of the following field types: geo_point"
can anyone please have a look into the issue and let me know whats wrong with my mapping and index creation
Here is the Code for Mappings that works for me
$params = [
'index' => 'sweden_postal_codes',
'body' => [
'mappings' => [
'codes' => [
'properties' => [
'location' => [
'type' => 'geo_point'
],
'city' => [
'type' => 'string'
]
]
]
]
]
];
$client = ClientBuilder::create()
->setSSLVerification(false)
->setHosts(["elasticsearch:9200"])->build();
// Adding the index into the ES Cluster
$response = $client->indices()->create($params);
Here is the code for document indexing that worked for me
$params = [
'index' => 'sweden_postal_codes',
'type' => 'codes',
'id' => 2,
'body' => [
'location' => [
'lat' => 30.5268956,
'lon' => 79.2289643
],
'city' => 'Stockholm'
]
];
$client = ClientBuilder::create()
->setSSLVerification(false)
->setHosts(["elasticsearch:9200"])->build();
// Adding the index into the ES Cluster
$response = $client->index($params);

Multi match and highlighting in elasticsearch

When I try to match one field in query everything works fine with highlighting in elasticsearch.
When I try to use:
$params = [
'index' => 'my_index',
'type' => 'articles',
'body' => [
'from' => '0',
'size' => '10',
'query' => [
'bool' => [
'must' => [
'match' => [ 'content' => 'what I want to search' ]
]
]
],
'highlight' => [
'pre_tags' => ['<mark>'],
'post_tags' => ['</mark>'],
'fields' => [
'content' => [ 'fragment_size' => 150, 'number_of_fragments' => 3 ]
]
],
]
];
everything works, but when I try to catch multiple fields, my search works correctly, but highlighting disappears.
'match' => [ 'content' => 'what I want to search' ],
'match' => [ 'type' => 1 ]
Do you know how to achieve functional highlighting, when I want apply search on two different fields with two different queries?
try this:
$params = [
'index' => 'my_index',
'type' => 'articles',
'body' => [
'from' => '0',
'size' => '10',
'query' => [
'bool' => [
'must' => [
'match' => [ 'content' => 'what I want to search' ]
]
],
'filter' => ['type' => 1]
]
] ],
'highlight' => [
'pre_tags' => ['<mark>'],
'post_tags' => ['</mark>'],
'fields' => [
'content' => [ 'fragment_size' => 150, 'number_of_fragments' => 3 ]
]
],
]

Resources