folder structure for scraping in Laravel, using Goutte - laravel

I am a bit confused about my folder structure for the scraping code. Using console/commands, not the controller. So, in the handle function I am writing the whole scraping code. But should I suppose to do that? Or... what is the best approach for this?
UPDATED
If I understand correctly the answer below. It should look like this right now.
calling services
class siteControl extends Command
{
protected $signature = 'bot:scrape {website_id}';
protected $description = 'target a portal site and scrape';
public function __construct()
{
parent::__construct();
}
public function handle()
{
$website_id = $this->argument("website_id");
if ($website_id == 1) {
$portal = "App\Services\Site1";
}
$crawler = new $portal;
$crawler->run();
}
}
in handle method
class Site1 extends Utility
{
public function __construct()
{
parent::__construct();
}
public function run()
{
echo "method runs";
}
}
abstract:
use Goutte\Client;
abstract class Utility implements SiteInterfaces
{
protected $client;
public function __construct()
{
$this->client = new Client();
}
}
interfaces:
namespace App\Services;
interface SiteInterfaces
{
public function run();
}
and finally, I should write the whole scraping code inside the run() method? Please correct me If wrong about this... I am searching the best solution.

A best practice would be to call a separate service from your command handle() method. That way you could reuse that same service in a controller for instance.
The technical version:
Your application is given a specific thing to do (a command if you will). This command comes from outside of your application, which can be a anything from a web controller, to an API controller or a CLI application. In terms of hexagonal architecture this is called a port.
Once the application receives such a command it should not care which port it came from. By handling all similar commands in a single spot (a command handler) it does not have to worry about the origins of the command.
So to give you a short overview:
[Web request] [CLI command] <-- these are ports
\ /
\ /
\ /
[Command] <--- this is a method call to your service
|
|
|
[Command handler] <--- this is the service doing the actual work
Updated my answer
Based on the code you provided I implemented what I mentioned above like so:
app/Console/Command/BotScrapeCommand.php
This is the CLI command I mentioned above. All this class has to do is:
1. Gather input arguments; (website_id) in this case
2. Wrap those arguments in a command
3. Fire off the command using the command handler
namespace App\Console\Commands;
use App\Command\ScrapePortalSiteCommand;
use CommandHandler\ScrapePortalSiteCommandHandler;
class BotScrapeCommand extends Command
{
protected $signature = 'bot:scrape {website_id}';
protected $description = 'target a portal site and scrape';
public function handle(ScrapePortalSiteCommandHandler $handler)
{
$portalSiteId = $this->argument("website_id");
$command = new ScrapePortalSiteCommand($portalSiteId);
$handler->handle($command);
}
}
app/Command/ScapePortalSiteCommand.php
This is the Command I mentioned above. Its job is to wrap all input arguments in a class, which can be used by a command handler.
namespace App\Command;
class ScrapePortalSiteCommand
{
/**
* #var int
*/
private $portalSiteId;
public function __construct(int $portalSiteId)
{
$this->portalSiteId = $portalSiteId;
}
public function getPortalSiteId(): int
{
return $this->portalSiteId;
}
}
app/CommandHandler/ScrapePortalSiteCommandHandler.php
The command handler should implement logic based on its command. In this case that's figuring out which crawler to pick, then fire that one off.
namespace App\CommandHandler;
use App\Command\ScrapePortalSiteCommand;
use App\Crawler\PortalSite1Crawler;
use App\Crawler\PortalSiteCrawlerInterface;
use InvalidArgumentException;
class ScrapePortalSiteCommandHandler
{
public function handle(ScrapePortalSiteCommand $command): void
{
$crawler = $this->getCrawlerForPortalSite($command->getPortalSiteId());
$crawler->crawl();
}
private function getCrawlerForPortalSite(int $portalSiteId): PortalSiteCrawlerInterface {
switch ($portalSiteId) {
case 1:
return new PortalSite1Crawler();
default:
throw new InvalidArgumentException(
sprintf('No crawler configured for portal site with id "%s"', $portalSiteId)
);
}
}
}
app/Crawler/PortalSiteCrawlerInterface.php
This interface is there to make sure all crawlers can be called in similar fashion. Additionally it makes for nice type hinting.
namespace App\Crawler;
interface PortalSiteCrawlerInterface
{
public function crawl(): void;
}
app/Crawler/PortalSite1Crawler.php
This is where the implementation of the actual scraping goes.
namespace App\Crawler;
class PortalSite1Crawler implements PortalSiteCrawlerInterface
{
public function crawl(): void
{
// Crawl your site here
}
}
Another update
As you had some additional questions I've updated my answer once more.
:void
The use of : void in a method declaration means the method will not return anything. In a same way public function getPortalSiteId(): int means this method will always return an integer. The use of return typehints was added to PHP 7 and is not specific to Laravel. More information on return typehints can be found in the PHP documentation.
Commands and handlers
The use of commands and command handlers is a best practice which is part of the command bus pattern. This pattern describes an universal way of dealing with user input (a command). This post offers a nice explanation on commands and handlers. Additionally, this blog post describes in more details what a command bus is, how it's used and what the advantages are. Please note that in the code I've provided the bus implementation itself is skipped. In my opinion you do not need it per se, but in some cases it does add value.

Related

Laravel implement assertSentFrom specific address Mailable Testing

Trying to get to grips with Mocking and test cases, I want to test that a Mailable TestMail is sent from company#company.com, the documentation provides hasTo, hasCc, and hasBcc but doesn't look like it uses something like hasFrom. Is there any solutions to this?
https://laravel.com/docs/9.x/mocking#mail-fake
public function testEmailAlwaysFrom()
{
Mail::fake();
Mail::to('foo#bar.com')->send(new TestMail);
Mail::assertSent(TestMail::class, function ($mail) {
return assertEquals('company#company.com', $mail->getFrom());
// return $mail->hasTo($user->email) &&
// $mail->hasCc('...') &&
// $mail->hasBcc('...');
});
}
MailFake doesn't provide hasFrom method in the class and therefore will return false.
The workaround below however doesn't work when using the environmental variable MAIL_FROM_ADDRESS, ->from() has to be called within build().
A couple of GitHub issues have been reported suggesting a workaround below:
https://github.com/laravel/framework/issues/20056
https://github.com/laravel/framework/issues/20059
public function testEmailAlwaysFrom()
{
Mail::fake();
Mail::to('foo#bar.com')
->send(new TestMail);
Mail::assertSent(TestMail::class, function ($mail) {
$mail->build(); // <-- workaround
return $mail->hasTo('foo#bar.com') and
$mail->hasFrom('company#company.com');
});
}

Laravel 8 Testing - using RefreshDatabase but table not found

I'm trying to refactor a class and having problems testing it when I place certain code in the __constructor method and it throws an error that table not found within the test but works outside of tests.
I know this means that in the testing environment the table has yet to be created and although I'm using RefreshDatabase within the test it appears that at the point the class I'm testing initialises and attempts to access the database it's not ready. So I'm either doing something in the constructor I shouldn't or I'm missing something in my test structure.
Here's the basics of the class I'm tryting to test:
class PlayerRounds
{
use EclecticPresenter;
private RoundRepository $roundRepository;
private CourseRepository $courseRepository;
private $courseHoles;
public function __construct(RoundRepository $roundRepository, CourseRepository $courseRepository)
{
$this->roundRepository = $roundRepository;
$this->courseRepository = $courseRepository;
$this->init();
}
private function init()
{
$this->courseHoles = $this->courseRepository->all();
}
/**
* generates eclecic rounds for each league the player is in
* #param Player $player
*/
public function getAllEclecticRounds(Player $player)
{
$allEclecticRounds = collect();
$leagues = $player->league()->where('league_type', 'eclectic')->get();
$leagues->each(function ($league) use ($player, $allEclecticRounds) {
$newRound = $this->getPlayerEclecticRound($player, $league);
$allEclecticRounds->put($league->id, $newRound);
});
return $allEclecticRounds;
}
The test fails at the init() method. The fetch $this->courseHoles = $this->courseRepository->all(); the the test fails with a table not found error if it's within the constructor, It works if I place this piece of code within each method that needs it but means I call it often rather than once.
Here's my test:
class PlayerRoundsTest extends TestCase
{
use RefreshDatabase;
use EclecticTestHelper;
use WithFaker;
private $playerRounds;
protected function setUp(): void
{
parent::setUp();
$this->seed(CourseTableSeeder::class);
$this->playerRounds = app()->make(PlayerRounds::class);
}
/**
* #test
* #covers PlayerRounds::getAllEclecticRounds
* #description:
*/
public function it_returns_an_eclectic_round_for_all_leagues()
{
$player = Player::factory()->has(League::factory()->count(3))->create();
foreach ($player->league()->get() as $league) {
for ($x = 0; $x <= 3; $x++) {
$this->createScores($league, $player);
}
}
$result = $this->playerRounds->getAllEclecticRounds($player);
$this->assertCount(3, $result);
$result->each(function($collection) {
$this->assertCount(1, $collection);
});
Are there any ideas how I can initiate the class correctly and get the test set up correct and ensure the database is ready for the test. I assumed using RefreshDatabase was the correct approach and I had things in the correct order.
Thank you
**update
If I change the constructor to this:
public function __construct(RoundRepository $roundRepository, CourseRepository $courseRepository)
{
$this->roundRepository = $roundRepository;
$this->courseRepository = $courseRepository;
}
and then place the code that calls on the database to the method used in the test back to this:
public function getPlayerEclecticRound(Player $player, League $league, $maxDate = null)
{
// FIXME: Initiate at start in constructor but fails in tests
$this->courseHoles = $this->courseRepository->all();
//rest of code removed for brevity
}
This then passes the test.
This class needs the data in $this->courseHoles for a number of methods to work so I'm aiming to just call this once at initialization rather than every time I access the method as it is now but can' get it to work in a testing environment.
note I'm using a mysql database on the server but a sqllite memory database in testing
###update
Ok, after a bit of playing around the error is being caused by the loading of a custom artisan command I created. That command class has a dependancy of another class which in turn calls on the class I'm tresting.
I removed the command from Kernel.php as follows:
protected $commands = [
Inspire::class,
FixtureReminder::class,
SmsFixtureReminder::class,
UpdateMigrationTable::class,
// EclecticUpdate::class,
// MatchplayUpdate::class,
CleanTemporaryFiles::class,
AuthPermissionCommand::class
];
So - am I right in assuming for test purposes this is going to be impossible to isolate without changing this each time? This all relates to dependencies and the order in which CreateApplication works but I don't know enough to work around this.

Throwing a \Doctrine\DBAL\Driver\DriverException in a unit test mock

Redacting unit tests, I am confronted to this problem. A piece of code that I want to test catches a \Doctrine\DBAL\Exception\RetryableException. The first constructor in the classes chain is the one of DriverException and is built like this :
/**
* #param string $message The exception message.
* #param \Doctrine\DBAL\Driver\DriverException $driverException The DBAL driver exception to chain.
*/
public function __construct($message, \Doctrine\DBAL\Driver\DriverException $driverException)
{
$exception = null;
if ($driverException instanceof Exception) {
$exception = $driverException;
}
parent::__construct($message, 0, $exception);
$this->driverException = $driverException;
}
I feel like I am confronted to the problem of the egg and the chicken, here. How can I instanciante a class that takes an instance of itself as mandatory argument in the first place ?
Note: I won't mark this auto-response as a solution, it is more a workaround.
Instead of throwing the right exception in my unit test mock, I have created a simpler one, extending Exception but still implementing the original interface RetryableException, as it's the interface that is caught in the code I am testing. While not being what I wanted to do, it does the job in my precise case.
Here is how I have an actual instance of DriverException in my unit tests, using an anonymous class instead of a mock:
<?php
declare(strict_types=1);
use Doctrine\DBAL\Driver\Exception as TheDriverException;
use Doctrine\DBAL\Exception\DriverException;
use PHPUnit\Framework\TestCase;
final class MyTest extends TestCase
{
// ... the rest of the test case
private function getDriverExceptionWithCode(int $code): DriverException
{
$theDriverException = new class($code) extends \Exception implements TheDriverException {
public function __construct(int $code)
{
parent::__construct('oh no, you broke it :(', $code);
}
public function getSQLState(): ?string
{
return null;
}
};
return new DriverException($theDriverException, null);
}
}
In my case I needed to unit test a situation where the code is catching a DriverException with a specific code, but you can extend the code as you wish, or make it simpler. The only thing you need is to implement getSQLState, after all.
Hope this helps whoever stumbles on this question from their favorite search engine.

Unable to render HTML from Markdown

I am going through an online course on Laravel. This course is using the League\commonmark package for converting markdown to html.
Whenever the package is used in the app, I get:
Unable to find corresponding renderer for block type League\CommonMark\Block\Element\Document
The app uses the following presenter to do the conversion.
class PagePresenter extends AbstractPresenter
{
protected $markdown;
public function __construct($object, CommonMarkConverter $markdown)
{
$this->markdown = $markdown;
parent::__construct($object);
}
public function contentHtml()
{
return $this->markdown->convertToHtml($this->content);
}
}
Can anyone point me in the right direction?
That happens because the IoC is resolving the dependencies for CommonMarkConverter, specifically Environment which is instantiated with all null properties.
You can probably resolve this by using a Laravel specific integration: https://github.com/GrahamCampbell/Laravel-Markdown
Or you can bind and instance to the service container this way:
In your AppServiceProvider, register method add this:
$this->app->singleton('Markdown', function ($app) {
// Obtain a pre-configured Environment with all the CommonMark parsers/renderers ready-to-go
$environment = \League\CommonMark\Environment::createCommonMarkEnvironment();
// Define your configuration:
$config = ['html_input' => 'escape'];
// Create the converter
return new \League\CommonMark\CommonMarkConverter($config, $environment);
});
Now remove CommonMarkConverter from your Presenter constructor add use app('Markdown'):
class PagePresenter extends AbstractPresenter {
protected $markdown;
public function __construct($object)
{
$this->markdown = app('Markdown');
parent::__construct($object);
}
public function contentHtml()
{
return $this->markdown->convertToHtml($this->content);
}
}
You just put a line in the config/app.php file
'Markdown' => GrahamCampbell\Markdown\Facades\Markdown::class,

Laravel 4 Container Internal Workings

I've been studying the laravel 4 container to get more knowledge of the internals of laravel and to upgrade my own skills in writing better code.
However i'm failing to understand 3 similar pieces of code.
I'll use the smallest snippet to keep this question clean.
Similar questions can be found in links below. Although people have replied with correct answers, I'm not satisfied with simply 'Knowing how to use it, but not knowing how it all works inside'. So i really hope someone can give an explanation to all this.
Question 1
Question 2
<?php namespace Illuminate\Container; use Closure, ArrayAccess, ReflectionParameter;
class BindingResolutionException extends \Exception {}
class Container implements ArrayAccess {
/**
* Wrap a Closure such that it is shared.
*
* #param Closure $closure
* #return Closure
*/
public function share(Closure $closure)
{
return function($container) use ($closure)
{
// We'll simply declare a static variable within the Closures and if
// it has not been set we'll execute the given Closure to resolve
// the value and return it back to the consumers of the method.
static $object;
if (is_null($object))
{
$object = $closure($container);
}
return $object;
};
}
}
How does the share method know that the $container variable in that function is in fact an instance of Illuminate\Container? It isn't defined within the scope of that function.
Neither is it defined in the following example usecase (which wouldn't help anyway)
class AuthServiceProvider extends ServiceProvider{
/**
* Register the service provider.
*
* #return void
*/
public function register()
{
$this->app['auth'] = $this->app->share(function($app)
{
// Once the authentication service has actually been requested by the developer
// we will set a variable in the application indicating such. This helps us
// know that we need to set any queued cookies in the after event later.
$app['auth.loaded'] = true;
return new AuthManager($app);
});
}
}
I'd expect a different implementation, so here comes
class MyContainer{
public function share(Closure $closure)
{
$container = $this;
return function() use ($closure, $container)
{
static $object;
if(is_null($object))
{
$object = $closure($container);
}
return $object;
};
}
}
$closure = function($container)
{
var_dump($container);
};
$container = new MyContainer();
call_user_func($container->share($closure));
//dumps an instance of MyContainer -> which is the wanted behaviour
$container = new Illuminate\Container\Container();
call_user_func($container->share($closure));
//Throws a warning AND a notice
//Warning: Missing argument 1 for Illuminate\Container\Container::Illuminate\Container\{closure}() in /Users/thomas/Sites/Troll/vendor/illuminate/container/Illuminate/Container/Container.php on line 128
//NOTICE: Notice: Undefined variable: container in /Users/thomas/Sites/Troll/vendor/illuminate/container/Illuminate/Container/Container.php on line 137
//and even worse the output of the var_dump is NULL
I have the same problem in understanding the extend and the bind method, which both have the same implementation of passing a none-existing parameter as a closure argument, but i cannot grasp how it is resolved to the container instance itself?
The return value of Container::share() is a function that takes one argument: the container itself. In order to call it externally, you'd have to do this:
$closure = function ($container) {
var_dump($container);
};
$container = new Illuminate\Container\Container();
call_user_func($container->share($closure), $container);
The reason for this is due to how service definitions work. The intended use of share is to wrap around a service definition.
Like this:
$container = new Illuminate\Container\Container();
$container['foo'] = $container->share(function ($container) { return new Foo(); });
When you access a service, like this:
var_dump($container['foo']);
It checks if the value is callable, and if it is, it will try to call it as a function. If you leave off the share, you will get a new Foo instance every time. The share memoizes the instance and returns the same one every time.
To re-iterate, the $container argument in the function returned from share is there because that's how service creation works. The service definition ("factory" function that you "set" on the container) is just a function that takes a container and returns the instance of the service it is creating.
Since offsetGet() it is expecting the definition to take a $container argument, that's what share returns.

Resources