why scrapy won't load any of my pipelines? - windows

ok, so Im using Scrapy for some basic web scraping and its working fine on scraping part! when get some output using feed export something like -o output.csv wont do anything, it will make an empty file but nothing else.
after a period of confusion I couldn't make it work so i've decided to use a pipeline to write some custom method of exporting. but now the problem is even though the application is working fine... its just not load the pipelines. not any single one of them is not running and there is no error.
this is my settings.py where I put the option to load them:
ITEM_PIPELINES = {
'fbcrawl.pipelines.CsvExporterPipeline': 300
}
and this is my CsvExporterPipeline class inside pipelines.py:
class CsvExporterPipeline(object):
def process_item(self, item, spider):
print('\n' * 2)
print(item)
print('\n' * 2)
return item
and its not gonna run neither of these 3 prints wont run at all.
I want to know how can I have my pipelines loaded and working?
UPDATE: i forgot to mention that im trying to run this code... so the spider is mentioned here:
https://github.com/rugantio/fbcrawl

Are you sure BOT_NAME in settings.py is set at fbcrawl ?
And what is the code of your spider ?

Related

Displaying JSON output from an API call in Ruby using VScode

For context, I'm someone with zero experience in Ruby - I just asked my Senior Dev to copy-paste me some of his Ruby code so I could try to work with some APIs that he ended up putting off because he was too busy.
So I'm using an API wrapper called zoho_hub, used as a wrapper for Zoho APIs (https://github.com/rikas/zoho_hub/blob/master/README.md).
My IDE is VSCode.
I execute the entire length of the code, and I'm faced with this:
[Done] exited with code=0 in 1.26 seconds
The API is supposed to return a paginated list of records, but I don't see anything outputted in VSCode, despite the fact that no error is being reflected. The last 2 lines of my code are:
ZohoHub.connection.get 'Leads'
p "testing"
I use the dummy string "testing" to make sure that it's being executed up till the very end, and it does get printed.
This has been baffling me for hours now - is my response actually being outputted somewhere, and I just can't see it??
Ruby does not print anything unless you tell it to. For debugging there is a pretty printing method available called pp, which is decent for trying to print structured data.
In this case, if you want to output the records that your get method returns, you would do:
pp ZohoHub.connection.get 'Leads'
To get the next page you can look at the source code, and you will see the get request has an additional Hash parameter.
def get(path, params = {})
Then you have to read the Zoho API documentation for get, and you will see that the page is requested using the page param.
Therefore we can finally piece it together:
pp ZohoHub.connection.get('Leads', page: NNN)
Where NNN is the number of the page you want to request.

nightwatchjs, run same test on multiple pages

I have written some tests for my homepage but the tests are very generic, like footer, header checking.
My test structure is like:
const footerCheck = function(browser){
browser.url("example.com");
browser.verify.elementPresent(".footer-top", "Footer-top is present.")
browser.verify.elementPresent(".footer-middle", "Legal notice bar is present")
browser.verify.elementPresent(".footer-bottom", "Copyright bar is present")
}
export.module = {
"Footer Check" : footerCheck
}
Lets say I have 100 pages. I would like to run footerCheck function run on all hundred pages.
URLs like example.com/page1 , example.com/page2 , example.com/page3...
Since all the tests are valid for other pages I would like to loop all pages for the same test cases. Somehow could not get my head around it.
How is that possible, any help would be appreciated.
Thanks
In my personal experience, the best way to do BDD is adding cucumber that uses gherkin syntax. It is clearer and helps a lot to reduce redundant code if you know to use it well. There is a Nightwatch npm plugin to add cucumber, once you have added it you have to create your .feature file like the following
Feature: Check elements are present
Scenario Outline:
Given the user enters on a <page>
Then .footer-top, .footer-middle and .footer-bottom class should be enabled
Examples:
|page|
|page.com/page1|
|page.com/page2|
|page.com/page3|
And your step definitions (where you declare what will do each step) it automatically will run each step for each url provided in the examples (note the <page> flag that will be replaced on the example, first row is the name of the tag).
Take a look to the examples

before filter issue in padrino

I'm trying to create a chain of before filter in padrino that look like this of which look like this
before do
set_current_user
track_order_ip
!current_user and pass
## don't allow the next filter other filter to run if no current user
customer_inactivity!
skip_enforce!
## so the theory is this if a users is is not enforced he should not be allowed to execute enforce! before filter
enforce!
end
Now all filter would execute in chain but if the current_user is not present I wish to drop(i.e pass) the filter chain processing which is taken care by this (!current_user and pass) code
But trying to do something like this in padrino cause the app to redirect the same route multiple time and then break with following error.
ArgumentError at /myaccount/users/authenticate
uncaught throw :pass
at
!current_customer and pass
What I'm find weird and what I'm not able to understand is , why? it not working in Padrino(since I know Padrino internally uses Sinatra) because I did wrote a proof of concept similar application in sinatra (can be found over here) and that just seem to work out of box without any issue
Lastly here the padrino code
Now any one can give me some pointer as too what I'm doing wrong in padrino that is implemented correctly in the proof of concept sinatra app
Thanks

blobstore images get_serving_url

I am new to the Google App Engine and I am trying to use the Blobstore to store images that I want to display later on.
The image storage works fine. Now I want to dynamically change some images in my html code. Therefore I need a method of getting the images out of the blobstore and passing them. I am using Python. I found the get_serving_url-command, which seemed to be the perfect fit. Sadly, this causes an Error and I seem to be unable to fix it.
My basic code looks like this:
blob_key = "yu343mQ7kT4344N434ewQ=="
if blob_key:
blob_info = blobstore.get(blob_key)
if blob_info:
img = images.Image(blob_key=blob_key)
url = images.get_serving_url(blob_key)
...
Everytime the function gets called, I get the following Error in my Log Console.
File "C:\Program Files
(x86)\Google\google_appengine\google\appengine\ext\remote_api\remote_api_stub.py",
line 234, in _MakeRealSyncCall
raise pickle.loads(response_pb.exception())
AttributeError: 'ImagesNotImplementedServiceStub' object has no
attribute 'THREADSAFE'
I have no idea how to fix it or if I am doing something terribly wrong.
I am very grateful for your support! Thank you in advance!
Have a nice day
You probably need an instance of BlobKey so if you are getting blob_info successfully try:
img = images.Image(blob_key=blob_info.key())
url = images.get_serving_url(blob_info.key())

Render a view's output later via a delayed_job

If I render html I get html to the browser which works great. However, how can I get a route's response (the html) when being called in a module or class.
I need to do this because I'm sending documents to DocRaptor and rather than store the markup/html in a db column I would like to instead store record IDs and create the markup when the job executes.
A possible solution is using Ruby's HTTP library, Httparty or wget or something and open up the route and use the response.body. Before doing so I thought I'd ask around.
Thanks!
-- Update --
Here's something like what I ended up going with:
Quick tip - in case anyone does this and need their helper methods you need to extend AV with ApplicationHelper:
Here's something like what I ended up doing:
av = ActionView::Base.new()
av.view_paths = ActionController::Base.view_paths
av.extend ApplicationHelper #or any other helpers your template may need
body = av.render(:template => "orders/receipt.html.erb",:locals => {:order => order})
Link:
http://www.rigelgroupllc.com/blog/2011/09/22/render-rails3-views-outside-of-your-controllers/
check this question out, it contains the code probably want in an answer:
Rails 3 > Rendering views in rake task

Resources