PHPcrawler - tmp file - phpcrawl

I downloaded the latest version of phpcrawler, and I can access a test website of my own.
I only have an image and some text on this site, I run the crawler and I receive the text minus the image because I did the proper $crawler->addNonFollowMatch("/.(jpg|gif|png)$/ i");
I cannot get it to save the tmp file It does not save the unique tmp file in the folder I run the crawler from, I have tried to save a named file no luck.
I did run into many depreciated errors on different lines in all the php files, for example: #fopen, the # cause problems in different area's. I use PHP and can also do Regex.
David.

I answered my own question, since I see that PHPCrawler questions really do not get answered; I saw a question from last year not answered. I will answer it also, though it might be too late to do any good. This is the answer.
I added in a modified phpcrawler I adjusted for my needs:
$fp = fopen('c:/test/poopoo.txt','w');
fwrite($fp,($page_data['source']));
fclose($fp);
You put it before flushing the file and create your instance of class.
I found out using PHP Simple HTML DOM Parser from this project works well. If you need more control use RegExp, but that does have a steep learning curve.

Related

How to use d3's zoomable-sunburst

I'm trying to use zoomable suburst to display some data. I've generated the json file and am able to use the site to display my data. Now I'm trying to do this on my local machine, but am not sure of the correct way to go about this.
I think there are a couple of ways of going about this. One would be to just dump the js code into a js file and import it into an html file. I've seen some implementations on github give me the ability to do this, but they are not as clean as the one i've found on observablehq. And I'm unable to get the one on observablehq to work locally doing a copy/paste.
I also see an option on observablehq where i can download the code. I did that and the readme that came with it says that i need to run it on a server (ex. python -m http.server), but when i run the server from the folder containing the downloaded code, i keep getting a bunch of
code 404, message File not found
Now I'm a bit confused. I'd like to know the "right" way to go about using zoomable sunburst to show my data, and if it's at all possible to run this on my local.
Any suggestions/advice would be great. Thanks.
I'm super late to the party, but here is what was the problem for me :
I was running python -m http.server in the wrong directory, i.e the directory that didn't have index.html file inside. on I ran it in the directory that had the index.html file it worked perfectly.
hope this helps someone!

Malicious code hidden in image

I've come across a dodgy file upload on our server. It is an image and the MIME TYPE checks out, though on the server it was also uploaded with the extension .asp and .cer.
On the surface its a photo some weird chinese symbols and the letters asp, though I am sure it is hiding malicious code. I did a google search by image and it came out in a few possibly unsecure directories in some other sites.
This is out of my league to even verify. Out of interest I opened the file in notepad and it has the clear string "Google" which only makes me believe more that it is malicious.
All I need to know is
1- is it malicious?
2-did it run and what did it do?
3- how do i protect against it?
I cant give the link to the actual file on my server since Its been removed, but I can zip and mail it to anyone who wants to take a look.
If anyone has some advise on where to start I would appreciate it.
Heres a link to the same image, which came up on my google search though this one most likely has different code injected
http://www.bakjuweel.be/ShowImage.aspx?img=/upload/fotogalerijen/13/3.asp;.jpg&w=135&h=111
UPDATE
After alot more research I have found that it had a modified header to inject code. I run it through virustotal.com and my suspicious were confirmed. https://www.virustotal.com/en/file/3eac6e45d5923632089b538ca86d576c9994bd25be7940165ec997484d7c6715/analysis/
What it does or whether it executed is still unknown
OK, the file was malicious it contained encoded php, all of which im not sure of there were far too many encoded layers. It created a backdoor that fetched and executed remote code. This file was not detected by any of our antivirus software, what gave it away way was <% eval(. was the only part not encoded.A hacker took advantage of a vunerability in an old version of FCKeditor to add and execute it. I am still looking for a way to prevent it in the future.

Can't save when running "jekyll serve" on Windows 8.1, Notepad++

I am trying to use jekyll locally to build my website. It is all set up, and I can build and serve and see results at localhost:4000. There are no errors.
The problem is that when I run "Jekyll serve" I can't save files. The save option is greyed out and "ctrl + s" wont work.
I can open and edit the files, can do "Save As" and do other things - basically anything except saving.
I can save files when I am not serving them.
From what I understand, Jekyll is intended to be used to allow saving while serving so we can see our changes as we go. The auto-regenerate function (now a default with serve) supports that use.
I suspect the problem relates to some sort of permissions-type rule stopping me from editing files that are in use.
But because I am self-taught newbie and am not a developer/programmer, I don't know if it is something to do with how I have set up jekyll, notepad++, permissions or something else entirely.
Here is my environment:
Windows 8.1 64-bit
Ruby v2.1.5p
Jekyll v2.5.1
wdm v0.1.0
RubyDevKit
Notepad++ (in admin mode)
Here is what I have tried:
Scaled back the listen gem from v2.10.0 to v2.7.11 (the earlier was listed as safe/tested on a jekyll on windows website)
Scaled back Jekyll from v2.5.3 to v2.5.1 (the earlier was listed as safe/tested on a jekyll on windows website)
Opened Notepad++ in admin mode instead of normal mode.
Tried executing jekyll serve --watch (in case watch enabled saving)
I have not tried re-installing ruby v2.1.3 (listed as safe/tested on a Jekyll on windows website) because Jekyll is otherwise working I don't want to try a re-install except as a last resort - as a newbie I found it a pain to install it on Windows in the first place.
Can anyone help me with this (probably simple) issue?
I thank you for any assistance in advance.
Okay. So I feel really stupid.
But instead of pretending this never happened, I had better post this answer in case anyone else has a blonde 'moment' (read: an entire day) like I did:
Firstly, you can't edit the _config.yml files while serving. You can edit the other files - html, markdown, etc - but not the config file.
Secondly, in Notepad++ you need to make an actual change to a document before the saving option will appear.
I was using the _config file as my 'test' document for regeneration. While I did open up other files to check when I first thought I had an issue, I THINK I may not have made any changes to them - so the option to save them was never activated. After that, I only looked at the config file after making changes.
So, I THINK I may have been able to save while serving all along.
However, if I am wrong and it wasn't my own stupidity (which I strongly doubt), the steps I took which fixed it were:
Those steps outlined in my question; and
A reinstall of Notepad++ (as kindly recommended by 'nerver nerver' who has since removed his/her comment after I said that did not work).
SORRY ... and excuse me while I go and crawl away and hide in shame ...
If the files you were editing at that time was only _config.yml then the expected behavior is that the saved changes are not reflected when the Jekyll server is running/watching.
This is because the server is started after reading the configuration settings in _config.yml, and then changes that happen to that special file after that are not monitored by Jekyll (this is current as of May 2015, in case this gets changed in the future). Currently this is by design. see this SO question as well
What that means is, you have been saving the file when Jekyll is running just fine, the changes just do not get updated. A way to check this is to make some changes, close the file, then open it again (if you want to be extra sure, open in another editor) and see if reflected changes show up.
Changes made to other files in Jekyll when the server is running will be reflected. For example, if I edit a typo in a blog post, edit CSS files or change some formatting, and save in any text editor, Jekyll will regenerate the file from scratch and you should be able to see the changes by refreshing the localhost:4000 page (or whereever your server is running at).
I'm not sure about running Jekyll on Windows, but on a Linux terminal, Jekyll actually notified the number of files that have changed (with a timestamp) and that it regenerated X number of files. Something like
<timestamp> 3 files have changed. Regenerated 3 files in 0.0536 sec..
Lastly, this is probably not your issue, but I thought I might add this here for future reference, do not edit the files inside the _site folder, as they are always deleted and regenerated whenever the server is started again. Editing those files by hand might save and display changes, but the changes will be lost (because they are, statically generated every time by Jekyll)
TL;DR You most probably have been saving your files! The changes in _config.yml are just not reflected once the server is running, and has to be restarted for the new configuration parameters to take effect.

Magento site still not updating after changes to files

I have been trying to get my Magento site to take some changes but it is still not refreshing the changes. I have disabled caching and flushed all of them on every single occasion I have also cleared my browser cache and it still does not take changes. I have gone as far to delete several files from the server that the theme relies on but it still functions like nothing was ever removed! What could be my issue?
You keep editing those files. I do not think those files are the files you think they are.
You question is pretty short on details, but my first guess if your system is running with the compiler enabled, which means it's loading its class files from
includes/src
Googling around to learn about the compiler would be a good idea.
I'd try adding the following to the end of your index.php file
echo '<--';
print_r(get_included_files());
echo '-->';
This will list every file PHP used during the request. Compare the full paths with the ones you're editing, and I bet you'll find a discrepancy.

Prompt to reload an externally modified file in Textmate 2?

Im finding myself using TextMate 2 more and more for development these days
One thing that is bugging me is that it does not seem to reload a file when it is changed externally.
This is a big problem since I use terminal to switch git branches a lot, and it often results in accidentally saving an older version over the new branch
Sadly the TM team seem to have disabled the Issue tracker on github, and documentation just seems scattered far and wide over the web in tiny scraps.
Any ideas?
Right now it reloads the file silently whenever the file on Disk is changed. There is currently no option for a prompt.
Regarding the closed issue tracker: As an alternative, you can always send a mail to the Textmate users list if you find a problem.
I just updated to TM 2. In the past, I have enjoyed using TM to view development.log as I am introducing new or modified code. TM 1 would ask me if I want to revert to what's on disk and of course I would respond with Yes. Then I could see any additions to the log file. Occasionally I would empty the log file with Cmd+A, Delete, Cmd+S. Macromates, please make TM 2 work like TM 1 reloading content.
I also could not find a solution to this problem which is still there, however it is possible to file an issue using this contact page: http://macromates.com/support
Additionally there are hidden setting which cannot be set from the GUI. Unfortunately they don't include and setting for automatic load from disk: https://github.com/textmate/textmate/wiki/Hidden-Settings

Resources