How to exctract a string from the following HTML page using PHP - preg-match

Got stuck at some stuff. In short, I need to extract some certain data from a webpage.
Basically, I need to extract /title/tt0118615/ from
Anaconda"
by using preg_match() or whatever other ways. That's a piece of the code from the page which is extracted by the php code below:
<?php
$url = "http://www.imdb.com/find?s=tt&q=Anaconda";
$raw = file_get_contents($url);
echo preg_match ("/^(href=\"\/title\/tt)\"$/", $raw, $data);
echo "data: $data[1]";
?>
I know I'm wrong at the pattern, so that's why I'm posting my question here.
Thanks in advance.

I think this pattern will work in your case:
preg_match("/a href=\"([^\"]*)\"/", $raw, $data);
$data will be an array containing your results, $data[1] is the one you're looking for

$url = "http://www.imdb.com/find?s=tt&q=Anaconda";
$raw = file_get_contents($url);
preg_match_all('%b\.gif\?link=(/title/.*?)\'%i', $raw, $imdbcode, PREG_PATTERN_ORDER);
$imdbcode = $imdbcode[1][0];
echo $imdbcode; #echo's /title/tt0118615/

Related

Get Laravel 4 URL parameters.

Is there an easier way to get and URL parameters similar to that when appending pagination?
Link
Instead of going #if(Input::has('param1') Input::get('param1') #endif
Thanks
You can always default to something if the parameter is missing
{{ Input::get('param1', 'default value') }}
I actually couldn't find a solution to this after researching (I'm sure there is a way, but couldn't find it) so I just decided to create a quick function, a little arbitrary, but works...
$url = '?';
foreach(Input::all() as $input => $value)
$url .= $input . '=' . $value . '&';
Link

Codeigniter %B4 how to validate/filter

I am having an issue where if a post variable has %B4 in it, it will be urldecoded into a char that cannot be saved into the database without an error. (Even if I access via $_POST).
What is the best way to validate a field so these chars. cannot be saved?
I think that maybe prey_match can help you.
<?php
$subject = "abcdef";
$pattern = '/^def/';
preg_match($pattern, substr($subject,3), $matches, PREG_OFFSET_CAPTURE);
print_r($matches);
?>

Search specific text with DOM XPath

I have been trying to crawl a website pages and search for specific text using simple html dom and XPath. I have get all the links from website and trying to crawl that links and search text on all pages. The text that i want to search is within html span tag.
But no output is shown.
whats going wrong ?
here is my code
<?php
include_once("simple_html_dom.php");
set_time_limit(0);
$path='http://www.barringtonsports.com';
$html = file_get_contents($path);
$dom = new DOMDocument();
#$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$hrefs = $xpath->evaluate("/html/body//a");
for($i = 0; $i < $hrefs->length; $i++ ){
$href = $hrefs->item($i);
$url = $href->getAttribute('href');
$nurl = $path.$url;
$html1 = file_get_contents($nurl);
$dom1 = new DOMDocument();
#$dom1->loadHTML($html1);
$xpath1 = new DOMXPath($dom1);
$name = $xpath1->evaluate("//span[contains(.,'Asics Gel Netburner 15 Netball Shoes')]");
if($name)
echo"text found";
}
?>
I just want to check the whether text "Asics Gel Netburner 15 Netball Shoes" exist in any page of the website www.barringtonsports.com or not.
You're querying a lot of web-pages interactively. It takes more time than your server is allowed to use for generating pages.
You can execute this script from command-line to avoid timeouts or you can try to configure PHP and WebServer so they give more time to the script (you can ask on https://serverfault.com/ how to do this)
Well, first off you are mixing Simple HTML DOM and DOM Document. Just use one or the other. Since this is in the simple-html-dom tag start with this from the command line:
<?php
require_once("./simple_html_dom.php"); # simplehtmldom.sourceforge.net to use manual
$path="http://www.barringtonsports.com";
$html = file_get_html($path);
foreach ($html->find('a') as $anchor) {
$url = $anchor->href;
echo "Found link to " . $url . "\n";
# now see if the link is relative, absolute, or even on another site...
$checkhtml = file_get_html($url);
# now you can parse that link for stuff too.
}
?>
But really, that website has a search form, why not just send it a query instead and read the results?

Codeigniter - htmlspecialchars() on input not working

I'm using htmlspecialchars() on input field for user last name to prevent xss, but it's not working..
Let's say $user_data->user_last_name; is my user last name, so I did:
htmlspecialchars( $user_data->user_last_name, ENT_QUOTES, 'UTF-8' );
When I try to save user last name as 'Lastname<script>alert("xss")</script>', I get JS alert with 'xss' message.
Any clue maybe?
Try this, may work:
$string = htmlentities($user_data->user_last_name, ENT_QUOTES, 'ISO-8859-15');
While retrieving the input you should use:
$value = $this->input->post('input_name', true);
Here, true will clean the input value of xss.
It works, but The output is interpreted by your browser as HTML
// use That Simple Line Above Your Code To See The Real output
<?php
header('Content-Type: text/plain');
?>

Rewrite rules in the .htaccess file

The request is simple, however, I cannot find a way to implement it. I have links like:
httр://mysite.com/index.php?lang=EN
httр://mysite.com/index.php?route=add&lang=EN
httр://mysite.com/index.php?route=view&lang=EN
and so on. What I want is to create 301 redirects so that EN could be changed to GB. For example, if a customer opens httр://mysite.com/index.php?route=add&lang=EN, he should be redirected to httр://mysite.com/index.php?route=add&lang=GB.
I have searched for this for days and have failed to find a working solution. Please help.
Does it have to be done in .htaccess? Here's a relatively simple way of doing it in PHP:
<?
if ("EN" == $_GET['lang']) {
$params = $_GET;
$params['lang'] = "GB";
$query_strings = array();
foreach ($params as $key => $value) {
$query_strings[] = $key . "=" . $value;
}
header("HTTP/1.1 301 Moved Permanently");
header("Location: http://www.mysite.com?" . join($query_strings, "&");
}
Bottom line is that it may be easier to fix this problem on a level where you can isolate each query parameter and look at just the lang parameter and determine whether to do a redirect.
With regular expressions (as you would need to use in .htaccess) it's harder to isolate just the lang part. You would also need one line per language you want to redirect and maintain the list.

Resources