company_name = 'google'
browser.get('https://m.tianyancha.com/search?key=&checkFrom=searchBox')
ele = browser.find_element_by_xpath("//input[#id='live-search']")
ele.clear()
ele.send_keys(company_name, Keys.ENTER)
name = browser.find_element_by_xpath(
"//div[#class='new-border-bottom pt5 pb5 ml15 mr15'][1]//a[#class='query_name in-block']/span/em")
if name.text:
if name.text == company_name:
check = '1'
else:
check = '0'
else:
check = '0'
the error is :
NoSuchElementException: Message: no such element: Unable to locate
element: {"method":"xpath","selector":"//div[#class='new-border-bottom
pt5 pb5 ml15 mr15'][1]//a[#class='query_name in-block']/span/em"}
Your relative Xpath is wrong.
name = browser.find_element_by_xpath(
"//div[#class='new-border-bottom pt5 pb5 ml15 mr15'][1]//a[#class='query_name in-block']/span/em"
you cannot have // two times in your xpath. // means relative from the element you start with.
Check your Xpath for name.
Related
#i have scraped data below is my code, now i want to add a column of symbols to the respective company data, plz guide me how the symbol can be added to the respective firm data
#code below
from time import sleep
import pandas as pd
import os
import numpy as np
from bs4 import BeautifulSoup
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
browser = webdriver.Chrome(ChromeDriverManager().install())
symbols =['FATIMA',
'SSGC',
'FCCL',
'ISL',
'KEL',
'NCL',
'DGKC',
'SNGP',
'NML',
'ENGRO',
'HUMNL',
'CHCC',
'ATRL',
'HUBC',
'ASTL',
'PIBTL',
'OGDC',
'EFERT',
'FFC',
'NCPL',
'KTML',
'PSO',
'LUCK',
'SEARL',
'KOHC',
'ABOT',
'AICL',
'HASCOL',
'PTC',
'KAPCO',
'PIOC',
'POL',
'SHEL',
'GHGL',
'HCAR',
'DCR',
'BWCL',
'MTL',
'GLAXO',
'PKGS',
'SHFA','MARI',
'ICI',
'ACPL',
'PSMC',
'SPWL',
'THALL',
'BNWM',
'EFUG',
'GADT',
'AABS']
company = 1
for ThisSymbol in symbols :
# Get first symbol from the above python list
company = 2
# In the URL, make symbol as variable
url = 'http://www.scstrade.com/stockscreening/SS_CompanySnapShotYF.aspx?symbol=' + ThisSymbol
browser.get(url)
sleep(2)
# The below command will get all the contents from the url
html = browser.execute_script("return document.documentElement.outerHTML")
# So we will supply the contents to beautiful soup and we tell to consider this text as a html, with the following command
soup = BeautifulSoup (html, "html.parser")
for rn in range(0,9) :
plist = []
r = soup.find_all('tr')[rn]
# Condition: if first row, then th, otherwise td
if (rn==0) :
celltag = 'th'
else :
celltag = 'td'
# Now use the celltag instead of using fixed td or th
col = r.find_all(celltag)
print()
if col[i] == 0:
print ("")
else:
for i in range(0,4) :
cell = col[i].text
clean = cell.replace('\xa0 ', '')
clean = clean.replace (' ', '')
plist.append(clean)
# If first row, create df, otherwise add to it
if (rn == 0) :
df = pd.DataFrame(plist)
else :
df2 = pd.DataFrame(plist)
colname = 'y' + str(2019-rn)
df[colname] = df2
if (company == 1):
dft = df.T
# Get header Column
head = dft.iloc[0]
# Exclude first row from the data
dft = dft[1:]
dft.columns = head
dft = dft.reset_index()
# Assign Headers
dft = dft.drop(['index'], axis = 'columns')
else:
dft2 = df.T
# Get header Column
head = dft2.iloc[0]
# Exclude first row from the data
dft2 = dft2[1:]
dft2.columns = head
dft2 = dft2.reset_index()
# Assign Headers
dft2 = dft2.drop(['index'], axis = 'columns')
dft['Symbol'] = ThisSymbol
dft = dft.append(dft2, sort=['Year','Symbol'])
company = company +1
dft
my output looks this, i want to have a symbol column to each respective firm data
Symbol,i have added
dft['Symbol'] = ThisSymbol
but it add just first company from the list to all companies data
enter image description here
There is an xml :
<mgns1:Champ_supplementaire>
<mgns1:CODE_CS>3</mgns1:CODE_CS>
<mgns1:VALEUR_CS />
</mgns1:Champ_supplementaire>
When trying to get :
NodeList nodeliste_cs3 = (NodeList) xpath.evaluate( "//mgns1:Champ_supplementaire[mgns1:CODE_CS=3]/mgns1:VALEUR_CS",doc, XPathConstants.NODESET);
...
Node node_cs3 = nodeliste_cs3.item(i);
list.add(node_cs3.getTextContent() + ";");
I get NullPointerException ! So how to deal with node with no text ?
You can explicitly add predicate to specify that you want to select node only if it contains text:
...mgns1:VALEUR_CS[normalize-space()]
I have a datatable and want to select some records with LinQ in this format:
var result2 = from row in dt.AsEnumerable()
where row.Field<string>("Media").Equals(MediaTp, StringComparison.CurrentCultureIgnoreCase)
&& (String.Compare(row.Field<string>("StrDate"), dtStart.Year.ToString() +
(dtStart.Month < 10 ? '0' + dtStart.Month.ToString() : dtStart.Month.ToString()) +
(dtStart.Day < 10 ? '0' + dtStart.Day.ToString() : dtStart.Day.ToString())) >= 0
&& String.Compare(row.Field<string>("StrDate"), dtEnd.Year.ToString() +
(dtEnd.Month < 10 ? '0' + dtEnd.Month.ToString() : dtEnd.Month.ToString()) +
(dtEnd.Day < 10 ? '0' + dtEnd.Day.ToString() : dtEnd.Day.ToString())) <= 0)
group row by new { Year = row.Field<int>("Year"), Month = row.Field<int>("Month"), Day = row.Field<int>("Day") } into grp
orderby grp.Key.Year, grp.Key.Month, grp.Key.Day
select new
{
CurrentDate = grp.Key.Year + "/" + grp.Key.Month + "/" + grp.Key.Day,
DayOffset = (new DateTime(grp.Key.Year, grp.Key.Month, grp.Key.Day)).Subtract(dtStart).Days,
Count = grp.Sum(r => r.Field<int>("Count"))
};
and in this code, I try to iterate it with the following code:
foreach (var row in result2)
{
//... row.DayOffset.ToString() + ....
}
this issue occurred :
Object reference not set to an instance of an object.
I think it happens when there's no record with above criteria.
I tried to change it to enumerator like this , and use MoveNext() to check the data is on that or not:
result2.GetEnumerator();
if (enumerator2.MoveNext()) {//--}
but still the same error.
whats the problem?
I guess in one or more rows Media is null.
You then call Equals on null, which results in a NullReferenceException.
You could add a null check:
var result2 = from row in dt.AsEnumerable()
where row.Field<string>("Media") != null
&& row.Field<string>("Media").Equals(MediaTp, StringComparison.CurrentCultureIgnoreCase)
...
or use a surrogate value like:
var result2 = from row in dt.AsEnumerable()
let media = row.Field<string>("Media") ?? String.Empty
where media.Equals(MediaTp, StringComparison.CurrentCultureIgnoreCase)
...
(note that the last approach is slightly different)
My target is to crawl image url and image alt tag using scrapy . I tried many combinations but still didn't achieve it.
Here is what i tried
def parse_item(self, response):
sel = Selector(response)
item = imageItem()
item['crawl_time'] = time.asctime( time.localtime(time.time()))
item['crawl_date'] = time.asctime( time.localtime(time.strftime("%Y%m%d")))
item['url'] = response.url
for img in hxs.select('//img'):
item['title'] = node.xpath("#alt").extract()
item['iurl'] = node.xpath("#src").extract()
if response.meta['depth'] == 1:
exit
return item
Some issues there:
You already have sel selector. But you use hxs in the loop
in the loop, you are using node instead of img
does it make more sense that each loop should yield one image item
This is my tested and working code:
def parse_item(self, response):
sel = Selector(response)
images = sel.xpath('//img')
for img in images:
item = imageItem()
item['url'] = response.url
title = img.xpath('./#alt').extract() or ''
item_title = title[0] if title else ''
item['title'] = item_title
iurl = img.xpath('./#src').extract() or ''
item_iurl = iurl[0] if iurl else ''
item['iurl'] = item_iurl
yield item
Here is the below code using which I achieved the result , but depth is still 1
class MySpider(CrawlSpider):
name = 'imageaggr'
start_urls = ['http://www.dmoz.org/','http://timesofindia.indiatimes.com/','http://www.nytimes.com','http://www.washingtonpost.com/','http://www.jpost.com','http://www.rediff.com/']
rules = (
# Extract links matching 'category.php' (but not matching 'subsection.php')
# and follow links from them (since no callback means follow=True by default).
Rule(SgmlLinkExtractor(allow=('', ), deny=('defghi\.txt')), callback='parse_item'),
# Extract links matching 'item.php' and parse them with the spider's method parse_item
# Rule(SgmlLinkExtractor(allow=('\.cms','\.html' )), deny=('parse_item\.html'))),
#Rule(SgmlLinkExtractor(allow=('news', )), callback='parse_item'),
)
def parse_item(self, response):
sel = Selector(response)
images = sel.xpath('//img')
image_count = len(images)
count = 0
while(count < image_count):
item = imageItem()
item['url'] = response.url
title = sel.xpath('//img/#alt').extract()[count] or ''
if title == '':
break
item['title'] = title
iurl = sel.xpath('//img/#src').extract()[count] or ''
item['iurl'] = iurl
item['crawl_time'] = time.asctime( time.localtime(time.time()))
crawl_date = time.strftime("%Y%m%d")
item['crawl_date'] = crawl_date
count = count + 1
return item
I want to use Linq to XML and I am trying to figure out to way to get the value for currency CAD, BuyRateForeign, basically I need to get value 5,602895.
<ExchRates>
<ExchRate>
<Bank>Bank</Bank>
<CurrencyBase>USD</CurrencyBase>
<Date>24.05.2013.</Date>
<Currency Code="036">
<Name>AUD</Name>
<Unit>1</Unit>
<BuyRateCache>5,569450</BuyRateCache>
<BuyRateForeign>5,625707</BuyRateForeign>
<MeanRate>5,711378</MeanRate>
<SellRateForeign>5,797049</SellRateForeign>
<SellRateCache>5,855019</SellRateCache>
</Currency>
<Currency Code="124">
<Name>CAD</Name>
<Unit>1</Unit>
<BuyRateCache>5,546866</BuyRateCache>
<BuyRateForeign>5,602895</BuyRateForeign>
<MeanRate>5,688218</MeanRate>
<SellRateForeign>5,773541</SellRateForeign>
<SellRateCache>5,831276</SellRateCache>
</Currency>
</ExchRate>
</ExchRates>
var xDoc = XDocument.Load("path/to/your.xml");
var cadForeignBuyRate =
xDoc.Root.Element("ExchRate").First(e => e.Element("Bank").Value == "Bank")
.Elements("Currency").First(e => e.Element("Name").Value == "CAD")
.Element("BuyRateForeign").Value;
var xDoc = XDocument.Load("path/to/your.xml");
var BuyRateForeign = from nm in xDoc.Descendants("Currency")
where (string)nm.Element("Name") == "CAD"
select (string)nm.Element("BuyRateForeign");
Using Lambda expression
var stringRate = xDoc.Descendants("Currency")
.Where(p=> (string)p.Element("Name") == "CAD")
.Select(p => (string)p.Element("BuyRateForeign"))
.SingleOrDefault();