I want to first mention that I have seen other questions on Stack Overflow similar to this and have attempted the resolutions recommended, however, each iteration replicates the problematic behaviour.
I am attempting to extract data from this website https://www.marketwatch.com/investing/stock/aapl/financials in order to do some financial analysis however, the dump to my csv file is always empty.
I've tried to identify the issue in the scrapy shell and it seems like my "in values" never evaluates to true but I am not sure why because the initial response.xpath does print the table values.
The code is below. I appreciate any help, thank you all!
values = ["Sales/Revenue", "Cost of Goods Sold (COGS) incl. D&A", "Depreciation & Amortization Expense", "Gross Income", "SG&A Expense", "Research & Development", "EBIT after Unusual Expense", "Pretax Income", "Income Tax", "Net Income", "EBITDA"]
for row in response.xpath('//table[#class="crDataTable"]/tbody/tr[not(contains(#class,"thead"))]'):
test = row.xpath('/td[1]//text()').extract()
for i in values:
if i in test:
item['rowTitle'] = row.xpath('/td[1]//text()').extract()
item['year1'] = row.xpath('/td[2]//text()').extract()
item['year2'] = row.xpath('/td[3]//text()').extract()
item['year3'] = row.xpath('/td[4]//text()').extract()
item['year4'] = row.xpath('/td[5]//text()').extract()
item['present'] = row.xpath('/td[6]//text()').extract()
yield item
Related
I have the following block of code that iterates through the fields of each table and adds the fields of the current table respectively in order to create a number of tableboxes.
'iterate through every table
For i=1 To arrTCount
'the arrFF array holds the names of the fields of each table
arrFF = Split(arrFields(i), ", ")
arrFFCount = UBound(arrFF)
'create a tablebox
Set TB = ActiveDocument.Sheets("Main").CreateTableBox
'iterate through the fields of the array
For j=0 to (arrFFCount - 1)
'add the field to the tablebox
TB.AddField arrFF(j)
'Msgbox(arrFF(j))
Next
Set tboxprop = TB.GetProperties
tboxprop.Layout.Frame.ObjectId = "TB" + CStr(i)
TB.SetProperties tboxprop
Next
The above code creates the tableboxes, but with one field less every time (the last one is missing). If I change the For loop from For j=0 To (arrFFCount - 1) to For j=0 To (arrFFCount) it creates empty tableboxes and seems to execute forever. Regarding this change, I tested the field names with the Msgbox(arrFF(j)) command and it shows me the correct field names as I want them to be in the tableboxes in the UI of QlikView.
Does anybody have an idea of what seems to be the problem here? Can I do this in a different way?
To clarify the situation here and what I have tested so far, I have 11 tables to make tableboxes of and I have tried with just one of them or some of them. The result I am seeing with the code is on the left and what I am expecting to see is on the right of the following image. Please note that the number of fields vary for each table and the image has just one of them as an example.
I imported my dataset with SFrame:
products = graphlab.SFrame('amazon_baby.gl')
products['word_count'] = graphlab.text_analytics.count_words(products['review'])
I would like to do sentiment analysis on a set of words shown below:
selected_words = ['awesome', 'great', 'fantastic', 'amazing', 'love', 'horrible', 'bad', 'terrible', 'awful', 'wow', 'hate']
Then I would like to create a new column for each of the selected words in the products matrix and the entry is the number of times such word occurs, so I created a function for the word "awesome":
def awesome_count(word_count):
if 'awesome' in product:
return product['awesome']
else:
return 0;
products['awesome'] = products['word_count'].apply(awesome_count)
so far so good, but I need to manually create other functions for each of the selected words in this way, e.g., great_count, etc. How to avoid this manual effort and write cleaner code?
I think the SFrame.unpack command should do the trick. In fact, the limit parameter will accept your list of selected words and keep only these results, so that part is greatly simplified.
I don't know precisely what's in your reviews data, so I made a toy example:
# Create the data and convert to bag-of-words.
import graphlab
products = graphlab.SFrame({'review':['this book is awesome',
'I hate this book']})
products['word_count'] = \
graphlab.text_analytics.count_words(products['review'])
# Unpack the bag-of-words into separate columns.
selected_words = ['awesome', 'hate']
products2 = products.unpack('word_count', limit=selected_words)
# Fill in zeros for the missing values.
for word in selected_words:
col_name = 'word_count.{}'.format(word)
products2[col_name] = products2[col_name].fillna(value=0)
I also can't help but point out that GraphLab Create does have its own sentiment analysis toolkit, which could be worth checking out.
I actually find out an easier way do do this:
def wordCount_select(wc,selectedWord):
if selectedWord in wc:
return wc[selectedWord]
else:
return 0
for word in selected_words:
products[word] = products['word_count'].apply(lambda wc: wordCount_select(wc, word))
I'm new in SQR. I need help to write a variable and use it for a condition statement. my pseudo code goes
declare $head_print
let $head_print = (select * from PS_DTR_RPT_ACCT
where RPT_TREE_NODE = 'REAL_ESTATE_EXP'
or TREE_NODE_NUM between 4600000 and 4699999)
if(head_print contain REAL_ESTATE_EXP or Account between 46000000 and 4699999)
then head_print = "REAL ESTATE";
else head_print = "Capital ESTATE";
It's not quite clear what you want so I'm making an assumption.
It seems it is if a certain value is in table PS_DTR_RPT_ACCT, then you want it to say "REAL ESTATE" otherwise say "CAPITAL ESTATE"
Now with SQR, you have to put your SQL in a begin-select block - rules are very strict - field names must be in column 1 - code underneath NOT in column 1. In the following routine, I've tried to code your pseudo code in real SQR, however, I could not test it so you may get errors - plus I don't know your field names since it just says "select *".
Begin-Report
do GetData
End-Report
Begin-Procedure GetData
! Initialize value - if no data found, query this later and set it to the "ELSE"
Let $Head_print = ''
Begin-Select
Head_Print
! Override the value from the table but only if you need to
Let $Head_Print = 'REAL ESTATE'
from PS_DTR_RPT_ACCT
Where RPT_TREE_NODE = 'REAL_ESTATE_EXP'
or TREE_NODE_NUM between 4600000 and 4699999)
End-Select
! If $Head_print is blank, then no value was found with the sql - do the ELSE part
If $Head_Print = ''
Let $Head_Print = 'Capital Estate'
End-If
End-Procedure
SQR is quite a nice finite language to learn - syntax somewhat strict, but simple as Basic with SQL. I do recommend reading the reference manual - it's downloadable from Oracle.
Feel free to ask any other questions about SQR - I get alerts if you do - sorry it took this long to answer
I'm using LotusScript to clean and export values from a form to a csv file. In the form there are multiple date fields with names like enddate_1, enddate_2, enddate_3, etc.
These date fields are Data Type: Text when empty, but Data Type: Time/Date when filled.
To get the values as string in the csv without errors, I did the following (working):
If Isdate(doc.enddate_1) Then
enddate_1 = Format(doc.enddate_1,"dd-mm-yyyy")
Else
enddate_1 = doc.enddate_1(0)
End If
But to do such a code block for each date field didnt feel right.
Tried the following, but that isnt working.
For i% = 1 To 9
If Isdate(doc.enddate_i%) Then
enddate_i% = Format(doc.enddate_i%,"dd-mm-yyyy")
Else
enddate_i% = doc.enddate_i%(0)
End If
Next
Any suggestions how to iterate numbered fields with a for loop or otherwise?
To iterate numbered fields with a for loop or otherwise?
valueArray = notesDocument.GetItemValue( itemName$ )
however do you know that there is a possibility to export documents in CSV format using Notes Menu?
File\Exort
Also there is a formula:
#Command([FileExport]; "Comma Separated Value"; "c:\document.csv")
Combined solution of Dmytro, clarification of Richard Schwartz with my block of code to a working solution. Tried it as an edit on solution of Dmytro, but was rejected.
My problem was not only to iterate the numbered fields, but also store the values in an iterative way to easily retrieve them later. This I found out today trying to implement the solution of Dmytro combined with the clarification of Richard Schwartz. Used a List to solve it completely.
The working solution for me now is:
Dim enddate$ List
For i% = 1 To 9
itemName$ = "enddate_" + CStr(i%)
If Isdate(doc.GetItemValue(itemName$)) Then
enddate$(i%) = Format(doc.GetItemValue(itemName$),"dd-mm-yyyy")
Else
enddate$(i%) = doc.GetItemValue(itemName$)(0)
End If
Next
[The description is a bit fudged to obfuscate my real work for confidentiality reasons]
I'm working on a QTP test for a web page where there are multiple HTML tables of items. Items that are available have a clickable item#, while those that aren't active have an item# as plain text.
So if I have a set of ChildObjects like this:
//This is the set of table rows that contain item numbers, active or not.
objItemRows = Browser("browserX").Page("pageY").ChildObjects("class:=ItemRow")
What is the simplest way in QTP land to select only the clickable link-ized item #s?
UPDATE: The point here isn't to select the rows themselves, it's to select only the rows that have items in them (as opposed to header/footer rows in each table). If I understand this correctly, I could then use objItemRows.Count to count how many items (available and unavailable) there are. Could I then use something like
desItemLink = Description.Create
desItemLink("micclass").value = "Link"
objItemLinks = objItemRows.ChildObjects(desItemLink)
To get the links within only the item rows?
Hope that clarifies things, and thanks for the help.
I think I have this figured out.
Set desItemLink = description.create
desItemLink("micclass").value = "Link"
desItemLink("text").RegularExpression = True
//True, Regex isn't really required in this example, but I just wanted to show it could be used this way
//This next part depends on the format of the item numbers, in my case, it's [0-9]0000[0-9]00[0-9]
For x = 0 to 9
For y = 0 to 9
For z = 0 to 9
strItemLink = x & "0000" & y & "00" & z
desItemLink("text").value = strItemLink
Set objItemLink = Browser("browser").Page("page").Link(desItemLink)
If objItemLink.Exist(0) Then
//Do stuff
End If
Next
Next
Next
Thanks for your help anyways, but the code above will iterate through links with names in a given incrementing format.