I'm writing a function that supposed to work with lists of lists, and if I use this example:
def mode(year):
monthAmount = len(year)
for month in year:
(month)Index = len(month)
What I want this to do is, say year is [January, February, March], the results should be something like this: JanuaryIndex = *, FebruaryIndex = *, MarchIndex = *, and so on; with a number of different months. Is there an easy way to do this? Thanks.
I am not entirely sure what you are looking for here.
To get an index into a sequence you are looping over together with the actual value, use the enumerate() function:
for index, month in enumerate(year):
print index, month
You really do not want to dynamically set global variables. Use a dictionary instead:
monthindices = {}
for month in year:
monthindices[month] = len(month)
You can create global variables dynamically, by accessing the globals() mapping, but doing this is generally a bad idea. You'd do it like this if you are stubborn:
gl = globals()
for month in year:
gl['{}Index'.format(month)] = len(month)
Related
I imported my dataset with SFrame:
products = graphlab.SFrame('amazon_baby.gl')
products['word_count'] = graphlab.text_analytics.count_words(products['review'])
I would like to do sentiment analysis on a set of words shown below:
selected_words = ['awesome', 'great', 'fantastic', 'amazing', 'love', 'horrible', 'bad', 'terrible', 'awful', 'wow', 'hate']
Then I would like to create a new column for each of the selected words in the products matrix and the entry is the number of times such word occurs, so I created a function for the word "awesome":
def awesome_count(word_count):
if 'awesome' in product:
return product['awesome']
else:
return 0;
products['awesome'] = products['word_count'].apply(awesome_count)
so far so good, but I need to manually create other functions for each of the selected words in this way, e.g., great_count, etc. How to avoid this manual effort and write cleaner code?
I think the SFrame.unpack command should do the trick. In fact, the limit parameter will accept your list of selected words and keep only these results, so that part is greatly simplified.
I don't know precisely what's in your reviews data, so I made a toy example:
# Create the data and convert to bag-of-words.
import graphlab
products = graphlab.SFrame({'review':['this book is awesome',
'I hate this book']})
products['word_count'] = \
graphlab.text_analytics.count_words(products['review'])
# Unpack the bag-of-words into separate columns.
selected_words = ['awesome', 'hate']
products2 = products.unpack('word_count', limit=selected_words)
# Fill in zeros for the missing values.
for word in selected_words:
col_name = 'word_count.{}'.format(word)
products2[col_name] = products2[col_name].fillna(value=0)
I also can't help but point out that GraphLab Create does have its own sentiment analysis toolkit, which could be worth checking out.
I actually find out an easier way do do this:
def wordCount_select(wc,selectedWord):
if selectedWord in wc:
return wc[selectedWord]
else:
return 0
for word in selected_words:
products[word] = products['word_count'].apply(lambda wc: wordCount_select(wc, word))
I am working with an API that requires me to pass in numbers as strings. I need to increment the counter on each call.
I am using the following code:
days = days.to_i
days += 1
days = days.to_s
This works, but seems kind of sloppy. Is there a more way to do this in Ruby?
Yes, there is. You can do:
days = days.next
or
days = days.succ
Or, you can use the bang (!) methods:
days.next!
or
days.succ!
I have following problem:
I need to write a begin and end date into a matrix. Where the matrix contains the yearly quarters (1-4) in the collumns and the rows are the year.
E.g.
Matrix:
Q1 Q2 Q3 Q4
2010
2011
Now the Date 01.01.2010 should be put in the first element and the date 09.20.2011 in the sixed element.
Thanks in advance.
You first have to consider that SAS does not actually have date/time/datetime variables. It just uses numeric variables formatted as date/time/datetime. The actual value being:
days since 1/1/1960 for dates
seconds since 00:00 for times
seconds since 1/1/1960 00:00 for datetimes
SAS does not even distinguish between integer and float numeric types. So a date value can contain a fractional part.
What you do or can do with a SAS numeric variable is completely up to you, and mostly depends on the format you apply. You could mistakenly format a variable containing a date value with a datetime format... or even with a currency format... SAS won't notice or complain.
You also have to consider that SAS does not even actually have matrixes and arrays. It does provide a way to simulate their use to read and write to dataset variables.
That said, SAS does provide a whole lot of formats and informats that allow you to implement date and time manipulation.
Assuming you are coding within a data step, and assuming the "dates" are in dataset numeric variables, then the PUT function can extract the datepart you need to calculate row, column of the matrix element to write to, like so:
DATA table;
ARRAY dm{2,4} dm_r1c1-dm_r1c4 dm_r2c1-dm_r2c4;
beg_row = PUT(beg_date, YEAR4.)-2009;
end_row = PUT(end_date, YEAR4.)-2009;
beg_col = PUT(beg_date, QTR1.);
end_col = PUT(end_date, QTR1.);
dm{beg_row,beg_col} = beg_date;
dm{end_row,end_col} = end_date;
RUN;
... or if you are using a one-dimensional array:
DATA table;
ARRAY da{8} da_1-da_8;
beg_index = 4 * (PUT(beg_date, YEAR4.)-2010) + PUT(beg_date, QTR1.);
end_index = 4 * (PUT(end_date, YEAR4.)-2010) + PUT(end_date, QTR1.);
da{beg_index} = beg_date;
da{end_index} = end_date;
RUN;
I am attempting to generate a dummy variable for each year from 1996 to 2012 (inclusive) such that the 1996 dummy should equal 1 if it is 1996 and 0 if else using the foreach command in Stata to cut down on time (at least for future projects). What is currently happening is that the dummy for 1996 is being produced, but no others are generated.
I think that it has to do with how I am defining j, but I cannot quite figure out the formatting to achieve the results that I want. I have looked online and in the Stata help files and cannot find anything on this specific topic.
Here is what I have thus far:
local var year
local j = 1996
foreach j of var year {
gen d`j' = 1 if year==`j'
local ++j
}
I will continue to try and figure this out on my own, but if anyone has a suggestion I would be greatly appreciative.
Let us look at this line by line.
local var year
You defined a local macro var with content "year". This is legal but you never refer to that local macro in this code, so the definition is pointless.
local j = 1996
You defined a local macro j with content "1996". This is legal.
foreach j of var year {
You open a loop and define the loop index to be j. That means that within the loop any reference to local macro j will be interpreted in terms of the list of arguments you provide. (The previous definition of j is irrelevant within the loop, and so has no effect in the rest of your code.)
... of var year
You specify that the loop is over a variable list here. Note that the keyword var here is short for varlist and has absolutely nothing to do the local macro name var you just defined. The variable list consists of the single variable name year.
gen d`j' = 1 if year==`j'
This statement will be interpreted, the one and only time the loop is executed, as
gen dyear = 1 if year==year
as references to the local macro j are replaced with its contents, the variable name year. year==year is true for every observation. The effect is a new variable dyear which is 1 in every observation. That is not an indicator or dummy variable as you want it. If you look at your dataset carefully, you will see that is not a dummy variable for year being 1996.
local ++j
You are trying to increment the local macro j by 1. But you just set local macro j to contain the string "year", which is a variable name. But you can't add 1 to a string, and so the error message will be type mismatch. You don't report that error, which is a surprise. It is a little subtle, as in the previous command the context of generate allows interpretation of the reference to year as an instruction to calculate with the variable year, which is naturally numeric. But local commands are all about string manipulation, which may or may not have numeric interpretation, and your command is equivalent, first of all, to instructing Stata to add
"year" + 1
which triggers a type mismatch error.
Turning away from your code: Consider a loop
forval y = 1996/2012 {
gen d`y' = 1 if year == `y'
}
This is closer to what you want but makes clearer another bug in your code. This would create variables d1996 to d2012 but each will be 1 in the year specified but missing otherwise, which is not what you want.
You could fix that by adding a further line in the loop
replace d`y' = 0 if year != `y'
but a much cleaner way to do it is the single line
gen d`y' = year == `y'
The expression
year == `y'
is evaluated as 1 when true and 0 when false, which is what you want.
All this is standard technique documented in [U] or [P].
As #Roberto Ferrer pointed out, however, experienced Stata users would not define dummies this way, as tabulate offers an option to do it without a loop.
A tutorial that brings together comments on local macros, foreach and forvalues loops is within http://www.stata-journal.com/sjpdf.html?articlenum=pr0005
search foreach
within Stata would have pointed to that as one of various pieces you can read.
Looping is not necessary. Try the tabulate command with the gen() option. See help tabulate oneway.
See also help xi and help factor variables.
You are trying to loop through the distinct values of year but the syntax is not correct. You are actually looping through a list of variables with only one element: year. The command levelsof gives you the distinct values, but like I said, looping is not necessary.
Maybe this might help.
/*assuming the data is from 1970-2012*/
/*assuming your year variable name is fyear*/
forvalues x=1970/2012 {
gen fyear `x'=0
replace fyear `x'=1 if fyear==`x'
}
However, I do agree with Roberto Ferrer that loop may not be necessary.
I'm using LotusScript to clean and export values from a form to a csv file. In the form there are multiple date fields with names like enddate_1, enddate_2, enddate_3, etc.
These date fields are Data Type: Text when empty, but Data Type: Time/Date when filled.
To get the values as string in the csv without errors, I did the following (working):
If Isdate(doc.enddate_1) Then
enddate_1 = Format(doc.enddate_1,"dd-mm-yyyy")
Else
enddate_1 = doc.enddate_1(0)
End If
But to do such a code block for each date field didnt feel right.
Tried the following, but that isnt working.
For i% = 1 To 9
If Isdate(doc.enddate_i%) Then
enddate_i% = Format(doc.enddate_i%,"dd-mm-yyyy")
Else
enddate_i% = doc.enddate_i%(0)
End If
Next
Any suggestions how to iterate numbered fields with a for loop or otherwise?
To iterate numbered fields with a for loop or otherwise?
valueArray = notesDocument.GetItemValue( itemName$ )
however do you know that there is a possibility to export documents in CSV format using Notes Menu?
File\Exort
Also there is a formula:
#Command([FileExport]; "Comma Separated Value"; "c:\document.csv")
Combined solution of Dmytro, clarification of Richard Schwartz with my block of code to a working solution. Tried it as an edit on solution of Dmytro, but was rejected.
My problem was not only to iterate the numbered fields, but also store the values in an iterative way to easily retrieve them later. This I found out today trying to implement the solution of Dmytro combined with the clarification of Richard Schwartz. Used a List to solve it completely.
The working solution for me now is:
Dim enddate$ List
For i% = 1 To 9
itemName$ = "enddate_" + CStr(i%)
If Isdate(doc.GetItemValue(itemName$)) Then
enddate$(i%) = Format(doc.GetItemValue(itemName$),"dd-mm-yyyy")
Else
enddate$(i%) = doc.GetItemValue(itemName$)(0)
End If
Next