Stata year display different than stored format - panel

I am working with Stata and have a panel dataset with years ranging from 1990 to 2015. When browsing the data, the years are displayed as 1990, 1991 and so on. However, for instance, when trying to drop a year, it only works the following way
drop if year==11
which results in dropping the year 2000. When plotting data, the ticks are also displayed as 1,2,3,4...,25, 26, instead of the actual years.
How can I convert back years into their actual values?

It sounds as if you or someone else read in the data with year as a string variable and then used encode to generate a numeric variable. That's quite the wrong approach, as you have found out: you do not want the string to be mapped to integers 1 up. You need destring for that situation. Now that you have done this, you need decode and then destring or (if the original variable is still present in the dataset) destring.
Note that you should check your data carefully. Why was year imported in this way in the first place? Often this happens when data come from a spreadsheet and people don't check carefully enough for metadata (e.g. header information).
clear
input str4 original
"1990"
"1991"
"1992"
end
encode original, gen(year)
* solution 1
decode year, gen(year2)
destring year2, replace
* solution 2 (better)
destring original, replace
list
+-------------------------+
| original year year2 |
|-------------------------|
1. | 1990 1990 1990 |
2. | 1991 1991 1991 |
3. | 1992 1992 1992 |
+-------------------------+
Also, in Stata, "format" is nothing to do with what is stored, but with what is displayed. See help format. It is, naturally, an overloaded term in computing.

Related

Month slicer and filter not working properly on rolling data Power BI

i All.
I have created below measure to reflects always 3 month figures when month slicer is used.
3R =
CALCULATE(COUNT('Order'[Order/ not ordered]),DATESINPERIOD('Date'[Date],LASTDATE('Date'[Date]),-3,MONTH)))
However, when I try to add salesman figures as filter with below formula, and click to month slicer it is directly show just choosen month figures not 3 months.
3R John =
CALCULATE(COUNT('Order'[Order/ notordered])
,DATESINPERIOD('Date'[Date],LASTDATE('Date'[Date]),-3,MONTH),FILTER('Order','Order'[Salesman]="John")))
on below link I have shared a sample for this. there are 2 different table and 1 matrix.
Matrix is named working and 1 of the table is named as "not working properly". not selecting any value on slicer. all data gives same data. however when clicked month slicer not named main table is changing and this is correct. also matrix is correct but table that I am trying to achive is not working.
What i am trying to achive is, 3 month roling data based on customer and salesman. when click for example 1 on month slicer table should give January 2020, December 2019 and November 2019 figures.
https://drive.google.com/file/d/1LoqSiKhHMFn_OioI2RnXOzjcIL9dPRjS/view?usp=sharing
below is the solution. worked for me.
3R John =CALCULATE(COUNT('Order'[Order/notordered]),DATESINPERIOD('Date'[Date],LASTDATE('Date'[Date]),-3,MONTH),'Order'[Salesman]="John"))
just remove filter('order', section and it is ok now.

Converting dates from different centuries

I have staging table which contains date as string with format 'mm/dd/yy'. I have Oracle 11g procedure to convert the string to date format before loading into main table. I'm using to_date('03/20/34','mm/dd/rr') to convert into date format which is giving wrong output as 03/20/2034 whereas the correct date is 03/20/1934. Please help me out to get the correct output where my table contains dates from both centuries.
"I'm using to_date('03/20/34','mm/dd/rr') to convert into date format which is giving wrong output as 03/20/2034 whereas the correct date is 03/20/1934. "
RR was a hack Oracle introduced in the last Millennium as part of the fight to resolve the Y2K bug. The standard date mask YY defaults the century to the current century. But in 1999 it was more likely that 01/01/00 meant 01/01/2000 rather than 01/01/1900. So the RR hack was to derive the century for dates using fixed windows pivoting on 00: values 00-49 are given century 20, 50-99 are given 19. Clearly some of the time this guess would be wrong, but the data corruption introduced was of a lower level than defaulting all dates to century 19.
The key point is, the windows are fixed. It was intended to be a temporary solution, because there wasn't time to switch all the legacy systems to use four-digit years before 2000 arrived. But the vision was always that all systems would be fixed in the long term, even if only through retirement or replacement. Certainly nobody expected that new systems would be built supporting two-digit years.
It is now 2017 and there is no excuse for systems to still be using two-digit years. Back in the old days storage was expensive, and shaving two digits from a date was a valuable space saving. Now it is just sloppiness.
Which obviously doesn't help you solve your problem. The short answer is there is no way to change the pivot used by RR. The best solution would be to enforce stricter validation on the data input aspect of your system, and insist on four-digit years. Whether that's feasible depends on your office politics. The other solution is to write your own conversion function:
create or replace function my_to_date (p_str varchar2) return date as
begin
if to_number(substr(p_str, 7) <= 35 then
return to_date(substr(p_str, 1, 6)||'19'||substr(p_str, 7), 'dd/mm/yyyy');
else
return to_date(substr(p_str, 1, 6)||'20'||substr(p_str, 7), 'dd/mm/yyyy');
end;
Obviously you'll need to define the actual rules for deciding whether to use 19 or 20.
I also encountered an issue like this, when inserting date values from the late 90s. The format in the script I was given read DD-MON-YY, so the database read that as 20YY, instead of 19YY.
My very inelegant solution was to open the raw data file and simply add a "19" before the YY year values.

How do you parse a HTML table representing time?

I am attempting to parse this HTML table representing a year's worth of temperature data, provided by an Australian government website.
This table is set up in an unusual way: the columns are months, and the rows are days of the month (so the first row's cells are JAN 1, FEB 1, MAR 1). Each cell contains a number if there's data recorded for that day, an empty cell if no data was recorded, or a cell class notDay if the day does not exist (eg Feb 31st).
My intent is to build a database full of this data in the format
DATE RAINFALL MAX TEMP
2015-02-07 35 31
2015-02-07 40 17
My question is: what would the simplest or most efficient (in terms of programmer efficiency) way to parse the table to get the data into a usable format?
I'm personally using Ruby with the Nokogiri library, but general non-language-specific algorithm/approach advice is welcome if it makes for a better discussion. I'm not looking for someone to write the code and solve the problem for me, but for advice about the approach to take.
I wonder if you can:
Take all the cells in the order they appear:
Use Array#flatten if you've got an array-of-array situation.
Discard any notDay cells with Array#reject
Iterate over all the relevant dates using a date range:
(Date.new(2014,1,1) .. Date.new(2014,12,31)).each {...}
And go from there...?

Duration format in google spreadsheet

I'm trying to apply a duration format to some cells in google spreadsheet. I would like to convert an integer number in a format: X days x hours x minutes.
I've tried with some formats like: d:h:mm but i found a problem when I apply the format. It always put one day less. When I write 1 in the cell the convert to 31:0:00. When I write 2 the cells changes to 1:00:00.
That is because the duration format is actually a date / time format (for comparing dates).
If you simply enter a number (1) google will interpret that as midnight (as times are stored as fractions of whole days) of the reference day number 1.
Reference day in Google Sheets is 31/12/1899 - IE the 31st day of the month. That is why your result returns days=31.
To achieve what you want you effectively want to add 1 to your values. so that 1 (+1) actually becomes "2 days since 31/12/1899 - ie 01/01/1900 - ie 1 day, and you could then use custom format for display, but this wont work when you have >31 days.
I think the best way is to simply concatenate the data you have with relavent parts like so (where A1 is a cell containg your data - 1,2,1.5 etc):
=int(A1)&" days "&int(MOD(A1,1)*24)&" hours " & mod(MOD(A1,1)*24,1)*60 & " minutes"

SSRS Value By Date

Ok, I've seen similar questions on here, but nothing exactly the same. I am creating reports based on a cube that reads data from a DW. A lot of the reports tend to be along the lines of Value by Something By Week or Value By Something By Month. Everything seems ok, but the week and month (columns) don't order correctly. Week 10 goes before Week 9, February comes before January, etc. Im very frustrated bc I can't get these things to work correctly.
To add to this, at some point my customer needs to be able to write their own reports against the cube using Reportbuilder 3.0. So, I am reluctant to rely on manually editing the query. SURELY there is some obvious way to do this. In my DimDate I have a weekname that is a varchar, a week that is date, etc. Same for month.
Im missing something obvious here.
Thanks!
The sort order would make sense (varchars are strings {"Week 10", "Week 9"}, and {"February", "January"}) in that they are coming before their respective pair in the examples you've given, assuming an ASCII type of sort on the string values.
There are multiple ways to have ascending sort with strings as column headers (assuming ASCII type sorting on the string field):
Ensure week numbers are two digits in length e.g. "Week 9" would become "Week 09". This will ensure that the week columns are sorted in ascending order (or descending order, which ever is the case).
Add a month number in front of the month name e.g. "01 January", "02 February" -> You will still need two digit month numbers otherwise you will get the same issue you had with week numbers.
Use formatted dates as opposed to strings, as dates will be sorted properly.
Alternatively, if the issue is being caused in the dimension within the cube you can ensure any order by clauses are on keys, and not name fields.

Resources