I have built two queries using two different Universe. One Universe will give me data for the year 2015 and another data will give data for 2016.
Our fiscal calendar for 2015 ( October 2014 to September 2015). Hence I have the complete data for the year 2015 in one "Universe" .
For the year 2016 I will get the data from another Universe, which will get refreshed everyday.
I am doing an analysis where I need to compare the 2015 and 2016 data together.
Experts, please let me know if there is way to append 2015 and 2016 data using queries created by two different universe.
The data for 2015 remains constant, the data for 2016 should get append as soon its refreshed. Please advise if there is a way to append this data.
Thanks,
Ganesh
You can merge these two queries, similar to a SQL Union, even if they're based on the same data source.
On the report level, highlight and right-click the dimensions that are the same (date --> date, id --> id) and merge them individually.
Measures cannot be merged, but you can create an overall variable to add them together. Right-click on the 'variables' folder on the side, and type in a formula for a measure, like so:
=[Domestic].[Revenue] + [International].Revenue
Then, just drag and drop your data objects (merged dimensions and variable measures) onto your report.
This is assuming you're using WEBI.
Related
In excel power query I have a linked excel file where only the last column is updated by the team. For example is the cut-off data day is Jan 13th then the data is titled Jan 13 and the rows are updated for this week. When the update is done the following week, the column header is changed to Jan 20. How can I keep the old data for Jan 13th in power query so that I only get the new data in a new column when refreshed?
I tried but I am stuck
PowerQuery doesn't store any data, it's essentially a transformation script, that uses whatever is in the linked source when refreshed. So if the data for 13th is gone at the source it will be gone in PowerQuery too. If the column for 20th is added next to column for 13th, then you can create a dynamic logic that would keep more columns.
You could create a logic preserving the historical data using VBA (to store it in the not linked table), but in Excel & M it is not possible. With PowerBI there is an option of incremental refresh that potentially could be used here, but it's not available in Excel
I want to filter a column that spans from 2014-2019 to 2017-2018 in VS with SSIS.
I have tried different things but none seem to work.
Derived Column date in your example is likely what you're looking for.
The Week column is of a date type DT_DBDATE. Your string "2017-01-01" should be getting promoted to a data date type so the boolean check will identify if the lower bound is being met.
You'd either need to create a second derived column to check against the upper bound or as #vhoang indicates, change the logic to just extract the year from the date column.
YEAR([Week]) >= 2017 && YEAR([Week]) < 2019
Now, you have a column that flags each row as meets criteria or not (year is 2017 or 2018)
You will then need to do something with that. The SSIS something is called a Conditional Split. I would add a new path called OutOfConsideration and the logic there would be the inverse of our above Derived Column Derived Column date which is true if the year meets our criteria.
![Derived Column date]
Now connect your destination, or additional processing steps, to the Conditional Split's default output path. If you need to do processing on the invalid data, that'd be the OutOfConsideration path.
Finally, to get the best performance out of SSIS, only bring the rows into it that you need. If the source data is in a system that supports filtering, filter the data there. It is easy to click click click design SSIS packages but it is better long term for you to write custom queries to only bring the required columns and rows into the data flow. Less work for all around, lower maintenance cost, etc
Have a Google Sheet that I'm trying to build. It uses IFTTT to pull articles people read into a individual, and then an aggregate spreadsheet.
In terms of specs, it needs to:
Pull in data from multiple sheets.
The first column in each source sheet is a date column. Some are formulas (to remove extraneous data from another date column), and some are hard-coded. This may differ sheet to sheet, but is constant per sheet.
Once imported into the aggregate sheet, I need to sort by date.
Problem
I'm a query/importrange newbie, and I'm currently stuck on the sorting by date.
Current URL
https://docs.google.com/spreadsheets/d/1GLGYvApJgRheg7rgzoB8rFyTUgkRpZ2O8eKVE4bZyo4/edit?usp=sharing
When I order by Col1, I can't honestly tell how it is sorting, the end result is:
March 7, 2017
February 15, 2007
February 28, 2017
March 7, 2017
March 8, 2017
November 9, 2010
If you inspect the cells, the first March 7, 2017 is situated where the formula resides, which does not seem to move no matter how I sort. If you look at the sort order without that cell, it seems to be sorting alphabetically.
So it comes down to two main questions:
-What am I doing wrong that is making it so the order by is not including the first row.
Edit: This is now fixed
-How do I get it to recognize that the contents of the sorting column is a date?
Thanks ahead of time -
J.
Your formula seems to have a few problems.
importrange should take key not url. But it seems that it works anyway...
Pulled sheets have no header, so the 3rd parameter of query should be -1 or omitted, not 1.
If Col1 is a valid date, <> '' should not work. It should be is not null.
But it turns out that your pulled sheets' dates are not yyyy-mm-dd format so they weren't recognized as dates by query.
Thus, more valid formula should be like this:
=query({importrange("...", "Sheet1!A:E");importrange("...", "Sheet1!A:E")},
"select * where Col1 is not null order by Col1 asc",
-1)
And you should format dates(column A) on your pulled sheets to yyyy-mm-dd. Check my working sample aggregator and pulled sheet one and two.
I have a huge CSV file (over 57,000 rows and 50 columns) that I need to analyze.
Edit: Hi guys, thanks for your answers and comments, but I am still really confused about how to do this in Ruby, and I have no idea how to use MySQL. I will try to be more specific:
The CSV files:
CSV on Storm Data Details for 2015
CSV on Storm Data Details for 2000
The questions:
Prior to question start, for all answers, exclude all rows that have a County/Parish, Zone, or Marine name that begins with the letters A, B, or C.
Find the month in 2015 where the State of Washington had the largest number of storm events. How many days of storm-free weather occurred in that month?
How many storms impacting trees happened between 8PM EST and 8AM EST in 2000?
In which year (2000 or 2015) did storms have a higher monetary impact within the boundaries of the 13 original colonies?
The problems:
1) I was able to use filters in Excel to determine that the most "Thunderstorm Wind" events in Washington happened in July (6 entries), and there were 27 days of storm-free weather. However, when I tried to check my work in Spotfire, I got completely different results. (7 entries in May, and 28 days of storm-free weather in May. Excel only found two Thunderstorm Wind events in May.) Do you know what could be causing this discrepancy?
2) There are two columns where damage to trees might be mentioned: Event_Narrative and Episode_Narrative. Would it be possible to search both columns for "tree" and filter the spreadsheet down to only those results? Multiple-column filtering is apparently impossible in Excel. I would also need to find a way to omit the word "street" in the results (because it contains the word "tree").
The method I came up with for the time range is to filter to only EST and AST results, then filter Begin_Time to 2000 to 2359 and 0 to 759 and repeat those ranges to filter End_Time. This appears to work.
3) I was able to filter the states to Delaware, Pennsylvania, New Jersey, Georgia, Connecticut, Massachusetts, Maryland, South Carolina, New Hampshire, Virginia, New York, North Carolina, and Rhode Island. It seems like a simple task to add all the values in Columns Y and Z (Damage_Property, Damage_Crops) and compare between the two years, but the values are written in the form "32.79K" and I cannot figure out how to make the adding equation work in that format or convert the values into integers.
Also, the question is asking for the original territory of the colonies, which is not the same as the territory those states now occupy. Do you know of a way to resolve this issue? Even if I had the time to look up each city listed, there does not seem to be a database of cities in the original 13 colonies online, and even if there was, the names of the cities may now be different.
I am learning Ruby and some people have suggested that I try to use the Ruby CSV library to put the data into an array. I have looked at some tutorials that sort of describe how to do that, but I still don't understand how I would filter the data down to only what I need.
Can anyone help?
Thank you!
I downloaded the data so I could play with it. You can get the record count pretty easily in Ruby. I just did it in irb:
require 'csv'
details = []
CSV.foreach("StormEvents_details-ftp_v1.0_d2015_c20160818.csv") do |row|
details << row
end
results = details.select do |field|
[field[-2], field[-3]].any? { |el| el[/\btree\b/i] } && field[8] == "CALIFORNIA"
end
results.count
=> 125
I just used array indices. You could zip things together and make hashes for better readability.
Wanted to post this as a comment but I don't have enough rep. Anyways:
I have converted CSV/xls files to JSON in the past with the help of some nodejs packages and uploaded them to my couchbase database. Within couchbase I can query with N1ql (really just SQL) which will allow you to achieve your goal of filtering multiple criterias. Like spickermann said, a database will solve your problem.
Edit:
My-Sql also supports importing a CSV file to a My-SQL table. Will be easier than the CSV to JSON to Couchbase
Csv-to-json
https://github.com/cparker15/csv-to-json/blob/master/README.md
I have an RDCL Matrix report.
Generated in Visual Studio 2013.
In Asp.net MVC5.
I am creating the report for a database based on a range of dates returned in a recordset and bound at runtime to the report. Depening on the dates returned in the dataset the columns in the report are dynamically created (matrix report).
My date range will always be for a months duration so my report is very wide and space is of importance as users want entire months date range to fit on one page.
This is how my report currently looks(sorry i had to mask out data).As you can see i can only fit until the 23 October and then it moves onto other page. I need it all to fit on one page.
This is how it looks in the designer so you can see where the groupings are.
So in order to make it fit all on one page, i was hoping it could easily move the employee row group column above the data rows (only to appear once per instance of that employee) so it doesn't take up the space in the row it currently resides in ? This should allow me to gain a little space.
I have tried to moving this column manually or re-creating the grouping and making it a group header but then all employees appear in a list instead of at the head of each section. I have looked at similar questions such as this but it's not really what i need.
Any help would be appreciated.