In my database I have a field timestamp like this :
2016/03/23 14:00
2016/03/23 14:01
2016/03/23 14:03
And sometimes I don't have value (here, I don't have 2016/03/23 14:02).
What I want to do to fix this problem :
Store in two parameters the first and the last value (I know how to do this).
Make a dataSet with all values using javascript and those two parameters, in this case :
2016/03/23 14:00
2016/03/23 14:01
2016/03/23 14:02
2016/03/23 14:03
Then I will be able to do a join dataset with those two dataset to fix my problem.
Is it possible with Birt without using a Pojo data source ? I never used this kind of datasource and it seems a little bit complicated for what I need to do...
Thank you and have a good day
What you are looking for is a "Scripted Data Source". If you create a Data Set based on a Scripted Data Source, you can write your fetch method on your own and generate the Data Rows based on your Input Parameters.
Please search for a tutorial about scripted data sources on your own. This is a whide field.
Related
I have following table called "status" in source :
and following is the target table requirement:
My issue is I am not able to finalise how to write the job to put it against a column time dimension. It's easy in excel to simply divide the cell but I am not able to do it using ETL tool.
I am sure someone must have faced and resolved similar requirement.
Please help.
Caveat: I have not worked with MYSQL and SAP BODS, I am using SQL Server tools as a platform to explain the solution. I am not including any codes, but just listing high level steps; which I recommend.
I believe you should be able to tackle this problem by:
Build a date table (or date dimension in DWH language).This table should have dates, associated financial ears . Refer to the following location for info:
https://www.mssqltips.com/sqlservertip/4054/creating-a-date-dimension-or-calendar-table-in-sql-server/
Select the data into a staging table and increment a counter column , for each of the source columns; Req in count, offer issued etc. The date dim should have the months associated with each of the dates, so you should be able to map the counts.
You should then be able to pivot the date into your destination table.
Hope that helps.
Cheers
Nithin
I have a internal webpage that makes data from excel searchable and readable from a 3rd party excel export file. The webpage allows for the uploading of multiple excel files in which the data gets read and stored in a MySQL database.
We want to update the application to keep a history of the uploaded data (it's data that has monthly values) so we can easily search, filter and generate graphs from the uploaded data.
So I am using Laravel 5.4 and have maatwebsite\excel to import and parse the excel file.
The Excel file always consists of the following columns (Dummy File)
| Item group | item # | item name | Item Currency | <month> <year> |
After Item Currency there is always 36 columns for the past 3 years of data from the current month so a column would be named like dec 2017
Now in Laravel I have created a Model for the item named Item and a model for the monthly values named ItemMonthly
Now I am able to read the file and create columns dynamically in the database but I feel like this is very ugly and not efficient at all:
(Gist) Code for Models and Excel Function
Biggest problem
Because I need to read all the monthly data and since I need them in order of month I can't really rename all the columns as far as I know. I need to be able to get all the columns to render in a Highchart graph and in a Datatable. and some items don't have the same monthly data (some only go till 2015 for example.
Needed advice
I've read a couple of solutions here some of them saying instead of creating columns in MySQL just store the monthly data as a json object in a single column.
Some answers just completely advice on changing from MySQL to MongoDB
I am kind of at a loss to find the best approach for this, and am sincerely wondering if MySQL is the right way to go. The solutions I have been trying so far all seem to involve really hacky ways of doing this.
If there is more info needed please let me know. I don't want to write an immense wall of text but I also want to provide the correct amount of information.
Many thanks!
I have a huge CSV file (over 57,000 rows and 50 columns) that I need to analyze.
Edit: Hi guys, thanks for your answers and comments, but I am still really confused about how to do this in Ruby, and I have no idea how to use MySQL. I will try to be more specific:
The CSV files:
CSV on Storm Data Details for 2015
CSV on Storm Data Details for 2000
The questions:
Prior to question start, for all answers, exclude all rows that have a County/Parish, Zone, or Marine name that begins with the letters A, B, or C.
Find the month in 2015 where the State of Washington had the largest number of storm events. How many days of storm-free weather occurred in that month?
How many storms impacting trees happened between 8PM EST and 8AM EST in 2000?
In which year (2000 or 2015) did storms have a higher monetary impact within the boundaries of the 13 original colonies?
The problems:
1) I was able to use filters in Excel to determine that the most "Thunderstorm Wind" events in Washington happened in July (6 entries), and there were 27 days of storm-free weather. However, when I tried to check my work in Spotfire, I got completely different results. (7 entries in May, and 28 days of storm-free weather in May. Excel only found two Thunderstorm Wind events in May.) Do you know what could be causing this discrepancy?
2) There are two columns where damage to trees might be mentioned: Event_Narrative and Episode_Narrative. Would it be possible to search both columns for "tree" and filter the spreadsheet down to only those results? Multiple-column filtering is apparently impossible in Excel. I would also need to find a way to omit the word "street" in the results (because it contains the word "tree").
The method I came up with for the time range is to filter to only EST and AST results, then filter Begin_Time to 2000 to 2359 and 0 to 759 and repeat those ranges to filter End_Time. This appears to work.
3) I was able to filter the states to Delaware, Pennsylvania, New Jersey, Georgia, Connecticut, Massachusetts, Maryland, South Carolina, New Hampshire, Virginia, New York, North Carolina, and Rhode Island. It seems like a simple task to add all the values in Columns Y and Z (Damage_Property, Damage_Crops) and compare between the two years, but the values are written in the form "32.79K" and I cannot figure out how to make the adding equation work in that format or convert the values into integers.
Also, the question is asking for the original territory of the colonies, which is not the same as the territory those states now occupy. Do you know of a way to resolve this issue? Even if I had the time to look up each city listed, there does not seem to be a database of cities in the original 13 colonies online, and even if there was, the names of the cities may now be different.
I am learning Ruby and some people have suggested that I try to use the Ruby CSV library to put the data into an array. I have looked at some tutorials that sort of describe how to do that, but I still don't understand how I would filter the data down to only what I need.
Can anyone help?
Thank you!
I downloaded the data so I could play with it. You can get the record count pretty easily in Ruby. I just did it in irb:
require 'csv'
details = []
CSV.foreach("StormEvents_details-ftp_v1.0_d2015_c20160818.csv") do |row|
details << row
end
results = details.select do |field|
[field[-2], field[-3]].any? { |el| el[/\btree\b/i] } && field[8] == "CALIFORNIA"
end
results.count
=> 125
I just used array indices. You could zip things together and make hashes for better readability.
Wanted to post this as a comment but I don't have enough rep. Anyways:
I have converted CSV/xls files to JSON in the past with the help of some nodejs packages and uploaded them to my couchbase database. Within couchbase I can query with N1ql (really just SQL) which will allow you to achieve your goal of filtering multiple criterias. Like spickermann said, a database will solve your problem.
Edit:
My-Sql also supports importing a CSV file to a My-SQL table. Will be easier than the CSV to JSON to Couchbase
Csv-to-json
https://github.com/cparker15/csv-to-json/blob/master/README.md
I am attempting to parse this HTML table representing a year's worth of temperature data, provided by an Australian government website.
This table is set up in an unusual way: the columns are months, and the rows are days of the month (so the first row's cells are JAN 1, FEB 1, MAR 1). Each cell contains a number if there's data recorded for that day, an empty cell if no data was recorded, or a cell class notDay if the day does not exist (eg Feb 31st).
My intent is to build a database full of this data in the format
DATE RAINFALL MAX TEMP
2015-02-07 35 31
2015-02-07 40 17
My question is: what would the simplest or most efficient (in terms of programmer efficiency) way to parse the table to get the data into a usable format?
I'm personally using Ruby with the Nokogiri library, but general non-language-specific algorithm/approach advice is welcome if it makes for a better discussion. I'm not looking for someone to write the code and solve the problem for me, but for advice about the approach to take.
I wonder if you can:
Take all the cells in the order they appear:
Use Array#flatten if you've got an array-of-array situation.
Discard any notDay cells with Array#reject
Iterate over all the relevant dates using a date range:
(Date.new(2014,1,1) .. Date.new(2014,12,31)).each {...}
And go from there...?
I'm doing an ETL-process with Pentaho (Spoon / Kettle) where I'd like to read XML-file and store element values to db.
This works just fine with "Get data from XML" -component...but the XML file is quite big, several giga bytes, and there fore reading the file takes too long.
Pentaho Wiki says:
The existing Get Data from XML step is easier to use but uses DOM
parsers that need in memory processing and even the purging of parts
of the file is not sufficient when these parts are very big.
The XML Input Stream (StAX) step uses a completely different approach
to solve use cases with very big and complex data stuctures and the
need for very fast data loads...
There fore I'm now trying to do the same with StAX, but it just doesn't seem to work out like planned. I'm testing this with XML-file which only has one element group. The file is read and then mapped/inserted to table...but now I get multiple rows to table where all the values are "undefined" and some rows where I have the right values. In total I have 92 rows in the table, even though it should only have one row.
Flow goes like:
1) read with StAX
2) Modified Java Script Value
3) Output to DB
At step 2) I'm doing as follow:
var id;
if ( xml_data_type_description.equals("CHARACTERS") &&
xml_path.equals("/labels/label/id") ) {
id = xml_data_value; }
...
I'm using positional-staz.zip from http://forums.pentaho.com/showthread.php?83480-XPath-in-Get-data-from-XML-tool&p=261230#post261230 as an example.
How to use StAX for reading XML-file and storing the element values to DB?
I've been trying to look for examples but haven't found much. The above example uses "Filter Rows" -component before inserting the rows. I don't quite understand why it's being used, can't I just map the values I need? It might be that this problem occurs because I don't use, or know how to use, Filter Rows -component.
Cheers!
I posted a possible StAX-based solution on the forum listed above, but I'll post the gist of it here since it is awaiting moderator approval.
Using the StAX parser, you can select just those elements that you care about, namely those with a data type of CHARACTERS. For the forum example, you basically need to denormalize the rows in sets of 4 (EXPR, EXCH, DATE, ASK). To do this you add the row number to the stream (using an Add Sequence step) then use a Calculator to determine a "bucket number" = INT((rownum-1)/4). This will give you a grouping field for a Row Denormaliser step.
When the post is approved, you'll see a link to a transformation that uses StAX and the method I describe above.
Is this what you're looking for? If not please let me know where I misunderstood and maybe I can help.