We a need a csv viewer which can look at 10MM-15MM rows on a windows environment and each column can have some filtering capability (some regex or text searching) is fine.
I strongly suggest using a database instead, and running queries (eg, with Access). With proper SQL queries you should be able to filter on the columns you need to see, without handling such huge files all at once. You may need to have someone write a script to input each row of the csv file (and future csv file changes) into the database.
I don't want to be the end user of that app. Store the data in SQL. Surely you can define criteria to query on before generating a .csv file. Give the user an online interface with the column headers and filters to apply. Then generate a query based on the selected filters, providing the user only with the lines they need.
This will save many people time, headaches and eye sores.
We had this same issue and used a 'report builder' to build the criteria for the reports prior to actually generating the downloadable csv/Excel file.
As other guys suggested, I would also choose SQL database. It's already optimized to perform queries over large data sets. There're couple of embeded databases like SQLite or FirebirdSQL (embeded).
http://www.sqlite.org/
http://www.firebirdsql.org/manual/ufb-cs-embedded.html
You can easily import CSV into SQL database with just few lines of code and then build a SQL query instead of writing your own solution to filter large tabular data.
Related
I am looking for a way to extract power queries metadata from power query editor to spreadsheet or word for documentation purposes to understand the transformations or formulas applied in each query present in power query editor.
I have read different comments in other sites including renaming .XLSX to .ZIP and inside xl\connections.xml there's a Microsoft.Mashup.OleDb.1 data connection with some metadata but I am not successful in extracting the queries metadata. I am looking for any automated process to extract power queries transformation data into spreadsheet outside of power query. Any suggestions or ideas will be great help for me.
You can access the code underlying any Power Query in Excel through the Queries object that is part of the workbook. It's in "Formula" property of the Query object. You can also get the name of the Query with the "Name" property. It just gives you the code as plain text, so it would be up to you to apply any context to that.
for i = 1 to ThisWorkbook.Queries.Count
ThisWorkbook.Queries(i).Name
ThisWorkbook.Queries(i).Formula
next
Note this only works in Excel 2016 or later. Older versions of Excel where PQ is installed as an add-in can't access PQ through VBA. I'm also unaware of any method to extract information on the dependencies between Queries within a workbook (though with consistent naming conventions you could pretty easily build this yourself I figure).
I have a financial system with all its business logic located in the database and i have to code an automated workflow for transactions batch processing, which consists of steps listed below:
A user or an external system inserts some data in a table
Before further processing a snapshot of this data in the form of CSV file with a digital signature has to be made. The CSV snapshot itself and its signature have to be saved in the same input table. Program updates successfully signed rows to make them available for further steps of code
...further steps of code
Obvious trouble is step#2: I don't know, how to assign results of a query as a BLOB, that represents a CSV file, to a variable. It seems like some basic stuff, but I couldn't find it. The CSV format was chosen by users, because it is human-readable. Signing itself can be made with a request to external system, so it's not an issue.
Restrictions:
there is no application server, which could process the data, so i have to do it with plsql
there is no way to save a local file, everything must be done on the fly
I know that normally one would do all the work on the application layer or with some local files, but unfortunately this is not the case.
Any help would be highly appreciated, thanks in advance
I agree with #william-robertson. you just need to create a comma delimited values string (assuming header and data row) and write that to a CLOB. I recommend an "insert" trigger. There are lots of SQL tricks you can do to make that easier). On usage of that CSV string will need to be owned by the part of the application that reads it in and needs to do something with it.
I understand yo stated you need to create a CVS, but see if you could do XML instead. Then you could use DBMS_XMLGEN to generate the necessary snapshot into a database column directly from the query for it.
I do not accept the concept that a CVS is human-readable (actually try it sometime as straight text). What is valid is that Excel displays it in human-readable form. But is should also be able to display the XML as human-readable. Further, if needed the data in it can be directly back-ported into the original columns.
Just a alternate idea.
I'm importing data from a text file using Bulk Insert in the script component in SSIS package.
Package Ran successfully and data imported into SQL
Now how do I validate the completeness of the data?
1. I can get the row count from source and destination and compare.
but my manager wants to know how we can verify all the data has come a cross as it is without any issues.
If we are comparing 2 tables then probably a joining them on all fields and see anything missing out.
I’m not sure how to compare a text file and a sql table.
One way I could is write code to ready the file line by line and query the database for that record and compare each and every field. We have millions of records so this is not going to be a simple task.
Any other way to validate all of the data ??
Any suggestions would be much appreciated
Thanks
Ned
Well you could take the same file and do a look-up to the SQL source and if any of the columns don't match move to a row count.
Here's a generic example of how you can do this.
How can I retrieve data (using sql) from Excel to a table in Oracle database. I am using dbsaint.
Instead of DBSAINT, which developer tool should I use for this purpose?
The easiest way to do this is to export the data from Excel into a CSV file. Then use an external table to insert the data into your database table.
Exporting the CSV file can be as simple as "Save as ...". But watch out if your data contains commas. In that case you will need to ensure that the fields are delimited safely and/or that the separator is some other character which doesn't appear in your data: a set of characters like |~| (pipe tilde pipe) would work. Find out more.
External tables were introduced in Oracle 9i. They are just like normal heap tables except their data is held in external OS files rather than inside the database. They are created using DDL statements and we can run SELECTs against them (they are read only). Find out more.
Some additional DB infrastructure is required - the CSV files need to reside in an OS directory which is defined as an Oracle dictionary object. However, if this is a task you're going to be doing on a regular basis then the effort is very worthwhile. Find out more.
I don't know much about DbSaint; it's some kind of database IDE like TOAD or SQL Developer but focused at the cheap'n'cheerful end of the market. It probably doesn't support this exact activity, especially exporting to CSV from Excel.
Often times when I am working on a project, I find my self looking at the database scheme and having to export the data to work with the new scheme.
Lots of times there has been a database where the data stored was fairly crude. What I mean by that is that its stored with lots of unfiltered characters. I find my self writing custom php scripts to filter through this information and create a nice clean UTF-8 CSV file that I then reimport into my new database.
I'd like to know if there are better ways to handle this?
I would suggest using an ETL tool, or at least following ETL practices when moving data. Considering that you are already cleaning, you may follow the whole ECCD path -- extract, clean, conform, deliver. If you do your own cleaning, consider saving intermediate csv files for debug and audit purpose.
1. Extract (as is, junk included) to file_1
2. Clean file_1 --> file_2
3. Conform file_2 --> file_3
4. Deliver file_3 --> DB tables
If you archive files 1-3 and document versions of your scripts, you will be able to backtrack in case of a bug.
ETL tools -- like Microsoft SSIS, Oracle Data Integrator, Pentaho Data Integrator -- connect to various data sources and offer plenty of transformation and profiling tasks.
No one answer to this one, but i once needed to quickly migrate a database and ended up using sqlautocode, which is a tool to autogenerate a (python orm) model from an existing database - the model uses the great sqlalchemy orm library. It even generates some sample code, to get started ... (see below)
Amazingly, it worked out of the box. You do not have a full migration, but an easy way to programmatically access all your tables (in python).
I didn't do it at that project, but you could of course autogenerate your orm layer for the target DB as well, then write a script, which transfers the right rows over into the desired structure.
Once you get your DB content into python, you will be able to deal with u'unicode', even if it will take some attepts, dependent on the actual crudeness ...
Example code:
# some example usage
if __name__ == '__main__':
db = create_engine(u'mysql://username:password#localhost/dbname')
metadata.bind = db
# fetch first 10 items from address_book
s = customers.select().limit(10)
rs = s.execute()
for row in rs:
print row
You can consider Logstash.
logstash is a tool for managing events and logs. You can use it to collect logs, parse them, and store them for later use (like, for searching)
Logstash operate every single event/log like pipe: input | filter | output.
Logstash have many input plugins to accept different sources/formats, and you can use filter to parse your source data then output to multiple outputs/formats which you need.