Datameer - Add Columns to Joined table - hadoop

I have joined some data from HDFS with some data from an Oracle DW, which is working fine, but I cant seem to add any new columns to this sheet. To add columns for calculated fields etc I have to duplicate the sheet and do it there - this doesn't seem overly efficient.
Am I doing something wrong here or can you not add columns to a join result sheet?

... but I cant seem to add any new columns to this sheet.
Right. It will not be possible to add columns to a JoinedSheet. It is a new data set containing columns from two or more sheets based on a key column which you defined.
... or can you not add columns to a join result sheet?
It will be necessary to reference these data as input for a new Worksheet by Duplicating Worksheet.

Another approach could be using datameer rest-api. You can get the content of the workbook in json format and add columns manually or by implementing a simple script, then update the workbook with changed json file.

Related

Extract tables from pdf using anchor using Document Understanding in UiPath?

I am trying to extract tables from pdf files. I am using UiPath's Document Understanding for this purpose. I have to create a template for this purpose and then use that template for other similar invoices. The issue I am facing is that the number of items in the table is varying. For example some pdf's have table which contain 4 items whereas other pdf files have table which contain only 1 item. So if I create a template using the pdf which has a table containing 4 items then it works. But then the same template when used for files which have table containing 1 item then it does not properly extract the data in the table. Is there any solution for this?
The solution should be able to extract tables from similar invoices containing varying number of items in the table. The format and layout of the invoice and the table is similar. The only thing varying is the number of items in each table.
Thanks for your time and help!

Combining multiple sheets with different columns using Power Query

I have a workbook with multiple pages that need to get combined, i.e. stacked, into one table. While they have many similar column names, they do not all have the same columns and the column order differs. Because of this I cannot use the inherent merge functionality because it uses column order. Table.Combine will solve the problem, but I cannot figure out to create a statement that will use the "each" mechanic to do that.
For each worksheet in x workbook
Table.Combine(prior sheet, next sheet)
return all sheets stacked.
Would someone please help?
If you load your workbook with Excel.Workbook you can choose the Sheet Kind (instead of Table or DefinedName kinds) and ignore the sheet names.
let
Source = Excel.Workbook(File.Contents("C:\Path\To\File\FileName.xlsx"), null, true),
#"Filter Sheets" = Table.SelectRows(Source, each [Kind] = "Sheet"),
#"Promote Headers" = Table.TransformColumns(#"Filter Sheets", {{"Data", each Table.PromoteHeaders(_, [PromoteAllScalars=true])}}),
#"Combine Sheets" = Table.Combine(#"Promote Headers"[Data])
in
#"Combine Sheets"
Load each table into Power Query as a separate query
fix up the column names as needed for each individual query
save each query as a connection
in one of the queries (or in a separate query) use the Append command to append all the fixed up queries that now have the same column names.

Merged DataTable Don't Copy To New Form

f I have 2 DataTables (dt1 and dt4) and I want to merge them and put them in another (dtFinal). How can I do this in vb.net?
I tried the Merge statement on the datatable, but the Problem it get the columns but not the value from the table
dt1 > Is manually written dt
dt4 > is Excel File loaded to dt
I Need to merge them to the new Form and them make Sum on the two qty Column
to get as image attached , but all working
Merged Two Table
Dim dtFinal As New DataTable
dtFinal = Frm_DiffLive.dt4.Copy
dtFinal.Merge(Frm_DiffLive.dt)
dtFinal.Merge(Frm_DiffLive.dt4)
Me.dgvFinal.DataSource = dtFinal
1- It Load All Columns From Two dataTAble
But Only Contents of DataTable Fills Manually
Not The One That Populated From Excel File ???
Do I Need To save it first to memory or something like that ?
2- how to make one column from the new dtFinal is the sum between Two column at each Row ??
f
I have tried a lot of merge lines , but i think as the columns appear correct, it's not merge problem
The beat solution was to save the two table to db first
Thene re- select it again anf do the whole required process.

datatables + adding a filter per column

How do I get the search/filter to work per column?
I am making small steps in building, and constantly adding to my data table, which is pretty dynamic at this stage. It basically builds a datatable based on the dat that is fed into it. I have now added the footer to act as a search/filter, but unfortunately this is where I have become stuck. I cannot get the filer part to work. Advice greatly appreciated.
here is my sample data tables that I am working on http://live.datatables.net/qociwesi/2/edit
It basically has dTableControl object that builds by table.
To build my table I need to call loadDataFromSocket which does the following:
//then I have this function for loading my data and creating my tables
//file is an array of objects
//formatFunc is a function that formats the data in the data table, and is stored in options for passing to the dTableControl for formatting the datatable - not using this in this example
//ch gets the keys from file[0] which will be the channel headers
//then I add the headers
//then I add the footers
//then I create the table
//then i build the rows using the correct values from file
//then I draw and this then draws all the row that were built
//now the tricky part of applying the search to each columns
So i have got this far but the search per column is not working. How do I get the search/filter to wrok per column?
Note this is a very basic working example that I have been working off: http://jsfiddle.net/HattrickNZ/t12w3a65/
You should use t1.oTable to access DataTables API, see updated example for demonstration.
Please compare your code with jsFiddle in your question, notice its simplicity and consider rewriting your code.

Hive: How to have a derived column that has stores the sentiment value from the sentiment analysis API

Here's the scenario:
Say you have a Hive Table that stores twitter data.
Say it has 5 columns. One column being the Text Data.
Now How do you add a 6th column that stores the sentiment value from the Sentiment Analysis of the twitter Text data. I plan to use the Sentiment Analysis API like Sentiment140 or viralheat.
I would appreciate any tips on how to implement the "derived" column in Hive.
Thanks.
Unfortunately, while the Hive API lets you add a new column to your table (using ALTER TABLE foo ADD COLUMNS (bar binary)), those new columns will be NULL and cannot be populated. The only way to add data to these columns is to clear the table's rows and load data from a new file, this new file having that new column's data.
To answer your question: You can't, in Hive. To do what you propose, you would have to have a file with 6 columns, the 6th already containing the sentiment analysis data. This could then be loaded into your HDFS, and queried using Hive.
EDIT: Just tried an example where I exported the table as a .csv after adding the new column (see above), and popped that into M$ Excel where I was able to perform functions on the table values. After adding functions, I just saved and uploaded the .csv, and rebuilt the table from it. Not sure if this is helpful to you specifically (since it's not likely that sentiment analysis can be done in Excel), but may be of use to anyone else just wanting to have computed columns in Hive.
References:
https://cwiki.apache.org/Hive/gettingstarted.html#GettingStarted-DDLOperations
http://comments.gmane.org/gmane.comp.java.hadoop.hive.user/6665
You can do this in two steps without a separate table. Steps:
Alter the original table to add the required column
Do an "overwrite table select" of all columns + your computed column from the original table into the original table.
Caveat: This has not been tested on a clustered installation.

Resources