Orient ETL for document db - etl

I am able to import data into orient document db as plain documents,can anyone please help me or share configuration file to embed a list inside a document ?

Related

NIFI - Processing HDFS Data

I am new to the tool. I have a requirement to read data from HDFS and filter some entries and transfor some fields and write the output to unix local system. Could you let me know the components I can please.
I am using ListHDFS, FetchHDFS, updateattributes but after that little struck what component to use to convert hdfs 'lzo' data and how to transform.
Please could you help me on it.
Thanks,
Kumar

How to load a huge CSV file in orient db

I want to load a huge CSV file in my orient Db database.there is some checklist for the database ,that should our db follows.
1- there would be a single csv file and this CSV file will have millions of records and more then 20 Columns.
2- From this csv i have to create multiple Classes and each class will have different Properties (is it possible with Orient db).
3- i have to create index too
Please help for this.how should i create Etl config file for this
Thanks in advance.
please let me know if any input required from my side.

how to save a text file to hive using table of context as schema

I have many project reports in text format (word and pdf). These files contains data that I want to extract; Such as references, keywords, names mentioned .......
I want to process these files with Apache spark and save the result to hive,
use the power of dataframe (use the table of context as schema) is that possible?
May you share with me any ideas about how to process these files?
As far as I understand, you will need to parse the files using Tika and manually create custom schema s as described here.
Let me know if this helps. Cheers.

While creating table how to identify the data types in hive

I am learning to use Hadoop for performing Big Data related operations.
I need to perform some queries on a collection of data sets split across 8 csv files. Each csv file has multiple sheets and the query concerns only one of the sheets(Sheet Name: Table4)
The dataset can be downloaded here : http://www.census.gov/hhes/www/hlthins/data/utilization/tables.html
Sample Data snap shot attached for quick reference
I have already converted the above xls file to csv.
Am not sure how to group the data while creating table in Hive.
It will be really helpful if you can guide me here.
Note: I am a novice with Hadoop and Big Data, so if anyone could guide me with how to proceed further I'd be very grateful.
If you need information on the queries or anything else let me know.
Thanks!

How to load multiple excel files into different tables based on xls metadata using SSIS?

I have multiple excel files with two types of metadata, Now i have to push the data into two different tables based on metadata of excel files using SSIS.
There are many, many different ways to do this. You'd need to share a lot more information on how your data is structured to really give a great answer, but here's the general strategy I'd suggest.
In the control flow tab, have a separate data flow for each Excel file. The data flows will all work the same, with the exception of having a different Excel source in each data flow, so it will be enough to get the first version working and then copy and paste for the other files.
In the data flow, use a conditional split transformation to read the metadata coming from Excel and send the row to the correct table.
If you really want to be fancy, however, you could create a child package that includes all your data flow logic. Using the Execute Package Task you can pass the Excel file name to the child package for each Excel file you need to import. This way you consolidate your logic in one package and can still import from multiple Excel files in parallel.

Resources