I want to filter tableau data on bases of Folders . Like I have 3 folders and want data of only one folder . is this possible in tableau ? till now not able to locate api or way to do in graphQL . any lead would be highly appreciated .
Related
I have requirement like daily i am receiving diffrent type of files like Excel,CSV,Avaro,JSON etc
I need to fetch list of files names like
tablea.xls
tablea.csv etc
I need convert all the file from different format to CSV.
This things we need to do using ADF.
Thanks ,
Use the Get Metadata activity to list files and the Copy activity to convert the format. Copy can change formats but can not do much in the way of transform. Specify the format you want in the Sink section of the Copy config. Try some things out and some tutorials and come back if you get specific errors.
I am new to large scala data analytics and archiving so I though I ask this question to see if I am looking at things the right way.
Current requirement:
I have large number of static files in the filesystem. Csv, Eml, Txt, Json
I need to warehouse this data for archiving / legal reasons
I need to provide a unified search facility MAIN functionality
Future requirement:
I need to enrich the data file with additional metadata
I need to do analytics on the data
I might need to ingest data from other sources from API etc.
I would like to come up with a relatively simple solution with the possibility that I can expand it later with additional parts without having to rewrite bits. Ideally I would like to keep each part as a simple service.
As currently search is the KEY and I am experienced with Elasticsearch I though I would use ES for distributed search.
I have the following questions:
Should I copy the file from static storage to Hadoop?
is there any virtue keeping the data in HBASE instead of individual files ?
is there a way that once a file is added to Hadoop I can trigger an event to index the file into Elasticsearch ?
is there perhaps a simpler way to monitor hundreds of folders for new files and push them to Elasticsearch?
I am sure I am overcomplicating this as I am new to this field. Hence I would appreciate some ideas / directions I should explore to do something simple but future proof.
Thanks for looking!
Regards,
I am new to pig Scripting . I am unable to fetch the data which i have stored through pig script into hbase using HbaseStorage(). But when i am trying to fetch the data using phoenix i am able to see the data .
Can anyone help me out ?
The problem Solved guyz . As the timeStamp always changes for a column in Hbase. When i use the Stored Time correctly over there for searching . I was able to fetch the records .
Previously i was using intial row creation timeStamp for searching.
I have a live streaming tweets which I need to store in HDFS . Currently I can access the live tweets and able to extract the information from those tweets . My requirement is such that I need to append all the tweets into a single sequence file in HDFS . However I have thought to resolve this issue by two ways . Either I can make a single tweet to store into a small file in HDFS and periodically I can bundle them into a single sequence file .The second approach which i thought of is at the run time I would read the sequence file and then append the new contents into the sequence file .
Please let me know which approach I should go for . Kindly also suggest me if there is any better solution for handling these type of use cases .
I recommend using Flume.
You can see how Tweets are streamed into HDFS in this example:
https://github.com/cloudera/cdh-twitter-example
how to work on specific part of cvs file uploaded into HDFS ?
I'm new in Hadoop and i have an a question that is if i export an a relational database into cvs file then uploaded it into HDFS . so how to work on specific part (table) in file using MapReduce .
thanks in advance .
I assume that the RDBMS tables are exported to individual csv files for each table and stored in HDFS. I presume that, you are referring to column(s) data within the table(s) when you mentioned 'specific part (table)'. If so, place the individual csv files into the separate file paths say /user/userName/dbName/tables/table1.csv
Now, you can configure the job for the input path and field occurrences. You may consider to use the default Input Format so that your mapper would get one line at time as input. Based on the configuration/properties, you can read the specific fields and process the data.
Cascading allows you to get started very quickly with MapReduce. It has framework that allows you to set up Taps to access sources (your CSV file) and process it inside a pipeline say to (for example) add column A to column B and place the sum into column C by selecting them as Fields
use BigTable means convert your database to one big table