How to create federated dataset from a text file? - tensorflow-datasets

in the tutorial of tensorflow federated learning for text generation, they used a preprocessed shakespeare dataset.
https://www.tensorflow.org/federated/tutorials/federated_learning_for_text_generation
i want to create a federated dataset from scratch using my own text file.

Related

Looking for the optimum options for Lightweight Charts to visualize data in CSV files or a database

I am a Python developer and new to JavaScript and Lightweight Charts.
I am noticing that all Lightweight Charts code samples use a JavaScript array to initialize chart data. My candle bar data reside in a database that I can export to one or more CSV files.
What are the practical options for Lightweight Charts to visualize data in CSV files or a database?

From selected data into PDF using RDF file

I am currently trying to convert a simple table into a PDF file using an existing .rdf file.
My first approach was to look for a new program that can do so because I want to replace the current 'Oracle Reports' program.
Is there any other program that would support converting SQL data into an PDF using an .rdf File?
I tried writing a Python 3 script to do just that, but I would not know where to start.
Oracle APEX 21.2 (latest at the current time) has a package named APEX_DATA_EXPORT that can take a SELECT statement and export it into various formats, one of them being PDF. The example in the documentation shows how to generate a PDF from a simple query. After calling apex_data_export.export, you can use the BLOB that is returned by the function and do whatever you need with the PDF.
There are not very many options for styling and formatting the table, but Oracle does plan on adding additional printing capabilities for PDFs in the future.

Elastic Architecture Full Text Searching on 1 million file's content

Summary
I am trying to design an elastic index(s) that will provide a solid foundation for indexing 1,000,000+ Files and full text searching on the contents. New files will be continuously added after the initial digitization process.
Use Case
Various File Types (Pdf, outlook email, mp3, txt, jpeg of handwritten things, ..etc) need to be searchable by their contents and meta-data. Users want to manually tag relationships between documents. ex Document A -> contains information about -> Document B. Users want to be able to see related/similar texts. Users want Named Entity Recognition on the text contents. The physical files are already stored on an external computer just waiting to be processed.
Implementation
File Content extraction pipeline using Apache Tika
NER using Spacy
Upload File Contents + NER Tags to Elastic
Eventually we would run our own search models to gain better search insights + data science.
How do I best store my extracted contents to fit the needs of the user and have a scalable foundation? Is it better to run our trained Named Entity Recognition on initial index or after text extraction has been uploaded to elastic?
Or does it make more sense to use an existing solution from below to not reinvent the wheel?
https://github.com/dadoonet/fscrawler
https://github.com/deepset-ai/haystack
https://github.com/RD17/ambar
Instead of reinventing the wheel, I'd recommend to use existing solutions such as Jina, there's a working example of pdf search using Jina. You can also search across different modalities(text, image, pdf, etc.) using this.

How to convert my excel data into JSON with the appropriate formatting?

I am trying to use my own data set for the mind-gapper motion chart reproduced by Mike Bostock at https://bost.ocks.org/mike/nations/
He uses a JSON data file from https://bost.ocks.org/mike/nations/nations.json
I have a data file having food trends in an excel file and I'm wondering what is the best approach to converting excel file into the appropriate JSON format?
How did Mike originally do this? I presume that he had an excel file originally?
It depends on the structure of the data in your csv, but I use online tools like this one: http://www.convertcsv.com/csv-to-json.htm

Using orientDB ETL json transform

does anyone have examples of orientDB etl transformers with multiple transforms or something that can create class identifiers on the fly, so for example, if you want to create organization entities and the id could be a hash from the organization name , essentially if the json we are importing is not exactly the schema we want in the destination
What about using block code in your ETL configuration file? You can use it in the begin phase, so you could transform id column in your .csv input file. It is not an ideal solution I agree.
see the Block documentation

Resources