How does the terracotta server array store data? - caching

I am using the kit "TERRACOTTA SERVER 4.X AND OLDER" downloaded from "http://www.terracotta.org/open-source/". I want to know how the data is stored in the TSA , in a file or something else? I am aware the data is stored somewhere in the server-data directory which ofcourse is mentioned in the tc-config file in the data element. Any insight on this would be very helpful.

Related

How to download a CSV from a HTTPS URL to file using Pentaho Data Integration - Spoon (Kettle)?

When googling this question, it seems to have been asked, and partially (and poorly) answered a number of times, mostly for older versions.
Question: How can I download a CSV to a local file, with the below constraints? I'm designing in Spoon.
URL: Will always be the same. https://example.com/data/my.csv . The website prepares the csv and provides it back to the web client as a file download after about 4-5 seconds. In a browser this means it is downloaded as a .csv, and not displayed.
Authentication: The website does not require authentication for access. The data isn't sensitive.
Local file path: The downloaded CSV will overwrite the existing csv. eg: d:\data\my.csv . Ie, I can set this on a timer and have it download the newest csv every hour or so.
Proxy: It is quite likely I will need to traverse a network proxy. eg badproxy.mynetwork.internal:8080 and that proxy requires a username and password. It's far better if I can set this password in a single location so any future things created can reference it. Not really sure on how to approach this either.
The rest of my process focuses on addressing the content of the csv, and already works fine.
The processes I've found on google show using the Http Client component, though it's not particularly straightforward how this translates into a file being saved locally into a known location.
Thanks for any pointers.
PDI v9.0.0.0-423
The HTTP client step needs to be triggered. Use a Row generator step generating e.g. 1 empty row and link that with a hop to the HTTP client step.
for your solution , try this:
Data Grid -->HTTP Client-->CSV File Input->Text file output(extension with csv)

Access sqlite db file with jpa

I have a huge sqlite file containing my db. I need to know if it is possible and how to connect to this db as an embedded one with jpa.
I'm developing an app that packs this database inside it's own jar so that when I use it on another system I don't have to import a copy of my db back and forth.
The technologies I'd like to use are Angular and Spring since those are the ones I know best. If there are some techonlogies that better suit this purpose I'd like some suggestions.
Thanks :)
I hope I undestood your question correctly, so I made a small project for you, hence you can have a look into it: spring-jpa-sqlite-sample. It may guide you a bit, though I and don't claim correctness or completeness.
The path to the sqlite file can easily be changed by inserting the correct url in the persistence.properties file:
driverClassName=org.sqlite.JDBC
url=jdbc:sqlite:src/main/resources/chinook.db --> you may use relative paths.
hibernate.dialect=dev.mutiny.semo.config.SQLiteDataTypesConfig
hibernate.hbm2ddl.auto=none
hibernate.show_sql=true
You can also use Environment variables from your system, which Spring tries to read from, so that you can reference the correct directory to a file. This can be found here: Read system environment var (SO)
Last but not least. Beware of using huge SQLite files. Find another way and transfer it first into a 'real' Database like any other Client/Server RDBMS you know (Oracle, MariaDB, MSSQL, depends on your scenario/taste).
Have closer look onto the documentation: When to use SQLite (and when not to!)

How to plug in a process of identifying sensitive information somewhere in ETL pipeline?

Hope you are doing well !
We have already developed ETL pipeline using apache NiFi. Which gets trigger only when client uploads source data file from portal.After that, the data present inside source file goes through various layers,gets transformed and stored back to warehouse(i.e. hive).
Goal : To identify sensitive information and mask it so that end user won't see actual data.
Identify Sensitive data & masking strategy : We will make use of open source tool to achieve this goal as follow.
Data steward studio : This tool allow me to identify sensitive information and tag it properly.
Apache Atlas : Once data steward user has confirmed the tag then that tag will be pushed into Apache atlas.
Apache ranger : At the final, we can define tag based-masking policy using Apache ranger which will allow or deny to specific user. 
For more details on above solution , please visit link.
https://www.youtube.com/watch?v=RzEfLwJaLsc
Problem : In order to feed the data to DSS tool, it should be loaded first in hive table. That is fine. But we cannot stop the existing ETL flow in-between and then start identification process of sensitive information. The above solution must require some manual process which i want to get rid of and make it automated.that is, it should be plugged in somewhere within NiFi pipeline.But so far, as per my understanding DSS do not allow us to do something like that.
Manual Process :
Create Asset collection
Accept/Reject suggested tags within DSS.
If we cannot plug identification process in pipeline, then client sensitive data will be exposed to everyone and visible to everyone in team. I want something where we can de-identify sensitive data before it actually get loaded into HDFS or hive tables.
Please write your response to me on the same problem, if anyone has already worked into this particular area.
  
I did not test it, but here are my thoughts on this challenge.
Set up the system such that data is NOT visible to everyone(or anyone) by default
Load the data into hive
Let the profilers run and accept its suggestions
Open up the data to those who should have access (except for the things found by the profiler)
There are still some implementation details to work out (e.g. How to automate step 3/4 and whether you can just solve this with tags or whether the data needs to sit in a staging area first). But I hope this steers you in a good direction.
One idea might be to use EncryptContent of nifi (https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.5.0/org.apache.nifi.processors.standard.EncryptContent/). Then the values loaded into Hive will be encrypted in the first place and would not be visible to the stewards. Once the tagging has been done - then in the subsequent part of the pipeline (where I'm assuming you're using nifi as well) - you can decrypt back content as required.

Move records from Development to Production in CloudKit

I did something silly and made hundreds of records in Development environment in CloudKit. A previous thread mentioned that the records could be downloaded into a file and re-uploaded to the Production environment. Is there any other way I could do this, and if not, how would I go about downloading in records and storing it into a file?
Thanks in advance!
There is no option do do it in one run. You need one app that is connected to the development environment for reading your records. Then if you want to write to the production environment, you can only do that by re-signing your app. So indeed you first need to download all data, store them somewhere, and then writing them back to your production database.
Since the CKRecord complies to the NSCoding protocol you can write the results of your query directly to a file using:
NSKeyedArchiver.archiveRootObject(records, toFile: filePath)
Then if you want to read that file you can use:
result = NSKeyedUnarchiver.unarchiveObjectWithFile(filePath)

how to delete file from the server after downloading it ?or how can i store the file to client machine directly from output stream?

am using liferay custom portlet and in that am using jasper report now my problem is that how can i download the pdf report directly on the client machine
right now am storing the file at server first.then provide url for downloading the pdf to user.but how can i directly store the file to client machine if i have pdf file's outputstream .
ot if i can know some how when user click on the download link and after downloading the file if i want to delete the donlowded file from the server then how can i do it.?if any one can guide me...
I'm not sure what you're asking for is possible, but I would be interested in seeing someone correct that statement though.
Servers really shouldn't be directly storing files on a client machine as that violates the intent of the client server relationship. A client has to make a request for the file and then the client can save that file (eg like a ftp download). Servers just don't manipulate client machines as they see fit.
As far as knowing when a file is downloaded, there isn't anything in a portlet you can do to detect that. You can use ResourceRequest and serveResource method to serve a file, but nothing in the portlet API will inform your portlet that the download is complete or that it wasn't interrupted by something.
As an alternative you might try simply having a cron job that will clean out old files. In this case, make sure to inform users how long they have to successfully download the file.

Resources