Download or Backup a generated file from PCF automatically - spring-boot

We have a microservices app that are running on PCF.
Some of the microservices are able to generate log files in its log folder.
Is there a way to automate to download these log files and save it to a shared folder or remote container (like google drive and the like)?
Your suggestion and advice is highly appreciated.
Thank you.

In the perfect world, you would not write things to the local filesystem that you need to keep. It's OK to write cached filed or artifacts you can simply recreate, but you shouldn't put anything important there.
https://docs.cloudfoundry.org/devguide/deploy-apps/prepare-to-deploy.html#filesystem
The local file system exposed to your app is ephemeral and it's not safe to store important things there even for a short period of time. You could certainly try to set up a process that runs periodically and sends log files out of your container to somewhere else. However, when your app crashes you're going to lose log messages, probably the important ones that say why your app crashed, because your sync process isn't going to have time to run before the container is cleaned up.
What you want to do instead is to configure your applications to write their logs to STDOUT or STDERR.
https://docs.cloudfoundry.org/devguide/deploy-apps/streaming-logs.html#writing
Anything written to STDOUT/STDERR is automatically captured by the platform and sent out the log stream for your app. You can then send your log stream to a variety of durable locations.
https://docs.cloudfoundry.org/devguide/services/log-management.html
Most applications can easily be configured to write to STDOUT/STDERR. You've tagged spring-boot on this post, so I assume your apps are running Spring Boot. By default, Spring Boot should log to STDOUT/STDERR so there shouldn't be anything you need to do.
What might be happening though is that your app developers have specifically configured the app to send logs to a file. Look in the src/main/resources/application.properties or application.yml file of your application for the properties logging.file.path or logging.file.name. If present, comment out or remove them. That should make your logs to STDOUT/STDERR.
https://docs.spring.io/spring-boot/docs/current/reference/html/spring-boot-features.html#boot-features-logging-file-output

Related

How to handle long time processing request

I have a spring boot rest API deployed on AWS Elastic Beanstalk and I am trying to upload pictures through it.
This is what I did : Upload a zip file through a file input from the browser, get the zip file on the server, go through all the files and upload each one on AWS S3.
It works fine but I ran into a problem: When I try to upload lots of pictures, I get an HTTP error (504 Gateway Timeout). I found out this is because the server takes too much time to respond, and I am trying to figure how to set a higher timeout for the requests (didn't find yet).
But in the mean time I am asking myself if it is the best solution.
Wouldn't it be better to end the request directly after receiving the zip file, make the uploads to S3 and after that notify the user that the uploads are done ? Is there even a way to do that ? Is there a good practice for this ? (operation that takes lots of time to process).
I know how to do the process asynchronously but I would really like to know how to notify the user after it completes.
Wouldn't it be better to end the request directly after receiving the zip file, make the uploads to S3 and after that notify the user that the uploads are done ?
Yes, asynchronous processing of the uploaded images in the zip file would be better.
Is there even a way to do that ? Is there a good practice for this ? (operation that takes lots of time to process).
Yes there is a better way. To keep everything within EB, you could look at Elastic Beanstalk worker environment. The worker environment is ideal for processing your images.
In this solution, your web based environment would store the images uploaded in S3 and submit it names along with other identifying information to an SQS queue. The queue is an entry point for the worker environment.
Your workers would process the images from the queue independently from the web environment. In the meantime, the web environment would have to check for the results and notify your users once the images get processed.
The EB also supports linking different environments. Thus you could establish a link between web and worker environments for easier integration.

How to download 300k log lines from my application?

I am running a job on my Heroku app that generates about 300k lines of log within 5 minutes. I need to extract all of them into a file. How can I do this?
The Heroku UI only shows logs in real time, since the moment it was opened, and only keeps 10k lines.
I attached a LogDNA Add-on as a drain, but their export also only allows 10k lines export. To even have the option of export, I need to apply a search filter (I typed 2020 because all the lines start with a date, but still...). I can scroll through all the logs to see them, but as I scroll up the bottom gets truncated, so I can't even copy-paste them myself.
I then attached Sumo Logic as a drain, which is better, because the export limit is 100k. However I still need to filter the logs in 30s to 60s intervals and download separately. Also it exports to CSV file and in reverse order (newest first, not what I want) so I have to still work on the file after its downloaded.
Is there no option to get actual raw log files in full?
Is there no option to get actual raw log files in full?
There are no actual raw log files.
Heroku's architecture requires that logging be distributed. By default, its Logplex service aggregates log output from all services into a single stream and makes it available via heroku logs. However,
Logplex is designed for collating and routing log messages, not for storage. It retains the most recent 1,500 lines of your consolidated logs, which expire after 1 week.
For longer persistence you need something else. In addition to commercial logging services like those you mentioned, you have several options:
Log to a database instead of files. Something like Apache Cassandra might be a good fit.
Send your logs to a logging server via Syslog (my preference):
Syslog drains allow you to forward your Heroku logs to an external Syslog server for long-term archiving.
Send your logs to a custom logging process via HTTPS.
Log drains also support messaging via HTTPS. This makes it easy to write your own log-processing logic and run it on a web service (such as another Heroku app).
Speaking solely from the Sumo Logic point of view, since that’s the only one I’m familiar with here, you could do this with its Search Job API: https://help.sumologic.com/APIs/Search-Job-API/About-the-Search-Job-API
The Search Job API lets you kick off a search, poll it for status, and then when complete, page through the results (up to 1M records, I believe) and do whatever you want with them, such as dumping them into a CSV file.
But this is only available to trial and Enterprise accounts.
I just looked at Heroku’s docs and it does not look like they have a native way to retrieve more than 1500 and you do have to forward those logs via syslog to a separate server / service.
I think your best solution is going to depend, however, on your use-case, such as why specifically you need these logs in a CSV.

Registering a Service to receive file system events with Delphi

I want to write a service with Delphi that processes file system events. Basically, when files and directories are created or changed, I want to push them to the cloud. I was hoping to use TSHChangeNotify (see TSHChangeNotify on StackOverflow), but it doesn't appear to work in a service; I was able to do exactly what I needed with it in a regular GUI app, but it dies in a service. I've yet to determine where it dies, but I wonder if I'm going down the best path.
So, can anyone shed light on these two (related) questions:
Can TSHChangeNotify be used in a service (FYI I am using the Aldyn SvCom framework, which is how I get the component onto a form in the first place!) and if so, what are the tricky bits?
Regardless of 1, what is the best way to write a service that gets notified of file system events?
Basically, I am trying to copy files and directories that are created or modified into the cloud, transparently to the user.
Thanks!

Heroku - letting users download files from tmp

Let me start by saying I understand that heroku's dynos are temporary and unreliable. I only need them to persist for at most 5 minutes, and from what I've read that generally won't be an issue.
I am making a tool that gathers files from websites and zips the up for download. My tool does everything and creates the zip - I'm just stuck at the last part: providing the user with a way to download the file. I've tried direct links to the file location, and http GET requests, and Heroku didn't like either. I really don't want to have to set up AWS just to host a file that only needs to persist for a couple of minutes.. Is there another way to download files stored on /tmp?
As far as I know, you have absolutely no guarantee that a request goes to the same dyno as the previous request.
The best way to do this would probably be to either host the file somewhere else, like S3, or to send it immediately in the same request.
If you're generating the file in a background worker, then it most definitely won't work. Every process runs on a separate dyno.
See How Heroku Works for more information on their backend.

What is the best way to remotely reset the server cache in a web farm?

Each of our production web servers maintains its own cache for separate web sites (ASP.NET Web Applications). Currently to clear a cache we log into the server and "touch" the web.config file.
Does anyone have an example of a safe/secure way to remotely reset the cache for a specific web application? Ideally we'd be able to say "clear the cache for app X running on all servers" but also "clear the cache for app X running on server Y".
Edits/Clarifications:
I should probably clarify that doing this via the application itself isn't really an option (i.e. some sort of log in to the application, surf to a specific page or handler that would clear the cache). In order to do something like this we'd need to disable/bypass logging and stats tracking code, or mess up our stats.
Yes, the cache expires regularly. What I'd like to do though is setup something so I can expire a specific cache on demand, usually after we change something in the database (we're using SQL 2000). We can do this now but only by logging in to the servers themselves.
For each application, you could write a little cache-dump.aspx script to kill the cache/application data. Copy it to all your applications and write a hub script to manage the calling.
For security, you could add all sorts of authentication-lookups or IP-checking.
Here the way I do the actual app-dumping:
Context.Application.Lock()
Context.Session.Abandon()
Context.Application.RemoveAll()
Context.Application.UnLock()
Found a DevX article regarding a touch utility that look useful.
I'm going to try combining that with either a table in the database (add a record and the touch utility finds it and updates the appropriate web.config file) or a web service (make a call and the touch utility gets called to update the appropriate web.config file)
This may not be "elegant", but you could setup a scheduled task that executes a batch script. The script would essentially "touch" the web.config (or some other file that causes a re-compile) for you.
Otherwise, is your application cache not set to expire after N minutes?

Resources