How to zip 'n' oldest files/dir in a dir in bash - bash

i have dir logs that contains 100 dirs. i need to zip (eg. logs_.zip) the oldest 75 dirs in logs dir. At any given time the logs dir should contain only latest 25 dirs and zip files of oldest 75 files

From logrotate manual:
LOGROTATE(8)
System Administrator's Manual
LOGROTATE(8)
NAME
logrotate ‐ rotates, compresses, and mails system logs
SYNOPSIS
logrotate [-dv] [-f|--force] [-s|--state file] config_file ..
DESCRIPTION
logrotate is designed to ease administration of systems that generate large numbers of log files. It allows automatic rotation,
compression, removal, and mailing of log files. Each log file may be
handled daily, weekly, monthly, or when it grows too large.
Normally, logrotate is run as a daily cron job. It will not modify a log more than once in one day unless the criterion for that
log is based on the log's size and logrotate is being run more than
once each day, or unless the -f or --force option is used.
Any number of config files may be given on the command line. Later config files may override the options given in earlier files, so
the order in which the logrotate config files are listed is impor‐
tant. Normally, a single config file which includes any other config files which are needed should be used. See below for more
information on how to use the include directive to accomplish this.
If a directory is given on the command line, every file in that directory is used as a config file.
If no command line arguments are given, logrotate will print version and copyright information, along with a short usage summary.
If any errors occur while rotating logs, logrotate will exit with
non-zero status.

Related

Can "rsync --append" replace files that are larger at destination?

I have an rsync job that moves log files from a web server to an archive. The server rotates its own logs, so I might see a structure like this:
/logs
error.log
error.log.20200420
error.log.20200419
error.log.20200418
I use rsync to sync these log files every few minutes:
rsync --append --size-only /foo/logs/* /mnt/logs/
This command syncs everything with the least amount of processing. And it's important - calculating checksums or writing an entire file every time a few lines are added is a no-go. But it ignores files if there is a larger version on the server instead of replacing them:
man rsync:
--append [...] If a file needs to be transferred and its size on the receiver is the
same or longer than the size on the sender, the file is skipped.
Is there a way to tell rsync to replace files instead in this case? Using --append is important for me and works well for other log files that use unique filenames. Maybe there's a better tool for this?
The service is a packaged application that I can't really edit or configure unfortunately, so changing the file structure or paths isn't an option for me.

Counting number of files in a directory with Nifi

Using Apache Nifi, I am passing files to a directory. I want to count the number of files in this directory, wait until all of the files I need are present, and then run the StreamExecuteCommand processor to process the data in that directory. (Right now, the StreamExecute doesn't wait long enough for all of the files to arrive before the process begins - so I want to add this wait)
I just want to know how to count the number of files in a directory to start. I am using ListFiles to retrieve the names of files, but not sure how to count them in NiFi.
Thanks
If you are using ExecuteStreamCommand to run a shell command on the files, you could easily add something like ls -l | wc -l to the same or additional ExecuteStreamCommand processor to count the number of files in the directory.
We usually caution against this approach, however, because there are edge cases where you can have a file present in the directory which isn't "complete" if some external process is writing it. The model usually recommended is to write files in with a temporary filename like .file1, .file2, and rename each upon successful completion to file1, file2, etc. The ListFile processor supports numerous settings to avoid detecting these files until they are ready for processing.
We also usually recommend setting some boolean flag through the external process rather than waiting for an explicit count unless that value will never change.

How can I push static log4j log files which I copy from other systems manually into my LogStash ELK server?

I am manually copying files from 3 server on a daily basis.
Because of security reasons I cannot setup an automatic forwarder. So I have 3 directories srvapp1 srvapp2 and dbserver and I copy manually the files into these folders.
How can I push these files into logstash? Is there a tool for updating files form log4j of the form app.log app.log.1 ...
As I get this work and can prove the power and validity of ElasticSearch//LogStash/Kibana I may be able to convince management to officially use this in production/development boxes.
Thanks!
The solution to your problem is a bit more complicated(ish). A forwarder is ideal, however sometimes not possible. Here is what I did (which works perfectly fine).
You need a secure connection? Use rsync + ssh!
Set up your rsync rules, e.g. rsync your log folder every minute with an append flag. You will have to test your rsync strategy, but it shouldn't be too hard.
So in your crontab you can do something like:
* * * * * rsync -avzi --stats -e 'ssh ...' --append-verify FROM_FOLDER TO_FOLDER
This will append your log files names safely from A to B. So your copying overhead is also minimal.
Configure logstash with a file input that loogs at the target folder for your rsync command. Logstash will automatically pick up all files from there and note all changes. Everything works fine.
Standardise your log-names. Logstash will reparse your logfile if your application renames it. The way I solved that is to configure my log-appender to simply use unique names. Instead of having my-log-file.log and rename it at midnight to my-log-file.log.XXXX.XX.XX, I simply use the date pattern in every file from the start. That way, Logstash doesn't get confused about reparsing.
The above solution is fully automatic (so no need to manually copy anything anymore). I hope that helps,
Artur

How to determine when a file was created

I have a .jar file that is compiled on a server and is later copied down to a local machine. Doing ls -lon the local machine just gives me the time it was copied down onto the local machine, which could be much later than when it was created on the server. Is there a way to find that time on the command line?
UNIX-like systems do not record file creation time.
Each directory entry has 3 timestamps, all of which can be shown by running the stat command or by providing options to ls -l:
Last modification time (ls -l)
Last access time (ls -lu)
Last status (inode) change time (ls -lc)
For example, if you create a file, wait a few minutes, then update it, read it, and do a chmod to change its permissions, there will be no record in the file system of the time you created it.
If you're careful about how you copy the file to the local machine (for example, using scp -p rather than just scp), you might be able to avoid updating the modification time. I presume that a .jar file probably won't be modified after it's first created, so the modification time might be good enough.
Or, as Etan Reisner suggests in a comment, there might be useful information in the .jar file itself (which is basically a zip file). I don't know enough about .jar files to comment further on that.
wget and curl have options that allow you to preserve the file's modified time stamp. This is close enough to what I was looking for.

Incremental deploy from a shell script

I have a project, where I'm forced to use ftp as a means of deploying the files to the live server.
I'm developing on linux, so I hacked together a bash script that makes a backup of the ftp server's contents,
deletes all the files on the ftp, and uploads all the fresh files from the mercurial repository.
(and taking care of user uploaded files and folders, and making post-deploy changes, etc)
It's working well, but the project is starting to get big enough to make the deployment process too long.
I'd like to modify the script to look up which files have changed, and only deploy the modified files. (the backup is fine atm as it is)
I'm using mercurial as a VCS, so my idea is to somehow request the changed files between two revisions from it, iterate over the changed files,
and upload each modified file, and delete each removed file.
I can use hg log -vr rev1:rev2, and from the output, I can carve out the changed files with grep/sed/etc.
Two problems:
I have heard the horror stories that parsing the output of ls leads to insanity, so my guess is that the same applies to here,
if I try to parse the output of hg log, the variables will undergo word-splitting, and all kinds of transformations.
hg log doesn't tell me a file is modified/added/deleted. Differentiating between modified and deleted files would be the least.
So, what would be the correct way to do this? I'm using yafc as an ftp client, in case it's needed, but willing to switch.
You could use a custom style that does the parsing for you.
hg log --rev rev1:rev2 --style mystyle
Then pipe it to sort -u to get a unique list of files. The file "mystyle" would look like this:
changeset = '{file_mods}{file_adds}\n'
file_mod = '{file_mod}\n'
file_add = '{file_add}\n'
The mods and adds templates are files modified or added. There is a similar file_dels and file_del template for deleted files.
Alternatively, you could use hg status -ma --rev rev1-1:rev2 which adds an M or an A before modified/added files. You need to pass a different revision range, one less than rev1, as it is the status since that "baseline". Deleted files are similar - you need the -d flag and a D is added before each deleted file.

Resources