How to stop auto-generation of log files by cron? - bash

I have a bot on Marathi language wikipedia. The bot runs from Wikimedi's toolforge server. I have set up a cron job which generates two files: one .err and another is .out
Following is my the content from my cron file:
0 9 * * * jsub -release buster -N KiranBOT1 kiranbot1.sh
The job runs daily. I was away for around a week, and the .err file became around 500 megabytes in size. So I deleted it, and after one run when I checked the newly generated file, it was around 8 megabytes.
It doesn't generate multiple .err files, but it keeps adding on the details in the same file, which increases the file size tremendeously. I have created that bot in such a way, that even if I stop editing Wikipedia/stop checking the bot, even then the bot would keep doing its task. In such a scenario, the .err file size would just keep on increasing. I don't want that to happen.
I can live without these .err, and .out files. Is there a way to stop generating these files?
Thanks a lot in advance,
-usernamekiran.
Edit:
The file names are KiranBOT1.err, and KiranBOT1.out. I apologise, I should have mentioned this in my original question.

Original throwaway comment now posted as an answer.
The day you stop generating the logs is the day that you need them...
My suggestion would be create a dated daily pair of files (KiranBOT1_YYYYMMDD.err, and KiranBOT1_YYYYMMDD.out), and use logrotate to remove after (say) one week?

Check the full names of the files and do something like
ln -s /dev/null .err
ln -s /dev/null .out
echo "Warning: .err and .out are symlinks from /dev/null" > README

Related

Shell script to verify data packages

I need to make shell script to check my algorithms with loads of data(tests packages saved in .in files, every package contains folder with .in file and the other one with .out file where supposed to be correct result)
Sometimes It's about 1000 files in one packages so there's no point of doing it manually. I need some kind of loop which opens this .in file then redirect input of my c++ program and also redirect output of this program(save result to .out files) But the point is I can't get this language as quick as I need.
And I would like this script to compare results of my algorithm to .out files from packages
for f in ExternalIn/*.in; do//part of code which opens process with my algorithm and compare its .out file to .out file from package
Skipping checks for missing files, whitespace-safety, etc., you probably need something like:
for f in ExternalIn/*.in; do
# diff the result of my_cpp_app eating file.in with file.out
# and store the comparison result in file.diff
diff ${f/.in/.out} <(my_cpp_app <$f 2>/dev/null) > ${f/.in/.diff}
done
Although I would probably do it with find / xargs pipeline which is not only safer but also allows parallel execution.
Or even write a Makefile for this and use make, which after all is a tool for exactly this kind of work.

Merge lines in bash

I would like to write a script that restores a file, but preserving the changes that may be done after the backout file is created.
With more details: at some moment I create a backup of a file (file_orig). Do some changes to the original file as well(file_my_changes). After that, the original file can be changed again (file_additional_changes), but after the restore I want to have the backup file, plus the additional changes (file_orig + file_addtional_changes). In general backing out my changes only.
I am talking about grub.cfg file, so the expected possible changes will be adding or removing parts of a line.
Is it possible this to be done with a bash script?
I have 2 ideas:
Add some comments above the lines I am going to change, and then before the resotore if the line differ from the one from the backed out file, to read the comment, which will tell me what exactly to remove from the line;
If there is a way to display only the part of the line that differs from the file_orig and file_additional_changes, then to replace this line with the line from file_orig + the part that differs. But I am not sure if this is possible to be done at all.
Example"
line1: This is line1
line2: This is another line1
Is it possible to display only "another"?
Of course any other ideas are welcome!
Thank you!
Unclear, but perhaps if you're using a bash script you could run a diff on the 2 edited file and the last one and save that output someplace that you want to keep it? That would mean you have a copy of the changes.
Or just use git like everybody else.
One possibility would be to use POSIX commands patch and
diff.
Create the backup:
cp operational-file operational-file.001
Edit the operational file.
Create a patch from the differences:
diff -u operational-file.001 operational-file > operational-file.patch001
Copy the operational file again.
cp operational-file operational-file.002
Edit the operational file again.
Create a new patch
diff -u operational-file.002 operational-file > operational-file.patch002
If you need to recover but skip the changes from patch.001, then:
cp operational-file.001 operational-file
patch -i patch.002
This would apply just the second set of changes to the original file, as log as there's no overlap.
Consider using a version control system to keep records of the file changes. Consider using date/time stamps instead of version numbers on the file names.

Bash get all specific files in specific directory

I have a script that takes as an argument a path to a file upon which it performs certain operations. These files are stored in directories with path storage///_id/files (so in 2016 July 22 it would be storage/2016/Jul/22_1/files for the first set of files, .../Jul/22_2/files for second one etc.). The problem is each directory stores files with two extensions (say file.doc, file.txt) and I want to perform operations only on .txt files. I've tested earlier something like
for file in "/home/gonczor/temp/"*/*".txt"; do
echo "$file"
done
And it worked perfectly given that names in directories don't change. When I move one step further and add this 22_1, 22_2, 23_1 directories something strange happens.
This is my script (simplified):
for file in "$FILE_PATH/""$YEAR/""$MONTH/""$DAY"*/*".txt"; do
my_program ${report}
done
And instead of finding .../2016/Jul/22_1/file.txt it finds /2016/Jul/22*/*.txt
How can I make it work? The solution I've tried to make up is from here

how to read / copy a tmp file which exists for a very short time

my nginx creates tmp files for requests which are bigger than 16kb. I am trying to read this tmp files but they only exist for a rly short period of time (1ms?). Is there unix command / programm which can help me to read this files before they are gone?
the ngnix warning message looks like
a client request body is buffered to a temporary file /var/lib/nginx/body/0000001851
EDIT
i am not in the position to alter the ngnix source code nor am i able to edit the source code of the request origin. I just want to take a look at this files for debugging purpose as i cant imagine what kind of request will bloat up to 16k
In general you'll probably want to get nginx's assistance for this or if that's not possible and it's really important change the source code as Leo suggests.
There is one cringe-inducing, wtf-provoking trick which I am mentioning as a curiosity. You can set the append-only mode on the directory. If your filesystem supports it you can say:
chattr +a mydir
Your process will be able to create stuff inside but not remove it. Then at your leisure you can use inotify_wait to monitor the directory for changes. I don't know of any clean ways to remove the files though.
Well you could try parsing the output with something like:
stdbuf -oL nginx 2>&1 |
grep -F --line-buffered \
"a client request body is buffered to a temporary file" | {
while read -a line
cp line[${#line[#]}-1] /dest/path
}
Although you might find that this is too slow and the file is gone before you can copy it.
A better solution might be to use inotify. inotify_wait as mentioned by cnicutar would work. You could try the following:
while true
do
file=$(inotifywait -e create --format %f -r /var/lib/nginx/body/)
cp "/var/lib/nginx/body/$file" "/dest/path/$file"
done
If you don't get what you are looking for (eg if the files are copied before all the data is written), you could experiment with different events instead of create, maybe close, close_write or modify.

Finding and Removing Unused Files Through Command Line

My websites file structure has gotten very messy over the years from uploading random files to test different things out. I have a list of all my files such as this:
file1.html
another.html
otherstuff.php
cool.jpg
whatsthisdo.js
hmmmm.js
Is there any way I can input my list of files via command line and search the contents of all the other files on my website and output a list of the files that aren't mentioned anywhere on my other files?
For example, if cool.jpg and hmmmm.js weren't mentioned in any of my other files then it could output them in a list like this:
cool.jpg
hmmmm.js
And then any of those other files mentioned above aren't listed because they are mentioned somewhere in another file. Note: I don't want it to just automatically delete the unused files, I'll do that manually.
Also, of course I have multiple folders so it will need to search recursively from my current location and output all the unused (unreferenced) files.
I'm thinking command line would be the fastest/easiest way, unless someone knows of another. Thanks in advance for any help that you guys can be!
Yep! This is pretty easy to do with grep. In this case, you would run a command like:
$ for orphan in `cat orphans.txt`; do \
echo "Checking for presence of ${orphan} in present directory..." ;
grep -rl $orphan . ; done
And orphans.txt would look like your list of files above, one file per line. You can add -i to the grep above if you want to grep case-insensitively. And you would want to run that command in /var/www or wherever your distribution keeps its webroots. If, after you see the above "Checking for..." and no matches below, you haven't got any files matching that name.

Resources