How to tail dynamically created files following a regex

How to tail dynamically created files following a regex - bash

I am trying to tail dynamically created files in bin/bash using command
tail -f /data/logs*.log
But its not tailing the files created at runtime.
For eg if there are already 2 files logs1.log and logs2.log present and after some time if logs3.log is created at runtime.
It is not tailing logs3.log
What is the way to tail such dynamically created files ?

This does not work because the bash wildcard * is resolved only once. It produced the list of files that are already existing. This list is then not updated any more. So you whole line tail -f /data/logs*.log is replaced by something like tail -f /data/logs/logs1.log /data/logs2.log /data/logs3.log. And this command is then executed. Normally you must understand wildcards as a preprocessing before a command is executed.
What you want needs a bit more effort. You command already works for files that already exist. That is good so far. But you need more. So you must send your tail command into the background by adding a &. Try it:
tail -f /data/logs*.log &
sleep 2s
echo something more
But instead of writing "something more" you want to listen for new files and also tail -f them. How to do this you can find here: https://unix.stackexchange.com/questions/24952/script-to-monitor-folder-for-new-files
Over the time you will have more and more processes. Assuming that you normally have only a few new files, this won't be a problem. But in case you have hundreds or thousands of new files you have to spend more effort for your solution.

You can try something like this, but it has some issues (see below):
#!/bin/bash
pid=
dir="$1"
handle_sigint() {
kill $pid 2>/dev/null
exit
}
trap handle_sigint SIGINT SIGTERM
while true; do
tail -n1 -f "$dir"/*.log &
pid=$!
inotifywait -q -e create "$dir"
kill $pid 2>/dev/null
done
Run it by giving the wanted directory as the first parameter.
Sadly, even if you remove the -n1 argument in the tail command, you may miss some log lines, notably if the new files have many lines written directly on creation.

Related

Append to list of files in bash

so I'm trying to get a simple bash script to continuously read a directory and update a list of files to play through a command. However, I'm having some trouble thinking out the logic in it. What I need to do is put the current items in the directory into the list, have each item in the directory run through a program, and when a new item comes in, just append it to the list. I'm attempting to use inotifywait but can't seem to think of the proper logic. I may need it to run in the background, as the process that is running on these files will run before inotifywait is read again, at which point it will not pick up any new files that have been added as it only checks when it runs. Here's the code so hopefully it makes more sense.
#!/bin/bash
#Initial check to see if files are converted.
if [ ! -d "/home/pi/rpitx/converted" ]; then
echo "Converted directory does not exist, cannot play!"
exit 1
fi
CYAN='\e[36m'
NC='\e[39m'
LGREEN='\e[92m'
#iterate through directory first and act upon each item
for f in $FILES
do
echo -e "${CYAN}Now playing ${f##*/}...${NC}"
#Figure out a way to always watch directory even when it is playing
inotifywait -m /home/pi/rpitx/converted -e create -e moved_to |
while read path action file; do
echo -e "${LGREEN}New file found: ${CYAN}${file}${NC}"
FILES+=($file)
done
# take action on each file. $f store current file name
sudo ./rpitx -m RF -i "${f}" -f 101100
done
exit 0
So for example. if rpitx is currently playing something, and a file is converted, it won't pick up the latest file and add it to the list, nor will it make it since it's always reading. Is there a way to get inotifywait to run in the background of this script somehow? Thanks.

This is actually quite a difficult problem to get 100% perfect, but it is possible to get pretty close.
It is easy to get all the files in a directory, and it is easy to use inotifywait to get iteratively informed of new files being placed into the directory. The issue is getting the two to be consistent. If inotifywait isn't started until all the files have been processed (or even just listed), then you might miss new files created between the listing and the invocation of inotifywait. If, on the other hand, you start inotifywait first, then a file created after the invocation of inotifywait and the extraction of the current file list will be listed twice.
Since it is easier to filter duplicates than notice orphans, the recommended approach is the second one.
As a first approximation, we could ignore the duplicate problem on the assumption that the window of vulnerability is pretty short and so it is probably unlikely to happen. This simplifies the code, but it's not that difficult to track and eliminate duplicates: we could, for example, store each filename as the key in an associative array, ignoring the file if the key already exists.
We need three processes: one to execute inotifywait; one to produce the list of initial files; and one to handle each file as it is identified. So the basic structure of the code will be:
list_new_files |
{ list_existing_files; pass_through; } |
while read action file; do
handle -r "$action" "$file"
done
Note that the second process first produces the existing files, and then calls pass_through, which reads from standard input and writes to standard output, thus passing through the files being discovered by list_new_files. Since pipes have a finite capacity, it is possible that the execution of list_existing_files will block a few times (if there are lots of existing files and handling them takes a long time), so when pass_through finally gets executed, it could have quite a bit of queued-up input to pass through. That doesn't matter, unless the first pipe also fills up, which will happen if a large number of new files are created. And that still won't matter as long as inotifywait doesn't lose notifications while it is blocked on a write. (This may actually be a problem, since the manpage for inotifywait on my system includes in the "BUGS" section the note, "It is assumed the inotify event queue will never overflow." We could fix the problem by inserting another process which carefully buffers inotifywait's output, but that shouldn't be necessary unless you intend to flood the directory with lots of files.)
Now, let's examine each of the functions in turn.
list_new_files could be just the call to inotifywait from your original script:
inotifywait -m /home/pi/rpitx/converted -e create -e moved_to
Listing existing files is also easy. Here's one simple solution:
printf "%s\n" /home/pi/rpitx/converted/*
However, that will print out the full file path, which is different from the output from inotifywait. To make them the same, we cd into the directory in order to do the listing. Since we might not actually want to change the working directory, we use a subshell by surrounding the commands inside parentheses:
( cd /home/pie/rpitx/converted; printf "%s\n" *; )
The printf just prints its arguments each on a separate line. Since glob-expansions are not word-split or recursively glob-expanded, this is safe against whitespace or metacharacters in filenames, except newline characters. Filenames with newline characters are pretty rare; for now, I'll ignore the issue but I'll indicate how to handle it at the end.
Even with the change indicated above, the output from these two commands is not compatible: the first one outputs three things on each line (directory, action, filename), and the second one just one thing (the filename). In the listing below, you'll see how we modify the format to printf and introduce a format for inotifywait in order to make the outputs fully compatible, with the "action" for existing files set to EXISTING.
pass_through could, in theory, just be cat, and that's how I've coded it below. However, it is important that it operate in line-buffered mode; otherwise, nothing will happen until "enough" files have been written by list_existing_files. On my system, cat in this configuration works perfectly; if that doesn't work for you or you don't want to count on it, you could write it explicitly as a while read loop:
pass_through() {
while read -r line; do echo "$line"; done
}
Finally, handle is essentially the code from the original post, but modified a bit to take the new format into account, and to do the right thing with action EXISTING.
# Colours. Note the use of `$'...'` to actually store the code,
# thereby avoiding the need to later reinterpret backslash sequences
CYAN=$'\e[36m'
NC=$'\e[39m'
LGREEN=$'\e[92m'
converted=/home/pi/rpitx/converted
list_new_files() {
inotifywait -m "$converted" -e create -e moved_to --format "%e %f"
}
# Note the use of ( ) around the body instead of { }
# This is the same as `{( ... )}'; it makes the `cd` local to the function.
list_existing_files() (
cd "$converted"
printf "EXISTING %s\n" *
)
# Invoked as `handle action filename`
handle() {
case "$1" in
EXISTING)
echo "${CYAN}Now playing ${2}...${NC}"
;;
*)
echo "${LGREEN}New file found: ${CYAN}${file}${NC}"
;;
esac
sudo ./rpitx -m RF -i "${f}" -f 101100
}
# Put everything together
list_new_files |
{ list_existing_files; cat; } |
while read -r action file; do handle "$action" "$file"; done
What if we thought a filename might have a newline character in it? There are two "safe" characters which could be used to delimit the filenames, in the sense that they cannot appear inside a filename. One is /, which can obviously appear in a path, but cannot appear in a simple filename, which is what we're working with here. The other one is the NUL character, which cannot appear inside a filename at all, but can sometimes be a bit annoying to deal with.
Normally, faced with this problem, we would use a NUL, but that depends on the various utilities we're using allowing the separation of data with NUL instead of newline. That's not the case for inotifywait, which always outputs a newline after a notification line. So in this case it seems simpler to use a /. First we modify the formats:
inotifywait -m "$converted" -e create -e moved_to --format "%e %f/"
printf "%s/\n" *
Now, when we're reading the lines, we need to read until we find a line ending with / (and remember to remove it). read doesn't allow two-character line terminators, so we need to accumulate the lines ourselves:
while read -r action file; do
# If file doesn't end with a slash, we need to read another line
while [[ file != */ ]] && read -r line; do
file+=$'\n'"$line"
done
# Remember to remove the trailing slash
handle "$action" "${file%/}"
done

grep behaves strange in combination with inotifywait

I'm having a hard time understanding some sort of anomaly with grep return value.
As noted in the grep man page, the return value will be zero in case of a match and non-zero in case of no-match/error/etc.
In this code: (bash)
inotifywait -m ./logdir -e create -e moved_to |
while read path action file; do
if grep -a -q "String to match" "$path/$file"; then
# do something
fi
done
It returns non-zero when matched.
In this code: (bash)
search_file()
{
if grep -a -q "String to match" "$1"; then
# do something
fi
}
inotifywait -m ./logdir -e create -e moved_to |
while read path action file; do
search_file "$path/$file"
done
It returns zero when matched.
Can someone explain to me what is going on?
EDIT:
Let me be clear once more: if I run the first code on a file that contains the string, the if statement is running. if i run the second code on the same file, the if statement fails and does not run.

I support #John1024's conjecture that he wrote as a comment.
The "anomaly" is likely due to a slight timing difference between the two versions of your script. In case of a create event the file is initially empty, so grep will start scanning a partially written file. Calling grep through a function introduces a small delay, which increases the chances of the searched-for data to appear in the file by the time grep opens the file.
The solution to this race condition depends on a couple of assumptions/requirements:
Can you assume that pre-existing files in the watched directory will not be modified?
Do you want to identify every new matching file as soon as possible, or you can afford delaying its processing until it is closed?

BASH Script - Safe limits for string from command output

Good day,
I am writing a relatively simple BASH script that performs an SVN UP command, captures the console output, then does some post processing on the text.
For example:
#!/bin/bash
# A script to alter SVN logs a bit
# Update and get output
echo "Waiting for update command to complete..."
TEST_TEXT=$(svn up --set-depth infinity)
echo "Done"
# Count number of lines in output and report it
NUM_LINES=$(echo $TEST_TEXT | grep -c '.*')
echo "Number of lines in output log: $NUM_LINES"
# Print out only lines containing Makefile
echo $TEST_TEXT | grep Makefile
This works as expected (ie: as commented in the code above), but I am concerned about what would happen if I ran this on a very large repository. Is there a limit on the maximum buffer size BASH can use to hold the output of a console command?
I have looked for similar questions, but nothing quite like what I'm searching for. I've read up on how certain scripts need to use the xargs in cases of large intermediate buffers, and I'm wondering if something similar applies here with respect to capturing console output.
eg:
# Might fail if we have a LOT of results
find -iname *.cpp | rm
# Shouldn't fail, regardless of number of results
find -iname *.cpp | xargs rm
Thank you.

Using
var=$(hexdump /dev/urandom | tee out)
bash didn't complain; I killed it at a bit over 1G and 23.5M lines. You don't need to worry as long as your output fits in your system's memory.

I see no reason not to use a temporary file here.
tmp_file=$(mktemp XXXXX)
svn up --set-depth=infinity > $tmp_file
echo "Done"
# Count number of lines in output and report it
NUM_LINES=$(wc -l $tmp_file)
echo "Number of lines in output log: $NUM_LINES"
# Print out only lines containing Makefile
grep Makefile $tmp_file
rm $tmp_file

Generate shell script call tree

I've been handed a project that consists of several dozen (probably over 100, I haven't counted) bash scripts. Most of the scripts make at least one call to another one of the scripts. I'd like to get the equivalent of a call graph where the nodes are the scripts instead of functions.
Is there any existing software to do this?
If not, does anybody have clever ideas for how to do this?
Best plan I could come up with was to enumerate the scripts and check to see if the basenames are unique (they span multiple directories). If there are duplicate basenames, then cry, because the script paths are usually held in variable names so you may not be able to disambiguate. If they are unique, then grep the names in the scripts and use those results to build up a graph. Use some tool (suggestions?) to visualize the graph.
Suggestions?

Wrap the shell itself by your implementation, log who called you wrapper and exec the original shell.
Yes you have to start the scripts in order to identify which script is really used. Otherwise you need a tool with the same knowledge as the shell engine itself to support the whole variable expansion, PATHs etc -- I never heard about such a tool.
In order to visualize the calling graph use GraphViz's dot format.

Here's how I wound up doing it (disclaimer: a lot of this is hack-ish, so you may want to clean up if you're going to use it long-term)...
Assumptions:
- Current directory contains all scripts/binaries in question.
- Files for building the graph go in subdir call_graph.
Created the script call_graph/make_tgf.sh:
#!/bin/bash
# Run from dir with scripts and subdir call_graph
# Parameters:
# $1 = sources (default is call_graph/sources.txt)
# $2 = targets (default is call_graph/targets.txt)
SOURCES=$1
if [ "$SOURCES" == "" ]; then SOURCES=call_graph/sources.txt; fi
TARGETS=$2
if [ "$TARGETS" == "" ]; then TARGETS=call_graph/targets.txt; fi
if [ ! -d call_graph ]; then echo "Run from parent dir of call_graph" >&2; exit 1; fi
(
# cat call_graph/targets.txt
for file in `cat $SOURCES `
do
for target in `grep -v -E '^ *#' $file | grep -o -F -w -f $TARGETS | grep -v -w $file | sort | uniq`
do echo $file $target
done
done
)
Then, I ran the following (I wound up doing the scripts-only version):
cat /dev/null | tee call_graph/sources.txt > call_graph/targets.txt
for file in *
do
if [ -d "$file" ]; then continue; fi
echo $file >> call_graph/targets.txt
if file $file | grep text >/dev/null; then echo $file >> call_graph/sources.txt; fi
done
# For scripts only:
bash call_graph/make_tgf.sh call_graph/sources.txt call_graph/sources.txt > call_graph/scripts.tgf
# For scripts + binaries (binaries will be leaf nodes):
bash call_graph/make_tgf.sh > call_graph/scripts_and_bin.tgf
I then opened the resulting tgf file in yEd, and had yEd do the layout (Layout -> Hierarchical). I saved as graphml to separate the manually-editable file from the automatically-generated one.
I found that there were certain nodes that were not helpful to have in the graph, such as utility scripts/binaries that were called all over the place. So, I removed these from the sources/targets files and regenerated as necessary until I liked the node set.
Hope this helps somebody...

Insert a line at the beginning of each shell script, after the #! line, which logs a timestamp, the full pathname of the script, and the argument list.
Over time, you can mine this log to identify likely candidates, i.e. two lines logged very close together have a high probability of the first script calling the second.
This also allows you to focus on the scripts which are still actually in use.
You could use an ed script
1a
log blah blah blah
.
wq
and run it like so:
find / -perm +x -exec ed {} <edscript
Make sure you test the find command with -print instead of the exec clause. And / is probably not the path that you want to use. If you have to include bin directories then you will probably need to switch to grep in order to identify the pathnames to include, then when you have a file full of the right names, use xargs instead of find to run the script.

Is Recursive Grep Really Better?; How to Improve PBS-based Bash Script?; and other Questions

I work in a research group and we use the PBS queuing system. I'm no PBS master, but I wanted to script a search for if a job was running. To do this I first grab a string of all the jobs by using the results of a qstat call as my argument to qstat -f and then taking the detailed list of all jobs and searching it for the submitted file path. The current kludge stands as follows
dump=`qstat -f `qstat``
if grep -q \
"/${compounds[$i]}/D0_${j}_z_$((k*30))/scripts/jobscript_minim" \
<<<$dump; then
echo "Minimize is running!"
fi
Suggestions for improvement?
Also, I've been told that $() is cleaner than ``. But when I try:
dump="$(qstat -f "$(qstat)")"
...my program fails. Why is this? Am I misunderstanding how to nest shell calls with $()?? Or is it something to do with how I'm passing the list of queue jobs from qstat to qstat -f? Should I be using awk or something to grab the jobs from the qstat command and then somehow pass them as args to qstat -f?
Also should I be using recursive grep? Some people tell me its "saner" but I'm not sure what that means. Is it more portable? Is it faster? Does it need less trips to the therapist?
What is the reason you should use it?

Alright... managed to come up with a clean solution...
search_dir="${compounds[${i}]}/D0_${j}_z_$[30*k]"
if [ ! -z "$(qstat -f $(qstat | grep -F jmick | awk '{print $1}')|\
grep -F "$search_dir"|head -n 1)" ]
then
...since the directory I'm searching for is kind of long I assign it to a variable. I run the inner command substitution to get only the jobs with my user name, then run the outer command substitution to print full details on those jobs and then grep through those details for my directory. In case it finds it early I included a head to try to short circuit the command.
The question of what's the point of recursive grep, though, still stands.

A recursive grep will search multiple files in all the subdirectories. Without using recursion it will search a file or files only in the current (or specified) directory. I can't see how one would be any "saner" than the other. They each have their particular applications.
By the way, you should really split your questions into specific issues rather than posting them together - even if they have something in common. This site works better when you do it that way.

Try without the quotes:
dump=$(qstat -f $(qstat))

dump=`qstat -f `qstat`` is equivalent to dump=$(qstat -f )qstat$() which is equivalent to dump="$(qstat -f)qstat".
qstat -f "$(qstat)" calls qstat with two arguments: the option -f, and the output from qstat lumped together as a single word. dump="$(qstat -f "$(qstat)")" sets dump to the output of the outer qstat command.
qstat -f $(qstat) calls qstat with any number of arguments starting from 1, depending on the output from qstat: first the output of qstat is split into separate words at each whitespace sequence, then each word that looks like a glob pattern (i.e. contains *, ? or [) that matches at least one file is replaced by the list of matching file names. All these words and file names become individual arguments to the outer qstat.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

How to tail dynamically created files following a regex - bash

Related

Append to list of files in bash

grep behaves strange in combination with inotifywait

BASH Script - Safe limits for string from command output

Generate shell script call tree

Is Recursive Grep Really Better?; How to Improve PBS-based Bash Script?; and other Questions

Categories

Resources