I'm trying to determine where to cut off a log in order to shrink its size.
The log was started in 2010 and has been appended to by scripts that run daily since then. I'm grepping each line of the log to pull out lines that have dates in them, and then I want to grab the last 4 characters of those lines as that represents the year. Then I can determine on what line the year 2018 first appears for example, and truncate the file above that.
I'm trying to use tail -c 4 to grab the last 4 characters of each line, but I keep getting "cannot open input" error from tail.
Code:
#!/bin/bash
date=$(grep ' EST ' input.log)
IFS=$'\n'
for line in $date
do
printf "%s\n" "$line" > output.tmp
chmod 777 output.tmp
echo $(tail -c 4 output.tmp)
done
When I run this code with just "tail output.tmp", with no options, it works as expected and outputs the full line that is currently being iterated.
But when I try to use tail -c 4, that's when I get the "tail: cannot open input" error.
I have checked the man page for tail and -c option is available, so what am I doing wrong? Or is there a better way to approach this besides using tail? (I do not have grep -o option available on my system).
You don't need a temp file:
#!/bin/bash
date=$(grep ' EST ' input.log)
IFS=$'\n'
for line in $date
do
echo ${line: -4}
done
Related
I use tail -f to show the contents of a logfile.
What I want is when the logfile content changes, instead of appending the new lines to my screen, only the newly added lines should be shown on my screen.
So as if a clearscreen was made every time before printing the new lines.
I tried to find a solution by web search but couldn't find anything useful.
edit:
In my case it happens that several lines will be added at once (it is a php error logfile). So I am looking for a solution where more than the single last line can be shown on screen.
The watch command in combination with the tail command shows the last line of a log file with the intervall of every 2 seconds. Basically it doesn't refresh whenever a new line is appended to the log file but since you could specifiy an intervall it might help you for your use case.
watch -t tail -1 <path_to_logfile>
If you need a faster intervall like every 0.5 seconds, then you could specify it with the 'n' option i.e.:
watch -t -n 0.5 tail -1 <path_to_logfile>
Try
$ watch 'tac FILE | grep -m1 -C2 PATTERN | tac'
where
PATTERN is any keyword (or regexp) to identify errors you seek in the log,
tac prints the lines in reverse,
-m is a max count of matching lines to grep,
-C is any number of lines of context (before and after the match) to show (optional).
That would be similar to
$ tail -f FILE | grep -C2 PATTERN
if you didn't mind just appending occurrences to the output in real-time.
But if you don't know any generic PATTERN to look for at all,
you'd have to just follow all the updates as the logfile grows:
$ tail -n0 -f FILE
Or even, create a copy of the logfile and then do a diff:
Copy: cp file.log{,.old}
Refresh the webpage with your .php code (or whatever, to trigger the error)
Run: diff file.log{,.old}
(or, if you prefer sort to diff: $ sort file.log{,.old} | uniq -u)
The curly braces is shorthand for both filenames (see Brace Expansion in $ man bash)
If you must avoid any temp copies, store the line count in memory:
z=$(grep -c ^ file.log)
Refresh the webpage to trigger an error
tail -n +$z file.log
The latter approach can be built upon, to create a custom scripting solution more suitable for your needs (check timestamps, clear screen, filter specific errors, etc). For example, to only show the lines that belong to the last error message in the log file updated in real-time:
$ clear; z=$(grep -c ^ FILE); while true; do d=$(date -r FILE); sleep 1; b=$(date -r FILE); if [ "$d" != "$b" ]; then clear; tail -n +$z FILE; z=$(grep -c ^ FILE); fi; done
where
FILE is, obviously, your log file name;
grep -c ^ FILE counts all lines in a file (that is almost, but not entirely unlike cat FILE|wc -l that would only count newlines);
sleep 1 sets the pause/delay between checking the file timestamps to 1 second, but you could change it to even a floating point number (the less the interval, the higher the CPU usage).
To simplify any repetitive invocations in future, you could save this compound command in a Bash script that could take a target logfile name as an argument, or define a shell function, or create an alias in your shell, or just reverse-search your bash history with CTRL+R. Hope it helps!
I keep text files with definitions in a folder. I like to convert them to spoken word so I can listen to them. I already do this manually by running a few commands to insert some pre-processing codes into the text files and then convert the text to spoken word like so:
sed 's/\..*$/[[slnc 2000]]/' input.txt inserts a control code after first period
sed 's/$/[[slnc 2000]]/' input.txt" inserts a control code at end of each line
cat input.txt | say -v Alex -o input.aiff
Instead of having to retype these each time, I would like to create a Bash script that pipes the output of these commands to the final product. I want to call the script with the script name, followed by an input file argument for the text file. I want to preserve the original text file so that if I open it again, none of the control codes are actually inserted, as the only purpose of the control codes is to insert pauses in the audio file.
I've tried writing
#!/bin/bash
FILE=$1
sed 's/$/ [[slnc 2000]]/' FILE -o FILE
But I get hung up immediately as it says sed: -o: No such file or directory. Can anyone help out?
If you just want to use foo.txt to generate foo.aiff with control characters, you can do:
#!/bin/sh
for file; do
test "${file%.txt}" = "${file}" && continue
sed -e 's/\..*$/[[slnc 2000]]/' "$file" |
sed -e 's/$/[[slnc 2000]]/' |
say -v Alex -o "${file%.txt}".aiff
done
Call the script with your .txt files as arguments (eg, ./myscript *.txt) and it will generate the .aiff files. Be warned, if say overwrites files, then this will as well. You don't really need two sed invocations, and the sed that you're calling can be cleaned up, but I don't want to distract from the core issue here, so I'm leaving that as you have it.
This will:-
a} Make a list of your text files to process in the current directory, with find.
b} Apply your sed commands to each text file in the list, but only for the current use, allowing you to preserve them intact.
c} Call "say" with the edited files.
I don't have say, so I can't test that or the control codes; but as long as you have Ed, the loop works. I've used it many times. I learned it as a result of exposure to FORTH, which is a language that still permits unterminated loops. I used to have problems with remembering to invoke next at the end of the script in order to start it, but I got over that by defining my words (functions) first, in FORTH style, and then always placing my single-use commands at the end.
#!/bin/sh
next() {
[[ -s stack ]] && main
end
}
main() {
line=$(ed -s stack < edprint+.txt)
infile=$(cat "${line}" | sed 's/\..*$/[[slnc 2000]]/' | sed 's/$/[[slnc 2000]]/')
say "${infile}" -v Alex -o input.aiff
ed -s stack < edpop+.txt
next
}
end() {
rm -v ./stack
rm -v ./edprint+.txt
rm -v ./edpop+.txt
exit 0
}
find *.txt -type -f > stack
cat >> edprint+.txt << EOF
1
q
EOF
cat >> edpop+.txt << EOF
1d
wq
EOF
next
I am looking for a bash snippet for limiting the amount of console output from a shell command that could potentially become too verbose.
The purpose of this is for usage in build/CI environments where you do want to limit the amount out console output in order to prevent overloading the CI server (or even client tailing the output).
Full requirements:
display only up to 100 lines from the top (head) of the command output
display only up to 100 lines from the bottom (tail) of the command output
archive both stdout and stderr in full into a command.log.gz file
console output must be displayed relatively in realtime, a solution that output the result at the end is not acceptable as we need to be able to see its execution progress.
Current findings
unbuffer could be used to force the stdout/stderr to be unbuffered
|& tee can be used to send output to both archiver and tail/head
|& gzip --stdout >command.log.gz could archive the console output
head -n100 and tail -n100 can be used to limit the console output they introduce at least some problems like undesired results if number of output lines is under 200.
From what I understand you need to do limit output online (while it's being generated).
Here is a function that I can think of that would be useful for you.
limit_output() {
FullLogFile="./output.log" # Log file to keep the input content
typeset -i MAX=15 # number or lines from head, from tail
typeset -i LINES=0 # number of lines displayed
# tee will save the copy of the input into a log file
tee "$FullLogFile" | {
# The pipe will cause this part to be executed in a subshell
# The command keeps LINES from losing it's value before if
while read -r Line; do
if [[ $LINES -lt $MAX ]]; then
LINES=LINES+1
echo "$Line" # Display first few lines on screen
elif [[ $LINES -lt $(($MAX*2)) ]]; then
LINES=LINES+1 # Count the lines for a little longer
echo -n "." # Reduce line output to single dot
else
echo -n "." # Reduce line output to single dot
fi
done
echo "" # Finish with the dots
# Tail last few lines, not found in head and not more then max
if [[ $LINES -gt $MAX ]]; then
tail -n $(($LINES-$MAX)) "$FullLogFile"
fi
}
}
Use it in a script, load it to current shell or put it in .bash_profile to be loaded on user session.
Usage examples: cat /var/log/messages | limit_output or ./configure | limit_output
The function will read the standard input, save it to a log file, display the first MAX lines, then reduce each line to a single dot (.) on screen, then finally display the last MAX lines (or less if output was shorter then MAX*2).
Here is my current incomplete solution which for convenience is demonstrating processing a 10 lines output and that will (hopefully) limit the output to first 2 lines and last two lines.
#!/bin/bash
seq 10 | tee >(gzip --stdout >output.log.gz) | tail -n2
One way I use to achieve this is:
./configure | tee output.log | head -n 5; tail -n 2 output.log
What this does is:
Write the complete output to a filed called output.log using tee
Only print the first 5 lines using head -n
In the end print the last two lines from the written output.log using tail -n
I have the following output in a text file:
106 pages in list
.bookmarks
20130516 - Daily Meeting Minutes
20130517 - Daily Meeting Minutes
20130520 - Daily Meeting Minutes
20130521 - Daily Meeting Minutes
I'm looking to remove the first 2 lines from my output. This particular shell script that I use to execute, always has those first 2 lines.
This is how I generated and read the file:
#Lists
PGLIST="$STAGE/pglist.lst";
RUNSCRIPT="$STAGE/runPagesToMove.sh";
#Get List of pages
$ATL_BASE/confluence.sh $CMD_PGLIST $CMD_SPACE "$1" > "$PGLIST";
# BUILD executeable script
echo "#!/bin/bash" >> $RUNSCRIPT 2>&1
IFS=''
while read line
do
echo "$ATL_BASE/conflunce.sh $CMD_MVPAGE $CMD_SPACE "$1" --title \"$line\" --newSpace \"$2\" --parent \"$3\"" >> $RUNSCRIPT 2>&1
done < $PGLIST
How do I remove those top 2 lines?
You can achieve this with tail:
tail -n +3 "$PGLIST"
-n, --lines=K
output the last K lines, instead of the last 10; or use -n +K
to output starting with the Kth
The classic answer would use sed to delete lines 1 and 2:
sed 1,2d "$PGLIST"
awk way:
awk 'NR>2' "$PGLIST"
I have to fetch one specific line out of a big file (1500000 lines), multiple times in a loop over multiple files, I was asking my self what would be the best option (in terms of performance).
There are many ways to do this, i manly use these 2
cat ${file} | head -1
or
cat ${file} | sed -n '1p'
I could not find an answer to this do they both only fetch the first line or one of the two (or both) first open the whole file and then fetch the row 1?
Drop the useless use of cat and do:
$ sed -n '1{p;q}' file
This will quit the sed script after the line has been printed.
Benchmarking script:
#!/bin/bash
TIMEFORMAT='%3R'
n=25
heading=('head -1 file' 'sed -n 1p file' "sed -n '1{p;q} file" 'read line < file && echo $line')
# files upto a hundred million lines (if your on slow machine decrease!!)
for (( j=1; j<=100,000,000;j=j*10 ))
do
echo "Lines in file: $j"
# create file containing j lines
seq 1 $j > file
# initial read of file
cat file > /dev/null
for comm in {0..3}
do
avg=0
echo
echo ${heading[$comm]}
for (( i=1; i<=$n; i++ ))
do
case $comm in
0)
t=$( { time head -1 file > /dev/null; } 2>&1);;
1)
t=$( { time sed -n 1p file > /dev/null; } 2>&1);;
2)
t=$( { time sed '1{p;q}' file > /dev/null; } 2>&1);;
3)
t=$( { time read line < file && echo $line > /dev/null; } 2>&1);;
esac
avg=$avg+$t
done
echo "scale=3;($avg)/$n" | bc
done
done
Just save as benchmark.sh and run bash benchmark.sh.
Results:
head -1 file
.001
sed -n 1p file
.048
sed -n '1{p;q} file
.002
read line < file && echo $line
0
**Results from file with 1,000,000 lines.*
So the times for sed -n 1p will grow linearly with the length of the file but the timing for the other variations will be constant (and negligible) as they all quit after reading the first line:
Note: timings are different from original post due to being on a faster Linux box.
If you are really just getting the very first line and reading hundreds of files, then consider shell builtins instead of external external commands, use read which is a shell builtin for bash and ksh. This eliminates the overhead of process creation with awk, sed, head, etc.
The other issue is doing timed performance analysis on I/O. The first time you open and then read a file, file data is probably not cached in memory. However, if you try a second command on the same file again, the data as well as the inode have been cached, so the timed results are may be faster, pretty much regardless of the command you use. Plus, inodes can stay cached practically forever. They do on Solaris for example. Or anyway, several days.
For example, linux caches everything and the kitchen sink, which is a good performance attribute. But it makes benchmarking problematic if you are not aware of the issue.
All of this caching effect "interference" is both OS and hardware dependent.
So - pick one file, read it with a command. Now it is cached. Run the same test command several dozen times, this is sampling the effect of the command and child process creation, not your I/O hardware.
this is sed vs read for 10 iterations of getting the first line of the same file, after read the file once:
sed: sed '1{p;q}' uopgenl20121216.lis
real 0m0.917s
user 0m0.258s
sys 0m0.492s
read: read foo < uopgenl20121216.lis ; export foo; echo "$foo"
real 0m0.017s
user 0m0.000s
sys 0m0.015s
This is clearly contrived, but does show the difference between builtin performance vs using a command.
If you want to print only 1 line (say the 20th one) from a large file you could also do:
head -20 filename | tail -1
I did a "basic" test with bash and it seems to perform better than the sed -n '1{p;q} solution above.
Test takes a large file and prints a line from somewhere in the middle (at line 10000000), repeats 100 times, each time selecting the next line. So it selects line 10000000,10000001,10000002, ... and so on till 10000099
$wc -l english
36374448 english
$time for i in {0..99}; do j=$((i+10000000)); sed -n $j'{p;q}' english >/dev/null; done;
real 1m27.207s
user 1m20.712s
sys 0m6.284s
vs.
$time for i in {0..99}; do j=$((i+10000000)); head -$j english | tail -1 >/dev/null; done;
real 1m3.796s
user 0m59.356s
sys 0m32.376s
For printing a line out of multiple files
$wc -l english*
36374448 english
17797377 english.1024MB
3461885 english.200MB
57633710 total
$time for i in english*; do sed -n '10000000{p;q}' $i >/dev/null; done;
real 0m2.059s
user 0m1.904s
sys 0m0.144s
$time for i in english*; do head -10000000 $i | tail -1 >/dev/null; done;
real 0m1.535s
user 0m1.420s
sys 0m0.788s
How about avoiding pipes?
Both sed and head support the filename as an argument. In this way you avoid passing by cat. I didn't measure it, but head should be faster on larger files as it stops the computation after N lines (whereas sed goes through all of them, even if it doesn't print them - unless you specify the quit option as suggested above).
Examples:
sed -n '1{p;q}' /path/to/file
head -n 1 /path/to/file
Again, I didn't test the efficiency.
I have done extensive testing, and found that, if you want every line of a file:
while IFS=$'\n' read LINE; do
echo "$LINE"
done < your_input.txt
Is much much faster then any other (Bash based) method out there. All other methods (like sed) read the file each time, at least up to the matching line. If the file is 4 lines long, you will get: 1 -> 1,2 -> 1,2,3 -> 1,2,3,4 = 10 reads whereas the while loop just maintains a position cursor (based on IFS) so would only do 4 reads in total.
On a file with ~15k lines, the difference is phenomenal: ~25-28 seconds (sed based, extracting a specific line from each time) versus ~0-1 seconds (while...read based, reading through the file once)
The above example also shows how to set IFS in a better way to newline (with thanks to Peter from comments below), and this will hopefully fix some of the other issue seen when using while... read ... in Bash at times.
For the sake of completeness you can also use the basic linux command cut:
cut -d $'\n' -f <linenumber> <filename>