I have the following output in a text file:
106 pages in list
.bookmarks
20130516 - Daily Meeting Minutes
20130517 - Daily Meeting Minutes
20130520 - Daily Meeting Minutes
20130521 - Daily Meeting Minutes
I'm looking to remove the first 2 lines from my output. This particular shell script that I use to execute, always has those first 2 lines.
This is how I generated and read the file:
#Lists
PGLIST="$STAGE/pglist.lst";
RUNSCRIPT="$STAGE/runPagesToMove.sh";
#Get List of pages
$ATL_BASE/confluence.sh $CMD_PGLIST $CMD_SPACE "$1" > "$PGLIST";
# BUILD executeable script
echo "#!/bin/bash" >> $RUNSCRIPT 2>&1
IFS=''
while read line
do
echo "$ATL_BASE/conflunce.sh $CMD_MVPAGE $CMD_SPACE "$1" --title \"$line\" --newSpace \"$2\" --parent \"$3\"" >> $RUNSCRIPT 2>&1
done < $PGLIST
How do I remove those top 2 lines?
You can achieve this with tail:
tail -n +3 "$PGLIST"
-n, --lines=K
output the last K lines, instead of the last 10; or use -n +K
to output starting with the Kth
The classic answer would use sed to delete lines 1 and 2:
sed 1,2d "$PGLIST"
awk way:
awk 'NR>2' "$PGLIST"
Related
I worked very little with scripts and I don't know..
I need to create a script (in Ubuntu) that copies only files where a certain user modified more than 20 lines at a given time.
I know that to copy a file elsewhere I use this code:
$ ls dir1/
dir2/
$ cp -r dir1/ dir1.copy
$ ls dir1.copy
dir2/
And to count lines: wc -l file1
But how could I check if a user has modified more than 20 lines in a file (eg a simple txt, for example today)?
Thank you in advance !
In the first place, if by "modifying lines in a file" you mean "adding lines to a file", then you can do something about it. If you are literally talking about modifying lines in files, there is nothing you can do to track that activity without setting up some version control first.
So, assuming we are talking about files in which your users will be adding lines, a workaround for that may consist of setting up some scheduled tasks to check the line numbers of those files "at a given time" (as you said) and compare that value to a previous result, and then if there are more than 20 additional lines than from the last value, copy the files elsewhere.
First things first, counting the lines of your files is something you have already mentioned and it is right: I will propose using wc -l too.
Once here you will need two things: one place (tipically a file) to periodically save the number of lines of your files at a given time and one trigger that would start copying the files in case there have been more than 20 lines added.
So for example, in this case you can set up a cron job, like this (i.e. to run every hour):
0 */1 * * * cat ${FILE} |wc -l > /tmp/${FILE}_counter
That one will check the number of lines of a given file and send the output to a temporary file that we will be using soon. In case you have multiple files you can easily script that and make a loop, like this:
#!/bin/bash
for FILE in file1 file2 file3; do
cat ${FILE} |wc -l > /tmp/${FILE}_counter
done
Don't forget to add the path to the script in the cron job if you do that this way. After that, you will have something like this in your /tmp directory:
/tmp/file1_counter
/tmp/file2_counter
/tmp/file3_counter
...
At this point you only need a trigger, which can be another script, to compare the current number of lines of a file at a given time and start copying it elsewhere in case there are more than 20 additional lines than in the previous check. Consider this:
#!/bin/bash
LAST_VALUE=$(cat /tmp/${FILE}_counter)
CURRENT_VALUE=$(cat ${FILE} |wc -l)
if [ ${CURRENT_VALUE} -gt $(expr ${LAST_VALUE} + 20) ]; then
# Your cp stuff here
fi
Of course you can add a loop here too in case of handling multiple files:
#!/bin/bash
LAST_VALUE=$(cat /tmp/${FILE}_counter)
CURRENT_VALUE=$(cat ${FILE} |wc -l)
for FILE in file1 file2 file3; do
if [ ${CURRENT_VALUE} -gt $(expr ${LAST_VALUE} + 20) ]; then
# Your cp stuff here
fi
done
Then you only have to add this last script to a cron job too, and you should be done.
Hope you find this useful.
You can use diff to compare 2 files. With the -u0 option, it will show you the added/deleted/modified lines, prefixed with "+" or "-". You can then count lines starting with "+" or "-" with grep and it's -c option.
So for the number of lines added or modified, which begin with "+" :
diff -u0 $file_before $file_after | grep -c '^+'
and this will count the deleted or modified lines, which start with "-" :
diff -u0 $file_before $file_after | grep -c '^-'
Note that there are 2 header lines in this format, which also start with "+" and "-", so you may want to take that in account.
I'm trying to determine where to cut off a log in order to shrink its size.
The log was started in 2010 and has been appended to by scripts that run daily since then. I'm grepping each line of the log to pull out lines that have dates in them, and then I want to grab the last 4 characters of those lines as that represents the year. Then I can determine on what line the year 2018 first appears for example, and truncate the file above that.
I'm trying to use tail -c 4 to grab the last 4 characters of each line, but I keep getting "cannot open input" error from tail.
Code:
#!/bin/bash
date=$(grep ' EST ' input.log)
IFS=$'\n'
for line in $date
do
printf "%s\n" "$line" > output.tmp
chmod 777 output.tmp
echo $(tail -c 4 output.tmp)
done
When I run this code with just "tail output.tmp", with no options, it works as expected and outputs the full line that is currently being iterated.
But when I try to use tail -c 4, that's when I get the "tail: cannot open input" error.
I have checked the man page for tail and -c option is available, so what am I doing wrong? Or is there a better way to approach this besides using tail? (I do not have grep -o option available on my system).
You don't need a temp file:
#!/bin/bash
date=$(grep ' EST ' input.log)
IFS=$'\n'
for line in $date
do
echo ${line: -4}
done
I am looking for a bash snippet for limiting the amount of console output from a shell command that could potentially become too verbose.
The purpose of this is for usage in build/CI environments where you do want to limit the amount out console output in order to prevent overloading the CI server (or even client tailing the output).
Full requirements:
display only up to 100 lines from the top (head) of the command output
display only up to 100 lines from the bottom (tail) of the command output
archive both stdout and stderr in full into a command.log.gz file
console output must be displayed relatively in realtime, a solution that output the result at the end is not acceptable as we need to be able to see its execution progress.
Current findings
unbuffer could be used to force the stdout/stderr to be unbuffered
|& tee can be used to send output to both archiver and tail/head
|& gzip --stdout >command.log.gz could archive the console output
head -n100 and tail -n100 can be used to limit the console output they introduce at least some problems like undesired results if number of output lines is under 200.
From what I understand you need to do limit output online (while it's being generated).
Here is a function that I can think of that would be useful for you.
limit_output() {
FullLogFile="./output.log" # Log file to keep the input content
typeset -i MAX=15 # number or lines from head, from tail
typeset -i LINES=0 # number of lines displayed
# tee will save the copy of the input into a log file
tee "$FullLogFile" | {
# The pipe will cause this part to be executed in a subshell
# The command keeps LINES from losing it's value before if
while read -r Line; do
if [[ $LINES -lt $MAX ]]; then
LINES=LINES+1
echo "$Line" # Display first few lines on screen
elif [[ $LINES -lt $(($MAX*2)) ]]; then
LINES=LINES+1 # Count the lines for a little longer
echo -n "." # Reduce line output to single dot
else
echo -n "." # Reduce line output to single dot
fi
done
echo "" # Finish with the dots
# Tail last few lines, not found in head and not more then max
if [[ $LINES -gt $MAX ]]; then
tail -n $(($LINES-$MAX)) "$FullLogFile"
fi
}
}
Use it in a script, load it to current shell or put it in .bash_profile to be loaded on user session.
Usage examples: cat /var/log/messages | limit_output or ./configure | limit_output
The function will read the standard input, save it to a log file, display the first MAX lines, then reduce each line to a single dot (.) on screen, then finally display the last MAX lines (or less if output was shorter then MAX*2).
Here is my current incomplete solution which for convenience is demonstrating processing a 10 lines output and that will (hopefully) limit the output to first 2 lines and last two lines.
#!/bin/bash
seq 10 | tee >(gzip --stdout >output.log.gz) | tail -n2
One way I use to achieve this is:
./configure | tee output.log | head -n 5; tail -n 2 output.log
What this does is:
Write the complete output to a filed called output.log using tee
Only print the first 5 lines using head -n
In the end print the last two lines from the written output.log using tail -n
I have multiple files which have the same structure but not the same data. Say their names are values_#####.txt (values_00001.txt, values_00002.txt, etc.).
I want to extract a specific line from each file and copy it in another file. For example, I want to extract the 8th line from values_00001.txt, the 16th line from values_00002.txt, the 24th line from values_00003.txt and so on (increment = 8 each time), and copy them line by line in a new file (say values.dat).
I am new to shell scripting, I tried to use sed, but I didn't figure out how to do that.
Thank you in advance for your answers !
I believe ordering of files is also important to make sure you get output in desired sequence.
Consider this script:
n=8
while read f; do
sed $n'q;d' "$f" >> output.txt
((n+=8))
done < <(printf "%s\n" values_*.txt|sort -t_ -nk2,2)
This can make it:
for var in {1..NUMBER}
do
awk -v line=$var 'NR==8*line' values_${var}.txt >> values.dat
done
Explanation
The for loop is basic.
-v line=$var "gives" the $var value to awk, so it can be used with the variable line.
'NR==8*line' prints the line number 8*{value we are checking}.
values_${var}.txt gets the file values_1.txt, values_2.txt, and so on.
>> values.dat redirects to values.dat file.
Test
I created 3 equal files a1, a2, a3. They contain 30 lines, being each one the line number:
$ cat a1
1
2
3
4
...
Executing the one liner:
$ for var in {1..3}; do awk -v line=$var 'NR==8*line' a${var} >> values.dat; done
$ cat values.dat
8
16
24
I have to fetch one specific line out of a big file (1500000 lines), multiple times in a loop over multiple files, I was asking my self what would be the best option (in terms of performance).
There are many ways to do this, i manly use these 2
cat ${file} | head -1
or
cat ${file} | sed -n '1p'
I could not find an answer to this do they both only fetch the first line or one of the two (or both) first open the whole file and then fetch the row 1?
Drop the useless use of cat and do:
$ sed -n '1{p;q}' file
This will quit the sed script after the line has been printed.
Benchmarking script:
#!/bin/bash
TIMEFORMAT='%3R'
n=25
heading=('head -1 file' 'sed -n 1p file' "sed -n '1{p;q} file" 'read line < file && echo $line')
# files upto a hundred million lines (if your on slow machine decrease!!)
for (( j=1; j<=100,000,000;j=j*10 ))
do
echo "Lines in file: $j"
# create file containing j lines
seq 1 $j > file
# initial read of file
cat file > /dev/null
for comm in {0..3}
do
avg=0
echo
echo ${heading[$comm]}
for (( i=1; i<=$n; i++ ))
do
case $comm in
0)
t=$( { time head -1 file > /dev/null; } 2>&1);;
1)
t=$( { time sed -n 1p file > /dev/null; } 2>&1);;
2)
t=$( { time sed '1{p;q}' file > /dev/null; } 2>&1);;
3)
t=$( { time read line < file && echo $line > /dev/null; } 2>&1);;
esac
avg=$avg+$t
done
echo "scale=3;($avg)/$n" | bc
done
done
Just save as benchmark.sh and run bash benchmark.sh.
Results:
head -1 file
.001
sed -n 1p file
.048
sed -n '1{p;q} file
.002
read line < file && echo $line
0
**Results from file with 1,000,000 lines.*
So the times for sed -n 1p will grow linearly with the length of the file but the timing for the other variations will be constant (and negligible) as they all quit after reading the first line:
Note: timings are different from original post due to being on a faster Linux box.
If you are really just getting the very first line and reading hundreds of files, then consider shell builtins instead of external external commands, use read which is a shell builtin for bash and ksh. This eliminates the overhead of process creation with awk, sed, head, etc.
The other issue is doing timed performance analysis on I/O. The first time you open and then read a file, file data is probably not cached in memory. However, if you try a second command on the same file again, the data as well as the inode have been cached, so the timed results are may be faster, pretty much regardless of the command you use. Plus, inodes can stay cached practically forever. They do on Solaris for example. Or anyway, several days.
For example, linux caches everything and the kitchen sink, which is a good performance attribute. But it makes benchmarking problematic if you are not aware of the issue.
All of this caching effect "interference" is both OS and hardware dependent.
So - pick one file, read it with a command. Now it is cached. Run the same test command several dozen times, this is sampling the effect of the command and child process creation, not your I/O hardware.
this is sed vs read for 10 iterations of getting the first line of the same file, after read the file once:
sed: sed '1{p;q}' uopgenl20121216.lis
real 0m0.917s
user 0m0.258s
sys 0m0.492s
read: read foo < uopgenl20121216.lis ; export foo; echo "$foo"
real 0m0.017s
user 0m0.000s
sys 0m0.015s
This is clearly contrived, but does show the difference between builtin performance vs using a command.
If you want to print only 1 line (say the 20th one) from a large file you could also do:
head -20 filename | tail -1
I did a "basic" test with bash and it seems to perform better than the sed -n '1{p;q} solution above.
Test takes a large file and prints a line from somewhere in the middle (at line 10000000), repeats 100 times, each time selecting the next line. So it selects line 10000000,10000001,10000002, ... and so on till 10000099
$wc -l english
36374448 english
$time for i in {0..99}; do j=$((i+10000000)); sed -n $j'{p;q}' english >/dev/null; done;
real 1m27.207s
user 1m20.712s
sys 0m6.284s
vs.
$time for i in {0..99}; do j=$((i+10000000)); head -$j english | tail -1 >/dev/null; done;
real 1m3.796s
user 0m59.356s
sys 0m32.376s
For printing a line out of multiple files
$wc -l english*
36374448 english
17797377 english.1024MB
3461885 english.200MB
57633710 total
$time for i in english*; do sed -n '10000000{p;q}' $i >/dev/null; done;
real 0m2.059s
user 0m1.904s
sys 0m0.144s
$time for i in english*; do head -10000000 $i | tail -1 >/dev/null; done;
real 0m1.535s
user 0m1.420s
sys 0m0.788s
How about avoiding pipes?
Both sed and head support the filename as an argument. In this way you avoid passing by cat. I didn't measure it, but head should be faster on larger files as it stops the computation after N lines (whereas sed goes through all of them, even if it doesn't print them - unless you specify the quit option as suggested above).
Examples:
sed -n '1{p;q}' /path/to/file
head -n 1 /path/to/file
Again, I didn't test the efficiency.
I have done extensive testing, and found that, if you want every line of a file:
while IFS=$'\n' read LINE; do
echo "$LINE"
done < your_input.txt
Is much much faster then any other (Bash based) method out there. All other methods (like sed) read the file each time, at least up to the matching line. If the file is 4 lines long, you will get: 1 -> 1,2 -> 1,2,3 -> 1,2,3,4 = 10 reads whereas the while loop just maintains a position cursor (based on IFS) so would only do 4 reads in total.
On a file with ~15k lines, the difference is phenomenal: ~25-28 seconds (sed based, extracting a specific line from each time) versus ~0-1 seconds (while...read based, reading through the file once)
The above example also shows how to set IFS in a better way to newline (with thanks to Peter from comments below), and this will hopefully fix some of the other issue seen when using while... read ... in Bash at times.
For the sake of completeness you can also use the basic linux command cut:
cut -d $'\n' -f <linenumber> <filename>