Delete all consecutive lines with sed, but not an isolated one - bash

I have a log file which looks like the following text:
...
5 files analysed in 98 ms
7 files analysed in 654 ms
error1: ....
error2: ....
error3: ....
21 files analysed in 345 ms
3 files analysed in 78 ms
6 files analysed in 55 ms
...
I am looking forward to using "sed" or "awk" in order to remove all consecutive lines containing the pattern "files analysed in", but not the one above the useful information.
7 files analysed in 654 ms
error1: ....
error2: ....
error3: ....
I tried some tricks from this post. But nothing is working like I would like to. The number of errors is not always the same.
How could I proceed?

grep -v "files analysed in" -B 1
select everything that doesn't have the pattern, but provide one line of context before each match

with awk
$ awk '/pattern/{p=$0} !/pattern/{print p; print}' file
foo3 pattern foo4
some useful information
you can also exit after the first match.

Related

diff -u -s, line cound (+, -) not giving correct value

I am using diff -u -s file1 file2 and counting + and - for Added and deleted lines in files for File comparison automation. (Modified lines will also count as one + and one -). These counts match with Araxis tool compare statistics (Total Added+Deleted of script=Changed+deleted+new of Araxis) for most of the files. But script total and Araxis total does not match for few files.
P.S. - I am using cygwin to run script on windows. I tried dos2unix, tail -c 4 etc in hope of removing BOM characters. But out of these culprit files some of them does not have BOM, and still count does not match. Following are few sample culprit files.
(1)SIACPO_ActivacionDesactivacionBlacklist.aspx.vb - Script gives 57 total count, while araxis 55
(2)SIACPO_Suspension_Servicio.aspx - Script gives 2509 total count, while araxis 2473
(3)repCuadreProceso.aspx - Script gives 1165 total count, while araxis 1163
(4)detaPago.aspx.vb - This is strange file. There is no change at all, except BOM character on 1st line. Script gives 0, 0 count, then why at all this in modified list of files??
Now how can I attach these 4 culprint files (Dev as well as Prod version) for your troubleshooting?

How to print lines extracted from a log file within a specified time range?

I'd like to fetch result, let's say from 2017-12-19 19:14 till the entire day from a log file that looks like this -
/var/opt/MarkLogic/Logs/ErrorLog_1.txt:2017-12-19 19:14:00.723 Info: Saving /var/opt/MarkLogic/Forests/Meters/00001829
/var/opt/MarkLogic/Logs/ErrorLog_1.txt:2017-12-19 19:14:01.134 Info: Saved 9 MB at 22 MB/sec to /var/opt/MarkLogic/Forests/Meters/00001829
/var/opt/MarkLogic/Logs/ErrorLog_1.txt:2017-12-19 19:14:01.376 Info: Merging 19 MB from /var/opt/MarkLogic/Forests/Meters/0000182a and /var/opt/MarkLogic/Forests/Meters/00001829 to /var/opt/MarkLogic/Forests/Meters/0000182c, timestamp=15137318408510140
/var/opt/MarkLogic/Logs/ErrorLog_1.txt:2017-12-19 19:14:02.585 Info: Merged 18 MB in 1 sec at 15 MB/sec to /var/opt/MarkLogic/Forests/Meters/0000182c
/var/opt/MarkLogic/Logs/ErrorLog_1.txt:2017-12-19 19:14:05.200 Info: Deleted 15 MB at 337 MB/sec /var/opt/MarkLogic/Forests/Meters/0000182a
/var/opt/MarkLogic/Logs/ErrorLog_1.txt:2017-12-19 19:14:05.202 Info: Deleted 9 MB at 4274 MB/sec /var/opt/MarkLogic/Forests/Meters/00001829
I am new to Unix and familiar with grep command. I tried the below command
date="2017-12-19 [19-23]:[14-59]"
echo "$date"
grep "$date" $root_path_values
but it throws invalid range end error. Any solution ? The date is going to be coming from a variable so it will be unpredictable. Therefore, don't make a command just keeping the example in mind. $root_path_values is a sequence of error files like errorLog.txt, errorLog_1.txt, errorLog_2.txt and so on.
I'd like to fetch result, let's say from 2017-12-19 19:14 till the entire day … The date is going to be coming from a variable …
This is not a job for regular expressions. Since the timestamp has a sensible form, we can simply compare it as a whole, e. g.:
start='2017-12-19 19:14'
end='2017-12-20'
awk -vstart="$start" -vend=$end 'start <= $0 && $0 < end' ErrorLog_1.txt
egrep '2017-12-19 (19|2[0-3])\:(1[4-9]|[2-5][0-9])\:*\.*' path/to/your/file Try this regexp.
In the case if you need pattern in variable:
#!/bin/bash
date="2017-12-19 (19|2[0-3])\:(1[4-9]|[2-5][0-9])\:*\.*"
egrep ${date} path/to/your/file

How to count the number of reviews in my files in a folder and then sort by highest to lowest?

thanks anyone who has helped me so far, here is my problem: I have a folder which contains 825 files. Within these files are reviews of a hotel. An example name of one of these files is hotel_72572.dat and this file basically contains the following:
<Overall Rating>4
<Avg. Price>$173
<URL>http://www.tripadvisor.com/ShowUserReviews-g60878-d72572-r23327047-Best_Western_Pioneer_Square_Hotel-Seattle_Washington.html
<Author>everywhereman2
<Content>Old seattle...
<Date>Jan 6, 2009
<img src="http://cdn.tripadvisor.com/img2/new.gif" alt="New"/>
<No. Reader>-1
<No. Helpful>-1
<Overall>5
<Value>5
<Rooms>5
<Location>5
<Cleanliness>5
<Check in / front desk>5
<Service>5
<Business service>5
<Author> //repeats the fields again, each cluster of fields is a review
The fields (line 6 - <Business service>) are then repeated by n times where n is the number of reviews in the file. I thought that by counting the number of times "Author" appears per file would achieve this but perhaps there is a better solution?
I am trying to write a script that will called countreviews.sh that will count the number of reviews per file in my folder (the folder name is reviews_folder) and then sort the number per file from highest to lowest. An example output will be:
hotel_72572 45
hotel_72579 33
hotel_73727 17
where the prefix is the name of the file and the number is the number of reviews per file. My script must take the folder name as an argument. For example I would type ./countreviews.sh reviews_folder and would get my output.
I have received lots of help over the past few days with many different suggestions but none of them have achieved what I am trying to do (my fault due to poor explanations), I hope this finally explains it clearly enough. Thanks again anyone who has helped me over the past few days and for any help I get for this question.
grep -c Author hotel_*.dat | sort -t : -k2nr | sed 's/\.dat:/ /'
Output (e.g.):
hotel_72572 45
hotel_72579 33
hotel_73727 17
Update
#!/bin/bash
cd "$1" || exit 1
grep -c Author hotel_*.dat | sort -t : -k2nr | sed 's/\.dat:/ /'

Shell script Email bad formatting?

My script is perfectly fine and produce a file. The file is in plain text and is formatted like how (My expect results should look like this.) is formatted. However when I try to send my file to my email the formatting is completly wrong.
The line of code I am using to send my email.
cat ReportEmail | mail -s 'Report' bob#aol.com
The result I am getting on my email.
30129 22.65 253
96187 72.32 294
109525 82.35 295
10235 7.7 105
5906 4.44 106
76096 57.22 251
My expect results should look like this.
30129 22.65 253
96187 72.32 294
109525 82.35 295
10235 7.7 105
5906 4.44 106
76096 57.22 251
Your source file achieves the column alignment by using a combination of tabs and spaces. The width assigned to a tab, however, can vary from program to program. Widths of 4, 5, or 8 spaces, for example, are common. If you want consistent formatting in plain text from one viewer to the next, use only spaces.
As a workaround, you can expand the the tabs to spaces before passing the file to mail using the expand utility:
expand -t 8 ReportEmail.txt | mail -s 'Report' bob#aol.com
The option -t 8 tells expand to treat tabs as 8 spaces wide. Change the 8 to whatever number consistently makes the format in ReportEmail.txt work properly.

Change Data Capture in delimited files

There are two tab delimited files (file1, file2) with same number and structure of records but with different values for columns.
Daily we get another file (newfile) with same number and structure of records but with some changes in column values.
Compare this file (newfile) with two files (file1, file2) and update the records in them with changed records, keeping unchanged records intact.
Before applying changes:
file1
11 aaaa
22 bbbb
33 cccc
file2
11 bbbb
22 aaaa
33 cccc
newfile
11 aaaa
22 eeee
33 ffff
After applying changes:
file1
11 aaaa
22 eeee
33 ffff
file2
11 aaaa
22 eeee
33 ffff
What could be the easy and most efficient solution? Unix shell scripting? The files are huge containing millions of records, can a shell script be efficient solution in this case?
Daily we get another file (newfile) with same number and structure of records but with
some changes in column values.
This sounds to me like a perfect case for git. With git you can commit the current file as it is.
Then as you get new "versions" of the file, you can simply replace the old version with the new one, and commit again. The best part is each time you make a commit git will record the changes from file to file, giving you access to the entire history of the file.

Resources