bash - remove multiple different lines of text from a text file - macos

I'm working with lots of log files, and most log files have lots of repeating strings that are logged multiple times.
To make the logs easily viewable for others who don't have much to do with such things (for myself also), i've wanted to make a script that rips out some text lines that can cause a 'false alarm' to other people. ("Hey admin, i have these errors here multiple times"; > "Sigh, these errors don't mean anything" kind of way)
Is there some bash code with grep, cat or awk that can get rid of lots of different text lines, without having to go through the document over and over again for each line to be removed? (basically remove all garbage lines in one swoop)
Example, i'll mark the lines that i want removed in bold:
One thing I don't know why
It doesn’t even matter how hard you try
Keep that in mind, I designed this
rhyme
To explain in due time
All I know
time is a valuable thing
Watch it fly by as the pendulum swings
Watch it count down to the end of the
day
The clock ticks life away
It’s so unreal
Didn’t look out below
Watch the time go right out the window
Trying to hold on but didn’t even know
Wasted it all just to
Watch you go
Sorry about the Linkin Park Lyrics, listening to the Radio while trying to solve a problem gives some bad examples sometimes :P
Are all these lines removable in one command?
Many thanks if somebody knows how.

grep -v "<string1>\|<string2>\|<stringN>" /path/to/file

It removes the lines provided in not_wanted array.
#!/bin/bash
exec < example.txt
not_wanted[0]="It doesn’t even matter how hard you try"
not_wanted[1]="time is a valuable thing"
not_wanted[2]="The clock ticks life away"
not_wanted[3]="It’s so unreal"
not_wanted[4]="Trying to hold on but didn’t even know"
while read line; do
for i in "${not_wanted[#]}"; do
if [ "$line" == "$i" ]; then unset line; break; fi
done
if [ "$line" ]; then echo "$line"; fi
done

Put the lines you don't want in a file, then
grep -v -f not.wanted filename > smaller.file

Related

Extracting lines with specific character count

I have a python script that is pulling URLs from pastebin.com/archive, which has links to pastes (which have 8 random digits after pastbin.com in the url). My current output is a .txt with the below data in it, I only want the links to pastes present (Example: http://pastebin.com///Y5JhyKQT) and not links to other pages such as pastebin.com/tools). This is so I can set wget to go pull each individual paste.
The only way I can think of doing this is writing a bash script to count the number of characters in each line and only keep lines with 30 characters exactly (this is the length of the URLs linking to pastes).
I have no idea how I'd go about implementing something like this using grep or awk, perhaps using a while do loop? Any help would be appreciated!
http://pastebin.com///tools
http://pastebin.com//top.location.href
http://pastebin.com///trends
http://pastebin.com///Y5JhyKQT <<< I want to keep this
http://pastebin.com//=
http://pastebin.com///>
From the sample you posted it looks like all you need is:
grep -E '/[[:alnum:]]{8}$' file
or maybe:
grep -E '^.{30}$' file
If that doesn't work for you, explain why and provide a better sample.
This is the algorithm
Find all characters between new line characters or read one line at a time.
Count them or store them in variable and get its count. This is the length of your line.
Only process those lines that are exactly same count as you want.
In python there is both functions character count of string and reading line as well.
#!/usr/bin/env zsh
while read aline
do
if [[ ${#aline} == 30 ]]; then
#do something
fi
done
This is documented in the bash man pages under the "Parameter Expansion" section.
EDIT=this solution is zsh-only

use sed with for loop to delete lines from text file

I am essentially trying to use sed to remove a few lines within a text document. To clean it up. But I'm not getting it right at all. Missing something and I have no idea what...
#!/bin/bash
items[0]='X-Received:'
items[1]='Path:'
items[2]='NNTP-Posting-Date:'
items[3]='Organization:'
items[4]='MIME-Version:'
items[5]='References:'
items[6]='In-Reply-To:'
items[7]='Message-ID:'
items[8]='Lines:'
items[9]='X-Trace:'
items[10]='X-Complaints-To:'
items[11]='X-DMCA-Complaints-To:'
items[12]='X-Abuse-and-DMCA-Info:'
items[13]='X-Postfilter:'
items[14]='Bytes:'
items[15]='X-Original-Bytes:'
items[16]='Content-Type:'
items[17]='Content-Transfer-Encoding:'
items[18]='Xref:'
for f in "${items[#]}"; do
sed '/${f}/d' "$1"
done
What I am thinking, incorrectly it seems, is that I can setup a for loop to check each item in the array that I want removed from the text file. But it's simply not working. Any idea. Sure this is basic and simple and yet I can't figure it out.
Thanks,
Marek
Much better to create a single sed script, rather than generate 19 small scripts in sequence.
Fortunately, generating a script by joining the array elements is moderately easy in Bash:
regex=$(printf '\|%s' "${items[#]}")
regex=${regex#'\|'}
sed "/^$regex/d" "$1"
(Notice also the addition of ^ to the final regex -- I assume you only want to match at beginning of line.)
Properly, you should not delete any lines from the message body, so the script should leave anything after the first empty line alone:
sed "1,/^\$/!b;/$regex/d" "$1"
Add -i if you want in-place editing of the target file.

Bash - replace & write into same line

I have a bash script which get details from many servers. It works good but i want that the lines get updated and not get written new.
while [ true ] ; do
for i in $(seq ${#details[#]}); do
.... more code (irrelevant)
echo ${server[$i]}
echo $stat1
echo $stat2
echo $stat3
echo $stat4
done
done
How can i do, that all lines get constantly updated into same line?
I try with echo -ne but this makes that everything is in one long line.
I want that the line keep the place and just get updated with new value.
Would be great if somebody knows a trick.
Thank you!
UPDATE 1
#cbuckley:
Thanks for your answer, but its not working correctly. In this way with -ne i tryed it already. Result is (it always create new lines):
10.0.0.2
100310.0.0.1
72710.0.0.3
368310.0.0.2
100310.0.0.1
72710.0.0.3
Should be
10.0.0.1
17
1003
10.0.0.2
319
727
10.0.0.3
157
3683
values under IP should get updated constantly. (i think this should normaly work with -ne, but in my case it dont).
If you've already outputted across multiple lines, you can't remove those lines without clearing the screen. You have two options:
Using watch
You can write a script that outputs the stats once, and then use watch to repeatedly that script:
watch -n 10 ./script.sh # calls script every 10 seconds.
Clearing the screen
If that is not suitable, you'll need to clear the screen yourself:
while [ true ] ; do
clear # clear the screen
for i in $(seq ${#details[#]}); do
# ...
done
sleep 10 # don't update the screen too often
done
However, at this point, you've pretty much implemented a basic version of watch anyway.
You might want to try using sed with the -i option to edit a file 'in place' (i.e. to change the existing file instead of writing a new file)

Shell scripting - save to a file, so the file will always have the last 10 values added

I found myself quite stomped. I am trying to output data from a script to a file.
Altho I need to keep only the last 10 values, so the append won't work.
The main script returns one line; so I save it to a file. I use tail to get the last 10 lines and process them, but then I get to the point where the file is too big, due the fact that I continue to append lines to it (the script output a line every minute or so, which bring up the size of the log quite fast.
I would like to limit the number of writes that I do on that script, so I can always have only the last 10 lines, discarding the rest.
I have thought about different approaches, but they all involve a lot of activity, like create temp files, delete the original file and create a new file, with just the tail of the last 10 entry; but it feels so un-elegant and very amateurish.
Is there a quick and clean way to query a file, so I can add lines until I hit 10 lines, and then start to delete the lines in chronological order, and add the new ones on the bottom?
Maybe things are easier than what I think, and there is a simple solution that I cannot see.
Thanks!
In general, it is difficult to remove data from the start of a file. The only way to do it is to overwrite the file with the tail that you wish to keep. It isn't that ugly to write, though. One fairly reasonable hack is to do:
{ rm file; tail -9 > file; echo line 10 >> file; } < file
This will retain the last 9 lines and add a 10th line. There is a lot of redundancy, so you might like to do something like:
append() { test -f $1 && { rm $1; tail -9 > $1; } < $1; cat >> $1; }
And then invoke it as:
echo 'the new 10th line' | append file
Please note that this hack of using redirecting input to the same file as the later output is a bit fragile and obscure. It is entirely possible for the script to be interrupted and delete the file! It would be safer and more maintainable to explicitly use a temporary file.

Is this (simple) for loop doing what I want it to?

I have pretty much no experience with cygwin & UNIX but need to use it for extracting a large set of data from a even larger set of files...
I had some help yesterday to do this short script, but (after running for ~7-8 hours) the script simply wrote to the same output file 22 times. Atleast that's what I think happened.
I've now changed the code to this (see below) but it would be really awesome if someone who knows how this is done properly could tell me if it's likely to work before I waste another 8 hours...
for chr in {1..22}
do
zcat /cygdrive/g/data/really_long_filename$chr | sed '/^#/d' | cut -f1-3 >> db_to_rs_$chr
done
I want it to read file 1..22, remove rows starting with #, and send columns 1 to 3 to a file ending with the same number 1..22
yesterday the last part was just ...-f1-3 >> db_to_rs which I suspect just rewrote that file 22 times?
Help is much appreciated
~L
Yes, the code would work as expected.
When the command ended in ...-f1-3 >> db_to_rs, it essentially appended all the output to the file db_to_rs.
Saying ... >> db_to_rs_$chr would create filenames ending in {1 .. 22}.
However, note that saying >> would append the output to a file. So if db_to_rs1 already exists, the output would be appended. If you want to create a new file instead, say > instead of >>.

Resources