Removing text between two strings over multiple lines - bash

I have a log file that generates timestamp and command logging on separate lines. I'd like to strip out the timestamp and save just the "user: command" list. I've tried several permutations of sed to replace or delete the data between strings, but it always oversteps the bounds of the command. Current log output similar to:
USER 001
6:32am
USER 001
random bash command or output 001
USER 002
7:41am
USER 002
random bash command or output 002
USER 001
7:43am
USER 001
random bash command or output 003
USER 002
7:43am
USER 002
random bash command or output 004
Desired output:
USER 001
random bash command or output 001
USER 002
random bash command or output 002
USER 001
random bash command or output 003
USER 002
random bash command or output 004

Looks like this will do:
sed -ri 'N; /^.*\n[0-9]/d'
(Assumes GNU sed.)
It processes the file two lines at a time.
On each cycle:
sed automatically reads one line into the pattern space.
The N command appends to the pattern space a newline and the next line.
If the pattern space matches "any text, newline, digit", then delete
it (and therefore don't auto-print it).
Otherwise, auto-print it.

If file is in same format all time, you can just remove the line like this:
awk 'NR%4!=1 && NR%4!=2' file
USER 001
random bash command or output 001
USER 002
random bash command or output 002
USER 001
random bash command or output 003
USER 002
random bash command or output 004
Or you can use it like this:
awk '!(NR%4==1 || NR%4==2)' file

Related

Shell script: Insert multiple lines into a file ONLY after a specified pattern appears for the FIRST time. (The pattern appears multiple times)

I want to insert multiple lines into a file using shell script. Let us consider my original file: original.txt:
aaa
bbb
ccc
aaa
bbb
ccc
aaa
bbb
ccc
.
.
.
and my insert file: toinsert.txt
111
222
333
Now I have to insert the three lines from the 'toinsert.txt' file ONLY after the line 'ccc' appears for the FIRST time in the 'original.txt' file. Note: the 'ccc' pattern appears more than one time in my 'original.txt' file. After inserting ONLY after the pattern appears for the FIRST time, my file should change like this:
aaa
bbb
ccc
111
222
333
aaa
bbb
ccc
aaa
bbb
ccc
.
.
.
I should do the above insertion using a shell script. Can someone help me?
Note2: I found a similar case, with a partial solution:
sed -i -e '/ccc/r toinsert.txt' original.txt
which actually does the insertion multiple times (for every time the ccc pattern shows up).
Use ed, not sed, to edit files:
printf "%s\n" "/ccc/r toinsert.txt" w | ed -s original.txt
It inserts the contents of the other file after the first line containing ccc, but unlike your sed version, only after the first.
This might work for you (GNU sed):
sed '0,/ccc/!b;/ccc/r insertFile' file
Use a range:
If the current line is in the range following the first occurrence of ccc, break from further processing and implicitly print as usual.
Otherwise if the current line does contain ccc,insert lines from insertFile.
N.B. This uses the address 0 which allows the regexp to occur on line 1 and is specific to GNU sed.
or:
sed -e '/ccc/!b;r insertFile' -e ':a;n;ba' file
Use a loop:
If a line does not contain ccc, no further processing and print as usual.
Otherwise, insert lines from insertFile and then using a loop, fetch/print the remaining lines until the end of the file.
N.B. The r command insists on being delimited from other sed commands by a newline. The -e option simulates this effect and thus the sed commands are split across two -e options.
or:
sed 'x;/./{x;b};x;/ccc/!b;h;r insertFile' file
Use a flag:
If the hold space is not empty (the flag has already been set), no further processing and print as usual.
Otherwise, if the line does not contain ccc, no further processing and print as usual.
Otherwise, copy the current line to the hold space (set the flag) and insert lines from insertFile.
N.B. In all cases the r command inserts lines from insertFile after the current line is printed.

Reverse the number in a filename

So it seems that I have made a error when trying to make more readable filestructure. I accidentally named files in the wrong order, and now I need to correct it.
The files are named:
001 - file number 1.jpg
001 - file number 2.mp3
002 - file number 3.jpg
002 - file number 4.mp3
003 - file number 5.jpg
003 - file number 6.mp3
and so on up to I think 800 files in one folder, and 300 in another, its kind of a mess.
The correct order should be:
003 - file number 1.jpg
003 - file number 2.mp3
002 - file number 3.jpg
002 - file number 4.mp3
001 - file number 5.jpg
001 - file number 6.mp3
How can I rename all files, and change the number so it goes the reversed order?
Not sure how well will this scale on large amount of files but here it is.
#!/usr/bin/env bash
shopt -s extglob nullglob
file=(*.#(mp3|jpg))
mapfile -t -d '' files < <(printf '%s\0' "${file[#]}")
mapfile -t -d '' renamed < <(paste -zd ' ' <(printf '%s\0' "${files[#]%% *}" | sort -rz ) <(printf '%s\0' "${files[#]#* }"))
for i in "${!files[#]}"; do
echo mv -v "${files[$i]}" "${renamed[$i]}"
done
Output
mv -v 001 - file number 1.jpg 003 - file number 1.jpg
mv -v 001 - file number 2.mp3 003 - file number 2.mp3
mv -v 002 - file number 3.jpg 002 - file number 3.jpg
mv -v 002 - file number 4.mp3 002 - file number 4.mp3
mv -v 003 - file number 5.jpg 001 - file number 5.jpg
mv -v 003 - file number 6.mp3 001 - file number 6.mp3
It will spit an error message like what #oguz posted.
bash4+ only because of mapfile
Also the -z on both paste and sort might be GNU only.
Another option if you have the utility vidir which you can use your favorite text editor to rename your files. The caveat is it does not support file/path names with newlines.
vidir /path/to/files
Using your favorite text editor
EDITOR=kate vidir /path/to/files
If it is the first time you will use vidir then I suggest you try it on some test files first. The first column is just the increment of the files/directories don't touch it.
If the reversal is to be performed in your locale's collation order:
name=(*.{jpg,mp3})
pfix=("${name[#]%% *}")
for ((i=0,j=${#name[#]}-1; j>=0; i++,j--)); do
echo mv "${name[i]}" "${pfix[j]} ${name[i]#* }"
done
Populates an array with filenames and another with prefixes; loops through both in opposite directions and re-pairs them in reverse order.
Drop echo if its output looks good. Might complain that target and the source is the same just once, but that won't cause any harm.
Use param -r in sort → sort -r file

Comparing two files with accented characters (Mac OS / Terminal)

Goal: create a file listing all lines not found in either file
OS: Mac OS X, using Terminal
Problem: lines contain accented characters (UTF-8) and comparison doesn't seem to work
I've used the following command for comparing both files:
comm -13 <(sort file1) <(sort file2) > file3
That command works fine except with lines in files containing accented characters. Would you have any solutions?
One non-optimal thing I've tried is to replace all accented characters with non-accented ones with sed -i but that didn't seem to work on one of my two files, so I assume one file is weirdly encoded (in fact, ü is displayed u¨ when opening the file in TextMate but correctly as ü in TextEdit – I had generated that file using find Photos/ -type f > list_photos.txt to scroll through all filenames which contain accented characters... maybe I should add another parametre to the find command in the first place?). Any thoughts about this as well?
Many thanks.
Update:
I manually created text files with accented characters. The comm command worked without requiring LC_ALL. So the issue must be with the output of filenames into a text file (find command).
Test file A:
Istanbul 001 Mosquée Süleymaniye.JPG
Istanbul 002 Mosquée Süleymaniye.JPG
Test file B:
Istanbul 001 Mosquée Süleymaniye.JPG
Istanbul 002 Mosquée Süleymaniye - Angle.JPG
Istanbul 003 Ville.JPG
Comparison produces expected results. But it's when I create automatically those files, I instead get Su¨leymaniye for instance in the text file. When I don't generate an output file, the terminal however shows me the correct word Süleymaniye.
Many, many thanks for looking into it. Much appreciated.
You need to set the ENVIRONMENT for comm.
ENVIRONMENT
The LANG, LC_ALL, LC_COLLATE, and LC_CTYPE environment variables affect
the execution of comm as described in environ(7).
For example:
LC_COLLATE=C comm -13 <(sort file1) <(sort file2) > file3
or
LC_ALL=C comm -13 <(sort file1) <(sort file2) > file3

Grep based on pattern

Sample Text:
This is a test
This is aaaa test
This is aaa test
This is test a
This aa is test
I have just started learning unix commands like grep, awk and sed and have a quick question. If my text file contains the above text how can I just print out lines that use the letter ‘a’ 2 or fewer times.
I tried using awk, but don’t understand the syntax to add up all the instances of ‘a’ and only print the lines that have ‘a’ 2 or fewer times. I understand comparing numbers based on columns like awk ‘$1 <=2’ but don’t know how to use that with characters as well. Any help would be appreciated.
Essentially it should print out:
This is a test
This is test a
This aa is test
For Clarity: I don't want to remove the extra As, but rather only print the lines that contain two or fewer As.
Using awk
awk '!/aaa+/' file
This is a test
This is test a
This aa is test
Do not print lines with three or more a together.
Same with sed
sed '/aaa\+/d' file
This is a test
This is test a
This aa is test
Default for sed is to print all line. /aaa\+/d tells sed to delete lines with 3 or more a
like this?
kent$ grep -v 'aaa\+' file
This is a test
This is test a
This aa is test
Update
I just saw the comment, if your requirement is anywhere on the line, no matter consecutive or not, see the example (with awk):
kent$ cat f
1a a
2a
3
4a a a aa
5aaaaaaaaaa
kent$ awk 'gsub(/a/,"a")<3' f
1a a
2a
3
without gsub:
kent$ awk -F'a' 'NF<4' f
1a a
2a
3

how do I paste text to a line by line text filter like awk, without having stdin echo to the screen?

I have a text in an email on a windows box that looks something like this:
100 some random text
101 some more random text
102 lots of random text, all different
103 lots of random text, all the same
I want to extract the numbers, i.e. the first word on each line. I've got a terminal running bash open on my Linux box...
If these were in a text file, I would do this:
awk '{print $1}' mytextfile.txt
I would like to paste these in, and get my numbers out, without creating a temp file.
my naive first attempt looked like this:
$ awk '{print $1}'
100 some random text
100
101 some more random text
101
102 lots of random text, all different
103 lots of random text, all the same
102
103
The buffering of stdin and stdout make a hash of this. I wouldn't mind if stdin all printed first, followed by all of stdout; this is what would happen if I were to paste into 'sort' for example, but awk and sed are a different story.
a little more thought gave me this:
open two terminals. Create a fifo file. Read from the fifo on one terminal, write to it on another.
This does in fact work, but I'm lazy. I don't want to open a second terminal. Is there a way in the shell that I can hide the text echoed to the screen when I'm passing it in to a pipe, so that I paste this:
100 some random text
101 some more random text
102 lots of random text, all different
103 lots of random text, all the same
but see this?
$ awk '{print $1}'
100
101
102
103
You could use a here document. I tried the following and it worked:
$ awk '{print $1}' << END
> 100 some random text
> 101 some more random text
> 102 lots of random text, all different
> 103 lots of random text, all the same
> END
100
101
102
103
I'll try to explain what I typed:
awk '{print $1}' << END (RETURN)
(PASTE) (RETURN)
END (RETURN)
If that makes sense.
The pasted text still shows up on stdout, but the results you care about all end up afterwards, so you can easily grab it out. Make sure you pick something that's not in your text to replace the END in my example!
Perhaps the best way is to use your shell here-docs:
awk '{print $1}' <<EOF
100 some random text
101 some more random text
102 lots of random text, all different
103 lots of random text, all the same
EOF
100
101
102
103
Alternatively, you can use the END block:
awk '{a[NR]=$1} END{for (i=1;i<=NR;i++) print a[i]}'
100 some random text
101 some more random text
102 lots of random text, all different
103 lots of random text, all the same
^d
100
101
102
103
If you are set up like I am, running Windows, Xming and PuTTY, you can do this:
$ xclip -i[enter]
[right-click-paste]
[ctrl-d]
$ xclip -o | awk '{print $1}'
It requires that you have X-forwarding set up and you've installed xclip (xsel -i and xsel -o can be used instead).
Something like
stty -echo ; awk '{print $1}' ; stty echo
may help you.

Resources