SED bash script Assistance - bash

I'm trying to follow who my friend is following (all 1,522 of them)
and a got a text file with from his twitter page and I want to see just the last word of a line that begins with #.
Example:
Podcaster, broadcaster and tech pundit. The Tech Guy on the Premiere Radio
Networks. Live at live.twit.tv For my link feed follow #links_for_twit
(Line-wrapped to remove hateful horizontal scrollbar.)
I want that to turn into #links_for_twit.

Use awk instead:
awk '$NF ~ /^#/ {print $NF}'

You mean, like:
grep -o '#[a-zA-Z_0-9]*$' tweets.txt
?

If you're wanting to use sed, try this:
sed -n 's/.*\(#.*\)/\1/p'
-n: don't print anything unless asked
s/.*\(#.*\): capture everything after the last '#' in the line
/\1/: replace the whole line with the captured bit
p: print if a substitution was made
Hope that helps
EDIT: I just saw the complaint below about email addresses. you can add \s just before the # to ensure there's a space: sed -n 's/.*\s(#.*\)/\1/p'

If you have GNU grep, you could use a Perl-flavoured regex to ensure the # is at the start of a word:
grep -Po '(?<=^|\s)#\w+' filename

Related

How to insert a specific character at a specific line of a file using sed or awk?

I want to use command to edit the specific line of a file instead of using vi. This is the thing. If there is a # starting with the line, then replace the # to make it uncomment. Otherwise, add the # to make it comment. I'd like to use sed or awk. But it won't work as expected.
This is the file.
what are you doing now?
what are you gonna do? stab me?
this is interesting.
This is a test.
go big
don't be rude.
For example, I just want to add the # at the beginning of the the line 4 This is a test if it doesn't start with #. And if it starts with #, then remove the #.
I've already tried via sed & gawk (awk)
gawk -i inplace '$1!="#" {print "#",$0;next};{print substr($0,3,length-1)}' file
sed -i /test/s/^#// file # make it uncomment
sed -i /test/s/^/#/ file # make it comment
I don't know how to use if else to make sed work. I could only make it with a single command, then use another regex to make the opposite.
Using gawk, it works as the main line. But it will mess the rest of the code up.
This might work for you (GNU sed):
sed '4{s/^/#/;s/^##//}' file
On line 4 prepend a # to the line and if there 2 #'s remove them.
Could also be written:
sed '4s/^/#/;4s/^##//' file
This will remove # from the start of line 4 or add it if it wasn't already there:
sed -i '4s/^#/\n/; 4s/^[^\n]/#&/; 4s/^\n//' File
The above assume GNU sed. If you have BSD/MacOS sed, some minor changes will be required.
When sed reads a new line, the one thing that we know for sure about the new line is that it does not contain \n. (If it did, it would be two lines, not one.) Using this knowledge, the script works by:
s/^#/\n/
If the fourth line starts with #, replace # with \n. (The \n serves as a notice that the line had originally been commented out.)
4s/^[^\n]/#&/
If the fourth line now starts with anything other than \n (meaning that it was not originally commented), put a # in front.
4s/^\n//
If the fourth line now starts with \n, remove it.
Alternative: Modifying lines that contain test
To comment/uncomment lines that contain test:
sed '/test/{s/^#/\n/; s/^[^\n]/#&/; s/^\n//}' File
Alternative: using awk
The exact same logic can be applied using awk. If we want to comment/uncomment line 4:
awk 'NR==4 {sub(/^#/, "\n"); sub(/^[^\n]/, "#&"); sub(/^\n/, "")} 1' File
If we want to comment/uncomment any line containing test:
awk '/test/ {sub(/^#/, "\n"); sub(/^[^\n]/, "#&"); sub(/^\n/, "")} 1' File
Alternative: using sed but without newlines
To comment/uncomment any line containing test:
sed '/test/{s/^#//; t; s/^/#/; }' File
How it works:
s/^#//; t
If the line begins with #, then remove it.
t tells sed that, if the substitution succeeded, then it should skip the rest of the commands.
s/^/#/
If we get to this command, that means that the substitution did not succeed (meaning the line was not originally commented out), so we insert #.
If you end up on a system with a sed that doesn't support in-place editing, you can fall back to its uncle ed:
ed -s file 2>/dev/null <<EOF
4 s/^/#/
s/^##//
w
q
EOF
(Standard error is redirected to /dev/null because in ed, unlike sed, it's an error if s doesn't replace anything and a question mark is thus printed to standard error.)
$ awk 'NR==4{$0=(sub(/^#/,"") ? "" : "#") $0} 1' file
what are you doing now?
what are you gonna do? stab me?
this is interesting.
#This is a test.
go big
don't be rude.
$ awk 'NR==4{$0=(sub(/^#/,"") ? "" : "#") $0} 1' file |
awk 'NR==4{$0=(sub(/^#/,"") ? "" : "#") $0} 1'
what are you doing now?
what are you gonna do? stab me?
this is interesting.
This is a test.
go big
don't be rude.

Adding a new line to a text file after 5 occurrences of a comma in Bash

I have a text file that is basically one giant excel file on one line in a text file. An example would be like this:
Name,Age,Year,Michael,27,2018,Carl,19,2018
I need to change the third occurance of a comma into a new line so that I get
Name,Age,Year
Michael,27,2018
Carl,19,2018
Please let me know if that is too ambiguous and as always thank you in advance for all the help!
With Gnu sed:
sed -E 's/(([^,]*,){2}[^,]*),/\1\n/g'
To change the number of fields per line, change {2} to one less than the number of fields. For example, to change every fifth comma (as in the title of your question), you would use:
sed -E 's/(([^,]*,){4}[^,]*),/\1\n/g'
In the regular expression, [^,]*, is "zero or more characters other than , followed by a ,; in other words, it is a single comma-delimited field. This won't work if the fields are quoted strings with internal commas or newlines.
Regardless of what Linux's man sed says, the -E flag is an extension to Posix sed, which causes sed to use extended regular expressions (EREs) rather than basic regular expressions (see man 7 regex). -E also works on BSD sed, used by default on Mac OS X. (Thanks to #EdMorton for the note.)
With GNU awk for multi-char RS:
$ awk -v RS='[,\n]' '{ORS=(NR%3 ? "," : "\n")} 1' file
Name,Age,Year
Michael,27,2018
Carl,19,2018
With any awk:
$ awk -v RS=',' '{sub(/\n$/,""); ORS=(NR%3 ? "," : "\n")} 1' file
Name,Age,Year
Michael,27,2018
Carl,19,2018
Try this:
$ cat /tmp/22.txt
Name,Age,Year,Michael,27,2018,Carl,19,2018,Nooka,35,1945,Name1,11,19811
$ echo "Name,Age,Year"; grep -o "[a-zA-Z][a-zA-Z0-9]*,[1-9][0-9]*,[1-9][0-9]\{3\}" /tmp/22.txt
Michael,27,2018
Carl,19,2018
Nooka,35,1945
Name1,11,1981
Or, ,[1-9][0-9]\{3\} if you don't want to put [0-9] 3 more times for the YYYY part.
PS: This solution will give you only YYYY for the year (even if the data for YYYY is 19811 (typo mistakes if any), you'll still get 1981
You are looking for 3 fragments, each without a comma and separated by a comma.
The last fields can give problems (not ending with a comma and mayby only two fields.
The next command looks fine.
grep -Eo "([^,]*[,]{0,1}){0,3}" inputfile
This might work for you (GNU sed):
sed 's/,/\n/3;P;D' file
Replace every third , with a newline, print ,delete the first line and repeat.

How can extract word between special character and other words

I am trying to find a way, how to extract a word between special character and other words.
Example of the text:
description "CST 500M TEST/VPNGW/11040 X {} // test"
description "test2-VPNGW-110642 -VPNGW"
I am trying to achieve result like,only the word including VPNGW:
TEST/VPNGW/11040
test2-VPNGW-110642
I tried with grep and AWK, but looks like my knowledge is not so far enough.
The way to print with awk '{$1=""; $2=""; ... is not working due to the whole word is not always on the same position.
Thanks for the help!
With grep you can output only the part of the string that matches the regex:
grep -o '[^ "]\+VPNGW[^ "]\+' file.name
You could try something like:
grep -Eoi 'test.*[0-9]'
Of course this would be greedy and if there is another number after the ones in the required string it will grab up to there. Normally I would suggest an inverted test to stop at the thing you don't want:
grep -Eoi 'test[^ ]+'
The problem with this is like in your first example, there is more than one occurrence of the string 'test' and so the output for the first example is:
TEST/VPNGW/11040
test"
Of course knowing what your real data looks like you can make your own decision on what might best suit
Uou could go with the perl regex machine in grep and use a look-ahead:
grep -Eoi 'test[^ ]+(?= )'
Again though, if you have the string 'test' somewhere else on the line followed by a single space, this will still not work as desired.
Lastly, awk can do the job but you would need to cycle through each item or set RS to white space:
Option 1:
awk '{for(i=1;i<=NF;i++)if(tolower($i) ~ /test.*[0-9]/)print $i}'
Option 2:
awk 'tolower($i) ~ /test.*[0-9]/' RS="[[:space:]]+"
awk '/test2/{sub(/"/,"")}$0{print $4}/test2/{print $2}' file
TEST/VPNGW/11040
test2-VPNGW-110642

Sed is not replacing all occurrences of pattern

I've got a the following variable LINES with the format date;album;song;duration;singer;author;genre.
August 2013;MDNA;Falling Free;00:31:40;Madonna;Madonna;Pop
August 2013;MDNA;I don't give a;00:45:40;Madonna;Madonna;Pop
August 2013;MDNA;I'm a sinner;01:00:29;Madonna;Madonna;Pop
August 2013;MDNA;Give Me All Your Luvin';01:15:02;Madonna;Madonna;Pop
I want to output author-song, so I made this script:
echo $LINES | sed s_"^[^;]*;[^;]*;\([^;]*\);[^;]*;[^;]*;\([^;]*\)"_"\2-\1"_g
The desired output is:
Madonna-Falling Free
Madonna-I don't give a
Madonna-I'm a sinner
Madonna-Give Me All Your Luvin'
However, I am getting this:
Madonna-Falling Free;Madonna;Pop August 2013;MDNA;I don't give a;00:45:40;Madonna;Madonna;Pop August 2013;MDNA;I'm a sinner;01:00:29;Madonna;Madonna;Pop August 2013;MDNA;Give Me All Your Luvin';01:15:02;Madonna;Madonna;Pop
Why?
EDIT: I need to use sed.
When I run your sed script on your input, I get this output:
Madonna-Falling Free;Pop
Madonna-I don't give a;Pop
Madonna-I'm a sinner;Pop
Madonna-Give Me All Your Luvin';Pop
which is fine except for the extra ;Pop - you just need to add .*$ to the end of your regex so that the entire line is replaced.
Based on your reported output, I'm guessing your input file is using a different newline convention from what sed expects.
In any case, this is a pretty silly thing to use sed for. Much better with awk, for instance:
awk 'BEGIN {FS=";";OFS="-"} {print $5,$3}'
or, slightly more tersely,
awk -F\; -vOFS=- '{print $5,$3}'
If you want sed to see more than one line of input, you must quote the variable to echo:
echo "$LINES" | sed ...
Note that I'm not even going to try to evaluate the correctness of your sed script; using sed here is a travesty, given that awk is so much better suited to the task.
It looks like sed is viewing your entire sample text as a single line. So it is performing the operation requested and then leaving the rest unchanged.
I would look into the newline issue first. How are you populating $LINES?
You should also add to the pattern that seventh field in your input (genre), so that the expression actually does consume all of the text that you want it to. And perhaps anchor the end of the pattern on $ or \b (word boundary) or \s (a spacey character) or \n (newline).
If your format is absolutely permanent, just try below:
echo $line | sed 's#.*;.*;\(.*\);.*;.*;\(.*\);.*#\2-\1#'

Remove nth character from middle of string using Shell

I've been searching google for ever, and I cannot find an example of how to do this. I also do not grasp the concept of how to construct a regular expression for SED, so I was hoping someone could explain this to me.
I'm running a bash script against a file full of lines of text that look like this: 2222,H,73.82,04,07,2012
and I need to make them all look like this: 2222,H,73.82,04072012
I need to remove the last two commas, which are the 16th and 19th characters in the line.
Can someone tell me how to do that? I was going to use colrm, which is blessedly simple, but i can't seem to get that installed in CYGWIN. Please and thank you!
I'd use awk for this:
awk -F',' -v OFS=',' '{ print $1, $2, $3, $4$5$6 }' inputfile
This takes a CSV file and prints the first, second and third fields, each followed by the output field separator (",") and then the fourth, fifth and sixth fields concatenated.
Personally I find this easier to read and maintain than regular expression-based solutions in sed and it will cope well if any of your columns get wider (or narrower!).
This will work on any string and will remove only the last 2 commas:
sed -e 's/\(.*\),\([^,]*\),\([^,]*\)$/\1\2\3/' infile.txt
Note that in my sed variant I have to escape parenthesis, YMMV.
I also do not grasp the concept of how to construct a regular
expression for SED, so I was hoping someone could explain this to me.
The basic notation that people are telling you here is: s/PATTERN/REPLACEMENT/
Your PATTERN is a regular expression, which may contain parts that are in brackets. Those parts can then be referred to in the REPLACEMENT part of the command. For example:
> echo "aabbcc" | sed 's/\(..\)\(..\)\(..\)/\2\3\1/'
bbccaa
Note that in the version of sed I'm using defaults to the "basic" RE dialect, where the brackets in expressions need to be escaped. You can do the same thing in the "extended" dialect:
> echo "aabbcc" | sed -E 's/(..)(..)(..)/\2\3\1/'
bbccaa
(In GNU sed (which you'd find in Linux), you can get the same results with the -r options instead of -E. I'm using OS X.)
I should say that for your task, I would definitely follow Johnsyweb's advice and use awk instead of sed. Much easier to understand. :)
It should work :
sed -e 's~,~~4g' file.txt
remove 4th and next commas
echo "2222,H,73.82,04,07,2012" | sed -r 's/(.{15}).(..)./\1\2/'
Take 15 chars, drop one, take 2, drop one.
sed -e 's/(..),(..),(....)$/\1\2\3/' myfile.txt

Resources