how to remove lines containing only numbers, special characters or blanks after a delimiter - bash

The following code:
#!/bin/bash
osascript -e \
'tell application "Google_Chrome" to tell tab 1 of window 1 \
set t to execute javascript "document.body.innerText" \
end tell' | grep ':'
Results in output:
line1:blah blah
line2:blah 123
line3:
line4:[456] blah
Line5:blah blah
line6:[789]
line 7:
The desired output:
line1:blah blah
line2:blah 123
line4:[456] blah
I can use cut -d : -f1 to get just the left side and cut -d : -f2 to get just right side. But I can't seem to figure out how to remove blank lines or lines with only numbers and/or special characters while still preserving the structure of data.
To the best of my knowledge, what I'm trying to achieve follows this specific set of rules:
Every valid line of output contains a : (but not all lines containing : are valid)
No spaces, special characters or capital letters permitted to the left of :
Only lowercase letters, numbers and underscores [a-z] [0-9] and _ permitted to the left of :
Any line not containing letters [a-z] to right of : should be discarded. (case is not important)
Any ideas how to accomplish this?

Replace your grep with this:
... | grep -E '^[a-z0-9_]+:[^a-zA-Z]*[a-zA-Z]'
line1:blah blah
line2:blah 123
line4:[456] blah
This will meet your requirements of allowing only [a-z0-9_] characters on left of : and at least one of [a-zA-Z] on RHS of :.

Related

In bash, why does it output a right bracket as a value in when replacing?

So I was looking into tr and I was playing around with the following command:
echo "test 123 new LINE" | tr -c 'A-Za-z' '[\n*]' vs echo "test 123 new LINE" | tr -c 'A-Za-z' '[\n]'. These are the different outputs:
> echo "test 123 new LINE" | tr -c 'A-Za-z' '[\n*]'
test
new
LINE
> echo "test 123 new LINE" | tr -c 'A-Za-z' '[\n]'
test]]]]]new]LINE]
Without the addition of the wildcard, it appears to be replacing each new line with a right bracket character. Taking a look at the man page (https://linuxcommand.org/lc3_man_pages/tr1.html), it says that for the argument [CHAR*], it "in SET2, copies of CHAR until length of SET1". So clearly it still replaces characters in SET1 based on SET2 but where is it getting the right bracket from?
Nothing to do with my job so no need to tell me about sed or awk, just came across this and was curious.
In your second command, the replacement set is not in one of the special formats
[CHAR*]
[CHAR*REPEAT]
[:<keyword>:]
[=CHAR=]
So it doesn't get any special treatment and the square brackets are treated literally. So the first two non-alphabetic characters are replaced with [ and \n, respectively, and all other characters are replaced with ] (because the replacement set is extended by repeating the last character).

Sed to add color to column for a specific pattern?

I figured out how to colorize column 3 in green like this:
green=$'\033[1;32m';off=$'\e[m';echo -e "num co1umn1 column2 column3\n=== === === ===\n1 this is me\n2 that is you"|column -t|sed "s/[^[:blank:]]\{1,\}/$green&$off/3";unset green off
CLI result
How do I need to alter my sed command to colorize the pattern 'is' only within column 3 so that the output becomes:
Wanted result
If you want to color the whole word is, you can use (with GNU sed):
sed "s/\bis\b/$green&$off/"
sed "s/\<is\>/$green&$off/"
Here, \b is a word boundary, \< is a leading word boundary and \> is a trailing word boundary.
Else, you can tell sed to start looking for matches from the third line:
sed "3,$ s/[^[:blank:]]\{1,\}/$green&$off/3"
Output:
One way to do this is to ignore the first two lines of the output in sed:
sed "1,2 ! s/[^[:blank:]]\{1,\}/$green&$off/3";
Using sed
$ ... | sed "/^[[:digit:]]/s/\(\([^ ]* \+\)\{2\}\)\([^ ]*\)/\1$green\3$off/"
Modifying the echo to cover a couple other instances of is ...
$ echo -e "num co1umn1 column2 column3\n=== === === ===\n1 is is me\n2 that isn't you" | column -t
num co1umn1 column2 column3
=== === === ===
1 is is me # only colorize the 2nd occurrence of "is"
2 that isn't you # don't colorize "isn't" in 3rd column
Extending OP's current sed solution:
sed -r "3,$ s/[^[:blank:]]{1,}/XXX&XXX/3; s/XXXisXXX/${green}is${off}/; s/XXX//g"
Where:
3,$ - apply following sed scripts against line numbers 3-to-EOF (ie, skip 1st 2 lines)
first we offset the 3rd column values with XXX bookends (choose a set of characters that you know won't show up anywhere in the data)
then colorize XXXisXXX (removing the XXXs at the same time)
then remove any remaining XXX (from 3rd column in other rows)
This generates:

How to convert a line into camel case?

This picks all the text on single line after a pattern match, and converts it to camel case using non-alphanumeric as separator, remove the spaces at the beginning and at the end of the resulting string, (1) this don't replace if it has 2 consecutive non-alphanumeric chars, e.g "2, " in the below example, (2) is there a way to do everything using sed command instead of using grep, cut, sed and tr?
$ echo " hello
world
title: this is-the_test string with number 2, to-test CAMEL String
end! " | grep -o 'title:.*' | cut -f2 -d: | sed -r 's/([^[:alnum:]])([0-9a-zA-Z])/\U\2/g' | tr -d ' '
ThisIsTheTestStringWithNumber2,ToTestCAMELString
To answer your first question, change [^[:alnum:]] to [^[:alnum:]]+ to mach one ore more non-alnum chars.
You may combine all the commands into a GNU sed solution like
sed -En '/.*title: *(.*[[:alnum:]]).*/{s//\1/;s/([^[:alnum:]]+|^)([0-9a-zA-Z])/\U\2/gp}'
See the online demo
Details
-En - POSIX ERE syntax is on (E) and default line output supressed with n
/.*title: *(.*[[:alnum:]]).*/ - matches a line having title: capturing all after it up to the last alnum char into Group 1 and matching the rest of the line
{s//\1/;s/([^[:alnum:]]+|^)([0-9a-zA-Z])/\U\2/gp} - if the line is matched,
s//\1/ - remove all but Group 1 pattern (received above)
s/([^[:alnum:]]+|^)([0-9a-zA-Z])/\U\2/ - match and capture start of string or 1+ non-alnum chars into Group 1 (with ([^[:alnum:]]+|^)) and then capture an alnum char into Group 2 (with ([0-9a-zA-Z])) and replace with uppercased Group 2 contents (with \U\2).

Reverse four length of letters with sed in unix

How can I reverse a four length of letters with sed?
For example:
the year was 1815.
Reverse to:
the raey was 5181.
This is my attempt:
cat filename | sed's/\([a-z]*\) *\([a-z]*\)/\2, \1/'
But it does not work as I intended.
not sure it is possible to do it with GNU sed for all cases. If _ doesn't occur immediately before/after four letter words, you can use
sed -E 's/\b([a-z0-9])([a-z0-9])([a-z0-9])([a-z0-9])\b/\4\3\2\1/gi'
\b is word boundary, word definition being any alphabet or digit or underscore character. So \b will ensure to match only whole words not part of words
$ echo 'the year was 1815.' | sed -E 's/\b([a-z0-9])([a-z0-9])([a-z0-9])([a-z0-9])\b/\4\3\2\1/gi'
the raey was 5181.
$ echo 'two time five three six good' | sed -E 's/\b([a-z0-9])([a-z0-9])([a-z0-9])([a-z0-9])\b/\4\3\2\1/gi'
two emit evif three six doog
$ # but won't work if there are underscores around the words
$ echo '_good food' | sed -E 's/\b([a-z0-9])([a-z0-9])([a-z0-9])([a-z0-9])\b/\4\3\2\1/gi'
_good doof
tool with lookaround support would work for all cases
$ echo '_good food' | perl -pe 's/(?<![a-z0-9])([a-z0-9])([a-z0-9])([a-z0-9])([a-z0-9])(?!=[a-z0-9])/$4$3$2$1/gi'
_doog doof
(?<![a-z0-9]) and (?!=[a-z0-9]) are negative lookbehind and negative lookahead respectively
Can be shortened to
perl -pe 's/(?<![a-z0-9])[a-z0-9]{4}(?!=[a-z0-9])/reverse $&/gie'
which uses the e modifier to place Perl code in substitution section. This form is suitable to easily change length of words to be reversed
Possible shortest sed solution even if a four length of letters contains _s.
sed -r 's/\<(.)(.)(.)(.)\>/\4\3\2\1/g'
Following awk may help you in same. Tested this in GNU awk and only with provided sample Input_file
echo "the year was 1815." |
awk '
function reverse(val){
num=split(val, array,"");
i=array[num]=="."?num-1:num;
for(;i>q;i--){
var=var?var array[i]:array[i]
};
printf (array[num]=="."?var".":var);
var=""
}
{
for(j=1;j<=NF;j++){
printf("%s%s",j==NF||j==2?reverse($j):$j,j==NF?RS:FS)
}}'
This might work for you (GNU sed):
sed -r '/\<\w{4}\>/!b;s//\n&\n/g;s/^[^\n]/\n&/;:a;/\n\n/!s/(.*\n)([^\n])(.*\n)/\2\1\3/;ta;s/^([^\n]*)(.*)\n\n/\2\1/;ta;s/\n//' file
If there are no strings of the length required to reverse, bail out.
Prepend and append newlines to all required strings.
Insert a newline at the start of the pattern space (PS). The PS is divided into two parts, the first line will contain the current word being reversed. The remainder will contain the original line.
Each character of the word to be reversed is inserted at the front of the first line and removed from the original line. When all the characters in the word have been processed, the original word will have gone and only the bordering newlines will exist. These double newlines are then replaced by the word in the first line and the process is repeated until all words have been processed. Finally the newline introduced to separate the working line and the original is removed and the PS is printed.
N.B. This method may be used to reverse strings of varying string length i.e. by changing the first regexp strings of any number can be reversed. Also strings between two lengths may also be reversed e.g. /\<w{2,4}\>/ will change all words between 2 and 4 character length.
It's a recurrent problem so somebody created a bash command called "rev".
echo "$(echo the | rev) $(echo year | rev) $(echo was | rev) $(echo 1815 | rev)".
OR
echo "the year was 1815." | rev | tr ' ' '\n' | tac | tr '\n' ' '

Copy text from one line and create a new line with that next under it

I have a text file in which I want to find all of ID:= "abc123" when it finds that I want it to take that value of abc123 and create a new line and have a set string, newId:= "abc123 How can I do this within terminal?
I'd like to use bash, below are some examples, find the string '"ID": ", copy the value (abc123) and make a new line with this data.
"ID": "abc123"
"newID": "abc123"
You can do this:
sed -e 's/^"ID": "\(.*\)"/&\
"newID": "\1"/' myfile.txt
First, I'll try to explain the regular expression that searches for matches:
^ Matches the start of the line
"ID": " Matches that exact string
\(.*\) Matches a sequence of zero or more (*) of any character (.). Placing this expression between backslashed parenthesis creates a "capture", which allows us to store the resulting part of the match into an auxiliary variable \1.
" Matches the double-quote character
When it finds a match, it replaces it with:
& the match itself. This operator is an auxiliary variable that represents what was matched.
\<new-line> the backslash followed by an actual new line character escapes a new line, ie. it allows us to print a new line character into the replacement
"newId": " prints that exact string
\1 prints the contents of our capture, so it prints the ID we found
" prints a double quote character.
Hope this helps =)
Try doing this :
sed -r 's#^"ID": "([a-Z0-9]+)"#"newID": "\1"#' file.txt
sed : the executable
-r : extented mode (no need to backslash parenthesis)
s : we perform a substitution, skeleton is s#origin#replacement# (the separator can be anything)
^ : means start of line in regex
( ) : parenthesis is a capture
"newID": is the start of the new string
\1 : is the end of the substituted string (the captured string)
Considering your question is very vague I made some assumptions which will become apparent in my implementation.
INPUT FILE -- call it t
ID="one"
dkkd
ID="two"
ffkjf
ID="three"
ldl
Command ran on input file
for line in `cat t`; do newID=`echo $line | grep ID | cut -d= -f2`; if [[ "$newID" != "" ]]; then echo $line >> t2; echo newID=$newID >> t2; else echo $line >> t2; fi; done
OUTPUT FILE -- Name is t2 (apparent from the command)
ID="one"
newID="one"
dkkd
ID="two"
newID="two"
ffkjf
ID="three"
newID="three"
ldl
Basically this command goes line by line in the file (in this case called t) looks for an ID line. If it finds one it gets its value, prints the original line with the ID and then prints another one with a newID following right after. If the line in question does not have and ID then it just prints the line it self.
Things to note:
If you have any other line in the file that contains "ID" in it but is not the normal ID that you requested, this will not work.

Resources