Replace All first 4 spaces with a tab - shell

I am doing some documentation work, and I have a tree structure like this:
A
BB
C C
DD
How can I replace just all the occurrences of 2 spaces in the head of the line with '-', like:
A
--BB
--C C
----DD
I have tried sed 's/ /-/g', but this replaces all occurrences of 2 spaces; also sed 's/^ /-/g', this just replaces the first occurrence of 2 spaces. How can I do this?

The regular expression for four spaces at beginning of line is /^ / where I put the slashes just to demarcate the expression (they are not part of the actual regular expression, but they are used as delimiters by sed).
sed 's/^ /\t/' file
In recent sed versions, you can add an -i option to modify file in-place (that is, sed will replace the file with the modified file); on *BSD (including OSX), you need -i '' with an empty option argument.
The \t escape code for tab is also not universally supported; if that is a problem, your shell probably allows you to type a literal tab by prefixing it with ctrl-V.
(Your question title says "tab" but your question asks about dashes. To replace with two dashes, replace \t in the replacement part of the script with --, obviously.)
If you are trying to generalize to "any groups of two spaces at beginning of line should be replaced by a dash", this is not impossible to do in sed, but I would recommend Perl instead:
perl -pe 's%^((?: )+)% "-" x (length($1) / 2)%e' file
This captures the match into $1; the inner parenthesized expression matches two spaces and the + quantifier says to match that as many times as possible. The /e flag allows us to use Perl code in the replacement; this piece of code repeats the character "-" as many times as the captured expression was repeated, which is conveniently equal to half its length.

Related

Replace a specific character at any word's begin and end in bash

I need to remove the hyphen '-' character only when it matches the pattern 'space-[A-Z]' or '[A-Z]-space'. (Assuming all letters are uppercase, and space could be a space, or newline)
sample.txt
I AM EMPTY-HANDED AND I- WA-
-ANT SOME COO- COOKIES
I want the output to be
I AM EMPTY-HANDED AND I WA
ANT SOME COO COOKIES
I've looked around for answers using sed and awk and perl, but I could only find answers relating to removing all characters between two patterns or specific strings, but not a specific character between [A-Z] and space.
Thanks heaps!!
If perl is your option, would you try the following:
perl -pe 's/(^|(?<=\s))-(?=[A-Z])//g; s/(?<=[A-Z])-((?=\s)|$)//g' sample.txt
(?<=\s) is a zero-width lookbehind assertion which matches leading
whitespace without including it in the matched substring.
(?=[A-Z]) is a zero-width lookahead assertion which matches trailing
character between A and Z without including it in the matched substring.
As a result, only the dash characters which match the pattern above are
removed from the original text.
The second statement s/..//g is the flipped version of the first one.
Could you please try following.
awk '{for(i=1;i<=NF;i++){if($i ~ /^-[a-zA-Z]+$|^[a-zA-Z]+-$/){sub(/-/,"",$i)}}} 1' Input_file
Adding a non-one liner form of solution:
awk '
{
for(i=1;i<=NF;i++){
if($i ~ /^-[a-zA-Z]+$|^[a-zA-Z]+-$/){
sub(/-/,"",$i)
}
}
}
1
' Input_file
Output will be as follows.
I AM EMPTY-HANDED AND I WA
ANT SOME COO COOKIES
If you can provide Extended Regular Expressions to sed (generally with the -E or -r option), then you can shorten your sed expression to:
sed -E 's/(^|\s)-(\w)/\1\2/g;s/(\w)-(\s|$)/\1\2/g' file
Where the basic form is sed -E 's/find1/replace1/g;s/find2/replace2/g' file which can also be written as separate expressions sed -E -e 's/find1/replace1/g' -e 's/find2/replace2/g' (your choice).
The details of s/find1/replace1/g are:
find1 is
(^|\s) locate and capture at the beginning or whitespace,
followed by the '-' hyphen,
then capture the next \w (word-character); and
replace1 is simply \1\2 reinsert both captures with the first two backreferences.
The next substitution expression is similar, except now you are looking for the hyphen followed by a whitespace or at the end. So you have:
find2 being
a capture of \w (word-character),
followed by the hyphen,
followed by a capture of either a following space or the end (\s|$), then
replace2 is the same as before, just reinsert the captured characters using backreferences.
In each case the g indicates a global replace of all occurrences.
(note: the \w word-character also includes the '_' (underscore), so while unlikely you would have a hyphen and underscore together, if you do, you need to use the [A-Za-z] list instead of \w)
Example Use/Output
In your case, then output is:
$ sed -E 's/(^|\s)-(\w)/\1\2/g;s/(\w)-(\s|$)/\1\2/g' file
I AM EMPTY-HANDED AND I WA
ANT SOME COO COOKIES
remove the hyphen '-' character only when it matches the pattern 'space-[A-Z]' or '[A-Z]-space'. Assuming all letters are uppercase, and space could be a space, or newline
It's:
sed 's/\( \|^\)-\([A-Z]\)/\1\2/g; s/\([A-Z]\)-\( \|$\)/\1\2/g'
s - substitute
/
\( \|^\) - space or beginning of the line
- - hyphen...
\(A-Z]\) - a single upper case character
/
\1\2 - The \1 is replaced by the first \(...\) thing. So it is replaced by a space or nothing. \2 is replaced by the single upper case character found. Effectively - is removed.
/
g apply the regex globally
; - separate two s commands
s
Same as above. The $ means end of the line.
awk '{sub(/ -/,"");sub(/^-|-$/,"");sub(/- /," ")}1' file
I AM EMPTY-HANDED AND I WA
ANT SOME COO COOKIES

Extract text between two special characters

Trying to extract the text between the special characters "\ and \" through sed
Ex: "\hell##$\"},
expected output : hell##$
You can do it quite easily with using a capture-group and backreference with basic regular-expressions:
sed 's/^["][\]\([^\]*\).*$/\1/'
Explanation
Normal substitution sed 's/find/replace/, where
find is ^["][\] a double-quote and \ before beginning the capture \(...\) which contains [^\]* (zero or more characters not a \), the closing of the capture \) and then .*$ the remainder of the string;
replace is \1 (the first backreference) containing the text captured between \(...\).
(note: if your "\ doesn't begin the string, remove the first '^' anchor)
Example
$ echo '"\hell##$\"},' | sed 's/^["][\]\([^\]*\).*$/\1/'
hell##$
Look things over and let me know if you have questions.
This might work for you (GNU sed):
sed -nE '/"\\[^\\]*\\+([^\\"][^\\]*\\+)*"/{s/"\\/\n/;s/.*\n//;s/\\"/\n/;P;D}' file
The solution comes in two parts:
Firstly, a regexp to determine whether a pair of two characters exists. This can be tricky as a negated class is insufficient because edge cases can easily defeat a simplistic approach.
Secondly, once a pair of characters does exist the text between them must be extracted piece meal.

Bash: remove semicolons from a line in a CSV-file

I've a CSV-file with a few hundred lines and a lot (not all) of these lines contains data (Klas/Lesgroep:;;T2B1) which I want to extract.
i.e. ;;;;;;Klas/Lesgroep:;;T2B1;;;;;;;;;;
I want to delete the semicolons which are in front of Klas/Lesgroep but the number of semicolons is variable. How can I delete these semicolons in Bash ?
I'm not a native speaking Englishman so I hope it's clear to you
To remove any nonempty run of ; chars. that come directly before literal Klas/Lesgroep:
With GNU or BSD/macOS sed:
$ sed -E 's|;+(Klas/Lesgroep)|\1|' <<< ";;;;;;Klas/Lesgroep:;;T2B1;;;;;;;;;;"
Klas/Lesgroep:;;T2B1;;;;;;;;;;
The s function performs string substitution (replacement):
The 1st argument is a regex (regular expression) that specifies what part of the line to match,
and the 2nd arguments specifies what to replace the matching part with.
Note how I've chosen | as the regex/argument delimiter instead of the customary /, because that allows unescaped use of / chars. inside the regex.
;+ matches one or more directly adjacent ; chars.
(Klas/Lesgroep) matches literal Klas/Lesgroep and by enclosing it in (...) - making it a capture group - the match is remembered and can be referenced as \1 - the 1st capture group in the regex - in the replacement argument to s.
The net effect is that all ; chars. directly preceding Klas/Lesgroep are removed.
POSIX-compliant form:
$ sed 's|;\{1,\}\(Klas/Lesgroep\)|\1|' <<< ";;;;;;Klas/Lesgroep:;;T2B1;;;;;;;;;;"
Klas/Lesgroep:;;T2B1;;;;;;;;;;
POSIX requires the less powerful and antiquated BRE syntax, where duplication symbol + must be emulated as \{1,\}, and, generally, metacharacters (, ), {, } must be \-escaped.
With sed you can search for lines starting with at least one semi-colon followed by Klas/Lesgroep and, if found, substitute leading ; with nothing:
$ sed '/;;*Klas\/Lesgroep/s/^;*//g' <<< ";;;;;;Klas/Lesgroep:;;T2B1;;;;;;;;;;"
Klas/Lesgroep:;;T2B1;;;;;;;;;;
To remove all ";" from a file , we can use sed command . sed is used for modifying the files.
$ sed 's/find/replace/g' file
The substitute flag /g (global replacement) specifies the sed command to replace all the occurrences of the string in the line.
So to remove ";" just find and replace it with nothing.
sed 's/;//g' file.csv

Need to diff two text files in linux with some patterns in filelines

File A contains
Test-1.2-3
Test1-2.2-3
Test2-4.2-3
File B contains
Test1
Expected output should be
Test-1.2-3
Test2-4.2-3
diff A B doesn't work as expected.
Kindly let me know if any solutions here.
Using grep:
grep -vf B A
-f FILE, --file=FILE
Obtain patterns from FILE, one per line. The empty file
contains zero patterns, and therefore matches nothing.
-v, --invert-match
Invert the sense of matching, to select non-matching lines.
Edit:
Optionally, you may want to use the -w option if you want a more precise match on "words" only which seems to be your case from your example since your match is followed by '-'. As DevSolar points out, you may also want to use the -F option to prevent input patterns from your file B to be interpreted as regular expressions.
grep -vFwf B A
-w, --word-regexp
Select only those lines containing matches that form whole
words. The test is that the matching substring must either be
at the beginning of the line, or preceded by a non-word
constituent character. Similarly, it must be either at the end
of the line or followed by a non-word constituent character.
Word-constituent characters are letters, digits, and the
underscore.
-F, --fixed-strings
Interpret PATTERN as a list of fixed strings (rather than regular
expressions), separated by newlines, any of which is to be matched.
To complement Julien Lopez's helpful answer:
If you want to ensure that lines from File B only match at the beginning of lines from File A, you can prepend ^ to each line from file B, using sed:
grep -vf <(sed 's/^/^/' fileB) fileA
grep, which by default interprets its search strings as BREs (basic regular expressions), then interprets the ^ as the beginning-of-line anchor.
If the lines in File B may contain characters that are regex metacharacters (such as ^, *,?, ...) but should be treated as literals, you must escape them first:
grep -vf <(sed 's/[^^]/[&]/g; s/\^/\\^/g; s/^/^/' fileB) fileA
An explanation of this grim-looking - but generically robust - sed command can be found in this this answer of mine.
Note:
Assumes bash, ksh, or zsh due to use of <(...), a process substitution, which makes the output from sed act as if it were provided via a file.
sed command s/^/^/ looks like it won't do anything, but the first ^, in the regex part of the call, is the beginning-of-line anchor[1]
, whereas the second ^, in the substitution part of the call, is a literal to place at the beginning of the line (which will later itself be interpreted as the beginning-of-line anchor in the context of grep).
[1] Strictly speaking, to sed it is the beginning-of-pattern-space anchor, because it is possible to read multiple lines at once with sed, in which case ^ refers to the beginning of the pattern space (input buffer) as a whole, not to individual lines.

Ignoring lines with blank or space after character using sed

I am trying to use sed to extract some assignments being made in a text file. My text file looks like ...
color1=blue
color2=orange
name1.first=Ahmed
name2.first=Sam
name3.first=
name4.first=
name5.first=
name6.first=
Currently, I am using sed to print all the strings after the name#.first's ...
sed 's/name.*.first=//' file
But of course, this also prints all of the lines with no assignment ...
Ahmed
Sam
# I'm just putting this comment here to illustrate the extra carriage returns above; please ignore it
Is there any way I can get sed to ignore the lines with blank or whitespace only assignments and store this to an array? The number of assigned name#.first's is not known, nor are the number of assignments of each type in general.
This is a slight variation on sputnick's answer:
sed -n '/^name[0-9]\.first=\(.\+\)/ s//\1/p'
The first part (/^name[0-9]\.first=\(.\+\)/) selects the lines you want to pass to the s/// command. The empty pattern in the s command re-uses the previous regular expression and the replacement portion (\1) replaces the entire match with the contents of the first parenthesized part of the regex. Use the -n and p flags to control which lines are printed.
sed -n 's/^name[0-9]\.\w\+=\(\w\+\)/\1/p' file
Output
Ahmed
Sam
Explainations
the -n switch suppress the default behavior of sed : printing all lines
s/// is the skeleton for a substitution
^ match the beginning of a line
name literal string
[0-9] a digit alone
\.\w\+ a literal dot (without backslash means any character) followed by a word character [a-zA-Z0-9_] al least one : \+
( ) is a capturing group and \1 is the captured group

Resources