Remove specific white spaces from txt file - shell

I have a large txt file with two columns, the two columns are separated by a lot of white space:
123467, And the second part here
To remove the white space between the columns I used
sed -e "s/ /,/g" < a.txt
However it also removed the spaces between words in the second column
And the second part here
How can I just remove the spaces between columns without effecting the words in second column?

You could do this:
sed -E -e 's#, +#,#'
This would remove space after a ,. You should also not use g as it will try to match all pattern in the line.

$ sed 's/ *//' file
123467,And the second part here

Since you're thinking of the file as being columns separated by commas, it makes sense to use a tool that treats the input as columns separated by commas:
awk '$1=$1' OFS=, FS=', *' input

Related

Put the first letter of each column in eol

I have a file like this:
A_City,QQQQ
B_State,QQQQ
C_Country,QQQQ
A_Cityt,YYYY
B_State,YYYY
C_Country,YYYY
I want to add one more column at end of the line on the same file with the first letter of each column.
A_City,QQQQ,AQ
B_State,QQQQ,BQ
C_Country,QQQQ,CQ
A_Cityt,YYYY,AY
B_State,YYYY,BY
C_Country,YYYY,CY
I would like to get this using sed but if there is an awk code would help.
awk to the rescue!
$ awk '{print $0 "," substr($0,1,1) substr($0,length($0))}' file
A_City,QQQQ,AQ
B_State,QQQQ,BQ
C_Country,QQQQ,CQ
A_Cityt,YYYY,AY
B_State,YYYY,BY
C_Country,YYYY,CY
or, perhaps
$ awk -F, '{print $0 FS substr($1,1,1) substr($2,1,1)}' file
When you have only one , you can use
sed -r 's/^(.).*,(.).*/&,\1\2/' file
This might work for you (GNU sed):
sed -r 's/^|,+/&\n/g;s/$/,\n/;:a;s/\n(.).*,\n.*/&\1/;s/\n//;/\n.*,\n/ba;s/\n//g' file
Insert a newline at the start of a line or following one or more ,'s. Append an additional , and a newline to the end of the line. Append a character following a newline followed by zero or more characters followed by a , and a final newline and any following characters to its match. Remove the first newline. If there are two or more newlines repeat. Finally remove all newlines.
N.B. If the line is initially empty, this will add a , to such lines. Empty fields are catered for and will be represented by no first character.

Shell Script Replace a Specified Column with sed

I have a example dataset separated by semicolon as below;
123;IZMIR;ZMIR;123
abc;ANKAR;aaa;999
AAA;ZMIR;ZMIR;bob
BBB;ANKR;RRRR;ABC
I would like to replace values in a specified column. Lets say I want to change "ZMIR" AS "IZMIR" but only for the third column, the ones on the second column must stay the same.
Desired output is;
123;IZMIR;IZMIR;123
abc;ANKAR;aaa;999
AAA;ZMIR;IZMIR;bob
BBB;ANKR;RRRR;ABC
I tried;
sed 's/;ZMIR;/;IZMIR;/' file.txt
the problem is that it changes all the values on the file not just the 3rd one.
I also tried;
awk -F";" '{gsub("ZMIR",";IZMIR;",$2)}1'
and here it specifies the column but, it somehow adds spaces;
123 I;IZMIR; ZMIR 123
abc;ANKAR;aaa;999
AAA ;IZMIR; ZMIR bob
BBB;ANKR;RRRR;ABC
sed doesn't know about columns, awk does (but in awk they're called "fields"):
awk 'BEGIN{FS=OFS=";"} $3=="ZMIR"{$3="IZMIR"} 1' file
Note that since the above is doing a literal string search and replace, you don't have to worry about regexp or backreference metacharacters in the search or replacement strings, unlike in a sed solution (see https://stackoverflow.com/a/29626460/1745001).
wrt what you tried previously with awk:
awk -F";" '{gsub("ZMIR",";IZMIR;",$2)}1'
That says: find "ZMIR" in the 2nd semi-colon-separated field and replace it with ";IZMIR;" and also change every existing ";" on the line to a blank character.
To learn awk, read the book Effective Awk Programming, 4th Edition, by Arnold Robbins.
If you exactly know where the word to replace is located and how many of them are in that line you could use sed with something like:
sed '3 s/ZMIR/IZMIR/2'
With the 3 in the beginning you are selecting the third line and with the 2 in the end the second occurrence. However the awk solution is a better one. But just that you know how it works in sed ;)
This might work for you (GNU sed):
sed -r 's/[^;]+/\n&\n/3;s/\nZMIR\n/IZMIR/;s/\n//g' file
Surround the required field by unique markers then replace the required string (plus markers) by the replacement string. Finally remove the unique markers.
Perl on Command Line
Input
123;IZMIR;ZMIR;123
000;ANKAR;aaa;999
AAA;ZMIR;ZMIR;bob
BBB;ANKR;RRRR;ABC
$. == 1 means first row it does the work only for this row So second row $. == 2
$F[0] means first column and it only does on this column So fourth column $F[3]
-a -F\; means that delimiter is ;
what you want
perl -a -F\; -pe 's/$F[0]/***/ if $. == 1' your-file
output
***;IZMIR;ZMIR;123
abc;ANKAR;aaa;999
AAA;ZMIR;ZMIR;bob
BBB;ANKR;RRRR;ABC
for row == 2 and column == 2
perl -a -F\; -pe 's/$F[1]/***/ if $. == 2' your-file
123;IZMIR;ZMIR;123
abc;***;aaa;999
AAA;ZMIR;ZMIR;bob
BBB;ANKR;RRRR;ABC
Also without -a -F
perl -pe 's/123/***/ if $. == 1' your-file
output
***;IZMIR;ZMIR;123
abc;ANKAR;aaa;999
AAA;ZMIR;ZMIR;bob
BBB;ANKR;RRRR;ABC
If you want to edit you can add -i option that means Edit in-place And that's it, it simply find, replace and save in the same file
perl -i -a -F\; and so on
You need to include some absolute references in the line:
^ for beginning of the line
unequivocal separation pattern
^.*ZMIR and [^;]*;ZMIR give different values where first take everything before ZMIR and sed take the longest possible
Specific
sed 's/^\([^;]*;[^;]*;\)ZMIR;/\1IZMIR;/' YourFile
generic where Old and New are batch variable (Remember, this is regex value so regex rules to apply like escaping some char)
#Old='ZMIR'
#New='IZMIR'
sed 's/^\(\([^;]*;\)\{2\}\)'${Old}';/\1'${New}';/' YourFile
In this simple case sed is an alternative, but awk is better for a complex or long line.

Bash: Filtering records in a file based on multi column delimiter

Need help in Bash to filter records based on a multicolumn delimiter.
Delimiter is |^^|
Sample record
xyz#ATT.NET|^^|xyz|^^|307
Awk runs file when used with single character delimiter but not with multi character.
awk -F"|^^|" "NF !=3 {print}" file.txt
Any suggestions?
The issue is that every character in your delimiter is a regexp metacharacter so you need to escape them when appropriate so awk knows you want them treated literally. This might be overkill:
awk -F'\\|\\^\\^\\|' 'NF!=3' file.txt
but I can't test it since you only provided one line of input, not the selection of lines some of which do/don't match that'd be required to test the script.
awk -F "<regex>" ...
It is not a multicolumn delimiter, is is a regular expression
simple regex,
such as match this single char are what you get use to,
but not all there is.
One way is to escape all the regex characters as #Ed Morton answered.
Alternatively,
you can replace all |^^| with a single character which never shows in your file content, here let's say a comma
sed 's/|^^|/,/g' file.txt
xyz#ATT.NET,xyz,307
The command would be
sed 's/|^^|/,/g' file.txt | awk -F, 'NF != 3'

Remove all characters existing between first n occurences of a specific character in each line in shell

Say I have txt file with characters as follows:
abcd|123|kds|Name|Place|Phone
ldkdsd|323|jkds|Name1|Place1|Phone1
I want to remove all the characters in each line that exist within first 3 occurences of | character in each line. I want my output as:
Name|Place|Phone
Name1|Place1|Phone1
Could anyone help me figure this out? How can I achieve this using sed?
This would be a typical task for cut
cut -d'|' -f4- file
output:
Name|Place|Phone
Name1|Place1|Phone1
the -f4- means you want from the forth field till the end. Adjust the 4 if you have a different requirement.
You could try the below sed commad,
$ sed -r 's/^(\s*)[^|]*\|[^|]*\|[^|]*\|/\1/g' file
Name|Place|Phone
Name1|Place1|Phone1
^(\s*) captures all the spaces which are at the start.
[^|]*\|[^|]*\|[^|]*\| Matches upto the third |. So this abcd|123|kds| will be matched.
All the matched characters are replaced by the chars which are present inside the first captured group.
This might work for you (GNU sed):
sed 's/^\([^|]*|\)\{3\}//' file
or more readably:
sed -r 's/^([^|]*\|){3}//' file
sed 's/\(\([^|]*|\)\{3\}\)//' YourFile
this is a posix version, on GNU sed force --posix due to the use of | that is interpreted as "OR" and not in posix version.
Explaination
Replace the 3 first occurence (\{3\}) of [ any charcater but | followed by | (\([^|]*|\)) ] by nothing (// that is an empty pattern)
You can print the last 3 fields:
awk '{print $(NF-2),$(NF-1),$NF}' FS=\| OFS=\| file
Name|Place|Phone
Name1|Place1|Phone1

Remove variable parts of an input list

I have an input list from which I want to remove occurrences of a variable string. Say my input list looks as follows:
(BLA-123) some text
BLA-123 some text
BLA-123: some text
some text (BLA-123)
some text BLA-123
I would like my input list to look like:
some text
some text
some text
some text
some text
Basically, I need to remove all occurrences of any BLA-[0-9]{1,4} which may be inclosed in ( and ) or followed by a :, both from the beginning and the end of any line in the input list.
I thought of using cut but is kind of hard to achieve what I need. Then I thought of sed, which I believe is the way to go, but I have little to none experience with it.
Perhaps:
sed 's/ *[(]*[A-Z][A-Z]*-[0-9]\{1,4\}[):]* *//'
I've replace BLA with an arbitrary upper-case string [A-Z][A-Z]* because I don't know whether you meant it as a meta-variable in the problem description.
If you have the GNU sed, this can be slightly improved by using \? and \+:
sed 's/ *[(]\?[A-Z]\+-[0-9]\{1,4\}[):]\? *//'
These, however, convert:
some text BLA-123 more text
to:
some textmore text
which may not be what you want. If you want such a line to remain unchanged, then you can double the substitution, modifying the first so that it matches only at the start, and the second so it matches at the end:
sed 's/^ *[(]\?[A-Z]\+-[0-9]\{1,4\}[):]\? *//;s/ *[(]\?[A-Z]\+-[0-9]\{1,4\}[):]\? *$//'
This is not very optimal... but works:
$ sed -e 's/(BLA-[0-9]*)[ ]*//g' -e 's/BLA-[0-9]*:[ ]*//g' -e 's/BLA-[0-9]*[ ]*//g' a
some text
some text
some text
some text
some text
s/(BLA-[0-9]*)[ ]*//g deletes (BLA-XXXX) plus eventual trailing spaces.
s/BLA-[0-9]*:[ ]*//g deletes BLA-XXXX: plus eventual trailing spaces.
s/BLA-[0-9]*[ ]*//g deletes BLA-XXXX plus eventual trailing spaces.
Here's what I came up with:
sed -E 's/[[:punct:]]?BLA-[[:digit:]]{1,4}[[:punct:]]?[[:space:]]*//'
There's a trailing space at the end of some output lines that you can eliminate by putting [[:space:]]* at the beginning.
sed 's/ *(BLA-[0-9]\{1,4\}) *//
s/ *BLA-[0-9]\{1,4\}:\{0,1\} *//' YourFile
avoid the opening ( without cloing )
You can use awk one-liner:
$ cat toto
(BLA-123) some text
BLA-123 some text
BLA-123: some text
some text (BLA-123)
some text BLA-123
$ awk '{for (i=0;i<=NF;i=i+1) if ($i!~/BLA/) printf $i" "}{printf "\n"}' toto
some text
some text
some text
some text
some text
Which can be translated by
for each line (awk works by parsing line by line), for each field (NF is Number of Field, ie column), is the column number i does not contain BLA you print it. After each line, print "\n"
Hope this helps.

Resources