Remove comma and change word order - bash

I have a text file in which every line has this format (the number of words before and after the comma may vary, but it is at least one before and one after it):
some words,a few other words
And I want its content to be displayed on the terminal like this:
a few other words some words
How can I do this? The only thing I can think of is using the tr command to replace the comma with a space, but I have no clue as to how to change the order of the words.
Any help would be appreciated.

If you don't mind using awk, you could use the -F option:
$>cat f
some words,a few other words
foo,bar
$>awk -F',' '{print $2,$1}' f
a few other words some words
bar foo
An other option is using sed:
$>sed 's/\(.*\),\(.*\)/\2 \1/g' f
a few other words some words
bar foo
Finally, with a while loop:
$>while IFS=, read part1 part2; do echo "$part2" "$part1";done <f
a few other words some words
bar foo
You have to be sure that there is only 1 comma on each line though…

cat yourfile.txt|sed -E 's/(.*),(.*)/\2 \1/'

This is what AWK is for (when defining field separator to be the comma):
$ echo "some words,a few other words" | awk -F, '{ print $2,$1 }'
a few other words some words
Edit:
Just noticed that #fredtantini had the same (accepted) solution. I leave it here as it shows the solution in a more concise (clear) way (i.e., it doesn't utilise an external file).

Related

Duplicate entries in file

I have a file with content as below,
123 ABC
12345 ABC-test
In the shell script, I need an exact entry instead of two duplicate results, but unable to get the exact entry.
For example:
grep "ABC"
returns both the entries, but I want a specific entry, i.e., if I search for "ABC", I should get only "123 ABC" and not the other entry.
Since you consider words to be whitespace-separated chunks, it is easier to use awk here since it reads lines (records) and splits them into fields (non-whitespace chunks) by default:
awk '$2=="ABC"' file > newfile
awk '/([[:space:]]|^)ABC([[:space:]]|$)/' file > newfile
Here, the first awk will output all lines where the second word is ABC. The second awk outputs all lines with ABC followed/preceded with a whitespace or at start/end of the line.
See the online demo:
#!/bin/bash
s='123 ABC
12345 ABC-test'
awk '$2=="ABC"' <<< "$s"
awk '/([[:space:]]|^)ABC([[:space:]]|$)/' <<< "$s"
Output:
123 ABC
You have to forge proper regex (regular expression) - in this case you want only those lines, where ABC is not surrounded by other characters (is on boundaries):
grep -e '\bABC\b'
should do the work. -e switch enables extended regular expressions in grep. Check also some regex tutorials, i.e. https://www.regular-expressions.info/tutorial.html.

Shell Script Replace a Specified Column with sed

I have a example dataset separated by semicolon as below;
123;IZMIR;ZMIR;123
abc;ANKAR;aaa;999
AAA;ZMIR;ZMIR;bob
BBB;ANKR;RRRR;ABC
I would like to replace values in a specified column. Lets say I want to change "ZMIR" AS "IZMIR" but only for the third column, the ones on the second column must stay the same.
Desired output is;
123;IZMIR;IZMIR;123
abc;ANKAR;aaa;999
AAA;ZMIR;IZMIR;bob
BBB;ANKR;RRRR;ABC
I tried;
sed 's/;ZMIR;/;IZMIR;/' file.txt
the problem is that it changes all the values on the file not just the 3rd one.
I also tried;
awk -F";" '{gsub("ZMIR",";IZMIR;",$2)}1'
and here it specifies the column but, it somehow adds spaces;
123 I;IZMIR; ZMIR 123
abc;ANKAR;aaa;999
AAA ;IZMIR; ZMIR bob
BBB;ANKR;RRRR;ABC
sed doesn't know about columns, awk does (but in awk they're called "fields"):
awk 'BEGIN{FS=OFS=";"} $3=="ZMIR"{$3="IZMIR"} 1' file
Note that since the above is doing a literal string search and replace, you don't have to worry about regexp or backreference metacharacters in the search or replacement strings, unlike in a sed solution (see https://stackoverflow.com/a/29626460/1745001).
wrt what you tried previously with awk:
awk -F";" '{gsub("ZMIR",";IZMIR;",$2)}1'
That says: find "ZMIR" in the 2nd semi-colon-separated field and replace it with ";IZMIR;" and also change every existing ";" on the line to a blank character.
To learn awk, read the book Effective Awk Programming, 4th Edition, by Arnold Robbins.
If you exactly know where the word to replace is located and how many of them are in that line you could use sed with something like:
sed '3 s/ZMIR/IZMIR/2'
With the 3 in the beginning you are selecting the third line and with the 2 in the end the second occurrence. However the awk solution is a better one. But just that you know how it works in sed ;)
This might work for you (GNU sed):
sed -r 's/[^;]+/\n&\n/3;s/\nZMIR\n/IZMIR/;s/\n//g' file
Surround the required field by unique markers then replace the required string (plus markers) by the replacement string. Finally remove the unique markers.
Perl on Command Line
Input
123;IZMIR;ZMIR;123
000;ANKAR;aaa;999
AAA;ZMIR;ZMIR;bob
BBB;ANKR;RRRR;ABC
$. == 1 means first row it does the work only for this row So second row $. == 2
$F[0] means first column and it only does on this column So fourth column $F[3]
-a -F\; means that delimiter is ;
what you want
perl -a -F\; -pe 's/$F[0]/***/ if $. == 1' your-file
output
***;IZMIR;ZMIR;123
abc;ANKAR;aaa;999
AAA;ZMIR;ZMIR;bob
BBB;ANKR;RRRR;ABC
for row == 2 and column == 2
perl -a -F\; -pe 's/$F[1]/***/ if $. == 2' your-file
123;IZMIR;ZMIR;123
abc;***;aaa;999
AAA;ZMIR;ZMIR;bob
BBB;ANKR;RRRR;ABC
Also without -a -F
perl -pe 's/123/***/ if $. == 1' your-file
output
***;IZMIR;ZMIR;123
abc;ANKAR;aaa;999
AAA;ZMIR;ZMIR;bob
BBB;ANKR;RRRR;ABC
If you want to edit you can add -i option that means Edit in-place And that's it, it simply find, replace and save in the same file
perl -i -a -F\; and so on
You need to include some absolute references in the line:
^ for beginning of the line
unequivocal separation pattern
^.*ZMIR and [^;]*;ZMIR give different values where first take everything before ZMIR and sed take the longest possible
Specific
sed 's/^\([^;]*;[^;]*;\)ZMIR;/\1IZMIR;/' YourFile
generic where Old and New are batch variable (Remember, this is regex value so regex rules to apply like escaping some char)
#Old='ZMIR'
#New='IZMIR'
sed 's/^\(\([^;]*;\)\{2\}\)'${Old}';/\1'${New}';/' YourFile
In this simple case sed is an alternative, but awk is better for a complex or long line.

shift the rest of the line to a newline after a space

if I have the following:
>AB ABABABA
>AC ACACACA
how do I shift everything onto a newline after the space i.e.
>AB
ABABABABA
>AC
ACACACACA
I have tried:
cat file | sed 's/ /\n/g'
cat file | tr ' ' '\n'
however I get the exact same output.
** UPDATE **
Upon inspecting the file using less and nano, the output was different to using cat. The file contains some terminal escape characters that aren't displayed in cat, but are in less. (how does this even happen?)
This was a terrible bug to spot and everyone has actually posted corrected answers based on the output of cat. So thank you for your help. Could the mods close this one?
It seems that you need to replace (any kind of) space with a newline
perl -pe 's/\s+/\n/' data.txt
This produces the required output in my tests. The -p sets up the loop over input (opening files or using STDIN) and sets $_ to the current line. It also prints $_ each time after processing.
If there are multiple spaces, each to be replaced by \n, add /g modifier.
If there may be more to do you can also capture patterns and replace them
perl -pe 's/\s+(.*)/\n$1/' data.txt
Following the observation in the answer by glenn jackman and looking "more closely" it appears that the first word on the line need be copied to the next line. Then the above is modified to
perl -pe 's/^>(\S+)\K\s+/\n$1';
The \K is a particular form of the positive lookbehind, which asserts that the pattern preceeds the current match position but it discards all previous matches (so you don't have to capture and copy them). You can find it in perlre. Without it the >(\S+) would be consumed so it would have to be copied back in the replacement part, as />$1\n$1/.
Are you trying to move the content from before the space onto the next line as well?
As in >A BC becomes:
>A
ABC
Then one can use sed like this:
$ sed 'h;s/^>\([^ ]*\) /\1/;x;s/ .*/ /;G' file
>AB
ABABABABA
>AC
ACACACACA
Breakdown:
h; # Copy pattern space to hold buffer
s/^>\([^ ]*\) /\1/; # Convert >A BC to ABC
x; # eXchange hold buffer and pattern space
s/ .*/ /; # Remove everything after, but including the
# first space: >A BC -> >A
G # Append hold buffer to pattern space
Looking more closely it looks like you want to repeat the first word on the next line: to transform this
>foo bar
>baz qux
into this
>foo
foobar
>baz
bazqux
If that's true, you can do
sed -r 's/^>([^ ]+) />\1\n\1/' file # or
perl -pe 's/^>(\S+) />$1\n$1/' file
sed is for simple substitutions on individual lines, that is all. For anything else you should be using awk, e.g.:
$ awk '{print $1 ORS substr($1,2) $2}' file
>AB
ABABABABA
>AC
ACACACACA

Remove variable parts of an input list

I have an input list from which I want to remove occurrences of a variable string. Say my input list looks as follows:
(BLA-123) some text
BLA-123 some text
BLA-123: some text
some text (BLA-123)
some text BLA-123
I would like my input list to look like:
some text
some text
some text
some text
some text
Basically, I need to remove all occurrences of any BLA-[0-9]{1,4} which may be inclosed in ( and ) or followed by a :, both from the beginning and the end of any line in the input list.
I thought of using cut but is kind of hard to achieve what I need. Then I thought of sed, which I believe is the way to go, but I have little to none experience with it.
Perhaps:
sed 's/ *[(]*[A-Z][A-Z]*-[0-9]\{1,4\}[):]* *//'
I've replace BLA with an arbitrary upper-case string [A-Z][A-Z]* because I don't know whether you meant it as a meta-variable in the problem description.
If you have the GNU sed, this can be slightly improved by using \? and \+:
sed 's/ *[(]\?[A-Z]\+-[0-9]\{1,4\}[):]\? *//'
These, however, convert:
some text BLA-123 more text
to:
some textmore text
which may not be what you want. If you want such a line to remain unchanged, then you can double the substitution, modifying the first so that it matches only at the start, and the second so it matches at the end:
sed 's/^ *[(]\?[A-Z]\+-[0-9]\{1,4\}[):]\? *//;s/ *[(]\?[A-Z]\+-[0-9]\{1,4\}[):]\? *$//'
This is not very optimal... but works:
$ sed -e 's/(BLA-[0-9]*)[ ]*//g' -e 's/BLA-[0-9]*:[ ]*//g' -e 's/BLA-[0-9]*[ ]*//g' a
some text
some text
some text
some text
some text
s/(BLA-[0-9]*)[ ]*//g deletes (BLA-XXXX) plus eventual trailing spaces.
s/BLA-[0-9]*:[ ]*//g deletes BLA-XXXX: plus eventual trailing spaces.
s/BLA-[0-9]*[ ]*//g deletes BLA-XXXX plus eventual trailing spaces.
Here's what I came up with:
sed -E 's/[[:punct:]]?BLA-[[:digit:]]{1,4}[[:punct:]]?[[:space:]]*//'
There's a trailing space at the end of some output lines that you can eliminate by putting [[:space:]]* at the beginning.
sed 's/ *(BLA-[0-9]\{1,4\}) *//
s/ *BLA-[0-9]\{1,4\}:\{0,1\} *//' YourFile
avoid the opening ( without cloing )
You can use awk one-liner:
$ cat toto
(BLA-123) some text
BLA-123 some text
BLA-123: some text
some text (BLA-123)
some text BLA-123
$ awk '{for (i=0;i<=NF;i=i+1) if ($i!~/BLA/) printf $i" "}{printf "\n"}' toto
some text
some text
some text
some text
some text
Which can be translated by
for each line (awk works by parsing line by line), for each field (NF is Number of Field, ie column), is the column number i does not contain BLA you print it. After each line, print "\n"
Hope this helps.

Removing spaces from columns of a CSV file in bash

I have a CSV file in which every column contains unnecessary spaces(or tabs) after the actual value. I want to create a new CSV file removing all the spaces using bash.
For example
One line in input CSV file
abc def pqr ;valueXYZ ;value PQR ;value4
same line in output csv file should be
abc def pqr;valueXYZ;value PQR;value4
I tried using awk to trim each column but it didnt work. Can anyone please help me on this ?
Thanks in advance :)
I edited my test case, since the values here can contain spaces.
$ cat cvs_file | awk 'BEGIN{ FS=" *;"; OFS=";" } {$1=$1; print $0}'
Set the input field separator (FS) to the regex of zero or more spaces followed by a semicolon.
Set the output field separator (OFS) to a simple semicolon.
$1=$1 is necessary to refresh $0.
Print $0.
$ cat cvs_file
abc def pqr ;valueXYZ ;value PQR ;value4
$ cat cvs_file | awk 'BEGIN{ FS=" *;"; OFS=";" } {$1=$1; print $0}'
abc def pqr;valueXYZ;value PQR;value4
If the values themselves are always free of spaces, the canonical solution (in my view) would be to use tr:
$ tr -d '[:blank:]' < CSV_FILE > CSV_FILE_TRIMMED
This will replace multiple spaces with just one space:
sed -r 's/\s+/ /g'
If you know what your column data will end in, then this is a surefire way to do it:
sed 's|\(.*[a-zA-Z0-9]\) *|\1|g'
The character class would be where you put whatever your data will end in.
Otherwise, if you know more than one space is not going to come in your fields, then you could use what user1464130 gave you.
If this doesn't solve your problem, then get back to me.
I found one way to do what I wanted that is remove blank line and remove trailing newline of a file in an efficient way. I do this with :
grep -v -e '^[[:space:]]*$' foo.txt
from Remove blank lines with grep

Resources