I have a file like this:
A_City,QQQQ
B_State,QQQQ
C_Country,QQQQ
A_Cityt,YYYY
B_State,YYYY
C_Country,YYYY
I want to add one more column at end of the line on the same file with the first letter of each column.
A_City,QQQQ,AQ
B_State,QQQQ,BQ
C_Country,QQQQ,CQ
A_Cityt,YYYY,AY
B_State,YYYY,BY
C_Country,YYYY,CY
I would like to get this using sed but if there is an awk code would help.
awk to the rescue!
$ awk '{print $0 "," substr($0,1,1) substr($0,length($0))}' file
A_City,QQQQ,AQ
B_State,QQQQ,BQ
C_Country,QQQQ,CQ
A_Cityt,YYYY,AY
B_State,YYYY,BY
C_Country,YYYY,CY
or, perhaps
$ awk -F, '{print $0 FS substr($1,1,1) substr($2,1,1)}' file
When you have only one , you can use
sed -r 's/^(.).*,(.).*/&,\1\2/' file
This might work for you (GNU sed):
sed -r 's/^|,+/&\n/g;s/$/,\n/;:a;s/\n(.).*,\n.*/&\1/;s/\n//;/\n.*,\n/ba;s/\n//g' file
Insert a newline at the start of a line or following one or more ,'s. Append an additional , and a newline to the end of the line. Append a character following a newline followed by zero or more characters followed by a , and a final newline and any following characters to its match. Remove the first newline. If there are two or more newlines repeat. Finally remove all newlines.
N.B. If the line is initially empty, this will add a , to such lines. Empty fields are catered for and will be represented by no first character.
Related
I have a example dataset separated by semicolon as below;
123;IZMIR;ZMIR;123
abc;ANKAR;aaa;999
AAA;ZMIR;ZMIR;bob
BBB;ANKR;RRRR;ABC
I would like to replace values in a specified column. Lets say I want to change "ZMIR" AS "IZMIR" but only for the third column, the ones on the second column must stay the same.
Desired output is;
123;IZMIR;IZMIR;123
abc;ANKAR;aaa;999
AAA;ZMIR;IZMIR;bob
BBB;ANKR;RRRR;ABC
I tried;
sed 's/;ZMIR;/;IZMIR;/' file.txt
the problem is that it changes all the values on the file not just the 3rd one.
I also tried;
awk -F";" '{gsub("ZMIR",";IZMIR;",$2)}1'
and here it specifies the column but, it somehow adds spaces;
123 I;IZMIR; ZMIR 123
abc;ANKAR;aaa;999
AAA ;IZMIR; ZMIR bob
BBB;ANKR;RRRR;ABC
sed doesn't know about columns, awk does (but in awk they're called "fields"):
awk 'BEGIN{FS=OFS=";"} $3=="ZMIR"{$3="IZMIR"} 1' file
Note that since the above is doing a literal string search and replace, you don't have to worry about regexp or backreference metacharacters in the search or replacement strings, unlike in a sed solution (see https://stackoverflow.com/a/29626460/1745001).
wrt what you tried previously with awk:
awk -F";" '{gsub("ZMIR",";IZMIR;",$2)}1'
That says: find "ZMIR" in the 2nd semi-colon-separated field and replace it with ";IZMIR;" and also change every existing ";" on the line to a blank character.
To learn awk, read the book Effective Awk Programming, 4th Edition, by Arnold Robbins.
If you exactly know where the word to replace is located and how many of them are in that line you could use sed with something like:
sed '3 s/ZMIR/IZMIR/2'
With the 3 in the beginning you are selecting the third line and with the 2 in the end the second occurrence. However the awk solution is a better one. But just that you know how it works in sed ;)
This might work for you (GNU sed):
sed -r 's/[^;]+/\n&\n/3;s/\nZMIR\n/IZMIR/;s/\n//g' file
Surround the required field by unique markers then replace the required string (plus markers) by the replacement string. Finally remove the unique markers.
Perl on Command Line
Input
123;IZMIR;ZMIR;123
000;ANKAR;aaa;999
AAA;ZMIR;ZMIR;bob
BBB;ANKR;RRRR;ABC
$. == 1 means first row it does the work only for this row So second row $. == 2
$F[0] means first column and it only does on this column So fourth column $F[3]
-a -F\; means that delimiter is ;
what you want
perl -a -F\; -pe 's/$F[0]/***/ if $. == 1' your-file
output
***;IZMIR;ZMIR;123
abc;ANKAR;aaa;999
AAA;ZMIR;ZMIR;bob
BBB;ANKR;RRRR;ABC
for row == 2 and column == 2
perl -a -F\; -pe 's/$F[1]/***/ if $. == 2' your-file
123;IZMIR;ZMIR;123
abc;***;aaa;999
AAA;ZMIR;ZMIR;bob
BBB;ANKR;RRRR;ABC
Also without -a -F
perl -pe 's/123/***/ if $. == 1' your-file
output
***;IZMIR;ZMIR;123
abc;ANKAR;aaa;999
AAA;ZMIR;ZMIR;bob
BBB;ANKR;RRRR;ABC
If you want to edit you can add -i option that means Edit in-place And that's it, it simply find, replace and save in the same file
perl -i -a -F\; and so on
You need to include some absolute references in the line:
^ for beginning of the line
unequivocal separation pattern
^.*ZMIR and [^;]*;ZMIR give different values where first take everything before ZMIR and sed take the longest possible
Specific
sed 's/^\([^;]*;[^;]*;\)ZMIR;/\1IZMIR;/' YourFile
generic where Old and New are batch variable (Remember, this is regex value so regex rules to apply like escaping some char)
#Old='ZMIR'
#New='IZMIR'
sed 's/^\(\([^;]*;\)\{2\}\)'${Old}';/\1'${New}';/' YourFile
In this simple case sed is an alternative, but awk is better for a complex or long line.
I need to convert a series of text files that are formatted with line breaks to single lines separated by newlines (\n). For example:
This is an example text file
where the contents are separated
by line breaks
What I want this to look like is:
This is an example text file\nwhere the contents are separated\nby line breaks\n
I'm open to using awk, sed, or any builtin POSIX commands.
Please try this solution:
awk 'BEGIN{RS="\n";ORS="\\n"}1' file.txt
What we are doing is detect the Record Separator like '\n', and when we print we use '\n', the double slash implies it must print '\n', to force the printing we use the pattern 1 with the default action (print the whole record).
If you have any problem let me know, I don't have an awk available to try it.
It's not clear when you say "line break" if you you mean Carriage Return, Line Feed, or Newline or something else, nor is it clear if you want to replace newlines with the string \n or if you just want to strip Carriage Returns from newlines or something else, but if its the latter then all you need is:
dos2unix file
If you don't have dos2unix you can do it with any awk:
$ printf 'foo\r\nbar\r\n' | cat -v
foo^M
bar^M
$ printf 'foo\r\nbar\r\n' | awk '{sub(/\r$/,"")}1' | cat -v
foo
bar
You can't do it robustly with tr since it can't tell when a \r is at the end of a line or not, and you can't do it portably with sed.
This might work for you (GNU sed):
sed '1h;1!H;$!d;x;s/\n/\\n/g' file
Slurp the file into memory and quote newlines.
I have several files, which begins like this :
unit,s_adj,partner,stk_flow,indic,geo\time;aaaa;2222;
time,s_adj,partner,stk_flow,lolo,geo\time;bbb;2222;
I want to replace the first occurence before the semi-colon with that new occurence YEAR
The desired output would be:
YEAR;aaaa;2222;
YEAR;bbb;2222;
I tried with the following command line but it does not seem to do what I want
awk -F ";" 'NR==1 {$1=""; print "year"}' input_file
Your suggestions are welcomed.
Best.
try this:
sed 's/[^;]*/YEAR/' file
if you only want the substitution happen on the 1st line:
sed '1s/[^;]*/YEAR/' file
You can also do:
awk '{$1="YEAR"}1' OFS=\; FS=\; input-file
How do you replace a blank line in a file with a certain character using sed?
I have used the following command but it still returns the original input:
sed 's/^$/>/' filename
Original input:
ACTCTATCATC
CTACTATCTATCC
CCATCATCTACTC
...
Desired output:
ACTCTATCATC
>
CTACTATCTATCC
>
CCATCATCTACTC
>
...
Thanks for any help
Here is a way with awk. This wouldn't care if you have spaces or blank lines:
awk '!NF{$0=">"}1' file
NF stands for number of fields. Since blank lines or lines with just spaces have no fields, we use that to insert your text. 1 triggers the condition to be true and prints the line:
Test:
$ cat -vet file
ACTCTATCATC$
$
CTACTATCTATCC$
$
CCATCATCTACTC$
$
$ are end of line markers
$ awk '!NF{$0=">"}1' file
ACTCTATCATC
>
CTACTATCTATCC
>
CCATCATCTACTC
>
You may have tabs or white spaces in your filename' empty lines, try the following:
sed 's/^\s*$/>/' filename
You may have whitespace in your input. First thing to try is:
sed 's/^[[:blank:]]*$/>/' filename
The following code should work:
sed -i 's/^[[:space:]]*$/string/' foo
What's missing here is the escape character. This will work for you.
sed 's/^$/\>/g' filename
And if you need to delete the empty lines and print others, Try
sed '/^$/d' filename
I have a CSV file in which every column contains unnecessary spaces(or tabs) after the actual value. I want to create a new CSV file removing all the spaces using bash.
For example
One line in input CSV file
abc def pqr ;valueXYZ ;value PQR ;value4
same line in output csv file should be
abc def pqr;valueXYZ;value PQR;value4
I tried using awk to trim each column but it didnt work. Can anyone please help me on this ?
Thanks in advance :)
I edited my test case, since the values here can contain spaces.
$ cat cvs_file | awk 'BEGIN{ FS=" *;"; OFS=";" } {$1=$1; print $0}'
Set the input field separator (FS) to the regex of zero or more spaces followed by a semicolon.
Set the output field separator (OFS) to a simple semicolon.
$1=$1 is necessary to refresh $0.
Print $0.
$ cat cvs_file
abc def pqr ;valueXYZ ;value PQR ;value4
$ cat cvs_file | awk 'BEGIN{ FS=" *;"; OFS=";" } {$1=$1; print $0}'
abc def pqr;valueXYZ;value PQR;value4
If the values themselves are always free of spaces, the canonical solution (in my view) would be to use tr:
$ tr -d '[:blank:]' < CSV_FILE > CSV_FILE_TRIMMED
This will replace multiple spaces with just one space:
sed -r 's/\s+/ /g'
If you know what your column data will end in, then this is a surefire way to do it:
sed 's|\(.*[a-zA-Z0-9]\) *|\1|g'
The character class would be where you put whatever your data will end in.
Otherwise, if you know more than one space is not going to come in your fields, then you could use what user1464130 gave you.
If this doesn't solve your problem, then get back to me.
I found one way to do what I wanted that is remove blank line and remove trailing newline of a file in an efficient way. I do this with :
grep -v -e '^[[:space:]]*$' foo.txt
from Remove blank lines with grep