Remove spaces from a single column using bash - bash

I was provided with a CSV file, that in a single column, uses spaces to denote a thousands separator (eg. 11 000 instead of 11,000 or 11000). The other columns have useful spaces within them, so I need to only fix this one column.
My data:
Date,Source,Amount
1/1/2013,Ben's Chili Bowl,11 000.90
I need to get:
Date,Source,Amount
1/1/2013,Ben's Chili Bowl,11000.90
I have been trying awk, sed, and cut, but I can't get it to work.

dirty and quick:
awk -F, -v OFS="," '{gsub(/ /,"",$NF)}1'
example:
kent$ echo "Date,Source,Amount
1/1/2013,Ben's Chili Bowl,11 000.90"|awk -F, -v OFS="," '{gsub(/ /,"",$NF)}1'
Date,Source,Amount
1/1/2013,Ben's Chili Bowl,11000.90

One possibility might be:
sed 's/\([0-9]\) \([0-9]\)/\1\2/'
This looks for two digits either side of a blank and keeps just the two digits. For the data shown, it would work fine. You can add a trailing g if you might have to deal with 11 234 567.89.
If you might have other columns with spaces between numbers, or not the first such column, you can use a similar technique/regex in awk with gsub() on the relevant field.

just with bash
$ echo "Date,Source,Amount
1/1/2013,Ben's Chili Bowl,11 000.90" |
while IFS=, read -r date source amount; do
echo "$date,$source,${amount// /}"
done
Date,Source,Amount
1/1/2013,Ben's Chili Bowl,11000.90

Related

Is there a way to treat a single column of integers as an array in order to extract certain digits?

I am trying to treat a series of integers as an array in order to extract the "columns" of interest.
My data after extracting a column of integers looks something like:
01010101010
10101010101
00100111100
10111100000
01011000100
If I'm only interested in the 1st, 4th, and 11th integers, I'd like the output to look like this:
010
101
000
110
010
This problem is hard to describe in words, so I'm sorry for the lack of clarity. I've tried a number of suggestions, but many things such as awk's substr() are unable to skip positions (such as the 1st, 4th, and 11th positions here).
You can use the cut command:
cut -c 1,4,11 file
-c selects only characters.
or using (gnu) awk:
awk '{print $1 $4 $11}' FS= file
FS is the field separator which is set to nothing in order capture every single character.
With GNU awk which can use empty string as field separator, you could do:
awk -F '' '{print $1, $4, $11}' OFS='' infile
Could you please try following awk too.
awk '{print substr($0,1,1) substr($0,4,1) substr($0,11,1)}' Input_file

Change three or more empty lines into two using bash, sed or awk

Lets say that we have string containing words and multiple empty lines. For instance
"1\n2\n\n3\n\n\n4\n\n\n\n2\n\n3\n\n\n1\n"
I would like to "shrink" three or more empty lines into two using bash, sed or awk to obtain string
"1\n2\n\n3\n\n4\n\n2\n\n3\n\n1\n"
Has anybody an idea?
with awk
$ awk -v RS= -v ORS='\n\n' 1 file
If perl is acceptable,
perl -00 -lpe1
ought to do it. It reads and outputs whole paragraphs, which has the side effect of normalizing 2+ newlines to just \n\n.
If the data isn't too voluminous and you have GNU sed, use sed -z to make it work on a single null-terminated record rather than one \n-terminated record per line :
sed -z 's/\n\n\n\n*/\n\n/g'
Or with extended regexs :
sed -zr 's/\n{3,}/\n\n/g'

Shell Script Replace a Specified Column with sed

I have a example dataset separated by semicolon as below;
123;IZMIR;ZMIR;123
abc;ANKAR;aaa;999
AAA;ZMIR;ZMIR;bob
BBB;ANKR;RRRR;ABC
I would like to replace values in a specified column. Lets say I want to change "ZMIR" AS "IZMIR" but only for the third column, the ones on the second column must stay the same.
Desired output is;
123;IZMIR;IZMIR;123
abc;ANKAR;aaa;999
AAA;ZMIR;IZMIR;bob
BBB;ANKR;RRRR;ABC
I tried;
sed 's/;ZMIR;/;IZMIR;/' file.txt
the problem is that it changes all the values on the file not just the 3rd one.
I also tried;
awk -F";" '{gsub("ZMIR",";IZMIR;",$2)}1'
and here it specifies the column but, it somehow adds spaces;
123 I;IZMIR; ZMIR 123
abc;ANKAR;aaa;999
AAA ;IZMIR; ZMIR bob
BBB;ANKR;RRRR;ABC
sed doesn't know about columns, awk does (but in awk they're called "fields"):
awk 'BEGIN{FS=OFS=";"} $3=="ZMIR"{$3="IZMIR"} 1' file
Note that since the above is doing a literal string search and replace, you don't have to worry about regexp or backreference metacharacters in the search or replacement strings, unlike in a sed solution (see https://stackoverflow.com/a/29626460/1745001).
wrt what you tried previously with awk:
awk -F";" '{gsub("ZMIR",";IZMIR;",$2)}1'
That says: find "ZMIR" in the 2nd semi-colon-separated field and replace it with ";IZMIR;" and also change every existing ";" on the line to a blank character.
To learn awk, read the book Effective Awk Programming, 4th Edition, by Arnold Robbins.
If you exactly know where the word to replace is located and how many of them are in that line you could use sed with something like:
sed '3 s/ZMIR/IZMIR/2'
With the 3 in the beginning you are selecting the third line and with the 2 in the end the second occurrence. However the awk solution is a better one. But just that you know how it works in sed ;)
This might work for you (GNU sed):
sed -r 's/[^;]+/\n&\n/3;s/\nZMIR\n/IZMIR/;s/\n//g' file
Surround the required field by unique markers then replace the required string (plus markers) by the replacement string. Finally remove the unique markers.
Perl on Command Line
Input
123;IZMIR;ZMIR;123
000;ANKAR;aaa;999
AAA;ZMIR;ZMIR;bob
BBB;ANKR;RRRR;ABC
$. == 1 means first row it does the work only for this row So second row $. == 2
$F[0] means first column and it only does on this column So fourth column $F[3]
-a -F\; means that delimiter is ;
what you want
perl -a -F\; -pe 's/$F[0]/***/ if $. == 1' your-file
output
***;IZMIR;ZMIR;123
abc;ANKAR;aaa;999
AAA;ZMIR;ZMIR;bob
BBB;ANKR;RRRR;ABC
for row == 2 and column == 2
perl -a -F\; -pe 's/$F[1]/***/ if $. == 2' your-file
123;IZMIR;ZMIR;123
abc;***;aaa;999
AAA;ZMIR;ZMIR;bob
BBB;ANKR;RRRR;ABC
Also without -a -F
perl -pe 's/123/***/ if $. == 1' your-file
output
***;IZMIR;ZMIR;123
abc;ANKAR;aaa;999
AAA;ZMIR;ZMIR;bob
BBB;ANKR;RRRR;ABC
If you want to edit you can add -i option that means Edit in-place And that's it, it simply find, replace and save in the same file
perl -i -a -F\; and so on
You need to include some absolute references in the line:
^ for beginning of the line
unequivocal separation pattern
^.*ZMIR and [^;]*;ZMIR give different values where first take everything before ZMIR and sed take the longest possible
Specific
sed 's/^\([^;]*;[^;]*;\)ZMIR;/\1IZMIR;/' YourFile
generic where Old and New are batch variable (Remember, this is regex value so regex rules to apply like escaping some char)
#Old='ZMIR'
#New='IZMIR'
sed 's/^\(\([^;]*;\)\{2\}\)'${Old}';/\1'${New}';/' YourFile
In this simple case sed is an alternative, but awk is better for a complex or long line.

add character to particular field using awk

How to add '0' to the first and 9th digit of 2nd field using awk?
data
12345,20150303024955
output
12345,0201503030024955
I am just new to shell script.
Assuming by "add to" you mean "prefix with":
$ echo '12345,20150303024955' |
awk 'BEGIN{FS=OFS=","} {sub(/.{8}/,"&0",$2); $2="0"$2}1'
12345,0201503030024955
You asked for awk but this is also easy to do in sed:
$ echo '12345,20150303024955' | sed -r 's/,(.{8})/,0\10/'
12345,0201503030024955
How it works
-r
Turn on extended regex so that we don't need backslash escapes.
s/,(.{8})/,0\10/
Look for a comma followed by eight characters. Replace that with a comma, a zero, those eight characters, and another zero.

Unix cut: Print same Field twice

Say I have file - a.csv
ram,33,professional,doc
shaym,23,salaried,eng
Now I need this output (pls dont ask me why)
ram,doc,doc,
shayam,eng,eng,
I am using cut command
cut -d',' -f1,4,4 a.csv
But the output remains
ram,doc
shyam,eng
That means cut can only print a Field just one time. I need to print the same field twice or n times.
Why do I need this ? (Optional to read)
Ah. It's a long story. I have a file like this
#,#,-,-
#,#,#,#,#,#,#,-
#,#,#,-
I have to covert this to
#,#,-,-,-,-,-
#,#,#,#,#,#,#,-
#,#,#,-,-,-,-
Here each '#' and '-' refers to different numerical data. Thanks.
You can't print the same field twice. cut prints a selection of fields (or characters or bytes) in order. See Combining 2 different cut outputs in a single command? and Reorder fields/characters with cut command for some very similar requests.
The right tool to use here is awk, if your CSV doesn't have quotes around fields.
awk -F , -v OFS=, '{print $1, $4, $4}'
If you don't want to use awk (why? what strange system has cut and sed but no awk?), you can use sed (still assuming that your CSV doesn't have quotes around fields). Match the first four comma-separated fields and select the ones you want in the order you want.
sed -e 's/^\([^,]*\),\([^,]*\),\([^,]*\),\([^,]*\)/\1,\4,\4/'
$ sed 's/,.*,/,/; s/\(,.*\)/\1\1,/' a.csv
ram,doc,doc,
shaym,eng,eng,
What this does:
Replace everything between the first and last comma with just a comma
Repeat the last ",something" part and tack on a comma. VoilĂ !
Assumptions made:
You want the first field, then twice the last field
No escaped commas within the first and last fields
Why do you need exactly this output? :-)
using perl:
perl -F, -ane 'chomp($F[3]);$a=$F[0].",".$F[3].",".$F[3];print $a."\n"' your_file
using sed:
sed 's/\([^,]*\),.*,\(.*\)/\1,\2,\2/g' your_file
As others have noted, cut doesn't support field repetition.
You can combine cut and sed, for example if the repeated element is at the end:
< a.csv cut -d, -f1,4 | sed 's/,[^,]*$/&&,/'
Output:
ram,doc,doc,
shaym,eng,eng,
Edit
To make the repetition variable, you could do something like this (assuming you have coreutils available):
n=10
rep=$(seq $n | sed 's:.*:\&:' | tr -d '\n')
< a.csv cut -d, -f1,4 | sed 's/,[^,]*$/'"$rep"',/'
Output:
ram,doc,doc,doc,doc,doc,doc,doc,doc,doc,doc,
shaym,eng,eng,eng,eng,eng,eng,eng,eng,eng,eng,
I had the same problem, but instead of adding all the columns to awk, I just used (to duplicate the 2nd column):
awk -v OFS='\t' '$2=$2"\t"$2' # for tab-delimited files
For CSVs you can just use
awk -F , -v OFS=, '$2=$2","$2'

Resources