How to add double quote in csv file where field contains space? - bash

One feature of legacy code doesn't work and I have to make a work around by redevelopping a quick and dirty feature.
We are generating csv file and I had something like that with legacy code :
foo; bar;"foo bar";foobar
"bla ble"; bli;blo;"blu bly"
Each field in my csv containing a space must be surrounded by a double quote "
Currently, with my quick and dirty script, my csv file got only
foo; bar;foo bar;foobar
bla ble; bli;blo;blu bly
This is not good because clients will have a breaking change with my quick and dirty script :D
I am developping a script using shell /bin/bash, I've search arround sed or awk but wasn't able to find something to help me.
Will you ? :)
Thanks !

Here is a simple awk:
$ awk 'BEGIN{FS=OFS=";"}{for(i=1;i<=NF;++i) if ($i ~ / /) $i = "\042" $i "\042"}1' file.csv

To quote fields that contain spaces (for example foo;foo bar -> foo;"foo bar") you can use sed:
sed 's/ *\(\w\+ \)\+\w\+/"&"/g' input.csv > output.csv
The pattern *\(\w\+ \+\)\+\w\+ matches zero or more spaces, followed by a group with a word and one or more spaces \(\w\+ \+\), then one or more occurrences of the group \+, followed by a word \w\+. The replacement "&" quotes the matched pattern.

Using Miller (https://github.com/johnkerl/miller) and running
mlr --icsvlite --ocsv --quote-all --fs ";" cat input
you will have
"foo";"bar";"foo bar";"foobar"
"bla ble";"bli";"blo";"blu bly"
I think it's no problem for you to have double quotes for all

echo "foo; bar;foo bar;foobar" | sed s'#;#+#'g | tr '+' '\n' | \
sed s'#^#\"#'g | sed s'#$#\";#'g | tr -d '\n'
The first thing this code does, is replace the colon delimiters with a placeholder, that can then be replaced with newlines.
From there, it's simple. I first replace the start of every new line with double quotes, and then the end with closing double quotes and a colon.
After that, I use tr to remove the newlines again, which puts all of the colon delimited fields back on the same line.

Related

Append text to top of file using sed doesn't work for variable whose content has "/" [duplicate]

This question already has answers here:
Using different delimiters in sed commands and range addresses
(3 answers)
Closed 1 year ago.
I have a Visual Studio project, which is developed locally. Code files have to be deployed to a remote server. The only problem is the URLs they contain, which are hard-coded.
The project contains URLs such as ?page=one. For the link to be valid on the server, it must be /page/one .
I've decided to replace all URLs in my code files with sed before deployment, but I'm stuck on slashes.
I know this is not a pretty solution, but it's simple and would save me a lot of time. The total number of strings I have to replace is fewer than 10. A total number of files which have to be checked is ~30.
An example describing my situation is below:
The command I'm using:
sed -f replace.txt < a.txt > b.txt
replace.txt which contains all the strings:
s/?page=one&/pageone/g
s/?page=two&/pagetwo/g
s/?page=three&/pagethree/g
a.txt:
?page=one&
?page=two&
?page=three&
Content of b.txt after I run my sed command:
pageone
pagetwo
pagethree
What I want b.txt to contain:
/page/one
/page/two
/page/three
The easiest way would be to use a different delimiter in your search/replace lines, e.g.:
s:?page=one&:pageone:g
You can use any character as a delimiter that's not part of either string. Or, you could escape it with a backslash:
s/\//foo/
Which would replace / with foo. You'd want to use the escaped backslash in cases where you don't know what characters might occur in the replacement strings (if they are shell variables, for example).
The s command can use any character as a delimiter; whatever character comes after the s is used. I was brought up to use a #. Like so:
s#?page=one&#/page/one#g
A very useful but lesser-known fact about sed is that the familiar s/foo/bar/ command can use any punctuation, not only slashes. A common alternative is s#foo#bar#, from which it becomes obvious how to solve your problem.
add \ before special characters:
s/\?page=one&/page\/one\//g
etc.
In a system I am developing, the string to be replaced by sed is input text from a user which is stored in a variable and passed to sed.
As noted earlier on this post, if the string contained within the sed command block contains the actual delimiter used by sed - then sed terminates on syntax error. Consider the following example:
This works:
$ VALUE=12345
$ echo "MyVar=%DEF_VALUE%" | sed -e s/%DEF_VALUE%/${VALUE}/g
MyVar=12345
This breaks:
$ VALUE=12345/6
$ echo "MyVar=%DEF_VALUE%" | sed -e s/%DEF_VALUE%/${VALUE}/g
sed: -e expression #1, char 21: unknown option to `s'
Replacing the default delimiter is not a robust solution in my case as I did not want to limit the user from entering specific characters used by sed as the delimiter (e.g. "/").
However, escaping any occurrences of the delimiter in the input string would solve the problem.
Consider the below solution of systematically escaping the delimiter character in the input string before having it parsed by sed.
Such escaping can be implemented as a replacement using sed itself, this replacement is safe even if the input string contains the delimiter - this is since the input string is not part of the sed command block:
$ VALUE=$(echo ${VALUE} | sed -e "s#/#\\\/#g")
$ echo "MyVar=%DEF_VALUE%" | sed -e s/%DEF_VALUE%/${VALUE}/g
MyVar=12345/6
I have converted this to a function to be used by various scripts:
escapeForwardSlashes() {
# Validate parameters
if [ -z "$1" ]
then
echo -e "Error - no parameter specified!"
return 1
fi
# Perform replacement
echo ${1} | sed -e "s#/#\\\/#g"
return 0
}
this line should work for your 3 examples:
sed -r 's#\?(page)=([^&]*)&#/\1/\2#g' a.txt
I used -r to save some escaping .
the line should be generic for your one, two three case. you don't have to do the sub 3 times
test with your example (a.txt):
kent$ echo "?page=one&
?page=two&
?page=three&"|sed -r 's#\?(page)=([^&]*)&#/\1/\2#g'
/page/one
/page/two
/page/three
replace.txt should be
s/?page=/\/page\//g
s/&//g
please see this article
http://netjunky.net/sed-replace-path-with-slash-separators/
Just using | instead of /
Great answer from Anonymous. \ solved my problem when I tried to escape quotes in HTML strings.
So if you use sed to return some HTML templates (on a server), use double backslash instead of single:
var htmlTemplate = "<div style=\\"color:green;\\"></div>";
A simplier alternative is using AWK as on this answer:
awk '$0="prefix"$0' file > new_file
You may use an alternative regex delimiter as a search pattern by backs lashing it:
sed '\,{some_path},d'
For the s command:
sed 's,{some_path},{other_path},'

Replace spaces between two strings with symbol using sed

I have string like this:
20.07.2010|Berlin|id 100|bd-22.10.94|Marry Scott Robinson|msc#gmail.com
I need to replace whitespaces only between "Marry Scott Robinson" with "|". So to have bd-22.10.94|Marry|Scott|Robinson|
There many of such rows, so problem is in replace whitespace only between "bd-" and vertical line after name.
I'll assume that the name is always on the fifth column :
awk 'BEGIN{FS=OFS="|"}{gsub(/ /,OFS,$5)}1' file
If it is not the case, you can do :
awk 'BEGIN{FS=OFS="|"}{for(i=1;i<=NF;i++){if($i ~ /bd-/){break}};gsub(/ /,OFS,$(i+1))}1' file
Returns :
20.07.2010|Berlin|id 100|bd-22.10.94|Marry|Scott|Robinson|msc#gmail.com
Perl to the rescue!
perl -lne '($before, $change, $after) = /(.*\|bd-.*?\|)(.*?)(\|.*)/;
print $before, $change =~ s/ /|/gr, $after' -- file
-n reads the input line by line, running the code for each line
-l removes newlines from input and adds them to output
the first line populates three variables by values captured from the line. $before contains verything up to the first | after bd-; $change contains what follows up to the next |, and $after contains the rest.
s/ /|/gr replaces spaces by pipes (/g for "all of them") and returns (/r) the result.
This might work for you (GNU sed):
sed 's/[^|]*/\n&\n/5;:a;s/\(\n[^\n ]*\) /\1\|/;ta;s/\n//g' file
Sometimes to fix a problem we must erect scaffolding, then fix the original problem and finally remove the scaffolding.
Here we need to isolate the field by surrounding it by newlines.
Remove the spaces between the newlines by looping until failure.
Finally, remove the scaffolding i.e. the introduced newlines.
Another perl version:
$ perl -F'\|' -ne '$F[4] =~ tr/ /|/; print join("|", #F)' foo.txt
20.07.2010|Berlin|id 100|bd-22.10.94|Marry|Scott|Robinson|msc#gmail.com
Same basic idea as Corentin's first awk example. Split each line into columns based on |, replace spaces in the 5th one with |'s, print the re-joined lines.

ignore spaces within/around brackets to count occurrences

(to LaTeX users) I want to search for manually labeled items
(to whom it may concern) script file on GitHub
I tried to find solution, but what I've found suggested to remove spaces first. In my case, I think there should be simpler solution. It could be using grep or awk or some other tool.
Consider the following lines:
\item[a)] some text
\item [i) ] any text
\item[ i)] foo and faa
\item [ 1) ] foo again
I want to find (or count) if there are items with a single ) inside brackets. The format could have blank spaces inside the brackets and/or around it. Also, the char before the closing parentheses could be any letter or number.
Edit: I tried grep "\[a)\]" but it missed [ a) ].
Since there are many possible ways to write an item, I can not decide about a possible pattern. I think that it is enough for me such as
\item<blank spaces>[<blank spaces><letter or number>)<blank spaces>]
Replace blank space could not work because the patter above in general contains text around it (for example: \item[ a)] consider the function...)
The output should indicate is there are such patterns or not. It could be zero or the number of occurrences.
So to do it all in the grep itself:
grep -c -E '\\item\s*\[\s*\w+\)\s*\]' file.txt
Note all the \s* checks for spaces. Also -c to get the count.
Breaking it down:
\\ a backslash (needs escape in grep)
item "item"
\s* optional whitespaces
\[ "[" (needs escape in -E)
\s* optional whitespaces
\w+ at least one 'word' char
\) ")" (needs escape in -E)
\s* optional whitespaces
\] "]" (needs escape in -E)
Following awk may also help here(I am simply removing the spaces between [ to ] and then looking for pattern of either digit or character in it.
awk '
match($0,/\[.*\]/){
val=substr($0,RSTART+1,RLENGTH-1);
gsub(/[[:space:]]+/,"",val);
if(val ~ /[a-z0-9]+\)/){ count++ }
}
END{
print count
}' Input_file
So I am thinking something like this:
tr -d " \t" < file.txt | grep -c '\\item\[[0-9A-Za-z])\]'
This will count the number of matches for you.
Edit: Added \t to tr call. Now removes all spaces and tabs.
Here is a grep only version. This could be useful for printing out all of the matches (by removing -c) as well since the above version modifies the input:
grep -c '\\item *\[ *[0-9A-Za-z]) *\]' file.txt
Here is a more versatile answer if this is what you looking for. Here, we output the matches to a file and count the lines from the file to get the number of matches...
grep '\\item *\[ *[0-9A-Za-z]) *\]' file.txt > matches.txt
wc -l < matches.txt

Explained shell statement

The following statement will remove line numbers in a txt file:
cat withLineNumbers.txt | sed 's/^.......//' >> withoutLineNumbers.txt
The input file is created with the following statement (this one i understand):
nl -ba input.txt >> withLineNumbers.txt
I know the functionality of cat and i know the output is written to the 'withoutLineNumbers.txt' file. But the part of '| sed 's/^.......//'' is not really clear to me.
Thanks for your time.
That sed regular expression simply removes the first 7 characters from each line. The regular expression ^....... says "Any 7 characters at the beginning of the line." The sed argument s/^.......// substitutes the above regular expression with an empty string.
Refer to the sed(1) man page for more information.
that sed statement says the delete the first 7 characters. a dot "." means any character. There is an even easier way to do this
awk '{print $2}' withLineNumbers.txt
you just have to print out the 2nd column using awk. No need to use regex
if your data has spaces,
awk '{$1="";print substr($0,2)}' withLineNumbers.txt
sed is doing a search and replace. The 's' means search, the next character ('/') is the seperator, the search expression is '^.......', and the replace expression is an empty string (i.e. everything between the last two slashes).
The search is a regular expression. The '^' means match start of line. Each '.' means match any character. So the search expression matches the first 7 characters of each line. This is then replaced with an empty string. So what sed is doing is removing the first 7 characters of each line.
A more simple way to achieve the same think could be:
cut -b8- withLineNumbers.txt > withoutLineNumbers.txt

bash cat multiple files content in to single string without newlines

i got some files with name start as eg_. and only each contains one single line
eg_01.txt:
#china:129.00
eg_02.txt
#uk:219.98
eg_03.txt
#USA:341.90
......
i am expecting to cat them in to a single line to send by URL like:
#china:129.00#uk:219.98#USA:341.90
i use
echo cat eg_*
it give me the output look like a string, but it actually contains new line:
"#china:129.00
#uk:219.98
#USA:341.90"
is there any other way i can construct that string which expected and get rid of new line and even the space? is only cat enough to do this?
thanks in advance
You could always pipe it to tr
tr "\n" " "
That removes all newlines on stdin and replaces them with spaces
EDIT: as suggested by Bart Sas, you could also remove newlines with tr -d
tr -d "\n"
(note: just specifying an empty string to tr for the second argument won't do)
Using only one command
url=$(awk '{printf "%s",$0}' eg*)
In Perl, you'd do it like this:
perl -pe'chomp' eg*.txt
The -p says "loop through the input file and do whatever code is specified by the -e switch. The chomp in Perl says "Remove any trailing newlines."

Resources