I have scenario where we want to replace multiple double quotes to single quotes between the data, but as the input data is separated with "comma" delimiter and all column data is enclosed with double quotes "" got an issue and the same explained below:
The sample data looks like this:
"int","","123","abd"""sf123","top"
So, the output would be:
"int","","123","abd"sf123","top"
tried below approach to get the resolution, but only first occurrence is working, not sure what is the issue??
sed -ie 's/,"",/,"NULL",/g;s/""/"/g;s/,"NULL",/,"",/g' inputfile.txt
replacing all ---> from ,"", to ,"NULL",
replacing all multiple occurrences of ---> from """ or "" or """" to " (single occurrence)
replacing 1 step changes back to original ---> from ,"NULL", to ,"",
But, only first occurrence is getting changed and remaining looks same as below:
If input is :
"int","","","123","abd"""sf123","top"
the output is coming as:
"int","","NULL","123","abd"sf123","top"
But, the output should be:
"int","","","123","abd"sf123","top"
You may try this perl with a lookahead:
perl -pe 's/("")+(?=")//g' file
"int","","123","abd"sf123","top"
"int","","","123","abd"sf123","top"
"123"abcs"
Where input is:
cat file
"int","","123","abd"""sf123","top"
"int","","","123","abd"""sf123","top"
"123"""""abcs"
Breakup:
("")+: Match 1+ pairs of double quotes
(?="): If those pairs are followed by a single "
Using sed
$ sed -E 's/(,"",)?"+(",)?/\1"\2/g' input_file
"int","","123","abd"sf123","top"
"int","","NULL","123","abd"sf123","top"
"int","","","123","abd"sf123","top"
In awk with your shown samples please try following awk code. Written and tested in GNU awk, should work in any version of awk.
awk '
BEGIN{ FS=OFS="," }
{
for(i=1;i<=NF;i++){
if($i!~/^""$/){
gsub(/"+/,"\"",$i)
}
}
}
1
' Input_file
Explanation: Simple explanation would be, setting field separator and output field separator as , for all the lines of Input_file. Then traversing through each field of line, if a field is NOT NULL then Globally replacing all 1 or more occurrences of " with single occurrence of ". Then printing the line.
With sed you could repeat 1 or more times sets of "" using a group followed by matching a single "
Then in the replacement use a single "
sed -E 's/("")+"/"/g' file
For this content
$ cat file
"int","","123","abd"""sf123","top"
"int","","","123","abd"""sf123","top"
"123"""""abcs"
The output is
"int","","123","abd"sf123","top"
"int","","","123","abd"sf123","top"
"123"abcs"
sed s'#"""#"#' file
That works. I will demonstrate another method though, which you may also find useful in other situations.
#!/bin/sh -x
cat > ed1 <<EOF
3s/"""/"/
wq
EOF
cp file stack
cat stack | tr ',' '\n' > f2
ed -s f2 < ed1
cat f2 | tr '\n' ',' > stack
rm -v ./f2
rm -v ./ed1
The point of this is that if you have a big csv record all on one line, and you want to edit a specific field, then if you know the field number, you can convert all the commas to carriage returns, and use the field number as a line number to either substitute, append after it, or insert before it with Ed; and then re-convert back to csv.
I need to delete or replace the third ":" (colon) with a space. I can't do it at a certain index because the entries differ in length.
u:Testuser:rw:/home/user1/temp
g:Testgroup:-:/home/user2/temp
Result should look like this:
u:Testuser:rw /home/user1/temp
g:Testgroup:- /home/user2/temp
Is there a way to 1) delete a specific character and 2) to insert a character before/after a specific character?
I couldnĀ“t find a solution, I am a beginner unfortunately.
Thanks for the answer, I did it myself
g:Testgroup:-:/home/user2/temp | sed s/':'/' '/3
A dirty solution:
$ cat 54042857.txt
u:Testuser:rw:/home/user1/temp
g:Testgroup:-:/home/user2/temp
$ awk -F ':' ' { print $1":"$2":"$3" "$4 } ' 54042857.txt
u:Testuser:rw /home/user1/temp
g:Testgroup:- /home/user2/temp
Using parameter expansion:
$ foo='u:Testuser:rw:/home/user1/temp'
$ printf '%s\n' "${foo%":${foo#*:*:*:}"} ${foo#*:*:*:}"
u:Testuser:rw /home/user1/temp
I have small task.
I should write:
data="duke,rock,hulk,donovan,john"
And in the next variable, i should change delimiter of first variable.
data2="duke|rock|hulk|donovan|john"
What is the correct way to do this on bash ?
This is a small part of script, what i should do.
For example, i use construction "WHILE-GETOPS-CASE" to use usernames in parameter for excluding them.
ls /home/ | egrep -v $data2
You can easily replace a single character with an expansion:
data="duke,rock,hulk,donovan,john"
data2=${data//,/|}
echo "$data2"
Breaking down the syntax:
${data means "expand based on the value found in variable data;
// means "search all occurences of";
The lone / means "replace with what follows".
Note that some characters may need to be escaped, but not the comma and vertical bar.
Then you may filter the results like this:
ls /home/ | egrep -v "$data2"
Another very similar way would be to use tr (translate or delete characters):
data="duke,rock,hulk,donovan,john"
data2=$(echo $data | tr ',' '|')
echo "$data2"
I'm using jq to read some data from a JSON file.
after=`cat somefile.json | jq '.after[]'`
returns something like this:
"some value" "another value" "something else"
Basically a list of quoted strings. I now need to convert these strings into one string formatted like
"some value; another value; something else;"
I've tried a lot of combinations of for loops to try and get this working and nothing quite works.
Anyone know how this can be done? Cheers!
use sed:
sed -e 's/" /; /g; s/ "/ /g; s/"$/;"/' <<< '"some value" "another value" "something else"'
OUTPUT:
"some value; another value; something else;"
use sed s command for replacing the desire value
Thanks all! I actually decided to dig deeper into the jq docs to see if I could simply leverage it to do what I want.
after=`cat somefile.json | jq -c -r '.after[] + "; "'` | tr -d '\n'
This ended up working very well. Thanks for the sed version though! Always good to see another working solution.
Assuming .after[] returns the list of strings you describe, you can do this entirely with jq using join to format them as follows:
[ .after[] ] | join("; ") + ";"
I am trying to initialize an array from a string split using awk.
I am expecting the tokens be delimited by ",", but somehow they don't.
The input is a string returned by curl from the address http://www.omdbapi.com/?i=&t=the+campaign
I've tried to remove any extra carriage return or things that could cause confusion, but in all clients I have checked it looks to be a single line string.
{"Title":"The Campaign","Year":"2012","Rated":"R", ...
and this is the ouput
-metadata {"Title":"The **-metadata** Campaign","Year":"2012","Rated":"R","....
It should have been
-metadata {"Title":"The Campaign"
Here's my piece of code:
__tokens=($(echo $omd_response | awk -F ',' '{print}'))
for i in "${__tokens[#]}"
do
echo "-metadata" $i"
done
Any help is welcome
I would take seriously the comment by #cbuckley: Use a json-aware tool rather than trying to parse the line with simple string tools. Otherwise, your script will break if a quoted-string has an comma inside, for example.
At any event, you don't need awk for this exercise, and it isn't helping you because the way awk breaks the string up is only of interest to awk. Once the string is printed to stdout, it is still the same string as always. If you want the shell to use , as a field delimiter, you have to tell the shell to do so.
Here's one way to do it:
(
OLDIFS=$IFS
IFS=,
tokens=($omd_response)
IFS=$OLDIFS
for token in "${tokens[#]}"; do
# something with token
done
)
The ( and ) are just to execute all that in a subshell, making the shell variables temporaries. You can do it without.
First, please accept my apologies: I don't have a recent bash at hand so I can't try the code below (no arrays!)
But it should work, or if not you should be able to tweak it to work (or ask underneath, providing a little context on what you see, and I'll help fix it)
nb_fields=$(echo "${omd_response}" | tr ',' '\n' | wc -l | awk '{ print $1 }')
#The nb_fields will be correct UNLESS ${omd_response} contains a trailing "\",
#in which case it would be 1 too big, and below would create an empty
# __tokens[last_one], giving an extra `-metadata ""`. easily corrected if it happens.
#the code below assume there is at least 1 field... You should maybe check that.
#1) we create the __tokens[] array
for field in $( seq 1 $nb_fields )
do
#optionnal: if field is 1 or $nb_fields, add processing to get rid of the { or } ?
${__tokens[$field]}=$(echo "${omd_response}" | cut -d ',' -f ${field})
done
#2) we use it to output what we want
for i in $( seq 1 $nb_fields )
do
printf '-metadata "%s" ' "${__tokens[$i]}"
#will output all on 1 line.
#You could add a \n just before the last ' so it goes each on different lines
done
so I loop on field numbers, instead of on what could be some space-or-tab separated values