Adding quotes to variating characters in bash - bash

I am trying to use the sed function in order to add double quotes for anything in between a matched pattern and a comma to break of the pattern. At the moment I am extracting the following data from cloudflare and I am trying to modify it to line protocol;
count=24043,clientIP=x.x.x.x,clientRequestPath=/abc/abs/abc.php
count=3935,clientIP=y.y.y.y,clientRequestPath=/abc/abc/abc/abc.html
count=3698,clientIP=z.z.z.z,clientRequestPath=/abc/abc/abc/abc.html
I have already converted to this format from JSON output with a bunch of sed functions to modify it, however, I am unable to get to the bottom of it to put the data for clientIP and clientRequestPath in inverted commas.
My expected output has to be;
count=24043,clientIP="x.x.x.x",clientRequestPath="/abc/abs/abc.php"
count=3935,clientIP="y.y.y.y",clientRequestPath="/abc/abc/abc/abc.html"
count=3698,clientIP="z.z.z.z",clientRequestPath="/abc/abc/abc/abc.html"
This data will be imported into InfluxDB, count will be a float whilst clientIP and clientRequestPath will be strings, hence why I need them to be in inverted commas as at the moment I am getting errors since they arent as they should be.
Is anyone available to provided to adequate 'sed' function to do is?

This might work for you (GNU sed):
sed -E 's/=([^0-9][^,]*)/="\1"/g' file
Enclose any string following a = does not begin with a integer upto a , in double quotes, globally.

here is a solution using a SED script to allow for multiple operations on a source file.
assuming your source data is in a file "from.dat"
create a sed script to run multiple commands
cat script.sed
s/clientIP=/clientIP=\"/
s/,clientRequestPath/\",clientRequestPath/
execute multiple-command sed script on data file redirecting the output file "to.dat"
sed -f script.sed from.dat > to.dat
cat to.dat (only showing one line)
count=24043,clientIP="x.x.x.x",clientRequestPath=/abc/abs/abc.php

Related

Linux sed command that generates a new file on every regex match

I have the following Linux command which I am using to extract data from one very large log file.
sed -n "/<trade>/,/<\/trade>/p" Large.log > output.xml
However, the output is generated in a single file output.xml. My intention is to create a new file every time the "/<trade>/,/<\/trade>/p" is matched. Every new file will be named after the <id> tag which is inside the <trade> </trade> tags.
Something likes this...
sed -n "/<trade>/,/<\/trade>/p" Large.log > "/<id>/,/<\/id>/p".xml
However, that, of course, does not work and I am not sure how to apply a regex as a naming rule.
P.S At this point, I am also not sure if I should use sed or maybe I should try achieving this with awk

Converting a TXT file with double quotes to a pipe-delimited format using sed

I'm trying to convert TXT files into pipe-delimited text files.
Let's say I have a file called sample.csv:
aaa",bbb"ccc,"ddd,eee",fff,"ggg,hhh,iii","jjj kkk","lll"" mmm","nnn"ooo,ppp"qqq",rrr" sss,"ttt,""uuu",Z
I'd like to convert this into an output that looks like this:
aaa"|bbb"ccc|ddd,eee|fff|ggg,hhh,iii|jjj kkk|lll" mmm|"nnn"ooo|ppp"qqq"|rrr" sss|ttt,"uuu|Z
Now after tons of searching, I have come the closest using this sed command:
sed -r 's/""/\v/g;s/("([^"]+)")?,/\2\|/g;s/"([^"]+)"$/\1/;s/\v/"/g'
However, the output that I received was:
aaa"|bbb"ccc|ddd,eee|fff|ggg,hhh,iii|jjj kkk|lll" mmm|"nnn"ooo|pppqqq|rrr" sss|ttt,"uuu|Z
Where the expected for the 9th column should have been ppp"qqq" but the result removed the double quotes and what I got was pppqqq.
I have been playing around with this for a while, but to no avail.
Any help regarding this would be highly appreciated.
As suggested in comments sed or any other Unix tool is not recommended for this kind of complex CSV string. It is much better to use a dedicated CSV parser like this in PHP:
$s = 'aaa",bbb"ccc,"ddd,eee",fff,"ggg,hhh,iii","jjj kkk","lll"" mmm","nnn"ooo,ppp"qqq",rrr" sss,"ttt,""uuu",Z';
echo implode('|', str_getcsv($s));
aaa"|bbb"ccc|ddd,eee|fff|ggg,hhh,iii|jjj kkk|lll" mmm|nnnooo|ppp"qqq"|rrr" sss|ttt,"uuu|Z
The problem with sample.csv is that it mixes non-quoted fields (containing quotes) with fully quoted fields (that should be treated as such).
You can't have both at the same time. Either all fields are (treated as) unquoted and quotes are preserved, or all fields containing a quote (or separator) are fully quoted and the quotes inside are escaped with another quote.
So, sample.csv should become:
"aaa""","bbb""ccc","ddd,eee",fff,"ggg,hhh,iii","jjj kkk","lll"" mmm","""nnn""ooo","ppp""qqq""","rrr"" sss","ttt,""uuu",Z
to give you the desired result (using a csv parser):
aaa"|bbb"ccc|ddd,eee|fff|ggg,hhh,iii|jjj kkk|lll" mmm|"nnn"ooo|ppp"qqq"|rrr" sss|ttt,"uuu|Z
Have the same problem.
I found right result with https://www.papaparse.com/demo
Here is a FOSS on github. So maybe you can check how it works.
With the source of [ "aaa""","bbb""ccc","ddd,eee",fff,"ggg,hhh,iii","jjj kkk","lll"" mmm","""nnn""ooo","ppp""qqq""","rrr"" sss","ttt,""uuu",Z ]
The result appears in the browser console:
[1]: https://i.stack.imgur.com/OB5OM.png

Delete a string in a file using bash script

We have a file which has unformatted xml content in a single line
<sample:text>Report</sample:text><sample:user name="11111111" guid="163g673"/><sample:user name="22222222" guid="aknen1763y82bjkj18"/><sample:user name="33333333" guid="q3k4nn5k2nk53n6"/><sample:user name="44444444" guid="34bkj3b5kjbkq"/><sample:user name="55555555" guid="k4n5k34nlk6n711kjnk5253"/><sample:user name="66666666" guid="1n4k14nknl1n4lb1"/>
If we find a particular string suppose "22222222", i want to remove the entire string that surrounds the matched string. In our case the entire portion around 22222222 i.e., <sample:user name="22222222" guid="aknen1763y82bjkj18"/> should be removed and the file has to be saved.
How can we do it? Please help
You can do it using sed utility by invoking it like this:
sed -i file -e 's/<[^<]*"22222222"[^>]*>//'

How can I replace a word at a specific line in a file in unix

I've researched other questions on here, but haven't really found one that works for me. I'm trying to select a specific line from a file and replace a string on that line with another string. So I have a file named my_course. I'm trying to modify a line in my_course that starts with "123". on that line I want to replace the string "0," with "1,". Help?
One possibility would be to use sed:
sed '/^123/ s/0/1/' my_course
In the first /../ part you just have to specify the pattern you are looking for ^123 for a line starting with 123.
In the s/from/to/ part you have specify the substitution to be performed.
Note that by default after substitution the file will be written to stdout. You might want to:
redirect the output using ... > my_new_course
perform the substitution "in place" using the -e switch to sed
If you are using the destructive in place variant you might want to use -iEXTENSION in addition to keep a copy with the given EXTENSION of the original version in case something goes wrong.
EDIT:
To match the desired lined with a prefix stored in a variable you have to enclose the sed script with double quotes " as using single qoutes ' will prevent variable expansion:
sed "/^$input/ s/0/1/" my_course
Have you tried this:
sed -e '[line]s/old_string/new_string/' my_course
PS: the [ ] shouldn't be used, is there just to make it clear that you should put the number right before the "s".
Cheers!
In fact, the -e in this case is not necessary, I can write just
sed '<line number>s/<old string>/<new string>/' my_course
This is what worked for me on Fedora 36, GNU bash, version 5.2.15(1)-release (x86_64-redhat-linux-gnu):
sed -i '1129s/additional/extra/' en-US/Design.xml
I know you said you couldn't use line numbers; I don't know how to address that part, but this replaced "additional" with "extra" on line 1129 of that file.

bash templating

i have a template, with a var LINK
and a data file, links.txt, with one url per line
how in bash i can substitute LINK with the content of links.txt?
if i do
#!/bin/bash
LINKS=$(cat links.txt)
sed "s/LINKS/$LINK/g" template.xml
two problem:
$LINKS has the content of links.txt without newline
sed: 1: "s/LINKS/http://test ...": bad flag in substitute command: '/'
sed is not escaping the // in the links.txt file
thanks
Use some better language instead. I'd write a solution for bash + awk... but that's simply too much effort to go into. (See http://www.gnu.org/manual/gawk/gawk.html#Getline_002fVariable_002fFile if you really want to do that)
Just use any language where you don't have to mix control and content text. For example in python:
#!/usr/bin/env python
links = open('links.txt').read()
template = open('template.xml').read()
print template.replace('LINKS', links)
Watch out if you're trying to force sed solution with some other separator - you'll get into the same problems unless you find something disallowed in urls (but are you verifying that?) If you don't, you already have another problem - links can contain < and > and break your xml.
You can do this using ed:
ed template.xml <<EOF
/LINKS/d
.r links.txt
w output.txt
EOF
The first command will go to the line
containing LINKS and delete it.
The second line will insert the
contents of links.txt on the current
line.
The third command will write the file
to output.txt (if you omit output.txt
the edits will be saved to
template.xml).
Try running sed twice. On the first run, replace / with \/. The second run will be the same as what you currently have.
The character following the 's' in the sed command ends up the separator, so you'll want to use a character that is not present in the value of $LINK. For example, you could try a comma:
sed "s,LINKS,${LINK}\n,g" template.xml
Note that I also added a \n to add an additional newline.
Another option is to escape the forward slashes in $LINK, possibly using sed. If you don't have guarantees about the characters in $LINK, this may be safer.

Resources