Related
I have a text file that is basically one giant excel file on one line in a text file. An example would be like this:
Name,Age,Year,Michael,27,2018,Carl,19,2018
I need to change the third occurance of a comma into a new line so that I get
Name,Age,Year
Michael,27,2018
Carl,19,2018
Please let me know if that is too ambiguous and as always thank you in advance for all the help!
With Gnu sed:
sed -E 's/(([^,]*,){2}[^,]*),/\1\n/g'
To change the number of fields per line, change {2} to one less than the number of fields. For example, to change every fifth comma (as in the title of your question), you would use:
sed -E 's/(([^,]*,){4}[^,]*),/\1\n/g'
In the regular expression, [^,]*, is "zero or more characters other than , followed by a ,; in other words, it is a single comma-delimited field. This won't work if the fields are quoted strings with internal commas or newlines.
Regardless of what Linux's man sed says, the -E flag is an extension to Posix sed, which causes sed to use extended regular expressions (EREs) rather than basic regular expressions (see man 7 regex). -E also works on BSD sed, used by default on Mac OS X. (Thanks to #EdMorton for the note.)
With GNU awk for multi-char RS:
$ awk -v RS='[,\n]' '{ORS=(NR%3 ? "," : "\n")} 1' file
Name,Age,Year
Michael,27,2018
Carl,19,2018
With any awk:
$ awk -v RS=',' '{sub(/\n$/,""); ORS=(NR%3 ? "," : "\n")} 1' file
Name,Age,Year
Michael,27,2018
Carl,19,2018
Try this:
$ cat /tmp/22.txt
Name,Age,Year,Michael,27,2018,Carl,19,2018,Nooka,35,1945,Name1,11,19811
$ echo "Name,Age,Year"; grep -o "[a-zA-Z][a-zA-Z0-9]*,[1-9][0-9]*,[1-9][0-9]\{3\}" /tmp/22.txt
Michael,27,2018
Carl,19,2018
Nooka,35,1945
Name1,11,1981
Or, ,[1-9][0-9]\{3\} if you don't want to put [0-9] 3 more times for the YYYY part.
PS: This solution will give you only YYYY for the year (even if the data for YYYY is 19811 (typo mistakes if any), you'll still get 1981
You are looking for 3 fragments, each without a comma and separated by a comma.
The last fields can give problems (not ending with a comma and mayby only two fields.
The next command looks fine.
grep -Eo "([^,]*[,]{0,1}){0,3}" inputfile
This might work for you (GNU sed):
sed 's/,/\n/3;P;D' file
Replace every third , with a newline, print ,delete the first line and repeat.
I need to replace all patterns of either AllowUsers, #AllowUsers or # AllowUsers in a file.
I've got that part covered with sed -e 's/^\s*#\?AllowUsers.*//'
But the thing I'm having trouble with is that it should leave one occurrence in the file and remove everything else.
Let me know, thank you in advance!
awk '/AllowUsers||#AllowUsers||# AllowUsers/&&c++ {next} 1' did the job!
I'm trying to do this with sed
sed -i.bak 's,[href="#">bebeco.fr],href="http://www.bebeco.fr,http://mm.eulerian.net/dynclick/bebeco/?eml-publisher=bebeco&eml-name=deme_faible&eemail={email}&eurl=https://www.bebeco.fr/?utm_source=ANIM&utm_medium=Targeting&utm_campaign=deme_faible&utm_term=CCCC_seg_men&utm_content=website">bebeco.fr,g' pour_test_demenagement-lien-faible.html
As you can imagine, due to the multiple meta characters, the sed command line is bugging and not changing the line.
What can I do to solve the problem and not put a \ before each metacharacter?
Do you think awkcan help, without going into programming?
TIA
You are falling into the trap of thinking sed can operate on strings. It cannot. It operates on regexps with additional "special" characters. Just use awk as it does support string operations. Here's how to replace an old string with a new string:
awk -v old="original" -v new="replacement" 's=index($0,old){$0=substr($0,1,s-1) new substr($0,s+length(old))}1' file
It would be great if sed had a way to support strings like grep does with -F but it doesn't. It'd also be great if there was a briefer way to write it in awk but there isn't. So let's just suck it up and do string operations as provided by the tool that supports them instead of trying to find "special" characters that will sometimes work as delimiters and cobble together escape sequences to try to disable regexp metacharacters.
I would simply use a different separator. What about good old ~? The data as shown does not contain the ~:
sed 's~SEARCH~REPLACE~'
Do you think awk can help, without going into programming?
:) Since awk is a programming language, I don't think it is possible without going into programming. However, I think sed is ok for that job.
Your regex syntax is hosed. [href="#">bebeco.fr] matches a single character which is one of h, r, e, etc. On top of that, you have three fields where sed only permits two, so it's not clear what you are trying to accomplish; but perhaps something like
sed -i.bak 's,href="#">bebeco.fr,href="http://mm.eulerian.net/dynclick/bebeco/?eml-publisher=bebeco\&eml-name=deme_faible\&eemail={email}\&eurl=https://www.bebeco.fr/?utm_source=ANIM\&utm_medium=Targeting\&utm_campaign=deme_faible\&utm_term=CCCC_seg_men\&utm_content=website">bebeco.fr,g' pour_test_demenagement-lien-faible.html
That is, replace the hash in the double-quoted string in href="#">bebeco.fr with a long URL.
You can try to create and escaped pattern
MyStringSearch='ref="#">bebeco.fr'
MyStringSearchEsc="$( printf '%s' "${MyStringSearch}" | sed 's#\[]["\&*.\\/+?]#\\&#g' )"
MyStringReplace='href="http://mm.eulerian.net/dynclick/bebeco/?eml-publisher=bebeco\&eml-name=deme_faible\&eemail={email}\&eurl=https://www.bebeco.fr/?utm_source=ANIM\&utm_medium=Targeting\&utm_campaign=deme_faible\&utm_term=CCCC_seg_men\&utm_content=website">bebeco.fr'
MyStringReplaceEsc="$( printf '%s' "${MyStringReplace}" | sed 's#\[\\/\&"]#\\&#g' )"
before us it in final sed.
sed "s/${MyStringSearchEsc}/${MyStringReplaceEsc}/g" pour_test_demenagement-lien-faible.html
I've been searching google for ever, and I cannot find an example of how to do this. I also do not grasp the concept of how to construct a regular expression for SED, so I was hoping someone could explain this to me.
I'm running a bash script against a file full of lines of text that look like this: 2222,H,73.82,04,07,2012
and I need to make them all look like this: 2222,H,73.82,04072012
I need to remove the last two commas, which are the 16th and 19th characters in the line.
Can someone tell me how to do that? I was going to use colrm, which is blessedly simple, but i can't seem to get that installed in CYGWIN. Please and thank you!
I'd use awk for this:
awk -F',' -v OFS=',' '{ print $1, $2, $3, $4$5$6 }' inputfile
This takes a CSV file and prints the first, second and third fields, each followed by the output field separator (",") and then the fourth, fifth and sixth fields concatenated.
Personally I find this easier to read and maintain than regular expression-based solutions in sed and it will cope well if any of your columns get wider (or narrower!).
This will work on any string and will remove only the last 2 commas:
sed -e 's/\(.*\),\([^,]*\),\([^,]*\)$/\1\2\3/' infile.txt
Note that in my sed variant I have to escape parenthesis, YMMV.
I also do not grasp the concept of how to construct a regular
expression for SED, so I was hoping someone could explain this to me.
The basic notation that people are telling you here is: s/PATTERN/REPLACEMENT/
Your PATTERN is a regular expression, which may contain parts that are in brackets. Those parts can then be referred to in the REPLACEMENT part of the command. For example:
> echo "aabbcc" | sed 's/\(..\)\(..\)\(..\)/\2\3\1/'
bbccaa
Note that in the version of sed I'm using defaults to the "basic" RE dialect, where the brackets in expressions need to be escaped. You can do the same thing in the "extended" dialect:
> echo "aabbcc" | sed -E 's/(..)(..)(..)/\2\3\1/'
bbccaa
(In GNU sed (which you'd find in Linux), you can get the same results with the -r options instead of -E. I'm using OS X.)
I should say that for your task, I would definitely follow Johnsyweb's advice and use awk instead of sed. Much easier to understand. :)
It should work :
sed -e 's~,~~4g' file.txt
remove 4th and next commas
echo "2222,H,73.82,04,07,2012" | sed -r 's/(.{15}).(..)./\1\2/'
Take 15 chars, drop one, take 2, drop one.
sed -e 's/(..),(..),(....)$/\1\2\3/' myfile.txt
I'm looking for a way to remove lines within multiple csv files, in bash using sed, awk or anything appropriate where the file ends in 0.
So there are multiple csv files, their format is:
EXAMPLEfoo,60,6
EXAMPLEbar,30,10
EXAMPLElong,60,0
EXAMPLEcon,120,6
EXAMPLEdev,60,0
EXAMPLErandom,30,6
So the file will be amended to:
EXAMPLEfoo,60,6
EXAMPLEbar,30,10
EXAMPLEcon,120,6
EXAMPLErandom,30,6
A problem which I can see arising is distinguishing between double digits that end in zero and 0 itself.
So any ideas?
Using your file, something like this?
$ sed '/,0$/d' test.txt
EXAMPLEfoo,60,6
EXAMPLEbar,30,10
EXAMPLEcon,120,6
EXAMPLErandom,30,6
For this particular problem, sed is perfect, as the others have pointed out. However, awk is more flexible, i.e. you can filter on an arbitrary column:
awk -F, '$3!=0' test.csv
This will print the entire line is column 3 is not 0.
use sed to only remove lines ending with ",0":
sed '/,0$/d'
you can also use awk,
$ awk -F"," '$NF!=0' file
EXAMPLEfoo,60,6
EXAMPLEbar,30,10
EXAMPLEcon,120,6
EXAMPLErandom,30,6
this just says check the last field for 0 and don't print if its found.
sed '/,[ \t]*0$/d' file
I would tend to sed, but there is an egrep (or: grep -e) -solution too:
egrep -v ",0$" example.csv