substitute characters but not the last one - ruby
I have a string for example like this:
str = 'TEST;NAME=1;TARGET_SOMETHING;PLATFORM_INTEL;'
Now I would like to substitute all ";" with "-D" and delete the last ";"
I'm doing it with:
str.gsub(/;/, ' -D').gsub(/^/, ' -D')
the second gsub is only to add the -D also to the beginn of line
Result:
-DTEST -DNAME=1 -DTARGET_SOMETHING -DPLATFORM_INTEL -D
How to tell Ruby not to output the last "-D" or to delete the last ";" in str?
Any suggestions to do it in the same line?
You can combine split and map for this.
irb(main):012:0> str.split(";").map {|i| "-D#{i}"}.join(" ")
=> "-DTEST -DNAME=1 -DTARGET_SOMETHING -DPLATFORM_INTEL"
elements= (str.gsub(/;/, ' -D').gsub(/^/, ' -D')).split(' ')
output will be:
["-DTEST", "-DNAME=1", "-DTARGET_SOMETHING", "-DPLATFORM_INTEL", "-D"]
then delete last element from an array:
elements.delete_at(elements.size-1)
output will be in elements variable
p elements
["-DTEST", "-DNAME=1", "-DTARGET_SOMETHING", "-DPLATFORM_INTEL"]
Related
lowercase and remove punctuation from a csv
I have a giant file (6gb) which is a csv and the rows look like so: "87687","institute Polytechnic, Brazil" "342424","university of India, India" "24343","univefrsity columbia, Bogata, Colombia" and I would like to remove all punctuation and lower the case of second column yielding: "87687","institutepolytechnicbrazil" "342424","universityofindiaindia" "24343","univefrsitycolumbiabogatacolombia" what would be the most efficient way to do this on the terminal? Tried: cat TEXTFILE | tr -d '[:punct:]' > OUTFILE problem: resultant is not in lowercase and tr seems to act on both columns not just the ssecond.
With a real CSV parser in Perl, the robust/reliable way, using just one process. As far as it's line by line, the 6GB requirement of file size should not be an issue. #!/usr/bin/perl use strict; use warnings; # harness use Text::CSV; # load the needed module (install it) use feature qw/say/; # say = print("...\n") # create an instance of a new CSV parser my $csv = Text::CSV->new({ auto_diag => 1 }); # open a File Handle or exit with error open my $fh, "<:encoding(utf8)", "file.csv" or die "file.csv: $!"; while (my $row = $csv->getline ($fh)) { # parse line by line $_ = $row->[1]; # parse only column 2 s/[\s[:punct:]]//g; # removes both space(s) and punct(s) $_ = lc $_; # Lower Case current value $_ $row->[1] = qq/"$_"/; # edit changes and (re)"quote" say join ",", #$row; # display the whole current row } close $fh; # close the File Handle Output "87687","institutepolytechnicbrazil" "342424","universityofindiaindia" "24343","univefrsitycolumbiabogatacolombia" install cpan Text::CSV
Here's an approach using xsv and process substitution: paste -d, \ <(xsv select 1 infile.csv) \ <(xsv select 2 infile.csv | sed 's/[[:blank:][:punct:]]*//g;s/.*/\L&/') The sed command first removes all blanks and punctuation, then lowercases the entire match. This also works when the first field contains blanks and commas, and retains quoting where required.
Using sed $ sed -E ':a;s/([^,]*,)([^ ,]*)[ ,]([[:alpha:]]+)/\1\L\2\3/;ta' input_file "87687","institutepolytechnicbrazil" "342424","universityofindiaindia" "24343","univefrsitycolumbiabogatacolombia
I suggest using this awk solution, which should work with any version of awk: awk 'BEGIN{FS=OFS="\",\""} { gsub(/[^[:alnum:]"]+/, "", $2); $2 = tolower($2)} 1' file "87687","institutepolytechnicbrazil" "342424","universityofindiaindia" "24343","univefrsitycolumbiabogatacolombia" Details: We make "," input and output field separators in BEGIN block gsub(/[^[:alnum:]"]+/, "", $2): Strip all non-alphanumeric characters except " $2 = tolower($2): Lowercase second column
One GNU awk (for gensub()) idea: awk ' BEGIN { FS=OFS="\"" } { $4=gensub(/[^[:alnum:]]/,"","g",tolower($4)) } 1' This generates: "87687","institutepolytechnicbrazil" "342424","universityofindiaindia" "24343","univefrsitycolumbiabogatacolombia"
Another sed approach - sed -E 's/ +//g; s/([^"]),/\1/g; s/"([^"]*)"/"\L\1"/g' file I don't like how that leaves no flexibility, and makes you rewrite the logic if you find something else you want to remove, though. Another in awk - awk -F'[", ]+' ' { printf "\"%s\",\"", $2; for(c=3;c<=NF;c++) printf "%s", tolower($c); print "\""; }' file This approach lets you define and add any additional offending characters into the field delimiters without editing your logic. $: pat=$"[\"',_;:!##\$%)(* -]+" $: echo "$pat" ["',_;:!##$%)(* -]+ $: cat file "87687","institute 'Polytechnic, Brazil" "342424","university; of-India, India" "24343","univefrsity )columbia, Bogata, Colombia" $: awk -F"$pat" '{printf "\"%s\",\"", $2; for(c=3;c<=NF;c++) printf "%s", tolower($c); print "\"" }' file "87687","institutepolytechnicbrazil" "342424","universityofindiaindia" "24343","univefrsitycolumbiabogatacolombia" (I hate the way that lone single quote throws the markup color/format parsing off, lol)
Another way using ruby. Edited the data to show only the second field is modified. % ruby -r 'csv' -e 'f = open("file"); CSV.parse(f) do |i| puts "\"" + i[0] + "\",\"" + i[1].downcase.gsub(/[ ,]/,"") + "\"" end' "8768, 7","institutepolytechnicbrazil" "342 424","universityofindiaindia" "243 43","univefrsitycolumbiabogatacolombia" Using FastCSV gives a huge speedup gem install fastcsv % ruby -r 'fastcsv' -e 'f = open("file"); FastCSV.raw_parse(f) do |i| puts "\"" + i[0] + "\",\"" + i[1].downcase.gsub(/[ ,]/,"") + "\"" end' "8768, 7","institutepolytechnicbrazil" "342 424","universityofindiaindia" "243 43","univefrsitycolumbiabogatacolombia" Data % cat file "8768, 7","institute Polytechnic, Brazil" "342 424","university of India, India" "243 43","univefrsity columbia, Bogata, Colombia"
With your shown samples and attempts please try following GNU awk code using match function of it. Using regex (^"[^"]*",")([^"]*)(".*)$ in match function which will create 3 capturing groups and will store the value into arr and respectively I am fetching the values of it later in program to meet OP's requirement. awk ' match($0,/(^"[^"]*",")([^"]*)(".*)$/,arr){ gsub(/[^[:alnum:]]+/,"",arr[2]) print arr[1] tolower(arr[2]) arr[3] } ' Input_file
This might work for you (GNU sed): sed -E s'/("[^"]*",)/\1\n/;h;s/.*\n//;s/[[:punct:] ]//g;s/.*/"\L&"/;H;g;s/\n.*\n//' file Divide and rule. Partition the line into two fields, make a copy, process the second field removing punctuation and spaces, re-quote and lowercase and then re-assemble the fields An alternative, perhaps? sed -E ':a;s/^("[^"]*",".*)[^[:alpha:]"](.*)/\L\1\2/;ta' file
Here is a way to do so in PHP. Note: PHP will not output double quotes unless needed by the first column. The second column will never need double quotes, it has no space or special characters. $max_line_length = 100; if (($fp = fopen("file.csv", "r")) !== FALSE) { while (($data = fgetcsv($fp, $max_line_length, ",")) !== FALSE) { $data[1] = strtolower(preg_replace('/[\s[:punct:]]/', '', $data[1])); fputcsv(STDOUT, $data, ',', '"'); } fclose($fp); }
How i should use sed for delete specific strings and allow duplicate with more characters?
i had generate a list of file, and this had 17417 lines like : ./usr ./usr/share ./usr/share/mime-info ./usr/share/mime-info/libreoffice7.0.mime ./usr/share/mime-info/libreoffice7.0.keys ./usr/share/appdata ./usr/share/appdata/libreoffice7.0-writer.appdata.xml ./usr/share/appdata/org.libreoffice7.0.kde.metainfo.xml ./usr/share/appdata/libreoffice7.0-draw.appdata.xml ./usr/share/appdata/libreoffice7.0-impress.appdata.xml ./usr/share/appdata/libreoffice7.0-base.appdata.xml ./usr/share/appdata/libreoffice7.0-calc.appdata.xml ./usr/share/applications ./usr/share/applications/libreoffice7.0-xsltfilter.desktop ./usr/share/applications/libreoffice7.0-writer.desktop ./usr/share/applications/libreoffice7.0-base.desktop ./usr/share/applications/libreoffice7.0-math.desktop ./usr/share/applications/libreoffice7.0-startcenter.desktop ./usr/share/applications/libreoffice7.0-calc.desktop ./usr/share/applications/libreoffice7.0-draw.desktop ./usr/share/applications/libreoffice7.0-impress.desktop ./usr/share/icons ./usr/share/icons/gnome ./usr/share/icons/gnome/16x16 ./usr/share/icons/gnome/16x16/mimetypes ./usr/share/icons/gnome/16x16/mimetypes/libreoffice7.0-oasis-formula.png The thing is i want to delete the lines like : ./usr ./usr/share ./usr/share/mime-info ./usr/share/appdata ./usr/share/applications ./usr/share/icons ./usr/share/icons/gnome ./usr/share/icons/gnome/16x16 ./usr/share/icons/gnome/16x16/mimetypes and the "." at the start, for the result must be like : /usr/share/mime-info/libreoffice7.0.mime /usr/share/mime-info/libreoffice7.0.keys /usr/share/appdata/libreoffice7.0-writer.appdata.xml /usr/share/appdata/org.libreoffice7.0.kde.metainfo.xml /usr/share/appdata/libreoffice7.0-draw.appdata.xml /usr/share/appdata/libreoffice7.0-impress.appdata.xml /usr/share/appdata/libreoffice7.0-base.appdata.xml /usr/share/appdata/libreoffice7.0-calc.appdata.xml /usr/share/applications/libreoffice7.0-xsltfilter.desktop /usr/share/applications/libreoffice7.0-writer.desktop /usr/share/applications/libreoffice7.0-base.desktop /usr/share/applications/libreoffice7.0-math.desktop /usr/share/applications/libreoffice7.0-startcenter.desktop /usr/share/applications/libreoffice7.0-calc.desktop /usr/share/applications/libreoffice7.0-draw.desktop /usr/share/applications/libreoffice7.0-impress.desktop /usr/share/icons/gnome/16x16/mimetypes/libreoffice7.0-oasis-formula.png This is possible using sed ? or is more practical using another tool
With your list in the filename list, you could do: sed -n 's/^[.]//;/\/.*[._].*$/p' list Where: sed -n suppresses printing of pattern-space; then s/^[.]// is the substitution form that simply removes the first character '.' from each line; then /\/.*[._].*$/p matches line that contain a '.' or '_' (optional) after the last '/' with p causing that line to be printed. Example Use/Output $ sed -n 's/^[.]//;/\/.*[._].*$/p' list /usr/share/mime-info/libreoffice7.0.mime /usr/share/mime-info/libreoffice7.0.keys /usr/share/appdata/libreoffice7.0-writer.appdata.xml /usr/share/appdata/org.libreoffice7.0.kde.metainfo.xml /usr/share/appdata/libreoffice7.0-draw.appdata.xml /usr/share/appdata/libreoffice7.0-impress.appdata.xml /usr/share/appdata/libreoffice7.0-base.appdata.xml /usr/share/appdata/libreoffice7.0-calc.appdata.xml /usr/share/applications/libreoffice7.0-xsltfilter.desktop /usr/share/applications/libreoffice7.0-writer.desktop /usr/share/applications/libreoffice7.0-base.desktop /usr/share/applications/libreoffice7.0-math.desktop /usr/share/applications/libreoffice7.0-startcenter.desktop /usr/share/applications/libreoffice7.0-calc.desktop /usr/share/applications/libreoffice7.0-draw.desktop /usr/share/applications/libreoffice7.0-impress.desktop /usr/share/icons/gnome/16x16/mimetypes/libreoffice7.0-oasis-formula.png Note, without GNU sed that allows chaining of expressions with ';' you would need: sed -n -e 's/^[.]//' -e '/\/.*[._].*$/p' list
Assuming you want to delete the line(s) which is included other pathname(s), would you please try: sort -r list.txt | awk ' # sort the list in the reverse order { sub("^\\.", "") # remove leading dot s = prev; sub("/[^/]+$", "", s) # remove the rightmost slash and following characters if (s != $0) print # if s != $0, it means $0 is not a substring of the previous line prev = $0 # keep $0 for the next line }' Result: /usr/share/mime-info/libreoffice7.0.mime /usr/share/mime-info/libreoffice7.0.keys /usr/share/icons/gnome/16x16/mimetypes/libreoffice7.0-oasis-formula.png /usr/share/applications/libreoffice7.0-xsltfilter.desktop /usr/share/applications/libreoffice7.0-writer.desktop /usr/share/applications/libreoffice7.0-startcenter.desktop /usr/share/applications/libreoffice7.0-math.desktop /usr/share/applications/libreoffice7.0-impress.desktop /usr/share/applications/libreoffice7.0-draw.desktop /usr/share/applications/libreoffice7.0-calc.desktop /usr/share/applications/libreoffice7.0-base.desktop /usr/share/appdata/org.libreoffice7.0.kde.metainfo.xml /usr/share/appdata/libreoffice7.0-writer.appdata.xml /usr/share/appdata/libreoffice7.0-impress.appdata.xml /usr/share/appdata/libreoffice7.0-draw.appdata.xml /usr/share/appdata/libreoffice7.0-calc.appdata.xml /usr/share/appdata/libreoffice7.0-base.appdata.xml
remove only *some* fullstops from a csv file
If I have lines like the following: 1,987372,987372,C,T,.,.,.,.,.,.,.,.,1,D,.,.,.,.,.,.,.,1.293,12.23,0.989,0.973,D,.,.,.,.,0.253,0,4.08,0.917,1.048,1.000,1.000,12.998 1,987393,987393,C,T,.,.,.,.,.,.,.,.,1,D,.,.,.,.,.,.,0.152,1.980,16.09,0.999,0.982,D,-0.493,T,0.335,T,0.696,0,5.06,0.871,0.935,0.998,0.997,16.252 how can I replace all instances of ,., with ,?, I want to preserve actual decimal places in the numbers so I can't just do sed 's/./?/g' file however when doing: sed 's/,.,/,?,/g' file this only appears to work in some cases. i.e. there are still instances of ,., hanging around. anyone have any pointers? Thanks
This should work : sed ':a;s/,\.,/,?,/g;ta' file With successive ,., strings, after a substitution succeeded, next character to be processed will be the following . that doesn't match the pattern, so with you need a second pass. :a is a label for upcoming loop ,\., will match dot between commas. Note that the dot must be escaped because . is for matching any character (,a, would match with ,.,). g is for general substitution ta tests previous substitution and if it succeeded, loops to :a label for remaining substitutions.
Using sed it is possible by running a loop as shown in above answer however problem is easily solved using perl command line with lookarounds: perl -pe 's/(?<=,)\.(?=,)/?/g' file 1,987372,987372,C,T,?,?,?,?,?,?,?,?,1,D,?,?,?,?,?,?,?,1.293,12.23,0.989,0.973,D,?,?,?,?,0.253,0,4.08,0.917,1.048,1.000,1.000,12.998 1,987393,987393,C,T,?,?,?,?,?,?,?,?,1,D,?,?,?,?,?,?,0.152,1.980,16.09,0.999,0.982,D,-0.493,T,0.335,T,0.696,0,5.06,0.871,0.935,0.998,0.997,16.252 This command doesn't need a loop because instead of matching surrounding commas we're just asserting their position using a lookbehind and lookahead.
All that's necessary is a single substitution $ perl -pe 's/,\.(?=,)/,?/g' dots.csv 1,987372,987372,C,T,?,?,?,?,?,?,?,?,1,D,?,?,?,?,?,?,?,1.293,12.23,0.989,0.973,D,?,?,?,?,0.253,0,4.08,0.917,1.048,1.000,1.000,12.998 1,987393,987393,C,T,?,?,?,?,?,?,?,?,1,D,?,?,?,?,?,?,0.152,1.980,16.09,0.999,0.982,D,-0.493,T,0.335,T,0.696,0,5.06,0.871,0.935,0.998,0.997,16.252
You have an example using sed style regular expressions. I'll offer an alternative - parse the CSV, and then treat each thing as a 'field': #!/usr/bin/perl use strict; use warnings; #iterate input row by row while ( <DATA> ) { #remove linefeeds chomp; #split this row on , my #row = split /,/; #iterate each field foreach my $field ( #row ) { #replace this field with "?" if it's "." $field = "?" if $field eq "."; } #stick this row together again. print join ",", #row,"\n"; } __DATA__ 1,987372,987372,C,T,.,.,.,.,.,.,.,.,1,D,.,.,.,.,.,.,.,1.293,12.23,0.989,0.973,D,.,.,.,.,0.253,0,4.08,0.917,1.048,1.000,1.000,12.998 1,987393,987393,C,T,.,.,.,.,.,.,.,.,1,D,.,.,.,.,.,.,0.152,1.980,16.09,0.999,0.982,D,-0.493,T,0.335,T,0.696,0,5.06,0.871,0.935,0.998,0.997,16.252 This is more verbose than it needs to be, to illustrate the concept. This could be reduced down to: perl -F, -lane 'print join ",", map { $_ eq "." ? "?" : $_ } #F' If your CSV also has quoting, then you can break out the Text::CSV module, which handles that neatly.
You just need 2 passes since the trailing , found on a ,., match isn't available to match the leading , on the next ,.,: $ sed 's/,\.,/,?,/g; s/,\.,/,?,/g' file 1,987372,987372,C,T,?,?,?,?,?,?,?,?,1,D,?,?,?,?,?,?,?,1.293,12.23,0.989,0.973,D,?,?,?,?,0.253,0,4.08,0.917,1.048,1.000,1.000,12.998 1,987393,987393,C,T,?,?,?,?,?,?,?,?,1,D,?,?,?,?,?,?,0.152,1.980,16.09,0.999,0.982,D,-0.493,T,0.335,T,0.696,0,5.06,0.871,0.935,0.998,0.997,16.252 The above will work in any sed on any OS.
Replacing escape quotes with just quotes in a string
So I'm having an issue replacing \" in a string. My Objective: Given a string, if there's an escaped quote in the string, replace it with just a quote So for example: "hello\"74" would be "hello"74" simp"\"sons would be simp"sons jump98" would be jump98" I'm currently trying this: but obviously that doesn't work and messes everything up, any assistance would be awesome str.replace "\\"", "\""
I guess you are being mistaken by how \ works. You can never define a string as a = "hello"74" Also escape character is used only while defining the variable its not part of the value. Eg: a = "hello\"74" # => "hello\"74" puts a # hello"74 However in-case my above assumption is incorrect following example should help you: a = 'hello\"74' # => "hello\\\"74" puts a # hello\"74 a.gsub!("\\","") # => "hello\"74" puts a # hello"74 EDIT The above gsub will replace all instances of \ however OP needs only to replace '" with ". Following should do the trick: a.gsub!("\\\"","\"") # => "hello\"74" puts a # hello"74
You can use gsub: word = 'simp"\"sons'; print word.gsub(/\\"/, '"'); //=> simp""sons
I'm currently trying str.replace "\\"", "\"" but obviously that doesn't work and messes everything up, any assistance would be awesome str.replace "\\"", "\"" doesn't work for two reasons: It's the wrong method. String#replace replaces the entire string, you are looking for String#gsub. "\\"" is incorrect: " starts the string, \\ is a backslash (correctly escaped) and " ends the string. The last " starts a new string. You have to either escape the double quote: puts "\\\"" #=> \" Or use single quotes: puts '\\"' #=> \" Example: content = <<-EOF "hello\"74" simp"\"sons jump98" EOF puts content.gsub('\\"', '"') Output: "hello"74" simp""sons jump98"
Bash array + sed + html
I need change price the HTML file, which search and store them in array but I have to change and save /nuevo-focus.html price=( `cat /home/delkav/info-sitioweb/html/productos/autos/nuevo-focus.html | grep -oiE '([$][0-9.]{1,7})'|tr '\n' ' '` ) price2=( $90.880 $0 $920 $925 $930 $910 $800 $712 $27.220 $962 ) sub (){ for item in "${price[#]}"; do for x in ${price2[#]}; do sed s/$item/$x/g > /home/delkav/info-sitioweb/html/productos/autos/nuevo-focus.html done done } sub Output the "cat /home/.../nuevo-focus.html|grep -oiE '([$][0-9.]{1,7})'|tr '\n' ' '` )" is... $86.880 $0 $912 $908 $902 $897 $882 $812 $25.725 $715
In bash the variables $0 through $9 refer to the respective command line arguments of the script being run. In the line: price2=( $90.880 $0 $920 $925 $930 $910 $800 $712 $27.220 $962 ) They will be expanded to either empty strings or the command line arguments that you gave the script. Try doing this instead: price2=( '$90.880' '$0' '$920' '$925' '$930' '$910' '$800' '$712' '$27.220' '$962' ) EDIT for part two of question If what you are trying to do with the sed line is replace the prices in the file, overwriting the old ones, then you should do this: sed -i s/$item/$x/g /home/delkav/info-sitioweb/html/productos/autos/nuevo-focus.html This will perform the substitution in place (-i), modifying the input file. EDIT for part three of the question I just realized that your nested loop does not really make sense. I am assuming that what you want to do is replace each price from price with the corresponding price in price2 If that is the case, then you should use a single loop, looping over the indices of the array: for i in ${!price[*]} do sed -i "s/${price[$i]}/${price2[$i]}/g" /home/delkav/info-sitioweb/html/productos/autos/nuevo-focus.html done I'm not able to test that right now, but I think it should accomplish what you want. To explain it a bit: ${!price[*]} gives you all of the indices of your array (e.g. 0 1 2 3 4 ...) For each index we then replace the corresponding old price with the new one. There is no need for a nested loop as you have done. When you execute that, what you are Basically doing is this: replace every occurence of "foo" with "bar" # at this point, there are now no more occurences of "foo" in your file # so all of the other replacements do nothing replace every occurence of "foo" with "baz" replace every occurence of "foo" with "spam" replace every occurence of "foo" with "eggs" replace every occurence of "foo" with "qux" replace every occurence of "foo" with "whatever" etc...