how to remove force quotes for only one column in csv file - ruby

I'm generating some CSV output using Ruby's built-in CSV. Everything works fine, but the customer wants the price field in the output should be without double-quotes.
So the output looks like this:
"10789852616","Studentska-trgovina","27.80","EUR",
The customer wants to like this:
"10789852616","Studentska-trgovina",27.80,"EUR",

Try .to_f it returns the result of interpreting leading characters in str as a floating point number. Extraneous characters past the end of a valid number are ignored. If there is not a valid number at the start of str, 0.0 is returned.

Related

Replace specific commas in a csv file

I have a file like this:
gene_id,transcript_id(s),length,effective_length,expected_count,TPM,FPKM,id
ENSG00000000003.14,ENST00000373020.8,ENST00000494424.1,ENST00000496771.5,ENST00000612152.4,ENST00000614008.4,2.23231E3,2.05961E3,2493,2.112E1,1.788E1,00065a62-5e18-4223-a884-12fca053a109
ENSG00000001084.10,ENST00000229416.10,ENST00000504353.1,ENST00000504525.1,ENST00000505197.1,ENST00000505294.5,ENST00000509541.5,ENST00000510837.5,ENST00000513939.5,ENST00000514004.5,ENST00000514373.2,ENST00000514933.1,ENST00000515580.1,ENST00000616923.4,3.09456E3,2.92186E3,3111,1.858E1,1.573E1,00065a62-5e18-4223-a884-12fca053a109
The problem is that instead of ,, the file should've been tab delimited because the values starting from ENST (i.e. transcript_id(s)) are grouped in one column.
The number of ENST IDs is different in each line.
Each ENST ID has the same pattern: starts from ENST, followed by 11 digits followed by a period and then 1-3 digits: ^ENST[0-9]{11}[.][0-9]{1,3}.
I want to convert all the comma's between ENST ids to a : or any other character to read this as a csv file. Any help would be much appreciated. Thanks!
I imagine something as simple as
sed 's|,ENST|:ENST|g;s|:|,|' < /path/to/your/file
should work. No reason to over-complicate.

How do I filter file names out of a SQLite dump?

I'm trying to filter out all file names from an SQLite text dump using Ruby. I'm not very handy/familiar with regex and need a way to read, and write to a file, another dump of image files that are within the SQLite dump. I can filter out everything except stuff like this:
VALUES(3,5,1,43,'/images/e/e5/Folder%2FOrders%2FFinding_Orders%2FView_orders3.JPG','1415',NULL);
and this:
src="/images/9/94/folder%2FGraph.JPG"
I can't figure out the easiest way to filter through this. I've tried using split and other functions, but instead of splitting the string into an array by the character specified, it just removed the character.
You should be able to use .gsub('%2', ' ') the %2 with a space, while quoted, it should be fine.
Split does remove the character that is being split, though. So you may not want to do that, or if you do, you may want to use the Array#join method with the argument of the character you split with to put it back in.
I want to 'extract' the file name from the statements above. Say I have src="/images/9/94/folder%2FGraph.JPG", I want folder%2FGraph.JPG to be extracted out.
If you want to extract what is inside the src parameter:
foo = 'src="/images/9/94/folder%2FGraph.JPG"'
foo[/^src="(.+)"/, 1]
=> "/images/9/94/folder%2FGraph.JPG"
That returns a string without the surrounding parenthesis.
Here's how to do the first one:
bar = "VALUES(3,5,1,43,'/images/e/e5/Folder%2FOrders%2FFinding_Orders%2FView_orders3.JPG','1415',NULL);"
bar.split(',')[4][1..-2]
=> "/images/e/e5/Folder%2FOrders%2FFinding_Orders%2FView_orders3.JPG"
Not everything in programming is a regex problem. Somethings, actually, in my opinion, most things, are not candidates for a pattern. For instance, the first example could be written:
foo.split('=')[1][1..-2]
and the second:
bar[/'(.+?)'/, 1]
The idea is to use whichever is most clean and clear and understandable.
If all you want is the filename, then use a method designed to return only the filename.
Use one of the above and pass its output to File.basename. Filename.basename returns only the filename and extension.

Ruby (on Rails) Regex: removing thousands comma from numbers

This seems like a simple one, but I am missing something.
I have a number of inputs coming in from a variety of sources and in different formats.
Number inputs
123
123.45
123,45 (note the comma used here to denote decimals)
1,234
1,234.56
12,345.67
12,345,67 (note the comma used here to denote decimals)
Additional info on the inputs
Numbers will always be less than 1 million
EDIT: These are prices, so will either be whole integers or go to the hundredths place
I am trying to write a regex and use gsub to strip out the thousands comma. How do I do this?
I wrote a regex: myregex = /\d+(,)\d{3}/
When I test it in Rubular, it shows that it captures the comma only in the test cases that I want.
But when I run gsub, I get an empty string: inputstr.gsub(myregex,"")
It looks like gsub is capturing everything, not just the comma in (). Where am I going wrong?
result = inputstr.gsub(/,(?=\d{3}\b)/, '')
removes commas only if exactly three digits follow.
(?=...) is a lookahead assertion: It needs to be possible to be matched at the current position, but it's not becoming part of the text that is actually matched (and subsequently replaced).
You are confusing "match" with "capture": to "capture" means to save something so you can refer to it later. You want to capture not the comma, but everything else, and then use the captured portions to build your substitution string.
Try
myregex = /(\d+),(\d{3})/
inputstr.gsub(myregex,'\1\2')
In your example, it is possible to tell from the number of digits after the last separator (either , or .) that it is a decimal point, since there are 2 lone digits. For most cases, if the last group of digits does not have 3 digits then you can assume that the separator in front is decimal point. Another sign is the multiple appearance of a separator in big numbers allows us to differentiate between decimal point and separators.
However, I can give a string 123,456 or 123.456 without any sort of context. It is impossible to tell whether they are "123 thousand 456" or "123 point 456".
You need to scan the document to look for clue whether , is used for thousand separator or decimal point, and vice versa for .. With the context provided, then you can safely apply the same method to remove the thousand separators.
You may also want to check out this article on Wikipedia on the less common ways to specify separators or decimal points. Knowing and deciding not to support is better than assuming things will work.

Replacing manually written date with a string containing it

I have these 2 things I am working with:
CSV.foreach('datafile.csv','r') {|row| D_Location << row[0]}
puts Date.new(2003,05,02).cwday
In the first line I would like to change the datafile.csv to something like a string so I can change one string and it changes for all of these codes. I have many, each controlling 1 csv column.
In the second one I would like to replace the actual date written, and replace it with a string. This is so that can be automatic, because the string will be generated based on other criteria.
I trust the mods will ban me if I'm being too much of a noob hehe. Then I'll toughen up and find these answers myself eventually. But so far I've solved a lot, but not this. Thanks in advance!
Make a function which takes in a string representing a weekday, and returns a number. Call this function later in your code:
Date.new(2003, 05, yourfun('Tuesday')).cwday
For the first part of your question, you're already working with a string. I think what you mean is that you want it to be in a variable:
csv_file = 'datafile.csv'
CSV.foreach(csv_file,'r') {|row| D_Location << row[0]}
For the second part of your question, Date.parse() works with strings, but they need to be in a format that it can recognize. If your date strings use commas, you can replace them with hyphens:
date_str = "2003,05,02"
Date.parse(date_str.gsub(",", "-")).cwday # => 5
It's not clear where your date strings will be coming from or what format they'll be in, but the general concepts you need to understand are that you can use variables, and that you can transform strings.

Parsing String to get required Data

I'm trying to parse a line within a file that contains "ID" and a numeric entry. However, what the script below is doing is grabbing the "ID" numeric value plus everything after it. How can I just cut it down to "ID" + numeric value and nothing else?
Thanks in advance.
tail -n 1 events.log | sed 's/.*id=\([^)]\+\).*/\1/' > event_id.dat
Your regex string in brackets is [^)]\+, which means "all characters other than the end bracket".
If numerical digits is what you want to catch, you need to change that to [0-9]\+
It's tough to tell what you're looking for without an example of input and expected output, but this might work most generally:
sed -e 's/.*id=\([0-9]*\).*/\1/'
That amounts to:
Look for lines include "id=" immediately followed by some numbers ([0-9]*), with any amount of anything before or after
Replace those lines with just the numbers (where \1 references the part within parenthesis in the match expression)
Does that do what you want? If not, can you be more explicit with your input/output requirements?

Resources