How do I filter file names out of a SQLite dump? - ruby

I'm trying to filter out all file names from an SQLite text dump using Ruby. I'm not very handy/familiar with regex and need a way to read, and write to a file, another dump of image files that are within the SQLite dump. I can filter out everything except stuff like this:
VALUES(3,5,1,43,'/images/e/e5/Folder%2FOrders%2FFinding_Orders%2FView_orders3.JPG','1415',NULL);
and this:
src="/images/9/94/folder%2FGraph.JPG"
I can't figure out the easiest way to filter through this. I've tried using split and other functions, but instead of splitting the string into an array by the character specified, it just removed the character.

You should be able to use .gsub('%2', ' ') the %2 with a space, while quoted, it should be fine.
Split does remove the character that is being split, though. So you may not want to do that, or if you do, you may want to use the Array#join method with the argument of the character you split with to put it back in.

I want to 'extract' the file name from the statements above. Say I have src="/images/9/94/folder%2FGraph.JPG", I want folder%2FGraph.JPG to be extracted out.
If you want to extract what is inside the src parameter:
foo = 'src="/images/9/94/folder%2FGraph.JPG"'
foo[/^src="(.+)"/, 1]
=> "/images/9/94/folder%2FGraph.JPG"
That returns a string without the surrounding parenthesis.
Here's how to do the first one:
bar = "VALUES(3,5,1,43,'/images/e/e5/Folder%2FOrders%2FFinding_Orders%2FView_orders3.JPG','1415',NULL);"
bar.split(',')[4][1..-2]
=> "/images/e/e5/Folder%2FOrders%2FFinding_Orders%2FView_orders3.JPG"
Not everything in programming is a regex problem. Somethings, actually, in my opinion, most things, are not candidates for a pattern. For instance, the first example could be written:
foo.split('=')[1][1..-2]
and the second:
bar[/'(.+?)'/, 1]
The idea is to use whichever is most clean and clear and understandable.
If all you want is the filename, then use a method designed to return only the filename.
Use one of the above and pass its output to File.basename. Filename.basename returns only the filename and extension.

Related

breakable slashes everywhere but URLs

I generate pdf (latex) from restructured text using python sphinx (1.4.6) .
I use narrow table column headers with texts like "stuff/misc/other". I need the slashes to be breakable, so the table headers don't overflow into the next column.
The LaTeX solution is to use \BreakableSlash or \slash where necessary. I can use python code to replace all slashes:
from sphinx.util.texescape import tex_replacements
# \BreakableSlash needs package hyphenat to be loaded
tex_replacements.append((u'/', ur'\BreakableSlash ') )
# tex_replacements.append((u'/', ur'\slash ') )
But that will break any URL like http://www.example.com/ into something like
http:\unhbox\voidb#x\penalty\#M\hskip\z#skip/\discretionary{-}{}{}\penalty\#M\hskip\z#skip\unhbox\voidb#x\penalty\#M\hskip\z#skip/\discretionary{-}{}{}\penalty\#M\hskip\z#skipwww.example.com
or
http:/\penalty\exhyphenpenalty/\penalty\exhyphenpenaltywww.example.com
I'd like to use a general solution that works in both cases, where the editor of the documentation can still use normal ReST and doesn't have to worry about latex.
Any idea how to get classic slashes in URLs and breakable slashes everywhere else?
You have not really given data and source code and only asked for an idea, so I take the liberty of only sketching a solution in pseudo code:
Split the document into a list of strings at each position of a space using .split()
For each string, check whether it is an URL by comparing its left side to http:// (and maybe also ftp://, https:// or similar tags)
Do replacements, but only in strings which are no URLs
Recombine all strings including the spaces again, using a command such as " ".join(my_list)
One way to do it, might be to write a Transform subclass. And then use add transform in setup(app) to use it in every read.
I could use DefaultSubstitutions from transforms.py as template for my own class.

Ruby execute shell command and get array

I'm getting a string of few lines from the shell. Is it possible to get an Array with each line being its element?
Sure, depending on the output you could just split it. For example:
lines = `ls`.split
This solution is independent of the method you're using to execute the program. As long as you get the complete string you can split it.
The original question was splitting on lines, and the split function, by default, splits on white space. While that may be sufficient, you may want to pass in a regular expression, as in:
`ls -l`.split(/$/)
Which returns each line in a separate element in the array. However, it doesn't get rid of the initial carriage return or line feed. For that, you will want to use the map function to iterate over the array and apply strip to each, as in:
`ls -l`.split(/$/).map(&:strip)

Using Ruby on a string, how can I slice between two parts of the string using RegEx?

I just want to save the text between two specific points in a string into a variable. The text would look like this:
..."content"=>"The text I want to save to a variable"}]...
I suppose I would have to use scan or slice, but not exactly sure how to pull out just the text without grabbing the RegEx identifiers before and after the text. I tried this, but it didn't work:
var = mystring.slice(/\"content\"\=\>\".\"/)
This should do the job
var = mystring[/"content"=>"(.*)"/, 1]
Note that:
.slice aliases []
none of the characters you escaped are special regexp characters where you're using them
you can "group" the bit you want to keep with ()
.slice / [] take a second parameter to pick a matched group
your_text = '"content"=>"The text I want to save to a variable"'
/"content"=>"(?<hooray>.*)"/ =~ your_text
Afterwards, hooray local variable will be magically set to contain your text. Can be used to set multiple variables.
This regex will match your string:
/\"content\"=>\"(.*)\"/
you can try rubular.com for testing
It looks like you're trying to truncate a sentence. You can split the sentence either on punctuation, or even on words.
mystring.split(".")
mystring.split("word")

Is there a gsub alternative for ruby that can run a method everytime a match is made?

I need to run a script that will rewrite the folder path of a html file, there will be many matches, and the replacement string needs to be computed, something like
"Html string".gsub( /images/([a-zA-Z0-9\-]+)/, "/images/#{replacement_method($1)}/" )
only problem is gsub, at least to my knowledge, will only run the replacement_method() once, and I need it to run every time since the desired replacement string changes occurring to the folder string.
Is there a way to make this work with gsub? something like the replace function in wordpress?
Any other realistic approaches?
You have to use a block:
"Html string".gsub( /images/(folder)/) { |match| "/images/#{replacement_method(match)}/" }
The block will be called for each match in the string.

Inserting characters before whatever is on a line, for many lines

I have been looking at regular expressions to try and do this, but the most I can do is find the start of a line with ^, but not replace it.
I can then find the first characters on a line to replace, but can not do it in such a way with keeping it intact.
Unfortunately I donĀ“t have access to a tool like cut since I am on a windows machine...so is there any way to do what I want with just regexp?
Use notepad++. It offers a way to record an sequence of actions which then can be repeated for all lines in the file.
Did you try replacing the regular expression ^ with the text you want to put at the start of each line? Also you should use the multiline option (also called m in some regex dialects) if you want ^ to match the start of every line in your input rather than just the first.
string s = "test test\ntest2 test2";
s = Regex.Replace(s, "^", "foo", RegexOptions.Multiline);
Console.WriteLine(s);
Result:
footest test
footest2 test2
I used to program on the mainframe and got used to SPF panels. I was thrilled to find a Windows version of the same editor at Command Technology. Makes problems like this drop-dead simple. You can use expressions to exclude or include lines, then apply transforms on just the excluded or included lines and do so inside of column boundaries. You can even take the contents of one set of lines and overlay the contents of another set of lines entirely or within column boundaries which makes it very easy to generate mass assignments of values to variables and similar tasks. I use Notepad++ for most stuff but keep a copy of SPFSE around for special-purpose editing like this. It's not cheap but once you figure out how to use it, it pays for itself in time saved.

Resources