I am trying to fetch data from an rss feed but I am having a hard time in getting the image of a blog content/post. It seems like the <img> tag is located inside of a <content:encoded><![CDATA[...]]></content:encoded> element.
I'm not quite sure what to do with this. Any help is much appreciated.
it looks like this
<content:encoded><![CDATA[
<p><img class="class1" title="hello world" src="http://www.mysite.com/images/myPhoto.jpg" alt="" width="550" height="227" /></p>
<p><p>Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.</p>
]]></content:encoded>
The content of an RSS feed can be anything. Many blogs simply put HTML in the content so you would have to parse the HTML, which can be different every time.
To parse HTML you could turn it into XHTML and then use XPath to query it for the elements you want to find.
If you need more help here, you' ll need to post the structure of the content (if it is known)
Related
As you know _test.go are ignored when go projects is build and mock package is only imported by _test.go files so if these files are not include in builded project why to simply include the mock package.
So was wondering how to ignore the files inside it when building project.
Tried adding suffix _test.go to the files in mock package but got but an error "MockStruct not declared by package mock" when used.
Also tried to use build constraints
//go:build ignore
Got same error "MockStruct not declared by package mock"
Am i missing something here?
Is using build constraints the only way?
If your mock is being used only on test files it is not imported when building the project. Go compiler does not include tests and its dependencies when building.
Try this as an example:
Build the following code;
Check its binary size;
Remove the sample_test.go file;
Build again and check its binary size;
Size before and after tests should not be different, and it proves that nothing from test is included in the build.
sample.go
package main
import "fmt"
type SampleInterface interface {
DoSomething()
}
type Sample struct {
Name string
}
func main() {
s := Sample{}
CallDoSomething(&s)
}
func (s *Sample) DoSomething() {
fmt.Println("Do Something implementation ", s.Name)
}
func CallDoSomething(si SampleInterface) {
si.DoSomething()
}
sample_test.go
package main
import (
"fmt"
"testing"
)
type sample_mock struct {
Name string
}
func (s *sample_mock) DoSomething() {
fmt.Println("Do Something implementation", s.Name)
}
func TestCallDoSomething(t *testing.T) {
s := sample_mock{
Name: "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.",
}
CallDoSomething(&s)
}
I have a text file ar. 50 GB size.
I used to process it through TextPipe but atm only mac is available and no TextPipe access.
Is it possible to initiate regex search in this file with good results saving to some other file per matching line?
I was thinking about vim editor but have no sufficient knowledge on where to search for.
Would appreciate any suggestions.
As an example let's assume that I have the code below in my initial.txt file and I want to save lines with "Lorem" in line processed.txt.
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod
tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,
quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo
consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse
cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non
proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod
tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,
quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo
consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse
cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non
proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod
tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,
quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo
consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse
cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non
proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
For fixed strings use fgrep:
fgrep Lorem initial.txt > processed.txt
For regular expressions use grep and egrep (they have slightly different regexp syntax).
Vim is "genetically" related to other text-processing tools, such as sed or grep. And it also has an embedded sophisticated scripting language, so it is perfectly capable of batch text processing.
But Vim is an interactive text editor, so it feels a little wrong to use it merely as a replacement of awk or grep. However, if you're going to learn and use it for both editing and scripting, it's elegant and powerful.
To get some taste of Vim, you can solve your problem as follows (typing ':' in normal mode will automatically switch into command mode):
:e initial.txt
:g/Lorem/.w! >>processed.txt
I was thinking about vim editor but have no sufficient knowledge on where to search for. Would appreciate any suggestions.
The main problem with Vim is that you have to start from the very beginning, i.e. to learn how to open, edit and save files, and even how to properly exit the application. So you should download and install it and run vimtutor. Next, you should get used to Vim's embedded help system (:h user-manual) which by far is the best Vim's feature.
If you look for more books and tutorials, you can start from here. IMHO, Steve Oualline's "Vi IMproved" is still the best for beginners; and Drew Neil's "Practical Vim" is highly recommended for advanced vimmers.
Is it possible to split pages vertically using wkhtmltopdf? This would basically turn a single page into two pages.
I've drawn a little image, maybe that makes it a bit clearer.
I had to face a similar issue in my project.
You have 2 ways to solve the problem:
One way is to solve it using CSS, however the version of webkit included in wkhtmltopdf (at least the stable version) seems to be a bit old and does not support CSS multi-columns directives very well, see here.
For future reference I copy&paste the example template included in the linked issue:
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta charset="UTF-8">
<style type="text/css">
body {
margin: 0;
padding: 0;
border: none;
width: 17cm;
}
.container {
/* You *must* define a fixed height which is
large enough to fit the whole content,
otherwise the layout is unpredictable. */
height: 28em;
/* Width and count aren't respected, but you
have to give at least some dummy value (??). */
-webkit-columns: 0 0;
/* This is the strange way to define the number of columns:
50% = 2 columns, 33% = 3 columns 25% = 4 columns */
width: 33%;
/* Gap and rule do work. */
-webkit-column-gap: 2em;
-webkit-column-rule: 1px solid black;
text-align: justify;
}
</style>
</head>
<body>
<div class="container">
<h1>An Article</h1>
<p>
1. Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod
tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,
quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo
consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse
cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non
proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
</p>
<p>
2. Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod
tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,
quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo
consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse
cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non
proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
</p>
<p>
3. Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod
tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,
quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo
consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse
cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non
proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
</p>
<p>
4. Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod
tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,
quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo
consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse
cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non
proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
</p>
</div>
</body>
</html>
Another option (the one I ended up using) is to setup the layout of the page using a table with 2..n columns. You can then use a template engine or even JavaScript to fill the page with content and split the pages.
This approach is simple and works well if the content in the page has predictable size, in fact you cannot rely on wkhtmltopdf for splitting the content into several pages, you must do it yourself with the code.
Approach 1) is quite buggy but might be worth a shot if you're dealing with text, which might not be very suitable for approach 2).
If your content is regular it will be reasonably easy to hop on approach 2) and lay down the elements and split the pages using the code.
I have to display multiple long strings (with different length), but I can only display chunks of strings that need them to be between 275 and 295 characters.
So if I have a 3000 words string, It'd be displayed in about 10 pieces.
I'm looking for a way to find the next blank.
For example:
if str[275] != " "
# find next blank
p str[0..next_blank]
else
p str[0..275]
end
I thought of finding the index of the next blank in the 275-295th characters range, but I couldn't find how to do it in Ruby.
Any help will be much appreciated !
Rails has a method word_wrap which uses a simple regular expression:
str = 'Lorem ipsum dolor sit amet, consectetur adipisici elit, sed eiusmod tempor incidunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquid ex ea commodi consequat. Quis aute iure reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint obcaecat cupiditat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.'
puts str.gsub(/(.{1,80})(\s+|$)/, "\\1\n")
Output:
Lorem ipsum dolor sit amet, consectetur adipisici elit, sed eiusmod tempor
incidunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud
exercitation ullamco laboris nisi ut aliquid ex ea commodi consequat. Quis aute
iure reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla
pariatur. Excepteur sint obcaecat cupiditat non proident, sunt in culpa qui
officia deserunt mollit anim id est laborum.
The regular expression matches (and captures) up to 80 characters (.{1,80}) that are followed by whitespace or end-of-line (\s+|$).
Not using regular expresions, tear the input apart and put it back together:
str = 'Lorem ipsum dolor sit amet, consectetur adipisici elit, sed eiusmod tempor incidunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquid ex ea commodi consequat. Quis aute iure reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint obcaecat cupiditat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.'
def reformat_wrapped(s, width=78)
lines = []
line = ""
s.split(/\s+/).each do |word|
if line.size + word.size >= width
lines << line
line = word
elsif line.empty?
line = word
else
line << " " << word
end
end
lines << line if line
return lines.join "\n"
end
#=>puts reformat_wrapped(str, 78)
Lorem ipsum dolor sit amet, consectetur adipisici elit, sed eiusmod tempor
incidunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis
nostrud exercitation ullamco laboris nisi ut aliquid ex ea commodi consequat.
Quis aute iure reprehenderit in voluptate velit esse cillum dolore eu fugiat
nulla pariatur. Excepteur sint obcaecat cupiditat non proident, sunt in culpa
qui officia deserunt mollit anim id est laborum.
We are using Nokogiri gem and the html we get from the test editor is saved using Nokogiri Nokogiri::HTML::fragment(html_text).to_html converts into proper html tags and is getting saved to the database. But we are having some liquid tags which when rendered substitutes the value in the place where tags are added.
eg. html code snippet
<body>
<div>
<p>
Lorem ipsum dolor sit amet, consectetur adipisicing elit.Dolorem quam itaque, dolore esse labore dolorum inventore optio earum iure explicabo impedit eveniet perspiciatis nobis vero culpa aliquid, iusto saepe sunt.</p>
{{some_link}}
<div>
{{payment_link}}
</div>
</div>
</body>
once we convert it into the html tags using nokogiri it gets text in the url gets encoded(href="{{payment_link}}"). Is there a way to escape url encoding for the liquid tags?
This is how the html code when rendered and saved looks like in db
Output data
<p>Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod<br>\ntempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,<br>\nquis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo<br>\nconsequat. Duis aute irure dolor in reprehenderit in voluptate velit esse<br>\ncillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non<br>\nproident, sunt in culpa qui officia deserunt mollit anim id est laborum.</p>\n\n<p> </p>\n\n<p>{{payment_link}}</p>"
Expected data
<p>Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod<br>\ntempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,<br>\nquis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo<br>\nconsequat. Duis aute irure dolor in reprehenderit in voluptate velit esse<br>\ncillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non<br>\nproident, sunt in culpa qui officia deserunt mollit anim id est laborum.</p>\n\n<p> </p>\n\n<p>{{payment_link}}</p>