I have a need to hold a "Purchase Contract" type of report in my website. I am using Sinatra using erb files to deliver content. I would like to email the current report (the versions will change) out when people sign up for various items.
I'm thinking I can house it in the database, or an external file, in some kind of format, so I can do the both:
import it into an erb file for presentation on the web
use it in an email so it's readable in text format
So basically I need it in a format that's basic as possible, but it has to translate into HTML (erb) and text.
What are my options with the format of this file? And how can I translate that into HTML? I've looked at markdown and it's not very pretty with the gems that I find that translate to text. Seeing that it needs plain text as well as HTML I'm a bit lost as to how to get this done.
File Snippet
Privacy Policy
Updated Feb 20, 2019
Website.com (“Website”) is a private business. In this Privacy Statement the terms “we” and “our” refer to Website. This Privacy Statement explains Website’s practices regarding personal information of our users and visitors to this website (the “Website”), as well as those who have transactions with us through telephone, Internet, faxes and other means of communications.
Website’s Commitment to Privacy
At Website, we are committed to respecting the privacy of our members and our Website visitors. For that reason we have taken, and will continue to take, measures to help protect the privacy of personal information held by us.
This Privacy Statement provides you with details regarding: (1) how and why we collect personal information; (2) what we do with that information; (3) the steps that we take to help ensure that access to that information is secure; (4) how you can access personal information pertaining to you; and (5) who you should contact if you have questions and concerns about our policies or practices.
Solution: Save the file as HTML and use this gem for conversion into text:
https://github.com/soundasleep/html2text_ruby
Works fine if the HTML is simple enough.
Remaining: Still have the issue as using the HTML file as a partial.
Solved:
#text = markdown File.read('views/privacy.md')
So park the source file as a markdown file, which can translate to HTML. When I need the email version, I need to translate to HTML then to text using the HTML2text gem. https://rubygems.org/gems/html2text
As I understand it, you have a portion of text (stored in a database or a file, it doesn't really matter where) and you want to:
serve this up formatted as HTML via a webpage
send it plain via email
Assuming a standard Sinatra project layout where the views directory lives in the project dir, e.g.
project-root/
app.rb
views/
and a route to deliver the text in app.rb:
get "/sometext" do
end
If you put the erb template in the views directory and as the last line of the route make a call to the erb template renderer you should get the output in HTML. e.g.
project-root/
app.rb
views/
sometext.erb # this is the erb template
In the Sinatra app
# app.rb
# I'm assuming you've some way to differentiate
# bits of text, e.g.
get "/sometext/:id" do |id|
#text = DB.sometext.getid id # my fake database call
erb :sometext # <- this will render it, make it the final statement of the block
# make sure #text is in the template
# else use locals, e.g.
# erb :sometext, :locals => { text: #text }
end
Now when a user visits http://example.org/sometext/485995 they will receive HTML. Emailing the text to the user could be triggered via the website or some other method of your choice.
Related
I need to download an envelope from docusign by populating the tab data in ruby on rails.
I have used get_combined_document_from_envelope but it does not seems to get all the data.
def method_name
output_pdf = docusign.get_combined_document_from_envelope(
envelope_id: document.external_key,
local_save_path: "docusign_docs/file_name.pdf",
return_stream: false
)
end
I need the output file to have all the tabs populated.
The pdf would have the data inside the document as part of the completed doc (assuming it's visible). If you want to get the data downloaded separately, you would need to use API calls to get the envelope and get the tabs information. see this https://developers.docusign.com/esign-rest-api/guides/features/tabs
I am creating an application where members as well non members can download books depending upon the category of book available for both users.
I linked the books to BASE_URL/downloadPDF/11 for prompt download, passing the Id of the book, so that the file location is invisible to user.
Now what I want is no one can get the book downloaded directly pasting the book url(BASE_URL/downloadPDF/11) in browser address bar.
I tried making function private but It did not work for me.
you can encode your book url. so no one can get the book downloaded directly.
Let’s say we want to encode the following string with base64:
after conversion your book url look like below string
RW5jb2RpbmcgYW5kIERlY29kaW5nIEVuY3J5cHRlZCBQSFAgQ29kZQ==
so its difficult for end user to interpret this url.
you can also decode url using base64_decode
There are many http request tools in ruby, httparty, rest-client, etc. But most of them get only the page itself. Is there a tool that gets the html, javascript, css and images of a page just like a browser does?
Anemone comes to mind, but it's not designed to do a single page. It's capable if you have the time to set it up though.
It's not hard to retrieve the content of a page using something like Nokogiri, which is a HTML parser. You can iterate over tags that are of interest, grab their "SRC" or "HREF" parameters and request those files, storing their content on disk.
A simple, untested and written-on-the-fly, example using Nokogiri and OpenURI would be:
require 'nokogiri'
require 'open-uri'
html = open('http://www.example.com').read
File.write('www.example.com.html', html)
page = Nokogiri::HTML(html)
page.search('img').each do |img|
File.open(img['src'], 'wb') { |fo| fo.write open(img['src']).read }
end
Getting CSS and JavaScript are a bit more difficult because you have to determine whether they are embedded in the page or are resources and need to be retrieved from their sources.
Merely downloading the HTML and content is easy. Creating a version of the page that is stand-alone and reads the content from your local cache is much more difficult. You have to rewrite all the "SRC" and "HREF" parameters to point to the file on your disk.
If you want to be able to locally cache a site, it's even worse, because you have to re-jigger all the anchors and links in the pages to point to the local cache. In addition you have to write a full site spider which is smart enough to stay within a site, not follow redundant links, obey a site's ROBOTS file, and not consume all your, or their, bandwidth and get you banned, or sued.
As the task grows you also have to consider how you are going to organize all the files. Storing one page's resources in one folder is sloppy, but the easy way to do it. Storing resources for two pages in one folder becomes a problem because you can have filename collisions for different images or scripts or CSS. At that point you have to use multiple folders, or switch to using a database to track the locations of the resources, and rename them with unique identifiers, and rewrite those back to your saved HTML, or write an app that can resolve those requests and return the correct content.
I am having trouble with some Ruby CGI.
I have a home page (index.cgi) which is a mix of HTML and Ruby, and has a login form in it.
On clicking on the Submit button the POST's action is the same page (index.cgi), at which point I check to make sure the user has entered data into the correct fields.
I have a counter which increases by 1 each time a field is left empty. If this counter is 0 I want to change the current loaded page to something like contents.html.
With this I have:
if ( errorCount > 0 )
do nothing
else
....
end
What do I need to put where I have the ....?
Unfortunately I cannot use any frameworks as this is for University coursework, so have to use base Ruby.
As for using the CGI#header method as you have suggested, I have tried using this however it is not working for me.
As mentioned my page is index.cgi. This is made of a mixture of Ruby and HTML using "here doc" statements.
At the top of my code page I have my shebang line, following by a HTML header statement.
I then do the CGI form validation part, and within this I have tried doing something like: print this.cgi( { 'Status' => '302 Moved', 'location' =>
'{http://localhost:10000/contents.html' } )
All that happens is that this line is printed at the top of the browser window, above my index.cgi page.
I hope this makes sense.
To redirect the browser to another URL you must output an 30X HTTP response that contains the Location: /foo/bar header. You can do that using the CGI#header method.
Instead of dealing with these details that you do not yet master, I suggest you use a simple framework as Sinatra or, at least, write your script as a Rack-compatible application.
If you really need to use the bare CGI class, have a look at this simple example: https://github.com/tdtds/amazon-auth-proxy/blob/master/amazon-auth-proxy.cgi.
How could a client detect if a server is using Search Engine Optimizing techniques such as using mod_rewrite to implement "seo friendly urls."
For example:
Normal url:
http://somedomain.com/index.php?type=pic&id=1
SEO friendly URL:
http://somedomain.com/pic/1
Since mod_rewrite runs server side, there is no way a client can detect it for sure.
The only thing you can do client side is to look for some clues:
Is the HTML generated dynamic and that changes between calls? Then /pic/1 would need to be handled by some script and is most likely not the real URL.
Like said before: are there <link rel="canonical"> tags? Then the website likes to tell the search engine, which URL of multiple with the same content it should use from.
Modify parts of the URL and see, if you get an 404. In /pic/1 I would modify "1".
If there is no mod_rewrite it will return 404. If it is, the error is handled by the server side scripting language and can return a 404, but in most cases would return a 200 page printing an error.
You can use a <link rel="canonical" href="..." /> tag.
The SEO aspect is usually on words in the URL, so you can probably ignore any parts that are numeric. Usually SEO is applied over a group of like content, such that is has a common base URL, for example:
Base www.domain.ext/article, with fully URL examples being:
www.domain.ext/article/2011/06/15/man-bites-dog
www.domain.ext/article/2010/12/01/beauty-not-just-skin-deep
Such that the SEO aspect of the URL is the suffix. Algorithm to apply is typify each "folder" after the common base assigning it a "datatype" - numeric, text, alphanumeric and then score as follows:
HTTP Response Code is 200: should be obvious, but you can get a 404 www.domain.ext/errors/file-not-found that would pass the other checks listed.
Non Numeric, with Separators, Spell Checked: separators are usually dashes, underscores or spaces. Take each word and perform a spell check. If the words are valid - including proper names.
Spell Checked URL Text on Page if the text passes a spell check, analyze the page content to see if it appears there.
Spell Checked URL Text on Page Inside a Tag: if prior is true, mark again if text in its entirety is inside an HTML tag.
Tag is Important: if prior is true and tag is <title> or <h#> tag.
Usually with this approach you'll have a max of 5 points, unless multiple folders in the URL meet the criteria, with higher values being better. Now you can probably improve this by using a Bayesian probability approach that uses the above to featurize (i.e. detects the occurrence of some phenomenon) URLs, plus come up with some other clever featurizations. But, then you've got to train the algorithm, which may not be worth it.
Now based on your example, you also want to capture situations where the URL has been designed such that a crawler will index because query parameters are now part of the URL instead. In that case you can still typify suffixes' folders to arrive at patterns of data types - in your example's case that a common prefix is always trailed by an integer - and score those URLs as being SEO friendly as well.
I presume you would be using of the curl variants.
You could try sending the same request but with different "user agent" values.
i.e. send the request one using user agent "Mozzilla/5.0" and a second time using User Agent "Googlebot" if the server is doing something special for web crawlers then there should be a different response
With the frameworks today and url routing they provide I don't even need to use mod_rewrite to create friendly urls such http://somedomain.com/pic/1 so I doubt you can detect anything. I would create such urls for all visitors, crawlers or not. Maybe you can spoof some bot headers to pretend you're a known crawler and see if there's any change. Dunno how legal that is tbh.
For the dynamic url's pattern, its better to use <link rel="canonical" href="..." /> tag for other duplicate