Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
I'm writing a scraper program. I collect all the links on a page. They might be relative paths. For example:
foo.html
/foo.html
../foo.html
../../foo.html
I can concat them to the url of the page (basepath) they are on, but that isn't completely straightforward. For example:
http://www.example.com/foo + /bar.html = http://www.example.com/bar.html
http://www.example.com/bla/?foo=bar + ../foo.html = http://www.example.com/foo.html
I am wondering if there is an Erlang Lib, C Lib or a CLI program that can figure out the right concatenation for me?
As far as CLI goes, wget has the --base switch:
-B URL
--base=URL
Resolves relative links using URL as the point of reference, when reading links from an HTML file specified via the -i/--input-file option (together with --force-html, or when the input file was fetched remotely from a server describing it as HTML). This is equivalent to the presence of a "BASE" tag in the HTML input file, with URL as the value for the "href" attribute.
For instance, if you specify http://foo/bar/a.html for URL, and Wget reads ../baz/b.html from the input file, it would be resolved to http://foo/baz/b.html.
So if you exec'd it to output the file to stdout and read it with your erlang script, that should work.
You can use ex_uri:resolve/2.
Related
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 2 years ago.
Improve this question
When receiving ZPL raw labels (text files) from a third party, I would like to run a regular expression on them to validate them.
Rather than a 100% strict validation, I am mostly looking to avoid sending to the printer obviously wrong files, such as completely unrelated text files, or binary files.
I am not familiar enough with ZPL/ZPL-II and I would prefer to use an existing resource for that. Would you know if one exists?
I've never heard of one. But it wouldn't be too hard to validate. ZPL is pretty straightforward, especially if there's a very defined set that you send to your printer...
The ZPL command characters are ~ for immediate commands an ^ for formatting commands.
Label formats must begin with a ^XA and end with a ^XZ.
Download commands typically begin with a ~D<something>, like ~DY, ~DG, ~DT, ~DC etc.
There are a couple status commands like ~HI and ~HS
There may be a couple other edge cases, but these are the most common commands.
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I am migrating a shop over for a client.
I have to pull all the old image files off her 'shop' which has no FTP access.
It allowed me to export a list of filenames/urls. My plan was to load them up in Firefox and use "Downloadthemall" to simply download all the files. (Around 2000). However about 1 1/3 have [ and ] in.
i.e.
cdn.crapshop.com/images/image[1].jpg
Downloadthemall freaks out and only reads it as
cdn.crapshop.com/images/image
And won't download it because it isn't a file.
Anyone got any ideas of an alternative way to pull a list like this?
See this solution that explains why the example URL you provided is invalid: Validation. After you look at that post you'll see that, in the answer provided by #good, you have to encode characters that are not according to the specification using percent encoding, so the webserver will understand them.
This calls for python... see this post: Percent encoding in python
And then we can put it all together in a script, which you will use to read from stdin and output to stdout: python script.py < input > output.out.
import urllib, sys
while 1:
try:
line = sys.stdin.readline()
except KeyboardInterrupt:
break
if not line:
break
print urllib.quote(line.strip(), safe=':').strip('\'')
Then, hopefully, download them all will parse that list of files (the input to that script is supposed to be a list of url's separated by a newline) that have been corrected by the script.
You may be interested in this post as well: Downloading files with python. Which shows you how to download files (web pages in particular) using python.
Good luck!
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
Are there any tools out there that will index source code, client side, and provide blazing fast search results?
How can I index our internal source code? is related but covers server side tools.
Everything and Locate32 are nice indexing-tools on the windows platform. Just one problem, they only index the file-names.
DocFetcher is another solution, it tries to index the content of the files, but have big memory issues as it cannot index the content of bigger files, and just skips them
I'm also on the search for something to index my data, and i want some tool like locate32 wich is supernice to integrate with the windows shell, but it would be nice to get it to index the content of files also, only brute word indexing, no magic to be done to the data, but let me do plain wildcard searches, like words starting with, ending with, and containing.
But the search is still on.. (for an app, that is..)
Install ctags.
Then ctags -R in the root of your source tree. Many editors, including Vim, can use the resulting tags file to give near-instant search results.
I know this is an old question, but maybe this will help someone else.
Take a look at CodeIDX: http://sourceforge.net/projects/codeidx/.
Using CodeIDX you can index multiple directories using filetype filters and search the created index.
You can open multiple searches at the same time and the results can be viewed in a preview.
Using GNU Global you can get browsable, searchable source code. You can run this locally too or use all the tools that go with it (like less to go straight to a function definition).
See http://www.tamacom.com/tour/kernel/linux/ for an example of the Linux Kernel.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
Improve this question
I'm using this Songkick wrapper and it works for getting grabbing events by artist like so:
sk.events(:artist_name => "Balimurphy")
But I'm having trouble grabbing events by location. Songkick is expecting the query to look like this
location=geo:lat,lng
I'm having trouble finding the right syntax to pass lng=-73.5833, lat=45.5. Here are some variations I've tried:
sk.events(:location => :geo=>{:lng=>"-73.5833", :lat=>"45.5"})
sk.events(:location => {:geo=>lng=-73.5833, lat=45.5})
sk.events(:location => "geo=-73.5833,45.5")
Any ideas?
Where can I find documentation that might cover this?
I've been looking through the following 3 sources:
https://github.com/jrmehle/songkickr
http://rubydoc.info/gems/songkickr/0.1.0/frames
http://www.songkick.com/developer/event-search
and I think you need to change your last attempt to
sk.events(:location => "geo:-73.5833,45.5") # geo:
One example on the songkick page has location=ip:94.228.36.39. This makes me think that it for location, it wants location=type:data.
I assume that the hash you pass gets turned into key=value (just looking at the songkick page and your working example).
Therefore, you would want your value to be "geo:-73.5833,45.5" and your key to be "location".
I hope this works for you!
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
Improve this question
I was wondering if there is a tool (automator script or a third party) to generate code for simple scenarios like add another property. I don't like going to two or three places and write the same thing over and over again. instead I want to say "I want a new property of type int with name X" and it generates the lines in .h and .m files for me in one go.
I haven't actually used either, but xobjc is free (though requires you to do some code annotations) and Accessorizer looks interesting if somewhat complicated to setup.