I have a filename declared like this;
filename = Time.now.strftime("%H:%M:%S")+'.json'
and the error occurs when I do this
File.open(filename,'w') do |f|
f.write(rsp)
end
Error is in `initialize' : Invalid argument - 18:28:20.json which I assume is beacuse of a filename. When I do some 'normal' name everything works OK, so any tips?
Try:
filename = Time.now.strftime("%H_%M_%S")+'.json'
Windows uses the colon as a drive letter separator;
see this SO question for other special chars.
Use a different separator. You might be able to escape it, but IMO, not really worth it.
FWIW, for timestamped filenames I tends towards yyyymmdd-hhmmss or similar anyway.
For things like files it's always good to include more-complete info in the question--that naming conventions are different across OSes is well-known.
Related
My regexp behaves just like I want it to on http://regexr.com, but not like I want it in irb.
I'm trying to make a regular expression that will match the following:
A forward slash,
then 2 * any number of random characters (i.e. `.*`),
up to but not including another /
OR the end of the string (whichever comes first)
I'm sorry as that was probably unclear, but it's my best attempt at an English translation.
Here's my current attempt and hopefully that will give you a better idea of what I'm trying to do:
/(\/.*?(?=\/|$)){2}/
The usage scenario is I want to be able to take a path like /foo/bar/baz/bin/bash and shorten it to the level I'm at in the filesystem, in this case the second level (/foo/bar). I'm trying to do this using the command path.scan(-regex-).shift.
The usage scenario is I want to be able to take a path like /foo/bar/baz/bin/bash and shorten it to the level I'm at in the filesystem, in this case the second level (/foo/bar)
Ruby already has a class for handling paths, Pathname. You can use Pathname#relative_path_from to do what you want.
require 'pathname'
path = Pathname.new("/foo/bar/baz/bin/bash")
# Normally you'd use Pathname.getwd
cwd = Pathname.new("/foo/bar")
# baz/bin/bash
puts path.relative_path_from(cwd)
Regexes just invite problems, like assuming the path separator is /, not honoring escapes, and not dealing with extra /. For example, "//foo/bar//b\\/az/bin/bash". // is particularly common in code which joins together directories using paths.join("/") or "#{dir}/#{file}.
For completeness, the general way you match a single piece of a path is this.
%r{^(/[^/]+)}
That's the beginning of the string, a /, then 1 or more characters which are not /. Using [^/]+ means you don't have to try and match an optional / or end of string, a very useful technique. Using %r{} means less leaning toothpicks.
But this is only applicable to a canonicalized path. It will fail on //foo//b\\/ar/. You can try to fix up the regex to deal with that, or do your own canonicalization, but just use Pathname.
There are many sites (such as Stackoverflow) that has the title of the page in the URL.
I am looking for the algorithm in which they are using in order to avoid illegal URL characters. ( I dont want URL encoding, I want replace/remove algo)
like 'How is Illegal char's URL working?' will become 'How-is-Illegal-chars-URL-working'
Thanks!
The algorithm to do this is generally called 'slugify', because it turns a string into a 'slug' to be used in a URL. Searching for that should give you plenty of useful implementations.
No idea how SO does it, but I would just strip every non-alphanumeric character and replace spaces with underscores.
In Python:
def cleanTitle(title):
temp = ''
for character in title.lower():
if character in 'abcdefghijklmnopqrstuvwxyz1234567890_-+/<>,.=[]{}()\|!##$%^&':
temp += character
return temp
I see you are working in C#. I don't know C#, so you'll have to translate this code. I doubt it's hard to do, though.
I'm getting some string data from the web, and I suspect that it's not always what it says it is. I don't know where the problem is, and I just don't care any more. From day one on this project I've been fighting Ruby string encoding. I really want some way to say: "Here's a string. What is it?", and then use that data to get it to UTF-8 so that it doesn't explode gsub() 2,000 lines down in the depths of my app. I've checked out rchardet, but even though it supposedly works for 1.9 now, it just blows up given any input with multiple bytes... which is not helpful.
You can't really detect the encoding. You can only assume it.
For the most Western languages applications, the following construct
will work. The traditional encoding usually is "ISO-8859-1". The new and preferred encoding is UTF-8. Why not simply try to encode it with UTF-8 and fallback with the old encoding
def detect_encoding( str )
begin
str.encode("UTF-8")
"UTF-8"
rescue
"ISO-8859-1"
end
end
It is impossible to tell from a string what encoding it is in. You always need some additional metadata that tells you what the string's encoding is.
If you get the string from the web, that metadata is in the HTTP headers. If the HTTP headers are wrong, there is absolutely nothing that you or Ruby or anyone else can do. You need to file a bug with the webmaster of the site where you got the string from and wait till he fixes it. If you have a Service Level Agreement with the website, file a bug, wait a week, then sue them.
Old question, but chardet works on 1.9: http://rubygems.org/gems/chardet
why not try use https://github.com/brianmario/charlock_holmes to get the exact encoding. Then also use it to convert to UTF8
require 'charlock_holmes'
class EncodeParser
def initialize(text)
#text = text
end
def detected_encoding
CharlockHolmes::EncodingDetector.detect(#text)[:encoding]
end
def convert_to_utf8
CharlockHolmes::Converter.convert(#text, detected_encoding, "UTF-8")
end
end
then just use EncodeParser.new(text).detected_encoding or EncodeParser.new(text). convert_to_utf8
We had some fine experience with ensure_encoding. It actually does the job for us to convert resource files having unknown encoding to UTF-8.
The README will give you some hints which options would be a good fit for your situation.
I have never tried chardet since ensure_encoding did the job just fine for us.
I covered here how we use ensure_encoding.
Try setting these in your environment.
export LC_ALL=en_US.UTF-8
export LC_CTYPE=en_US.UTF-8
Try ruby -EBINARY or ruby -EASCII-8BIT to command line
Try adding -Ku or -Kn to your ruby command line.
Could you paste the error message ?
Also try this: http://github.com/candlerb/string19/blob/master/string19.rb
Might try reading this: http://yehudakatz.com/2010/05/05/ruby-1-9-encodings-a-primer-and-the-solution-for-rails/
I know it's an old question, but in modern versions of Ruby it's as simple as str.encoding. You get a return value something like this: #Encoding:UTF-8
Is there an api in windows that retrieves the server name from a UNC path ? (\\server\share)
Or do i need to make my own ?
I found PathStripToRoot but it doesn't do the trick.
I don't know of a Win32 API for parsing a UNC path; however you should check for:
\\computername\share
\\?\UNC\computername\share (people use this to access long paths > 260 chars)
You can optionally also handle this case: smb://computername/share and this case hostname:/directorypath/resource
Read here for more information
This is untested, but maybe a combination of PathIsUNC() and PathFindNextComponent() would do the trick.
I don't know if there is a specific API for this, I would just implement the simple string handling on my own (skip past "\\" or return null, look for next \ or end of string and return that substring) possibly calling PathIsUNC() first
If you'll be receiving the data as plain text you should be able to parse it with a simple regex, not sure what language you use but I tend to use perk for quick searches like this. Supposing you have a large document containing multiple lines containing one path per line you can search on \\'s I.e
m/\\\\([0-9][0-9][0-9]\.(repeat 3 times, of course not recalling ip address requirements you might need to modify the first one for sure) then\\)? To make it optional and include the trailing slash, and finally (.*)\\/ig it's rough but should do the trick, and the path name should be in $2 for use!
I hope that was clear enough!
I'm looking for a character to use a filename delimiter (I'm storing multiple filenames in a plaintext string). Windows seems not to allow :, ?, *, <, >, ", |, / and \ in filenames. Obviously, \ and / can't be used, since they mean something within a path. Is there any reason why any of those others shouldn't be used? I'm just thinking that, similar to / or \, those other disallowed characters may have special meaning that I shouldn't assume won't be in path names. Of those other 7 characters, are any definitely safe or definitely unsafe to use for this purpose?
The characters : and " are also used in paths. Colon is the drive unit delimiter, and quotation marks are used when spaces are part of a folder or file name.
The charactes * and ? are used as wildcards when searching for files.
The characters < and > are used for redirecting an application's input and output to and from a file.
The character | is used for piping output from one application into input of another application.
I would choose the pipe character for separating file names. It's not used in paths, and its shape has a natural separation quality to it.
An alternative could be to use XML in the string. There is a bit of overhead and some characters need encoding, but the advantage is that it can handle any characters and the format is self explanatory and well defined.
Windows uses the semicolon as a filename delimiter: ;. look at the PATH environment variable, it is filled with ; between path elements.
(Also, in Python, the os.path.pathsep returns ";", while it expands to ":" on Unix)
I have used * in the past. The reason for portability to Linux/Unix. True, technically it can be used on those fileysystems too. In practice, all common OSes use it as a wildcard, thus it's quite uncommon in filenames. Also, people are not surprised if programs do break when you put a * in a filename.
Why dont you use any character with ALT key combination like ‡ (Alt + 0135) as delimiter ?
It is actually possible to create files programmatically with every possible character except \. (At least, this was true at one time and it's possible that Windows has changed its policy since.) Naturally, files containing certain characters will be harder to work with than others.
What were you using to determine which characters Windows allows?
Update: The set of characters allowed by Windows is also be determined by the underlying filesystem, and other factors. There is a blog entry on MSDN that explains this in more detail.
If all you need is the appearance of a colon, and will be creating it programatically, why not make use of a UTF-8 character that just looks like a colon?
My first choice would be the Modifier Letter (U+A789), as it is a typical RTL character and appears a lot like a colon. It is what I use when I need a full DateTime in the filename, such as file_2017-05-04_16꞉45꞉22_clientNo.jpg
I would stay away from characters like the Hebrew Punctuation Sof Pasuq (U+05C3), as it is a LTR character and may mess with how a system aligns the file name itself.