In some gcc code I came across the following construct.
fatal (_("%s: cannot find section %s"), file_name, section_name);
I have never seen "_" in this context.
It is some sort of construct to create an entity from the character array, very probably a compiler extension.
Can someone tell me what it is?
It is usually a macro associated with the GNU gettext project, used for internationalization. The idea is the passed string is a key in a lookup table. There is one such table for each supported language, with the current one decided by handful of environmental factors.
The value found in the table should be a translation of the key, into the target language.
Since looking up such translated strings is a common activity in i18n code, _ is introduced as a convenient, short name for the lookup function.
Related
First time I tried learning Ruby was 2 years ago, now I have started again. The reason I stopped was because I could not understand the Symbol class. And now I am at the same point again, completely lost in when and why you use Symbols. I have read the other posts on Stackoverflow as well as Googled for several explanations. But I do not understand it yet.
First I thought symbols was just a way to create some sort of "named constant" without having to go through the same process as in let say Java.
:all
instead of making a constant with an arbitrary value public static final String ALL = 8;
However it does not make much sense when you use it in e.g. attr_accessor :first_name etc.
Are Symbols just a lightweight String class? I am having problems understanding how I should interpret, when and how to use symbols both in my own classes and in frameworks.
In short, symbols are lightweight strings, but they also are immutable and non-garbage-collectable.
You should not use them as immutable strings in your data processing tasks (remember, once symbol is created, it can't be destroyed). You typically use symbols for naming things.
# typical use cases
# access hash value
user = User.find(params[:id])
# name something
attr_accessor :first_name
# set hash value in opts parameter
db.collection.update(query, update, multi: true, upsert: true)
Let's take first example, params[:id]. In a moderately big rails app there may be hundreds/thousands of those scattered around the codebase. If we accessed that value with a string, params["id"], that means new string allocation each time (and that string needs to be collected afterwards). In case of symbol, it's actually the same symbol everywhere. Less work for memory allocator, garbage collector and even you (: is faster to type than "")
If you have a simple one-word string that appears often in your code and you don't do something funky to it (interpolation, gsub, upcase, etc), then it's likely a good candidate to be a symbol.
However, does this apply only to text that is used as part of the actual program logic such as naming, not text that you get while actually running the program...such as text from the user/web etc?
I can not think of a single case where I'd want to turn data from user/web to symbol (except for parsing command-line options, maybe). Mainly because of the consequences (once created symbols live forever).
Also, many editors provide different coloring for symbols, to highlight them in the code. Take a look at this example
The O'Reilly Ruby Cookbook (p. 15) quotes Jim Weirich as saying:
If the contents (the sequence of characters) of the object are important, use a string.
If the identity of the object is important, use a symbol.
Symbols are generally used as hash keys, because it's the identity of the key that's important. Symbols are also required when passing messages using certain methods like Object#send.
A Ruby implementation typically has a table in which it stores the names of all classes, methods and variables. It refers to say a method name by the position in the table, avoiding expensive string comparisons. But you can use this table too and add values to it: symbols.
If you write code that uses strings as identifiers rather than for their textual content, consider symbols. If you write a method that expects an argument to be either 'male' or 'female', consider using :male and :female . Comparing two symbols for equality is faster than strings (that's why symbols make good hash keys).
Symbols are used for naming things in the language: the names of classes, the names of methods etc.
These are very like strings, except they can never be garbage collected, and testing for equality is optimised to be very quick.
The Java implementation has a very similar thing, except that it is not available for runtime use. What I mean is, when you write java code like obj.someMethod(4), the string 'someMethod' is converted by the compiler into a symbol which is embedded in a lookup table in the .class file. These symbols are like 'special' strings which are not garbage collected, and which are very fast to compare for equality. This is almost identical to Ruby, except that Ruby allows you to create new symbols at runtime, whereas Java only allows it at compile time.
This is just like creating new methods -- Java allows it at compile time; Ruby allows it at runtime.
After ruby version 2.2 symbol GC was removed, so now mortal symbols i.e when we convert string to symbol ("mortal".to_sym) gets cleaned up from memory.
check this out:
require 'objspace'
ObjectSpace.count_symbols
{
:mortal_dynamic_symbol=>3,
:immortal_dynamic_symbol=>5,
:immortal_static_symbol=>3663,
:immortal_symbol=>3668
}
source: https://www.rubyguides.com/2018/02/ruby-symbols/
I am building a compiler and in lexical analyzer phase:
Install the reserved words in the symbol table initially.A field of the symbol-table entry indicates that these strings are never ordinary identi-fiers, and tells which token they represent. We have supposed that thismethod is in use in Fig. 3.14. When we find an identifier, a call to installID places it in the symbol table if it is not already there and returns a pointerto the symbol-table entry for the lexeme found. Of course, any identifiernot in the symbol table during lexical analysis cannot be a reserved word,so its token is id. The function getToken examines the symbol table entryfor the lexeme found, and returns whatever token name the symbol tablesays this lexeme represents either id or one of the keyword tokens thatwas initially installed in the table.
But now everytime i recognize a keyword, i will have to go through the entire symbol table, it's like comparing 'n' elements for every keyword/Id recognition.
Wont it be too inefficient. What else can i do?
Kindly help.
If you build a finite state automata to identify lexemes, then its terminal states should correspond to language lexemes.
You can leave keywords out of the FSA and you'll end up with only a single terminal state for strings that look like identifiers. This is a common implementation when coding the FSA by hand. You'll have the problem you have now. As a practical matter for the symbol table no matter what you do with keywords, you will want an extremely fast identifier lookup which pretty much suggests you need a hashing solution. If you have this, then you can do a lookup quickly and check your "it must be a keyword" bit. There are plenty of good hash schemes in existence; as usual, Wikipedia on hash functions is a pretty good place to start. This is a practical solution; I use it in my PARLANSE compiler (see my bio) which processes million-line files in a few tens of seconds.
This isn't really the fastest solution. It is better to include the keywords in the FSA (this tends to encourage the use of a lexer generator, because adding all the keywords to an manualy coded FSA is inconvenient, but not hard). If you do that, and you have keywords that look like identifiers, e.g., goto, there will be terminal states that in effect indicate that you have recognized an identifier that happens to be spelled as a specific keyword.
How you interpret that end state is up to you. One obvious choice is that such end states indicate you have found a keyword. No hash table lookup required.
you can use hash table for list of keywords. It makes your search O(1) complexity
You could use a perfect hash like one generated with gperf.
I'm new to programming with the Win32 API, and I'm still getting used to the prefix / suffix data type naming conventions. While Google and a little common sense will normally explain what the prefix is referring to, it would be nice if there was one (relatively) concise guide to explain them. Does anyone know of a resource like this?
And on a related note, what does the '_' (underscore) prefix mean with a variable? Does that underscore have a name, other than "underscore"?
The naming convention is called Hungarian Notation, as mentioned by others. Since you're not familiar with it, and are probably going to start using it, it is worth mentioning there are two main flavors of Hungarian:
prefix the variable with its type code
prefix the variable with its usage code
The difference is visible when, for instance, an int is used to describe the number of bytes in a certain strings. On the former, nLen will be used, meaning the variable is an int. On the later, cbLen will be used, meaning the variable counts bytes (as opposed to cchLen, which counts characters). Give this article a look, should give you a better explanation.
As for the underscores in front of a variable or function - this is a naming convention reserved for the compiler and its standard library. Some people use it for other purposes, but they really shouldn't. The purpose of the convention is to provide the compiler a naming standard that will prevent collisions with names given by the user.
Win32 API follows Hungarian Notation
It's called a hungarian notation, Wikipedia has some information about it, and there's something on MSDN.
The use of symbol literals is not immediately clear from what I've read up on Scala. Would anyone care to share some real world uses?
Is there a particular Java idiom being covered by symbol literals? What languages have similar constructs? I'm coming from a Python background and not sure there's anything analogous in that language.
What would motivate me to use 'HelloWorld vs "HelloWorld"?
Thanks
In Java terms, symbols are interned strings. This means, for example, that reference equality comparison (eq in Scala and == in Java) gives the same result as normal equality comparison (== in Scala and equals in Java): 'abcd eq 'abcd will return true, while "abcd" eq "abcd" might not, depending on JVM's whims (well, it should for literals, but not for strings created dynamically in general).
Other languages which use symbols are Lisp (which uses 'abcd like Scala), Ruby (:abcd), Erlang and Prolog (abcd; they are called atoms instead of symbols).
I would use a symbol when I don't care about the structure of a string and use it purely as a name for something. For example, if I have a database table representing CDs, which includes a column named "price", I don't care that the second character in "price" is "r", or about concatenating column names; so a database library in Scala could reasonably use symbols for table and column names.
If you have plain strings representing say method names in code, that perhaps get passed around, you're not quite conveying things appropriately. This is sort of the Data/Code boundary issue, it's not always easy to the draw the line, but if we were to say that in that example those method names are more code than they are data, then we want something to clearly identify that.
A Symbol Literal comes into play where it clearly differentiates just any old string data with a construct being used in the code. It's just really there where you want to indicate, this isn't just some string data, but in fact in some way part of the code. The idea being things like your IDE would highlight it differently, and given the tooling, you could refactor on those, rather than doing text search/replace.
This link discusses it fairly well.
Note: Symbols will be deprecated and then removed in Scala 3 (dotty).
Reference: http://dotty.epfl.ch/docs/reference/dropped-features/symlits.html
Because of this, I personally recommend not using Symbols anymore (at least in new scala code). As the dotty documentation states:
Symbol literals are no longer supported
it is recommended to use a plain string literal [...] instead
Python mantains an internal global table of "interned strings" with the names of all variables, functions, modules, etc. With this table, the interpreter can make faster searchs and optimizations. You can force this process with the intern function (sys.intern in python3).
Also, Java and Scala automatically use "interned strings" for faster searchs. With scala, you can use the intern method to force the intern of a string, but this process don't works with all strings. Symbols benefit from being guaranteed to be interned, so a single reference equality check is both sufficient to prove equality or inequality.
The wikipedia entry on Symbol tables is a good reference:
http://en.wikipedia.org/wiki/Symbol_table
But as I try to understand symbols in Ruby and how they are represented in the Array of Symbols (returned by the Symbol.all_symbols method),
I'm wondering whether Ruby's approach to the symbol table has any important differences from other languages?
Ruby doesn't really have a "symbol table" in that sense. It has bindings, and symbols (what lispers call atoms) but it isn't really doing it the way that article describes.
So in answer to your question: it isn't so much that ruby has the same thing done differently, but rather that it does two different things (:xxx notation --> unique ids and bindings in scopes) and uses similar / overlapping terminology for them.
To clarify:
The article you link to gives the conventional definition of a symbol table, to wit
where each identifier in a program's source code is associated with information relating to its declaration or appearance in the source, such as its type, scope level and sometimes its location
But this isn't what ruby's symbol table does. It just provides a globally unique identity for a certain class of objects which can be written as :something in the source code, including things like :+ and :"Hi bob!" which aren't identifiers. Also, merely using an identifier will not create a corresponding symbol. And finally, none of the information listed in the passage above is stored in ruby's list of symbols.
It's a coincidence of naming, and reading that article will not help you understand ruby's symbols.
The biggest difference is that (like Lisp) Ruby actually has a syntax for symbols, and it's easy to add/remove things at runtime yourself. If you say :balloon (or "balloon".intern) it will intern that for you. Even though you're referring to it by name in your source, internally it's just a pointer in the symbol table. If you compare symbols, it's just a pointer-compare, not a string-compare.
Languages like C don't really have a way to say simply "create a new symbol for me" at runtime. You can do it implicitly at compile-time by defining a function, but that's really its only use. Since C has no syntax for symbols, if you want to be able to say Balloon in your program but be able to compare it with a single machine instruction, you use enums (or #defines).
In Ruby, it takes only one character to make a symbol, so you can use it for all kinds of things (like hash keys).
Symbols in Ruby are used where other languages tend to use enums, defines, constants and the like. They're also often used for associative keys. Their use has little to do with a symbol table as discussed in that article, except that they obviously exist in one.