Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
Does anyone know of the practical reasons for the com.company.project package structure and why it has become the de-facto standard?
Does anyone actually store everything in a folder-structure that directly reflects this package structure? (this is coming from an Actionscript perspective, btw)
Preventing name-clashes is a rather practical reason for package structures like this. If you're using real domain names that you own and everybody else uses their package names by the same role, clashes are highly unlikely.
Esp. in the Java world this is "expected behaviour". It also kind of helps if you want to find documentation for a legacy library you're using that no one can remember anymore where it was coming from ;-)
Regarding storing files in such a package structure: In the Java world packages are effectively folders (or paths within a .jar file) so: Yes, quite a few people do store their files that way.
Another practical advantage of such a structure is, that you always know if some library was developed in-house or not.
I often skip the com. as even small orgs have several TLDs, but definitely useful to have the owner's name in the namespace, so when you start onboarding third-party libraries, you don't have namespace clashes.
Just think how many Utility or Logging namespaces there would be around, here at least we have Foo.Logging and Bar.Logging, and the dev can alias one namespace away :)
If you start with a domain name you own, expressed backwards, then it is only after that point that you can clash with anyone else following the same structure, as nobody else owns that domain name.
It's only used on some platforms.
Several reasons are:
Using domain names makes it easier to achieve uniqueness, without adding a new registry
As far as hierarchical structuring goes, going from major to minor is natural
For the second point, consider the example of storing dated records in a hierarchical file structure. It's much more sensible to arrange it hierarchically as YYYY/MM/DD than say DD/MM/YYYY: at the root level you see folders that organize records by year, then at the next level by month, and then finally by day. Doing it the other way (by days or months at the root level) would probably be rather awkward.
For domain names, it usually goes subsub.sub.domain.suffix, i.e. from minor to major. That's why when converting this to a hierarchical package name, you get suffix.domain.sub.subsub.
For the first point, here is an excerpt from Java Language Specification 3rd Edition that may shed some light into this package naming convention:
7.7 Unique Package Names
Developers should take steps to avoid the possibility of two published packages having the same name by choosing unique package names for packages that are widely distributed. This allows packages to be easily and automatically installed and catalogued. This section specifies a suggested convention for generating such unique package names. Implementations of the Java platform are encouraged to provide automatic support for converting a set of packages from local and casual package names to the unique name format described here.
If unique package names are not used, then package name conflicts may arise far from the point of creation of either of the conflicting packages. This may create a situation that is difficult or impossible for the user or programmer to resolve. The class ClassLoader can be used to isolate packages with the same name from each other in those cases where the packages will have constrained interactions, but not in a way that is transparent to a naïve program.
You form a unique package name by first having (or belonging to an organization that has) an Internet domain name, such as sun.com. You then reverse this name, component by component, to obtain, in this example, com.sun, and use this as a prefix for your package names, using a convention developed within your organization to further administer package names.
The name of a package is not meant to imply where the package is stored within the Internet; for example, a package named edu.cmu.cs.bovik.cheese is not necessarily obtainable from Internet address cmu.edu or from cs.cmu.edu or from bovik.cs.cmu.edu. The suggested convention for generating unique package names is merely a way to piggyback a package naming convention on top of an existing, widely known unique name registry instead of having to create a separate registry for package names.
Related
Several organizations distribute variants of the same project, and we regularly pull changes from one another. It would be great if we could eventually merge code repositories and maybe, maybe have a common source tree managed by a consortium. However, each member would probably want the option of distributing their own variant without too much pain for customers in case there is trouble upstreaming changes required to work with newer products.
The project consists of three packages:
A library
A compiler executable that outputs go code that needs to import the library
A utility executable that uses code generated by #2 and links with #1
A big annoyance, when pulling changes back and forth, is gratuitous differences in import paths. We basically have to edit every version of import "github.com/companyA/whatever" to import "companyB.com/whatever". Of course these problems would go away with (gasp) relative import paths. If we resorted to such heresy, our compiler can just hard-code the absolute import path in generated code to isolate end users from the library's import path. It would also require only one gratuitous difference in the source trees (the line in the compiler that outputs import statements) rather than a bunch.
But anyway, I know relative import paths are bad - this is a tricky situation. I know this is similar to questions such as this or this, because the answer of just asking end users to create a directory called companyB.com and cloning something from companyA in there is just not going to fly for practical and political reasons.
I already know that go is not really good at accommodating this scenario, so I'm also not asking for a magic bullet to make go handle something it can't. Another thing that unfortunately won't fly is asking customers to curl whatever | sh, as this is viewed as too much of a liability (deemed "training customers to do dangerous things"). Maybe we could forego go get and have everyone clone to some neutral non-DNS-name under $GOPATH/src, but we would need to do this without a "flag day" in which code suddenly breaks if it's in the wrong place.
My question is whether anyone has successfully merged SDK-type projects with existing end users, and if so, how did you do it, what worked, and what didn't? Did you in fact avoid relative import paths or gnarly GOPATH hacking, and if so was it worth it? What mechanisms did you employ (environment variables, configuration files, .project-config files in the current working directory, abusing the vendor directory, code-generation packages that figure out their absolute import path at compliation time) to make this work smoothly? Did you just muddle through with a huge amount of sed or maybe gofmt -r? Are there tricks involving clever use of .gitattributes or go generate to rewrite import paths on checkout/checkin?
Merging them is pretty easy - cross-merge so that they all match, pick one (or create a new one) as the canonical source of truth, then migrate all references to the canonical import and make all future updates there.
The probablem arises here:
each member would probably want the option of distributing their own variant without too much pain for customers in case there is trouble upstreaming changes required to work with newer products
There's no particularly good way to do that without using any of the known solutions you've already ruled out, and no way to do that depending on your threshold for "too much pain".
Without knowing more about the situation it's hard to suggest options, but if there's any way that each company can abstract their portion out into a separate library they could maintain and update at their pace while using a smaller shared library with shared responsibility, that would likely be the best option - something like the model used by Terraform for its providers? Basically you'd have shared maintenance of a shared "core" and then independent maintenance of "vendor-specific" packages.
I have a codebase where one file contains quite a lot of Structs, Interfaces and Variables in the same file as functions and I'm not sure if I need to seperate this into separate files with appending filename. So for example accounts.go will be accounts_struct.go and accounts_interface.go with struct and interface respectively.
What would be a good approach for the file organisation when you have growing codebase for Structs, Variables and Interfaces?
A good model to check out is the source code of Go itself: http://golang.org/src
(in addition of the official "Effective Go")
You will see that this approach (separating based on language items like struct, interface, ...) is never used.
All the files are based on features, and it is best to use a proximity principle approach, where you can find in the same file the definition of what you are using.
Generally, those features are grouped in one file per package, except for large ones, where one package is composed of many files (net, net/http)
If you want to separate anything, separate the source (xxx.go) from the tests/benchmarks (xxx_test.go)
As Thomas Jay Rush adds in the comments
Sometimes source code is automatically generated -- especially data structure definitions.
If the data structures are in the same file as the hand-wrought code, one must build capacity to preserve the hand-wrought portion in the code-generation phase.
If the data structures are separated in a different file, then inclusion allows one to simply write the data structure file without worry.
Dave Cheney offers an interesting perspective in "Absolute Unit (Test) # LondonGophers" (March 2019)
You should take a broader view of the "unit" under test.
The units are not each internal function you write, but a whole package. Specifically the public API of a package.
Organizing your files to facilitate testing their Public API is a good idea.
accounts_struct_test.go would not, in that regards, make much sense.
See also "How I organize packages in Go" by Bartłomiej Klimczak
Sometimes, a few handlers or repositories are needed.
For example, some information can be stored in a database and then sent via an event to a different part of your platform. Keeping only one repository with a method like saveToDb() isn’t that handy at all.
All of elements like that are split by the functionality: repository_order.go or service_user.go.
If there are more than 3 types of the object, there are moved to a separate subfolder.
Here is my mental model for designing a package.
a. A package should encompass one idea or concept. http is a concept, http client or http message is not.
b. A file in a package should encompass a set of related types, a good rule of thumb is if two files share the same set of imports, merge them. Using the previous example, http/client.go, http/server.go are a good level of granularity
c. Don't do one file per type, that's not idiomatic Go.
I am designing a new YAML file, and I want to use the most standard style of naming. Which is it?
Hyphenated?
- job-name:
...
lower_case_with_underscores?
- job_name:
...
CamelCase?
- jobName:
...
Use the standard dictated by the surrounding software.
For example, in my current project the YAML file contains default values for Python attributes. Since the names used in YAML appear in the associated Python API, it is clear that on this particular project, the YAML names should obey the Python lower_case_with_underscores naming convention per PEP-8.
My next project might have a different prevailing naming convention, in which case I will use that in the associated YAML files.
Kubernetes using camelCase: https://kubernetes.io/docs/user-guide/jobs/
apiVersion, restartPolicy
CircleCI using snake_case: https://circleci.com/docs/1.0/configuration/
working_directory restore_cache, store_artifacts
Jenkins with dash-case: https://github.com/jenkinsci/yaml-project-plugin/blob/master/samples/google-cloud-storage/.jenkins.yaml
stapler-class
So it looks like projects and teams use their own conventions and there is no one definite standard.
A less popular opinion derived from years of experience:
TL;DR
Obviously stick to the convention but IMHO follow the one that is established in your project's YML files and not the one that comes with the dependencies. I dare to say naming convention depends on too many factors to give a definitive answer or even try to describe a good practice other than "have some".
Full answer
Libraries might change over time which leads to multiple naming conventions in one config more often than any sane programmer would like - you can't do much about it unless you want to introduce (and later maintain) a whole new abstraction layer dedicated to just that: keeping the parameter naming convention pristine.
A one example of why you would want a different naming convention in your configs vs. configs that came with the dependencies is searchability, e.g. if all dependencies use a parameter named request_id, naming yours request-id or requestId will make it distinct and easily searchable while not hurting how descriptive the name is.
Also, it sometimes makes sense to have multiple parameters with the same name nested in different namespaces. In that case it might be justified to invent a whole new naming convention based on some existing ones, e.g.:
order.request-id.format and
notification.request-id.format
While it probably isn't necessary for your IDE to differentiate between the two (as it's able to index parameters within the namespace) you might consider doing so anyway as a courtesy for your peers - not only other developers who could use different IDEs but especially DevOps and admins who usually do use less specialized tools during maintenance, migrations and deployment.
Finally, another good point raised by one of my colleagues is that distinctive parameter names can be easily converted into a different convention with something as simple as one awk command. Doing so the other way around is obviously possible but by an order of magnitude more complicated which often spawns debates in the KISS advocates community about what it really means to "keep it simple stupid".
The conclusion is: do what's most sensible to you and your team.
We're doing a big project on OSGi and adding some commons modules. There's some discussion about naming the artifact.
So, one possibility when naming the module is for example:
cmns-definitions (for common definitions), another is cmns-definition, still another is cmns-def. This has some effect also on the package name. Now it's
xx.xxx.xxx.xxx.xxx.commons.definitions, if changing to cmns-def it would be xx.xxx.xxx.xxx.xxx.commons.def.
Inside this package will be classes like enums and other definitions to be used throughout the system.
I personally lean to cmns-definitions since there's not only 1 definition inside the package. Other people point out that java.util doesn't have only 1 utility there for example. Still, java.util is an abbreviation for me. It can mean java utility or java utilities. Same thing happens with commons-lang.
How would you name the package? Why would you choose this name?
cmns-definitions
cmns-definition
cmns-def
Bonus question: How to name something like cmns-exceptions? That's how I name it. Would you name it cmns-xcpt?
ËDIT:
I'm throwing in my own thoughts on this in the hope of being either confirmed or contradicted. If you can, please do.
According to what I think, the background reason why you name something is to make it easier to understand what's inside it. Or, according to Peter Kriens, to make it easy to remember and being able to automate processes via patterns. Both are valid arguments.
My reasoning is as follows in terms of pattern:
1) When a substantivation occurs and it's well known in the industry, follow it on your naming.
Eg:
"features" is a case on this. We have a module called cmns-features. Does this mean we have many features on this module? No. It means "the module that implements the "features" file from Apache karaf".
"commons" is a substantivation of "common" well-accepted on the industry. It doesn't mean "many common". It means "Common code".
If I see extr-commons as a module name, I know that it contains common code for extr (in this case extraction), for example.
2) When a quantity of classes inside the module are cooperating to give a distinct "one and one only" meaning to the whole, use singular form to name it.
The majority of modules are included here. If I name something cmns-persistence-jpa, I mean that whatever classes inside cooperate together to provide the jpa implementation of cmns-persistence-api. I don't expect 2 implementations inside it, but actually a myriad of classes that together make one implementation. Crystal clear to me. No?
3) When a grouping of classes is done with the sole purpose of gathering classes by affinity, but the classes don't cooperate together to no purpose, use plural.
Here is the case for example of cmns-definitions (enums used by the whole system).
Alternatively, using an abbreviation circumvents the problem, e.g. cmns-def which can be also "interpreted expanded" by a human reader to cmns-definitions. Many people use also "xxxx-util" meaning xxxx-utilities.
Still a third option can be used to pack things together, using a name that itself means a pluralization. The word "api" comes to mind, but any word that pluralizes something would do, like "pack".
Support to these cases (3) are well-known modules like commons-collections (using the plural) or commons-dbcp (using abbreviation) or commons-lang (again abbreviation) and anything that uses api to pack classes together by affinity.
From apache:
commons-collections -> many powerful data structures that accelerate development of most significant Java applications
commons-lang -> host of helper utilities for the java.lang API
commons-dbcp -> package of several database connection pools
'it is just a name ...'
I find in my long career that these just names can make a tremendous difference in productivity. I do not think it makes a difference if you use definitions, definition, or def as long as you're consistent and use patterns in the name that are easy to remember and can be used to automate processes. A build based on a consistent naming scheme is infinitely easier to work with than a build with "nice human display" names that are ad-hoc and have no discernible pattern.
If you use patterns, names tend to become shorter. Now people working with these names usually spent a lot of time with them. So their readability is not nearly as important as their mnemonic value. It turns out that abbreviations of 3 or 4 characters are surprisingly powerful. One of the reason is they work well is that there is only one possible abbreviation while if you go longer there are many candidates.
Anyway, most import part is the overall consistency. Good luck.
definitions (or def or definition) is a bad name because it doesn't have any semantic to a reader. You're in an object oriented world (I suppose) - try to follow its conventions and principles. Modules in Maven should be named after the biggest "abstraction" they contain. "Definition" is a form, not a meaning.
Your question is similar to: "Which class name is better FileUtilities or FileUtils". Answer: none.
Basically what you do with the Definitions and Exceptions is to provide kind of an API for your other modules. So I propose to combine definitions, exceptions and add interfaces to it. Then it makes sense to call it all cmns-api. I normally prefer the singular names as they are shorter but you are free to decide as it is just a name.
In Go, public names start with an upper case letter and private names start with a lower case letter.
I'm writing a program that is not library and is a single package. Is there any Go idiom that stipulates whether my identifiers should be all public or all private? I do not plan on using this package as a library or as something that should be imported from another Go program.
I can't think of any reason why I'd want a mixture. It "feels" like going all private is the proper choice.
I don't think I got any concrete answer, but Nate was closest with telling me to think of "exporting vs non-exporting" instead of "public and private".
This leads me to believe that not exporting anything is the best approach. In the worst case scenario, if I end up importing code from my application in another package, I will have to rethink what should be exported and what shouldn't be. Which is a good thing IMO.
If you are attempting to adjust your mindset to be more Go idiomatic, you should stop thinking of variables, functions, and methods as public or private. The more accurate term is exported or not exported. It definitely has a more C like feel to it.
As others have stated exporting really isn't needed for application program code. If for organizational reasons you decide to break your program up into packages, you could use sub-packages. At work we've decided to do just this. We have:
projectgopath/src/projectname
projectname/subcomponent1
projectname/subcomponent2
So far I am really liking this structure. It aids in separation of concerns, but does not go to the extent of making a package outside of the main project. The intent is clear. The sub-package's intended use is for this program only...
The new go build and go install commands seem to deal very well with it. We group components together in packages and expose only the necessary bits via exports.
In the described situation both approaches are equally valid, so it's more or less a matter of personal preferences. In my case I'm using camelCase identifiers for package main, mostly out of habit.
A lot of my go files started their life in isolated commands and were moved to packages as they could be reused by a few commands around the same topic.
I think you should make private all that couldn't possibly be called from elsewhere (supposing one day you make it an importable package) and make public the big functions that can be understood from elsewhere (if any) and structs fields when they are orthogonal (I mean when a change of the value of one field doesn't break the consistency of the struct value).