Parsing code-model from in-memory C#-code string

Parsing code-model from in-memory C#-code string - visual-studio

With EnvDTE.ProjectItem, is it possible to parse an in-memory C#-code string to get the FileCodeModel?
I don't want to alter the project file in this course by adding a temporary file to project, get its ProjectItem, do stuff and then delete the file. It will further alert the source control to observe the changes.

There's simply no good way to do this with CodeModel. This is why we're building Roslyn to make this sort of operation trivial -- it operates with an immutable model where you can take a solution, fork it to a separate copy and do analysis, without every modifying the original. There's previews out you might be able to use, depending upon your scenario.

Related

How do I not commit the development team lines in project.pbxproj without deselecting those lines manually?

I am collaborating with my friend on an iOS app. We use different Apple IDs in our Xcodes, so in "Signing and Capabilities" tab of project settings, we select different teams in the "Team" field:
From my observation, changing this affects the MyProject.xcodeproj/project.pbxproj file, which stores the file references that the Xcode project has, in addition to the "Team". Here's a snippet of what is changed:
buildSettings = {
ASSETCATALOG_COMPILER_APPICON_NAME = AppIcon;
ASSETCATALOG_COMPILER_GLOBAL_ACCENT_COLOR_NAME = AccentColor;
CODE_SIGN_STYLE = Automatic;
DEVELOPMENT_TEAM = <my team ID>; /* this is changed */
INFOPLIST_FILE = MyProject/Info.plist;
LD_RUNPATH_SEARCH_PATHS = (
"$(inherited)",
"#executable_path/Frameworks",
);
PRODUCT_BUNDLE_IDENTIFIER = io.github.sweeper777.MyApp;
PRODUCT_NAME = "$(TARGET_NAME)";
SWIFT_VERSION = 5.0;
TARGETED_DEVICE_FAMILY = 1;
};
The problem arises, when one of us commits this file and the other person pulls. The "puller" will now have the "Team" set to something invalid. When this person then tries to run the app on a real device, there will be code signing errors for obvious reasons. To solve this, this person must tediously go through all the targets that we have, and set each "Team" to their own team.
How can we make it so that on each person's computer, the "Team" stays the same after pulling, but any other changes to MyProject.xcodeproj/project.pbxproj is applied?
Remarks:
Putting the entire MyProject.xcodeproj/project.pbxproj in .gitignore doesn't work, because that would ignore every other change to it. Adding a new file to the project, for example, also changes MyProject.xcodeproj/project.pbxproj, and we want to be able to pull that change.
Manually deselecting the lines that say "DEVELOPMENT_TEAM = ..." when committing is as tedious as reselecting the correct team every time, so that's not a solution.
I found this. Apparently, I can configure git to run sed before git checkout and git add. However, that answer seems ignore the line by deleting it completely. This means that my friend, when he pulls, would still have to reselect the correct team. What I want is the kind of "ignore" that simply stops tracking that line. That is, if there is a local version of that line, use that.
I am also aware that this all wouldn't be a problem if we are on the same team. But if I understand this correctly, I can't have multiple people on my team unless I have a Company account, and not only can I not afford that, I don't own a Company.

I don't use Xcode itself and do not know how to smuggle Git hooks and scripts past the Xcode interface, so you'll need more than just this answer. But you mention sed in comments, and given your proposed file format, that may well be the way to go:
buildSettings = {
ASSETCATALOG_COMPILER_APPICON_NAME = AppIcon;
ASSETCATALOG_COMPILER_GLOBAL_ACCENT_COLOR_NAME = AccentColor;
CODE_SIGN_STYLE = Automatic;
DEVELOPMENT_TEAM = <my team ID>; /* this is changed */
INFOPLIST_FILE = MyProject/Info.plist;
LD_RUNPATH_SEARCH_PATHS = (
"$(inherited)",
"#executable_path/Frameworks",
);
PRODUCT_BUNDLE_IDENTIFIER = io.github.sweeper777.MyApp;
PRODUCT_NAME = "$(TARGET_NAME)";
SWIFT_VERSION = 5.0;
TARGETED_DEVICE_FAMILY = 1;
};
Git has the ability to run what it calls clean and smudge filters. These can be used to run any arbitrary program you like, including sed, the "stream editor", which is particularly good at making single-line changes based on regular expression matches.
There is another method that may also work, and may "play better" with Xcode, or may play worse. I'll go over that too, after covering clean and smudge filters.
Before we dive into writing clean and smudge filters, and using them from Git—you'll need to know all of these details as you will have to write your own custom filters—we should start with a simple fact about Git commits: No part of any commit can ever be changed. Once you make a commit, the stuff that's inside the commit—the stored data in all of its files—is the way it is, forever. So these filters have to work within that system. Remember that, as it will help with understanding what we're doing.
How Git makes and stores objects
The files inside a commit are not files, exactly: they're not the same thing as files in your file system, at least. Instead, they are what Git calls objects, specifically blob objects. A blob object holds the file's data; other objects hold the file's name; and commit objects collect everything together to be used all at once. There's one more internal object type for annotated tags but we'll stop here as we're really only interested in the blob-object part.
When Git extracts a commit, it reads the internal blob objects and runs them through internal code to decompress and format them into regular files. This can include doing end-of-line hacking (turning LFs into CRLFs) if desired. Normally, all this happens entirely inside Git, and the end result is that Git writes out an ordinary everyday file for you to use. This ordinary file is what you will work on / with, in Xcode or any other editor and compiler system and so on. These ordinary files are in your working tree.
After you've extracted some commit, you'll do some work on it, by changing some or all of the files in your working tree, to achieve whatever result you wanted. This can include changing the buildSettings, editing Swift code, editing Objectionable-C Objective-C code, and so on. You might add all-new files to the working tree, some of which you never commit at all (you can help make sure this never happens by listing such files in .gitignore).
Eventually, though, you'd like to commit the updated code. To do so, you must run git add, or maybe have your IDE run git add for you (perhaps Xcode has clicky buttons to do this). This invokes code in Git that converts the working tree file(s) back to internal blob objects if and as needed.
Again, normally this is all handled entirely inside of Git. Git will read the working tree file, maybe do CRLF-to-LF-only changes, compress the text, search for duplicate objects, and do all the other complicated things necessary to prepare the file, so that it is ready to be committed. The resulting data need not match what's in your working tree at all: it just has to be something that, when Git later goes to extract the file, produces what you will need in your working tree.
Clean and smudge filters
This is where these clean and smudge filters come in. I said, above, that normally, Git does the extraction and insertion all on its own. For binary files, the only thing Git does here is apply lossless compression.1 For text files, Git can do CRLF/LF substitutions as well. But what if you'd like to do your own operations?
You can: Git will let you do whatever you want during the extract process with a smudge filter, and will let you do whatever you want during the compress process with a clean filter. The clean filter replaces the in-file data, using a stream-edit type process,2 and then Git does its CRLF hacking if any and compressing on the "cleaned" data. The smudge filter replaces the decompressed, post-CRLF-hacking data coming out of Git with the data that should go into the working tree.
Hence you can write, as your clean filter, a sed script of the form:
s/DEVELOPMENT_TEAM = .*;/DEVELOPMENT_TEAM = DEVTEAMTEMPLATE;/
With that as the entire sed script, what sed will do is edit the incoming data stream and replace any actual development team text with the word DEVTEAMTEMPLATE.
Your smudge filter has to work slightly harder: it must find the template line and adjust it so that it contains the correct team ID. Where will you get the correct team ID? That's up to you: perhaps you can store it in a file in your home directory, or in a file that you create in the working tree but never commit in Git. You'll have to write this one or two or however-many-liner sed and/or shell script yourself.
1There are multiple phases of compression; git add does just one, and git checkout undoes all—including reading from "pack" files—as needed. The deeper level of compression, using delta encoding techniques, is entirely invisible at the "object" level, so nobody ever really has to think about it.
2With the advent of Git-LFS, Git gained the ability to run long-lived filters. Before that, Git always used simple stream filtering. The stream filtering is easier to understand, but is less efficient for doing en-masse operations on many files. Here, we're only interested in one file per repository anyway, so there's no need to go into the fancier long-lived filter details.
Defining clean and smudge filters
The tricky part here, with Git, is that you must define the filters in one place—in $HOME/.gitconfig or .git/config, for instance—and then tell Git to invoke them from another place, using the .gitattributes file. This is described in the gitattributes documentation. This documentation is pretty thorough, so read it. You can ignore all the long-running filter discussion, as noted above. I will quote one bit from the documentation here for emphasis, though, and expound on it:
Note that "%f" is the name of the path that is being worked on. Depending on the version that is being filtered, the corresponding file on disk may not exist, or may have different contents. So, smudge and clean commands should not try to access the file on disk, but only act as filters on the content provided to them on standard input.
When Git is running the smudge filter, it:
has opened some internal object (which may or may not be packed);
has decompressed it, or is in the process of decompressing it, and pumped / is-pumping out the data; and
this data is being fed to your filter, but is not written out to any file anywhere.
Your filter can use %f to know the name of the target output file, but the data are not in that file yet. The data bytes are only in some OS-level pipes or sockets or whatever your OS uses for connecting the output of one program (Git's internal decompressors) to another (your filter). Your smudge filter must read its standard input to get the data, and write the smudged data to standard output so that Git can read it (if necessary) and/or redirect that output to the correct file. Do not attempt to open the file by name!
(The same holds for the clean filter, except that in many cases, the input to your filter is just the raw data already in the file, so that opening the file and reading it mostly works. So this can mislead you, if you do your tests using a clean filter.)
Note that you can implement this scheme without a clean filter at all: your smudge filter can replace whatever is in the committed file even if it's a real team ID, rather than just a template. If you choose to do this, however, you'll "see" the team ID changing every time a different team-ID commits the file. The nice thing about using the clean filter is that once the committed copies of the file use the template line, every future cleaned file also uses the template line, so that it never changes.
Alternative: a template file
In general, it's unwise to commit actual configurations. Clean and smudge tricks can work, but they can only go so far: this particular file format works well because the change you want made is on a single line, and Git itself shows you file changes on a line-by-line basis, and sed works well with line-oriented input, and so on.
A lot of configuration files, though, wind up storing at least slightly-sensitive data, or perhaps very-sensitive data such as cleartext passwords. Such files should not be stored in Git at all if at all possible. Instead, you would store a template file in Git.
In this case, for instance, instead of storing MyProject.xcodeproj/project.pbxproj, you might have Git store MyProject.xcodeproj/project.pbxproj.template. This file would have template-ized contents. When you clone and check out the repository, you'd subsequently copy the template file into place and do any required adjustments.
Should the MyProject.xcodeproj/project.pbxproj file itself need to change, e.g., to acquire a new SWIFT_VERSION setting, you'd instead edit the template file, add that to Git, and commit. You would then use the usual "convert template to mine" process, or manually update the MyProject.xcodeproj/project.pbxproj file. Since this file is never committed—and is listed in .gitignore—it never goes into any commit and you never have to worry about collisions within it. Only the template file goes into Git.

Testing File/Folder Navigation and Manipulation

I am working on a module that supplies methods for navigating directories and manipulating files. Basically it will be a combination of the Dir and File classes, with options specific to the needs of a project I'm working on.
Right now I have started writing tests for some of these methods and things are getting messy.
Example
One of the methods I have is a tree function that returns a hash of files and folders where you can pass options like tree(only: 'folders', limit: 3). In order to test that it only goes down 3 levels, I would have to have 4+ subfolders with dummy files in them.
The Problem
Right now I'm testing on folders outside the project since the subfolders are already there, but I want to move away from this, especially considering the implausibility of testing on system files once I start testing methods equivalent to rm -rf (as well as the lack of portability).
I'm starting to think that I need to create a "lab rat" type folder that I do all my "experiments" on, but I have no clue how to approach creating it.
Do I create a function that creates the files?
Do I pull files and folders from another location?
Do I use some sort of "lorem ipsum" generator for file structures?
Do I make all these files and folders manually(ugh)?
Do I just mock and stub the hell out of everything and not actually create/delete the files and folders?(I don't see this happening)
So...
How would someone normally approach testing excessive amounts of file and folder manipulation?

I don't think you want to use mocks/stubs. The file system of your OS should be well tested and fast, so the benefit of mocks/stubs is minimal. Creating a mock/stub system increases the complexity without much benefit.
Here's my answers:
Do I create a function that creates the files?
Yes. You can create tests for these functions to make sure that they are correct. Instead of calling Dir and File, write helper functions that make the code simple and readable. Maybe you can share the helper functions between the source/test code...
Do I pull files and folders from another location?
Not sure what this is for...
Do I use some sort of "lorem ipsum" generator for file structures?
Yes, if you mean create functions that generate file structures.
Do I make all these files and folders manually(ugh)?
No.
Do I just mock and stub the hell out of everything and not actually create/delete the files and folders?(I don't see this happening)
No. One benefit of creating files/directories is that you can manually check what is going on and not be 100% dependent on the tests. This is actually a good approach because without it there could be a bug where both the source code and test code is not doing what you expect, but you wouldn't know because everything seems to be working.

Ruby library for manipulating XML with minimal diffs?

I have an XML file (actually a Visual C# project file) that I want to manipulate using a Ruby script. I want to read the XML into memory, do some work on them that includes changing some attributes and some text (fixing up some path references), and then write the XML file back out. This isn't so hard.
The hard part is, I want the file I write to look the same as the file I read in, except where I made changes. If the input file used double quotes, I want the output to use double quotes. If the input had a space before />, I want the output to do the same. Basically, I want the output to be the same as the input, except where I explicitly made changes (which, in my case, will only be to attribute values, or to the text content of an element).
I want minimal diffs because this project file is checked into version control -- and because the next time I make a change in Visual Studio, it's going to rewrite it in its preferred format anyway. I want to avoid checking in a bunch of meaningless diffs that will then be changed back again in the near future. I also want to avoid having to open the project in Visual Studio, make a change, and save, before I can commit my Ruby script's changes. I want my Ruby script to just make its changes, nothing more.
I originally just parsed the file with regexes, but ran into cases where I really needed an XML library because I needed to know more about child elements. So I switched to REXML. But it makes the following undesirable changes to my formatting:
It changes all the attributes from double quotes to single quotes.
It escapes all the apostrophes inside attribute values (changing them to &apos;).
It removes the space before />.
It sorts each element's attributes alphabetically, rather than preserving the original order.
I'm working around this by doing a bunch of gsub calls on REXML's output, but is there a Ruby XML-manipulation library that's a better fit for "minimal diff" scenarios?

You can build your own SAX parser (using Nokogiri, for example, it's very easy and I recommend to use it) to parse your XML file, change some data in it, and flush the processed XML file with your own customized, built from scratch, XML generator. The bad news is, you have to build a tiny XML library and generator routine in this case, so it is not an ordinary task.
Another way: don't build the SAX parser, but write an XML generator. Parse XML with your favourite library, change what you need to change and generate anything you want. You just need to recursively walk through all nodes in your document and output them within your conventions.

XCode - Editing xcodeproj bundle (specifically project.pbxproj)

I'm working in XCode and I've also written an external editor tool that generates resources for use in the project. In the best case scenario, the tool would edit the project.pbxproj file so that it includes the generated resources in the project. I've read through the file in an attempt to understand it, and it's mostly discernible but there is still one major question I have.
If I wanted to generate a new Group from outside XCode (or a new anything, for that matter), how do I know what ID code to use? For example: 19C28FACFE9D520D11CA2CBB is one of them from my project. How am I supposed to know what to use if I make my own? Do they just need to be unique? Would it be legal to just make one up: 000000000000000000000001 and 000000000000000000000002 and 000000000000000000000003 etc. ?
Any help on this would be wonderful. Thanks.

Yes, you can make your own. The best way would be to use a hash function such as MD5 or SHA1 to generate it then you can truncate it at the desired length. I would hash the name of the file/group along with a time stamp appended this way you get a more unique result.

How do you manage the String Translation Process?

I am working on a Software Project that needs to be translated into 30 languages. This means that changing any string incurs into a relatively high cost. Additionally, translation does not happen overnight, because the translation package needs to be worked by different translators, so this might take a while.
Adding new features is cumbersome somehow. We can think up all the Strings that will be needed before we actually code the UI, but sometimes still we need to add new strings because of bug fixes or because of an oversight.
So the question is, how do you manage all this process? Any tips in how to ease the impact of translation in the software project? How to rule the strings, instead of having the strings rule you?
EDIT: We are using Java and all Strings are internationalized using Resource Bundles, so the problem is not the internationalization per-se, but the management of the strings.

I'm not sure the platform you're internationalizing in. I've written an answer before on the best way to il8n an application. See What do I need to know to globalize an asp.net application?
That said - managing the translations themselves is hard. The problem is that you'll be using the same piece of text across multiple pages. Your framework may not, however, support only having that piece of text in one file (resource files in asp.net, for instance, encourage you to have one resource file per language).
The way that we found to work with things was to have a central database repository of translations. We created a small .net application to import translations from resource files into that database and to export translations from that database to resource files. There is, thus, an additional step in the build process to build the resource files.
The other issue you're going to have is passing translations to your translation vendor and back. There are a couple ways for this - see if your translation vendor is willing to accept XML files and return properly formatted XML files. This is, really, one of the best ways, since it allows you to automate your import and export of translation files. Another alternative, if your vendor allows it, is to create a website to allow them to edit the translations.
In the end, your answer for translations will be the same for any other process that requires repetition and manual work. Automate, automate, automate. Automate every single thing that you can. Copy and paste is not your friend in this scenario.

Pootle is an webapp that allows to manage translation process over the web.

There are a number of major issues that need to be considered when internationalizing an application.
Not all strings are created equally. Depending upon the language, the length of a sentence can change significantly. In some languages, it can be half as long and in others it can be triple the length. Make sure to design your GUI widgets with enough space to handle strings that are larger than your English strings.
Translators are typically not programmers. Do not expect the translators to be able to read and maintain the correct file formats for resource files. You should setup a mechanism where you can transform the translated data round trip to your resource files from something like an spreadsheet. One possibility is to use XSL filters with Open Office, so that you can save to Resource files directly in a spreadsheet application. Also, translators or translation service companies may already have their own databases, so it is good to ask about what they use and write some tools to automate.
You will need to append data to strings - don't pretend that you will never have to or you will always be able to put the string at the end. Make sure that you have a string formatter setup for replacing placeholders in strings. Furthermore, make sure to document what are typical values that will be replaced for the translators. Remember, the order of the placeholders may change in different languages.
Name your i8n string variables something that reflects their meaning. Do you really want to be looking up numbers in a resource file to find out what is the contents of a given string. Developers depend on being able to read the string output in code for efficiency a lot more than they often realize.
Don't be afraid of code-generation. In my current project, I have written a small Java program that is called by ant that parses all of the keys of the default language (master) resource file and then maps the key to a constant defined in my localization class. See below. The lines in between the //---- comments is auto-generated. I run the generator every time I add a string.
public final class l7d {
...normal junk
/**
* Reference to the localized strings resource bundle.
*/
public static final ResourceBundle l7dBundle =
ResourceBundle.getBundle(BUNDLE_PATH);
//---- start l7d fields ----\
public static final String ERROR_AuthenticationException;
public static final String ERROR_cannot_find_algorithm;
public static final String ERROR_invalid_context;
...many more
//---- end l7d fields ----\
static {
//---- start setting l7d fields ----\
ERROR_AuthenticationException = l7dBundle.getString("ERROR_AuthenticationException");
ERROR_cannot_find_algorithm = l7dBundle.getString("ERROR_cannot_find_algorithm");
ERROR_invalid_context = l7dBundle.getString("ERROR_invalid_context");
...many more
//---- end setting l7d fields ----\
}
The approach above offers a few benefits.
Since your string key is now defined as a field, your IDE should support code completion for it. This will save you a lot of type. It get's really frustrating looking up every key name and fixing typos every time you want to print a string.
Someone please correct me if I am wrong. By loading all of the strings into memory at static instantiation (as in the example) will result in a quicker load time at the cost of additional memory usage. I have found the additional amount of memory used is negligible and worth the trade off.

The localised projects I've worked on had 'string freeze' dates. After this time, the only way strings were allowed to be changed was with permission from a very senior member of the project management team.
It isn't exactly a perfect solution, but it did enable us to put defects regarding strings on hold until the next release with a valid reason. Once the string freeze has occured you also have a valid reason to deny adding brand new features to the project on 'spur of the moment' decisions. And having the permission come from high up meant that middle managers would have no power to change specs on your :)

If available, use a database for this. Each string gets an id, and there is either a table for each language, or one table for all with the language in a column (depending on how the site is accessed the performance dictates which is better). This allows updates from translators without trying to manage code files and version control details. Further, it's almost trivial to run reports on what isn't translated, and keep track of what was an autotranslation (engine) vs a real human translation.
If no database, then I stick each language in a separate file so version control issues are reduced. But the structure is basically the same - each string has an id.
-Adam

Not only did we use a database instead of the vaunted resource files (I have never understood why people use something like that which is a pain to manage, when we have such good tools for dealing with databases), but we also avoided the need to tag things in the application (forgetting to tag controls with numbers in VB6 Forms was always a problem) by using reflection to identify the controls for translation. Then we use an XML file which translates the controls to the phrase IDs from the dictionary database.
Although the mapping file had to be managed, it could still be managed independent of the build process, and the translation of the application was actually possible by end-users who had rights in the database.

The solution we came up to so far is having a small application in Excel that reads all the property files, and then shows a matrix with all the translations (languages as headers, keys as rows). It is quite evident what is missing then. This is send to the translators. When it comes back, then the sheet can be processed to generate the same property bundles back again. So far it has eased the pain somewhat, but I wonder what else is around.

This google book - resource file management gives some good tips
You can use Resource File Management software to keep track of strings that have changed and control the workflow to get them translated - otherwise you end up in a mess of freezes and overbearing version control
Some tools that do this sort of thing - no connection an I haven't actually used them, just researching
http://www.sisulizer.com/
http://www.translationzone.com/en/products/

I put in a makefile target that finds all the .properties files and puts them in a zip file to send off to the translators. I offered to send them just diffs, but for some reason they want the whole bundle of files each time. I think they have their own system for tracking just differences, because they charge us based on how many strings have changed from one time to the next. When I get their delivery back, I manually diff all their files with the previous delivery to see if anything unexpected has changed - one time all the PT_BR (Brazillian Portuguese) strings changed, and it turns out they'd used a PT_PT (Portuguese Portuguese) translator for that batch in spite of the order for PT_BR.

In Java, internationalization is accomplished by moving the strings to resource bundles ... the translation process is still long and arduous, but at least it's separated from the process of producing the software, releasing service packs etc. One thing that helps is to have a CI system that repackages everything any time changes are made. We can have a new version tested and out in a matter of minutes whether it's a code change, new language pack or both.

For starters, I'd use default strings in case a translation is missing. For example, the English or Spanish value.
Secondly, you might want to consider a web app or something similar for your translators to use. This requires some resources upfront, but at least you won't need to send files around and it will be obvious for the translators which strings are new, etc.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio