cvs2svn changes 'date' string in source codes - cvs2svn

I converted two of our group's CVS repositories and loaded them into SVN. But I found some files are changed !!!
for example,
static char rcsid[] =
"$Revision: 1.1.1.1 $$Date: 2007/06/25 16:56:59 $";
was changed to
static char rcsid[] =
"$Revision: 1.1.1.1 $$Date: 2007-06-25 16:56:59 $";
These are actual strings, not comments. (some other changed strings are in comments, which is OK).
So why would cvs2svn would do that? and how to prevent from cvs2svn from doing it???
I added --keywords-off options, but that didn't make a difference.
Thanks for any help!

CVS expands keywords (like $Date$) to the right value when you check the file out, not (as you might guess) when you check the file in. Moreover, different versions of CVS expand dates in different styles. Prior to CVS 1.12, dates were expanded using slashes, like 2007/06/25. Starting with CVS 1.12, dates have been expanded with dashes, like 2007-06-25.
The date format that you are seeing when you check the file out of Subversion is the result of keyword expansion by Subversion. AFAIK Subversion always expands the dates in the new style, with dashes. So the reason that the strings look different is that Subversion uses a different date expansion style, not because of anything that cvs2svn does.
When you specify --keywords-off, then cvs2svn leaves the keywords expanded as they were in CVS, namely in the form they had when they were checked in. Usually that means that they are in the format they had when the previous time that the file was checked out, namely with the valued reflecting the previous revision of the file. This is rarely useful.
The only way to get the date strings in the format that you expected would be to have cvs2svn expand the date strings itself and turn SVN keyword expansion off. Plus you would have to configure cvs2svn to expand using the "old date format" for the expansion, which can be set by calling _KeywordExpander.use_old_date_format() (or by editing the file cvs2svn_lib/keyword_expander.py). But then, presumably, you would want to turn keyword expansion back on post-conversion, so that subsequent Subversion revisions have their keywords expanded, too. So after the conversion, you would have to set the svn:keywords property on any file containing keywords, and you would also have to manually re-collapse the keywords (e.g., edit $Date: 2007/06/25 16:56:59 $ back to $Date$) in those files. All in all, this would be quite tricky to configure and is probably not worth the effort.

Related

WinMerge: How to compare files with the same content but different encodings?

Motivation: I am rewriting a doc -- text files to be processed later. The new sources now use UTF-8. Large portions of the sources are the same. I need to find differences.
Details: The old doc sources use the cp1250 encoding, the new sources use the UTF-8. Both new and old sources use the same line endings (CR+LF). I am using the Unicode version of the WinMerge application (WinMergeU.exe), version 2.12.4.0.
It almost works, but... When the lines differ, they are initially marked as block by the dark yellow, and the different portions are marked using the lighter colour. When moving the red block cursor there, the panes below show the different part.
However, the block of text is marked by the dark yellow also in cases when (the Unicode representation of) the text is the same. The red block moves also to those portions of the files. In such case, the two panes below (that show the differences) containt the same text and nothing is marked as different. See the picture below:
The very first line differs -- this is OK. But the second line has visually the same content. The only character outside of the ASCII range is Ú there. It has a different representation in the encoded sources. This causes the line marked as different, but the panes below does not mark anyting at the line as different.
See also the following paragraphs that are exactly the same (only the encoding in the sources differ, the same line ending is used).
It looks as if the initial comparison were based on binary representation of the lines. Is there any setting to tell WinMerge that the comparison (I mean the block marking) should be based on Unicode content?
I tried hard, but no luck, yet.
Update: The above question was for the latest stable 2.12.4. The beta version 2.13.22 works just perfectly for me. See my answer below.
This doesn't really answer your question about WinMerge, but have you considered using another diff program? One of my favorites is kdiff - http://kdiff3.sourceforge.net/
When I do a compare on KDiff using one UTF8 file and another Unicode file, I get the following:
Here is the compare screen - note that the encodings on the files are different, but the files are considered to be equal from a text standpoint:
I think it really should not be the task of a merge tool to allow the merging of files stored in different encodings.
An encoding is a function that maps bytes (stored on the disk or in memory) to characters (displayed on screen). Unfortunately, by default the encoding of a file is not stored together with the file. Therefore, any program that wants to open the file and display its contents needs to guess the encoding. While this sometimes works, it is also an error prone procedure.
Now, the character sets of different encodings do not overlap in general. So what is the merge tool supposed to do if you merge a character C from file A in encoding X into a file B in encoding Y, if character C is not part of the character set of encoding Y?
Thus, I think the task of a merge tool should be to merge the binary content. Anything else is a dirty hack and damned to fail at some level. (A merge tool maker may decide to provide character level merging, which also might work most of the time. But there is some guesswork involved.)
Therefore, I'd also recommend you first translate the old files to UTF-8 and then merge those with the new versions.
Just for your information. The question was for the latest stable 2.12.4. I have tried the beta version 2.13.22, and it works just perfectly for me. See the difference for exactly the same files -- only the first lines in the files were removed. (My big thanks to authors.)
Edit -> Options
Select 'Compare' from categories pane on left.
Check box 'Ignore carriage return differences' (UNIX, Windows, Mac)
I would recommend converting the files to the same encoding before diffing.
If you are working with a version control system I'd recommend the following:
Create a fresh checkout of the files
Convert all files to UTF-8
Commit the files
Copy your new files over
Use WinMerge
That way you end up with two commits in the history - one for the encoding change and another for the content changes and WinMerge will work as expected.
What about option File -> File Encoding... in WinMerge? It allows to set encoding for files independently.

Can I automatically update msgids in gettext's .po files for trivial text changes?

With gettext, the original (usually English) text of messages serves as
the message key ("msgid") for the translations. This means that every time the
original text changes, the msgid must be updated in all the .po files.
For real changes of the text, this is obviously unavoidable, as the
translator must update the translation.
However, if the change of the original does not change its meaning,
re-translation is superflous (e.g. change in punctation, whitespace
changes, or correction of a spelling mistake).
Is there a way to update the .po files automatically in that case?
I tried to use xgettext & msgmerge (with fuzzy matching turned on), but
fuzzy matching sometimes fails, plus this produces lots of ugly
"#,fuzzy" flags.
Note: There is a similar question:
How to efficiently work with gettext PO files when making small edits to large text values
However, it's about large strings, thus about a more specific problem.
One way to avoid the problem is to leave the msgids alone, have a .po file for the original language and make the fix inside that.
It always strikes me as being more of a work around than a proper fix though. For the next iteration (where there will definitely be more msgid changes) the msgid is changed and either the translators translate it in their usual update or each language is updated by hand when the msgid is changed.
I've had exactly this issue when doing minor changes to a django project. What I do is the following:
Change message in code.
Run find and replace on all translation files ("django.po"), replacing the old message (msgid) with the new one.
Run django-admin makemessages.
If I have done things right, the last step is superflous (i.e, you have done the change for gettext). django uses the gettext utilities, so it shouldn't matter how you make your message files.
I find and replace like so:
find . -name "*.po" -print | xargs sed -i 's/oldmessageid/newmessageid/g' Courtesy of http://rushi.vishavadia.com/blog/find-replace-across-multiple-files-in-linux

Should environment variables that contain a executable-path with spaces also contain the necessary quotes?

When defining an environment variable (on Windows for me, maybe there is a more general guideline)
set MY_TOOL=C:\DevTools\bin\mytool.exe
if the tool is located on a path with spaces
set MY_TOOL=C:\Program Files (x86)\Foobar\bin\mytool.exe
should the environment variable already contain the necessary spaces?
That is, should it read:
set MY_TOOL="C:\Program Files (x86)\Foobar\bin\mytool.exe"
instead of the above version without spaces?
Note: In light of Joeys answer, I really should narrow this question to the examples I gave. That is, environment variables that contain one single (executable / batch) tool to be invoked by a user or by another batch script.
Maybe the spaces should be escaped differently?
I'd say, do it without quotes and use them everywhere you use the variable:
set MY_TOOL=C:\Program Files (x86)\Foobar\bin\mytool.exe
"%MY_TOOL%" -someoption someargument somefile
Especially if you let the user set the value somewhere I guess this is the safest option, since they usually tend not to surround it with quotes rather than do so.
If there are plenty of places where you use the variable you can of course redefine:
set MY_TOOL="%MY_TOOL%"
which makes things more resilient for you. Optionally you could detect whether there are quotes or not and add them if not present to be totally sure.
When your variable represents only a path to a directory and you want to append file names there, then the "no quotes" thing is even more important, otherwise you'd be building paths like
"C:\Program Files (x86)\Foobar\bin"\mytool.exe
or even:
""C:\Program Files (x86)\Foobar\bin"\my tool with spaces.exe"
which I doubt will parse correctly.
The command shell can answer your question: type C:\Pro and hit the tab key.
Autocomplete will leave all spaces as-is and add quotes around the filename. So, this is what is "officially" expected.
(this assumes that autocomplete is turned on, I'm not sure whether the default is on or off, but most people have it on anyway, I guess)

Purpose of "$Id: ..." line in the header of source files

/* $Id: file.c,v 1.0 2010/09/15 01:12:10 username Exp $ */
I find this line in many source code files in the comment at the top (header) of the file. Why? Is it aimed for the version control software? -Thanks.
These sort of comments are automatically modified by various source code control systems, things like author, date, history and so forth.
See here for some common ones for RCS which is the first source code control system I ever saw to implement this sort of thing (that doesn't mean it was the first, just that RCS was the first I ever used and it had that capability).
One particular trick we used to use was to put the line:
static char *fileId = "$Id: $";
into the source file (and header files as well, although the names had to be unique) so that, when it was built, it would automatically have the ID of the files in the executable itself.
Then we could use something like strings to find out which source files were used to build the executable. Ideal for debugging problems in the field.
It tells CVS (and other VCSs) to expand the value of the Id at check-out time, so anybody reading the source file in question will know what version exactly was checked out for it. Not very popular any more (you can always ask your VCS for such info if you keep the source file in a client / repository / working directory -- or however else your VCS calls such things;-).
I believe you are correct. It appears to be a keyword substitution string for CVS.
Take a look at this question $id: name of file, date/time creation Exp $

Are there any invalid linux filenames?

If I wanted to create a string which is guaranteed not to represent a filename, I could put one of the following characters in it on Windows:
\ / : * ? | < >
e.g.
this-is-a-filename.png
?this-is-not.png
Is there any way to identify a string as 'not possibly a file' on Linux?
There are almost no restrictions - apart from '/' and '\0', you're allowed to use anything. However, some people think it's not a good idea to allow this much flexibility.
An empty string is the only truly invalid path name on Linux, which may work for you if you need only one invalid name. You could also use a string like "///foo", which would not be a canonical path name, although it could refer to a file ("/foo"). Another possibility would be something like "/dev/null/foo", since /dev/null has a POSIX-defined non-directory meaning. If you only need strings that could not refer to a regular file you could use "/" or ".", since those are always directories.
Technically it's not invalid but files with dash(-) at the beginning of their name will put you in a lot of troubles. It's because it has conflicts with command arguments.
I personally find that a lot of the time the problem is not Linux but the applications one is using on Linux.
Take for example Amarok. Recently I noticed that certain artists I had copied from my Windows machine where not appearing in the library. I check and confirmed that the files were there and then I noticed that certain characters in the folder names (Named for the artist) were represented with a weird-looking square rather than an actual character.
In a shell terminal the filenames look even stranger: /Music/Albums/Einst$'\374'rzende\ Neubauten is an example of how strange.
While these files were definitely there, Amarok could not see them for some reason. I was able to use some shell trickery to rename them to sane versions which I could then re-name with ASCII-only characters using Musicbrainz Picard. Unfortunately, Picard was also unable to open the files until I renamed them, hence the need for a shell script.
Overall this a a tricky area and it seems to get very thorny if you are trying to synchronise a music collection between Windows and Linux wherein certain folder or file names contain funky characters.
The safest thing to do is stick to ASCII-only filenames.

Resources