Compact representation of many ofstreams in C++11 - c++11

I'm writing a code that frequently uses file I/O.
I have to write 10 files and read vice versa.
The code is as follows:
ofstream of1("file1");
ofstream of2("file2");
ofstream of3("file3");
...
ofstream of10("file10");
It seems a messy code.
Is there any way to represent those in a compact format?

Related

Standard sizeof macro for primitive types

Are there any standard macros that can be used to identify the size of a primitive type at compile time? Similar to the ones in GCC:
__SIZEOF_INT__
__SIZEOF_LONG__
__SIZEOF_LONG_LONG__
__SIZEOF_SHORT__
__SIZEOF_POINTER__
__SIZEOF_FLOAT__
__SIZEOF_DOUBLE__
__SIZEOF_LONG_DOUBLE__
__SIZEOF_SIZE_T__
I remember seeing something similar somewhere but for the death of me I can't find or remember their name anymore. The one I'm interested mostly is the long type.
There are no standard macro definitions for sizes of primitive types.
In boost/atomic there are macros giving you sizes of primitive types, they are using boost/cstdint.hpp among other sources. Example would look like follow:
#include <iostream>
#include <boost/atomic.hpp>
int main() {
std::cout << BOOST_ATOMIC_DETAIL_SIZEOF_LONG;
}
reference:
http://www.boost.org/doc/libs/1_60_0/boost/atomic/detail/int_sizes.hpp

What does _setmode actually translate?

I have done some research on getting UTF-8/16 to work properly in cmd.exe. I've found these articles:
https://alfps.wordpress.com/2011/11/22/unicode-part-1-windows-console-io-approaches/
https://alfps.wordpress.com/2011/12/08/unicode-part-2-utf-8-stream-mode/
http://www.siao2.com/2008/03/18/8306597.aspx
and also this SO question: Output unicode strings in Windows console app
The life-saving function is _setmode which causes cmd.exe to Just Work™. But what does it actually do? The first article states that
The Visual C++ runtime library can convert automatically between internal UTF-16 and external UTF-8, if you just ask it to do so by calling the _setmode function with the appropriate file descriptor number and mode flag. E.g., mode _O_U8TEXT causes conversion to/from UTF-8.
That's all nice, but the following (to me) sort of contradicts it.
Let's take this simple program:
#include <fcntl.h>
#include <io.h>
#include <iostream>
int main(void)
{
_setmode(_fileno(stdout), _O_U16TEXT);
std::wcout << L"привет śążź Ειρήνη";
// yes, wcout; I can use both wprintf and wcout, they both seem to have the same effect
getchar();
return 0;
}
This prints to console properly (provided we select the right font, of course); without the _setmode call I get garbage. But what is actually being translated here? What does the function really do? Does it convert FROM UTF-16 to whatever codepage the console is using? Windows uses UTF-16 internally, why is a conversion needed in the first place?
Furthermore, if I change the second parameter to _O_U8TEXT, the program works just as fine as with _O_U16TEXT, which confuses me further; the UTF-16 representation of и is very different from the UTF-8 one, so how come this still works?
I should mention that I'm using Visual Studio 2015 (MSVC 14.0) and the source file is encoded as UTF-8 with BOM.

Purpose of using Windows Data Types in a program

I am trying to understand the purpose of using Windows Data Types when defining parameters of a function/structure fields in a particular language. I've read explanations detailing how this prevents code from "breaking" if "underlying types" are changed. Can some one present a concise explanation and example to clarify? Thanks.
Found answer in a similar post (Why are the standard datatypes not used in Win32 API?):
And the reason that these types are defined the way they are, rather than using int, char and so on is that it removes the "whatever the compiler thinks an int should be sized as" from the interface of the OS. Which is a very good thing, because if you use compiler A, or compiler B, or compiler C, they will all use the same types - only the library interface header file needs to do the right thing defining the types.
By defining types that are not standard types, it's easy to change int from 16 to 32 bit, for example. The first C/C++ compilers for Windows were using 16-bit integers. It was only in the mid to late 1990's that Windows got a 32-bit API, and up until that point, you were using int that was 16-bit. Imagine that you have a well-working program that uses several hundred int variables, and all of a sudden, you have to change ALL of those variables to something else... Wouldn't be very nice, right - especially as SOME of those variables DON'T need changing, because moving to a 32-bit int for some of your code won't make any difference, so no point in changing those bits.
It should be noted that WCHAR is NOT the same as const char - WCHAR is a "wide char" so wchar_t is the comparable type.
So, basically, the "define our own type" is a way to guarantee that it's possible to change the underlying compiler architecture, without having to change (much of the) source code. All larger projects that do machine-dependant coding does this sort of thing.

How to print UTF-8 strings without using platform specific functions?

Is it possible to print UTF-8 strings without using platform specific functions?
#include <iostream>
#include <locale>
#include <string>
using namespace std;
int main()
{
ios_base::sync_with_stdio(false);
wcout.imbue(locale("en_US.UTF-8")); // broken on Windows (?)
wstring ws1 = L"Wide string.";
wstring ws2 = L"Wide string with special chars \u20AC"; // Euro character
wcout << ws1 << endl;
wcout << ws2 << endl;
wcout << ws1 << endl;
}
I get this runtime error:
terminate called after throwing an instance of 'std::runtime_error'
what(): locale::facet::_S_create_c_locale name not valid
If I remove the line wcout.imbue(locale("en_US.UTF-8"));, I get only ws1 printed, and just once.
In another question ("How can I cin and cout some unicode text?"), Philipp writes:
"wcin and wcout don't work on Windows, just like the equivalent C functions. Only the native API works." Is it true form MinGW, too?
Thank you for any hint!
Platform:
MinGW/GCC
Windows 7
I haven't used gcc in a mingw environment on Windows, but from what I gather it doesn't support C++ locales.
Since it doesn't support C++ locales this isn't really relevant, but FYI, Windows doesn't use the same locale naming scheme as most other platforms. They use a similar language_country.encoding, but the language and country are not codes, and the encoding is a Windows code page number. So the locale would be "English_United States.65001", however this is not a supported combination (code page 65001 (UTF-8) isn't supported as part of any locale).
The reason that only ws1 prints, and only once is that when the character \u20AC is printed, the stream fails and the fail bit is set. You have to clear the error before anything further will be printed.
C++11 introduced some things that will portably deal with UTF-8, but not everything is supported yet, and the additions don't completely solve the problem. But here's the way things currently stand:
When char16_t and char32_t are supported in VS as native types rather than typedefs you will be able to use the standard codecvt facet specializations codecvt<char16_t,char,mbstate_t> and codecvt<char32_t,char,mbstate_t> which are required to convert between UTF-16 or UTF-32 respectively, and UTF-8 (rather than the execution charset or system encoding). This doesn't work yet because in the current VS (and in VS11DP) these types are only typedefs and template specializations don't work on typedefs, but the code is already in the headers in VS 2010, just protected behind an #ifdef.
The standard also defines some special purpose codecvt facet templates which are supported, codecvt_utf8, and codecvt_utf8_utf16. The former converts between UTF-8 and either UCS-2 or UCS-4 depending on the size of the wide char type you use, and the latter converts between UTF-8 and UTF-16 code units independent of the size of the wide char type.
std::wcout.imbue(std::locale(std::locale::classic(),new std::codecvt_utf8_utf16<wchar_t>()));
std::wcout << L"ØÀéîðüýþ\n";
This will output UTF-8 code units through whatever is attached to wcout. If output has been redirected to file then opening it will show a UTF-8 encoded file. However, because of the console model on Windows, and the way the standard streams are implemented, you will not get correct display of Unicode characters in the command prompt this way (even if you set the console output code page to UTF-8 with SetConsoleOutputCP(CP_UTF8)). The UTF-8 code units are output one at a time, and the console will look at each individual chunk passed to it expecting each chunk (i.e. single byte in this case) passed to be complete and valid encodings. Incomplete or invalid sequences in the chunk (every byte of all multibyte character representations in this case) will be replaced with U+FFFD when the string is displayed.
If instead of using iostreams you use the C function puts to write out an entire UTF-8 encoded string (and if the console output code page is correctly set) then you can print a UTF-8 string and have it displayed in the console. The same codecvt facets can be used with some other C++11 convinence classes to do this:
std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>,wchar_t> convert;
puts(convert(L"ØÀéîðüýþ\n).to_bytes().c_str());
The above is still not quite portable, because it assumes that wchar_t is UTF-16, which is the case on Windows but not on most other platforms, and it is not required by the standard. (In fact my understanding is that it's not technically conforming because UTF-16 needs multiple code units to represent some characters and the standard requires that all characters in the chosen encoding must be representable in a single wchar_t).
std::wstring_convert<std::codecvt_utf8<wchar_t>,wchar_t> convert;
The above will portably handle UCS-4 and USC-2, but won't work outside the Basic Multilingual Plane on platforms using UTF-16.
You could use the conditional type trait to select between these two facets based on the size of wchar_t and get something that mostly works:
std::wstring_convert<
std::conditional<sizeof(wchar_t)==2,std::codecvt_utf8_utf16<wchar_t>,
std::codecvt_utf8<wchar_t>
>::type,
wchar_t
> convert;
Or just use preprocessor macros to define an appropriate typedef, if your coding standards allow macros.
Windows support for UTF-8 is pretty poor, and whilst it's possible to do it using the Windows API it's not at all fun, also, your question specifies that you DON'T want to use platform specific functions...
As for doing it in 'standard C++', I'm not sure if it's possible under Windows without platform specific code. HOWEVER, there are numerous third party libraries available which will abstract away these platform details and allow you to write portable code.
I have recently updated my applications to use UTF-8 internally with the help of the Boost.Locale library.
http://www.boost.org/doc/libs/1_48_0/libs/locale/doc/html/index.html
Its locale generation class will allow you to generate a UTF-8 based locale object which you can then imbue into all the standard streams etc.
I am using this right now under both MSVC and GCC via MinGW-w64 successfully! I highly suggest you check it out. Yes, unfortunately it's not technically 'standard C++', however Boost is available pretty much everywhere, and is practically a de-facto standard, so I don't think that's a huge concern.

Parsing CArchive (MFC classes) files in Ruby

I have a legacy app that seems to be exporting/saving files with CArchive (legacy MFC application).
We're currently refactoring the tool for the web. Is there a library I can look at in Ruby for parsing and loading these legacy files?
What possible libraries could I look into?
Problems with the file format according to XML serialization for MFC include:
Non-robustness—your program will probably crash if you read an archive produced by another version of your program. This can be avoided by complex and unwieldly version management. By using XML, this can be largely avoided.
- Heavy dependencies between your program object model and the archived data. Change the program model and it is almost impossible to read data from a previous version.
- Archived data cannot be edited, understood, and changed, except with the associated application.
Also - 4 versions of the legacy software exists, how would I be able to overcome this ObjectModel, Archived data problem for the different versions? Total backward (import) capabilities are required.
CArchive doesn't have a format that you can parse. It's just a binary file. You have to know what is in it to know how to read it. A library could make it easier to read some data types (CString, CArray, etc.) but I'm not sure you'll find anything like this.
CArchive works like this (storing part):
CArchive ar;
int i = 5;
float f = 5.42f;
CString str("string");
ar << i << f << str;
Then all this is dumped into binary file. You would have to read binary data and somehow interpret it. This is easy in C++ because MFC knows how to serialize types, including complex types like CString and CArray. But you'll have to do this on your own using Ruby.
For example you might read 4 bytes (because you know that int is that big) and interpret it as integer. Next four bytes for float. And then you have to see how to load CString, it stores the length first and then data, but you'll have to take a look at the exact format it uses. You could create utility functions for each type to make your life easier but don't expect this to be simple.
You could write an exporter in C++ using the old functionality, that would read in the CArchive and then output an xml file or whatever of the contents. Reading CArchives directly from Ruby (or any other language than C++/MFC) is going to be a major project. Maybe you can get away with it if the data that is written is just a struct with a few ints or longs, but as soon as your CArchive contains UDT's you're in for a world of pain. For example I don't even think CArchive makes promises on alignment.

Resources