"Extra content at the end of the document" error using libxml2 to read from file handle created with shm_open - shared-memory

I'm trying to write a unit test that checks some xml parsing code. The unit test creates a file descriptor on an in-memory xml doc using shm_open and then passes that to xmlTextReaderForFd(). But I'm getting an "Extra content at the end of the document" error on the subsequent xmlTextReaderRead(). The parsing code works fine on a file descriptor created from an actual file (I've done a byte-for-byte comparison with the shm_open created one and it's the exact same set of bytes.) Why is libxml2 choking on a file descriptor created with shm_open?
Here's my code:
void unitTest() {
int fd = shm_open("/temporary", O_RDWR | O_CREAT, S_IRUSR | S_IWUSR);
char *pText = "<?xml version=\"1.0\"?><foo></foo>";
write(fd, pText, strlen(pText) + 1);
lseek(fd, 0, SEEK_SET);
xmlTextReaderPtr pReader = xmlReaderForFd(
fd, // file descriptor
"/temporary", // base uri
NULL, // encoding
0); // options
int result = xmlTextReaderRead(pReader);
// result is -1
// Get this error at console:
// /temporary:1: parser error : Extra content at the end of the document
// <?xml version="1.0"?><foo></foo>
// ^
}

I figured out the problem. I was writing out the NULL terminator and that's what was causing libxml2 to choke (although I could have sworn I already tried it without the NULL terminator, d'oh!) The fixed code should simply be:
write(fd, pText, strlen(pText));

Also, make sure you are reading the file as binary, not text. 'Text' strips out CR/LF, reduces the size of the file and leaves detritus at the end of the buffer.
Example (VS 2010):
struct _stat32 stat;
char *buf;
FILE *f = fopen("123.XML", "rb"); // right
//f = fopen("123.XML", "rt"); // WRONG!
_fstat(fileno(f), &stat);
buf = (char *)malloc(stat.st_size);
int ret = fread(buf, stat.st_size, 1, f);
assert(ret == 1);
// etc.

Related

Linux kernel_write function returns EFBIG when appending data to big file

The task is to write simple character device that copies all the data written to the device to tmp a file.
I use kernel_write function to write data to file and its work fine most of the cases. But when the output file size is bigger than 2.1 GB, kernel_write function fails with return value -27.
To write to file I use this function:
void writeToFile(void* buf, size_t size, loff_t *offset) {
struct file* destFile;
char* filePath = "/tmp/output";
destFile = filp_open(filePath, O_CREAT|O_WRONLY|O_APPEND, 0666);
if (IS_ERR(destFile) || destFile == NULL) {
printk("Cannot open destination file");
return;
}
size_t res = kernel_write(destFile, buf, size, offset);
printk("%ld", res);
filp_close(destFile, NULL);
}
If the size of "/tmp/output" < 2.1 GB, this function works just fine.
If the size of "/tmp/output"> 2.1 GB, kernel_write starts to return -27.
How can I fix this?
Thanks
You need to enable Large File Support (LFS) with the O_LARGEFILE flag.
The below code worked for me. Sorry, I made some other changes for debugging, but I commented above the relevant line.
struct file* writeToFile(void* buf, size_t size, loff_t *offset)
{
struct file* destFile;
char* filePath = "/tmp/output";
size_t res;
// Add the O_LARGEFILE flag here
destFile = filp_open(filePath, O_CREAT | O_WRONLY | O_APPEND | O_LARGEFILE, 0666);
if (IS_ERR(destFile))
{
printk("Error in opening: <%ld>", (long)destFile);
return destFile;
}
if (destFile == NULL)
{
printk("Error in opening: null");
return destFile;
}
res = kernel_write(destFile, buf, size, offset);
printk("CODE: <%ld>", res);
filp_close(destFile, NULL);
return destFile;
}
To test it, I created a file with fallocate -l 3G /tmp/output, then removed the O_CREAT flag because it was giving the kernel permission errors.
I should add a disclaimer that a lot of folks says that File I/O from the kernel is a bad idea. Even while testing this out on my own, I accidentally crashed my computer twice due to dumb errors.
Maybe do this instead: Read/Write from /proc File

Same .txt files, different sizes?

I have a program that reads from a .txt file
I use the cmd prompt to execute the program with the name of the text file to read from.
ex: program.exe myfile.txt
The problem is that sometimes it works, sometimes it doesn't.
The original file is 130KB and doesn't work.
If I copy/paste the contents, the file is 65KB and works.
If I copy/paste the file and rename it, it's 130KB and doesn't work.
Any ideas?
After more testing it shows that this is what makes it not work:
int main(int argc, char *argv[])
{
char *infile1
char tmp[1024] = { 0x0 };
FILE *in;
for (i = 1; i < argc; i++) /* Skip argv[0] (program name). */
{
if (strcmp(argv[i], "-sec") == 0) /* Process optional arguments. */
{
opt = 1; /* This is used as a boolean value. */
/*
* The last argument is argv[argc-1]. Make sure there are
* enough arguments.
*/
if (i + 1 <= argc - 1) /* There are enough arguments in argv. */
{
/*
* Increment 'i' twice so that you don't check these
* arguments the next time through the loop.
*/
i++;
optarg1 = atoi(argv[i]); /* Convert string to int. */
}
}
else /* not -sec */
{
if (infile1 == NULL) {
infile1 = argv[i];
}
else {
if (outfile == NULL) {
outfile = argv[i];
}
}
}
}
in = fopen(infile1, "r");
if (in == NULL)
{
fprintf(stderr, "Unable to open file %s: %s\n", infile1, strerror(errno));
exit(1);
}
while (fgets(tmp, sizeof(tmp), in) != 0)
{
fprintf(stderr, "string is %s.", tmp);
//Rest of code
}
}
Whether it works or not, the code inside the while loop gets executed.
When it works tmp actually has a value.
When it doesn't work tmp has no value.
EDIT:
Thanks to sneftel, we know what the problem is,
For me to use fgetws() instead of fgets(), I need tmp to be a wchar_t* instead of a char*.
Type casting seems to not work.
I tried changing the declaration of tmp to
wchar_t tmp[1024] = { 0x0 };
but I realized that tmp is a parameter in strtok() used elsewhere in my code.
I here is what I tried in that function:
//tmp is passed as the first parameter in parse()
void parse(wchar_t *record, char *delim, char arr[][MAXFLDSIZE], int *fldcnt)
{
if (*record != NULL)
{
char*p = strtok((char*)record, delim);
int fld = 0;
while (p) {
strcpy(arr[fld], p);
fld++;
p = strtok('\0', delim);
}
*fldcnt = fld;
}
else
{
fprintf(stderr, "string is null");
}
}
But typecasting to char* in strtok doesn't work either.
Now I'm looking for a way to just convert the file from UTF-16 to UTF-8 so tmp can be of type char*
I found this which looks like it can be useful but in the example it uses input from the user as UTF-16, how can that input be taken from the file instead?
http://www.cplusplus.com/reference/locale/codecvt/out/
It sounds an awful lot like the original file is UTF-16 encoded. When you copy/paste it in your text editor, you then save the result out as a new (default encoding) (ASCII or UTF-8) text file. Since a single character takes 2 bytes in a UTF-16-encode file but only 1 byte in a UTF-8-encoded file, that results in the file size being roughly halved when you save it out.
UTF-16 is fine, but you'll need to use Unicode-aware functions (that is, not fgets) to work with it. If you don't want to deal with all that Unicode jazz right now, and you don't actually have any non-ASCII characters to deal with in the file, just do the manual conversion (either with your copy/paste or with a command-line utility) before running your program.

How Should I read a File line-by-line in C?

I'd like to read a file line-by-line. I have fgets() working okay, but am not sure what to do if a line is longer than the buffer sizes I've passed to fgets()? And furthermore, since fgets() doesn't seem to be Unicode-aware, and I want to allow UTF-8 files, it might miss line endings and read the whole file, no?
Then I thought I'd use getline(). However, I'm on Mac OS X, and while getline() is specified in /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.8.sdk/usr/include/stdio.h, it's not in /usr/include/stdio, so gcc doesn't find it in the shell. And it's not particularly portable, obviously, and I'd like the library I'm developing to be generally useful.
So what's the best way to read a file line-by-line in C?
First of all, it's very unlikely that you need to worry about non-standard line terminators like U+2028. Normal text files are not expected to contain them, and the very overwhelming majority of all existing software that reads normal text files doesn't support them. You mention getline() which is available in glibc but not in MacOS's libc, and it would surprise me if getline() did support such fancy line terminators. It's almost a certainly that you can get away with just supporting LF (U+000A) and maybe also CR+LF (U+000D U+000A). To do that, you don't need to care about UTF-8. That's the beauty of UTF-8's ASCII compatibility and is by design.
As for supporting lines that are longer than the buffer you pass to fgets(), you can do this with a little extra logic around fgets. In pseudocode:
while true {
fgets(buffer, size, stream);
dynamically_allocated_string = strdup(buffer);
while the last char (before the terminating NUL) in the buffer is not '\n' {
concatenate the contents of buffer to the dynamically allocated string
/* the current line is not finished. read more of it */
fgets(buffer, size, stream);
}
process the whole line, as found in the dynamically allocated string
}
But again, I think you will find that there's really quite a lot of software out there that simply doesn't bother with that, from software that parses system config files like /etc/passwd to (some) scripting languages. Depending on your use case, it may very well be good enough to use a "big enough" buffer (e.g. 4096 bytes) and declare that you don't support lines longer than that. You can even call it a security feature (a line length limit is protection against resource exhaustion attacks from a crafted input file).
Based on this answer, here's what I've come up with:
#define LINE_BUF_SIZE 1024
char * getline_from(FILE *fp) {
char * line = malloc(LINE_BUF_SIZE), * linep = line;
size_t lenmax = LINE_BUF_SIZE, len = lenmax;
int c;
if(line == NULL)
return NULL;
for(;;) {
c = fgetc(fp);
if(c == EOF)
break;
if(--len == 0) {
len = lenmax;
char * linen = realloc(linep, lenmax *= 2);
if(linen == NULL) {
// Fail.
free(linep);
return NULL;
}
line = linen + (line - linep);
linep = linen;
}
if((*line++ = c) == '\n')
break;
}
*line = '\0';
return linep;
}
To read stdin:
char *line;
while ( line = getline_from(stdin) ) {
// do stuff
free(line);
}
To read some other file, I first open it with fopen():
FILE *fp;
fp = fopen ( filename, "rb" );
if (!fp) {
fprintf(stderr, "Cannot open %s: ", argv[1]);
perror(NULL);
exit(1);
}
char *line;
while ( line = getline_from(fp) ) {
// do stuff
free(line);
}
This works very nicely for me. I'd love to see an alternative that uses fgets() as suggested by #paul-tomblin, but I don't have the energy to figure it out tonight.

playing files after accepting them through open dialog box

I am a new member and joined this site after referring to it loads of times when i was stuck with some programming problems. I am trying to code a media player (Win32 SDK VC++ 6.0) for my college project and I am stuck. I have searched on various forums and msdn and finally landed on the function GetShortPathName which enables me to play through folders and files which have a whitespace in their names. I will paste the code here so it will be much more clearer as to what i am trying to do.
case IDM_FILE_OPEN :
ZeroMemory(&ofn, sizeof(ofn));
ofn.lStructSize = sizeof(ofn);
ofn.hwndOwner = hwnd;
ofn.lpstrFilter = "Media Files (All Supported Types)\0*.avi;*.mpg;*.mpeg;*.asf;*.wmv;*.mp2;*.mp3\0"
"Movie File (*.avi;*.mpg;*.mpeg)\0*.avi;*.mpg;*.mpeg\0"
"Windows Media File (*.asf;*.wmv)\0*.asf;*.wmv\0"
"Audio File (*.mp2;*.mp3)\0*.mp2;*.mp3\0"
"All Files(*.*)\0*.*\0";
ofn.lpstrFile = szFileName;
ofn.nMaxFile = MAX_PATH;
ofn.Flags = OFN_EXPLORER | OFN_FILEMUSTEXIST | OFN_HIDEREADONLY | OFN_ALLOWMULTISELECT | OFN_CREATEPROMPT;
ofn.lpstrDefExt = "mp3";
if(GetOpenFileName(&ofn))
{
length = GetShortPathName(szFileName, NULL, 0);
buffer = (TCHAR *) malloc (sizeof(length));
length = GetShortPathName(szFileName, buffer, length);
for(i = 0 ; i < MAX_PATH ; i++)
{
if(buffer[i] == '\\')
buffer[i] = '/';
}
SendMessage(hList,LB_ADDSTRING,0,(LPARAM)buffer);
mciSendString("open buffer alias myFile", NULL, 0, NULL);
mciSendString("play buffer", NULL, 0, NULL);
}
return 0;
using the GetShortPathName function i get the path as : D:/Mp3z/DEEPBL~1/03SLEE~1.mp3
Putting this path directly in Play button case
mciSendString("open D:/Mp3jh/DEEPBL~1/03SLEE~1.mp3 alias myFile", NULL, 0, NULL);
mciSendString("play myFile", NULL, 0, NULL);
the file opens and plays fine. But as soon as i try to open and play it through the open file dialog box, nothing happens. Any input appreciated.
It looks like the problem is that you're passing the name of the buffer variable to the mciSendString function as a string, rather than passing the contents of the buffer.
You need to concatenate the arguments you want to pass (open and alias myFile) with the contents of buffer.
The code can also be much simplified by replacing malloc with an automatic array. You don't need to malloc it because you don't need it outside of the block scope. (And you shouldn't be using malloc in C++ code anyway; use new[] instead.)
Here's a modified snippet of the code shown in your question:
(Warning: changes made using only my eyes as a compiler! Handle with care.)
if(GetOpenFileName(&ofn))
{
// Get the short path name, and place it in the buffer array.
// We know that a short path won't be any longer than MAX_PATH, so we can
// simply allocate a statically-sized array without futzing with new[].
//
// Note: In production code, you should probably check the return value
// of the GetShortPathName function to make sure it succeeded.
TCHAR buffer[MAX_PATH];
GetShortPathName(szFileName, buffer, MAX_PATH);
// Add the short path name to your ListBox control.
//
// Note: In C++ code, you should probably use C++-style casts like
// reinterpret_cast, rather than C-style casts!
SendMessage(hList, LB_ADDSTRING, 0, reinterpret_cast<LPARAM>(buffer));
// Build the argument string to pass to the mciSendString function.
//
// Note: In production code, you probably want to use the more secure
// alternatives to the string concatenation functions.
// See the documentation for more details.
// And, as before, you should probably check return values for error codes.
TCHAR arguments[MAX_PATH * 2]; // this will definitely be large enough
lstrcat(arguments, TEXT("open"));
lstrcat(arguments, buffer);
lstrcat(arguments, TEXT("alias myFile"));
// Or, better yet, use a string formatting function, like StringCbPrintf:
// StringCbPrintf(arguments, MAX_PATH * 2, TEXT("open %s alias myFile"),
// buffer);
// Call the mciSendString function with the argument string we just built.
mciSendString(arguments, NULL, 0, NULL);
mciSendString("play myFile", NULL, 0, NULL);
}
Do note that, as the above code shows, working with C-style strings (character arrays) is a real pain in the ass. C++ provides a better alternative, in the form of the std::string class. You should strongly consider using that instead. To call Windows API functions, you'll still need a C-style string, but you can get one of those by using the c_str method of the std::string class.

Alternative to fgets()?

Description:
Obtain output from an executable
Note:
Will not compile, due to fgets() declaration
Question:
What is the best alternative to fgets, as fgets requires char *?
Is there a better alternative?
Illustration:
void Q_analysis (const char *data)
{
string buffer;
size_t found;
found = buffer.find_first_of (*data);
FILE *condorData = _popen ("condor_q", "r");
while (fgets (buffer.c_str(), buffer.max_size(), condorData) != NULL)
{
if (found == string::npos)
{
Sleep(2000);
} else {
break;
}
}
return;
}
You should be using the string.getline function for strings
cppreference
however in your case, you should be using a char[] to read into.
eg
string s;
char buffer[ 4096 ];
fgets(buffer, sizeof( buffer ), condorData);
s.assign( buffer, strlen( buffer ));
or your code:
void Q_analysis( const char *data )
{
char buffer[ 4096 ];
FILE *condorData = _popen ("condor_q", "r");
while( fgets( buffer, sizeof( buffer ), condorData ) != NULL )
{
if( strstr( buffer, data ) == NULL )
{
Sleep(2000);
}
else
{
break;
}
}
}
Instead of declaring you buffer as a string declare it as something like:
char buffer[MY_MAX_SIZE]
call fgets with that, and then build the string from the buffer if you need in that form instead of going the other way.
The reason what you're doing doesn't work is that you're getting a copy of the buffer contents as a c-style string, not a pointer into the gut of the buffer. It is, by design, read only.
-- MarkusQ
You're right that you can't read directly into a std::string because its c_str and data methods both return const pointers. You could read into a std::vector<char> instead.
You could also use the getline function. But it requires an iostream object, not a C FILE pointer. You can get from one to the other, though, in a vendor-specific way. See "A Handy Guide To Handling Handles" for a diagram and some suggestions on how to get from one file type to another. Call fileno on your FILE* to get a numeric file descriptor, and then use fstream::attach to associate it with an fstream object. Then you can use getline.
Try the boost library - I believe it has a function to create an fstream from a FILE*
or you could use fileno() to get a standard C file handle from the FILE, then use fstream::attach to attach a stream to that file. From there you can use getline(), etc. Something like this:
FILE *condorData = _popen ("condor_q", "r");
std::ifstream &stream = new std::ifstream();
stream.attach(_fileno(condorData));
I haven't tested it all too well, but the below appears to do the job:
//! read a line of text from a FILE* to a std::string, returns false on 'no data'
bool stringfgets(FILE* fp, std::string& line)
{
char buffer[1024];
line.clear();
do {
if(!fgets(buffer, sizeof(buffer), fp))
return !line.empty();
line.append(buffer);
} while(!strchr(buffer, '\n'));
return true;
}
Be aware however that this will happily read a 100G line of text, so care must be taken that this is not a DoS-vector from untrusted source files or sockets.

Resources