Is there an efficient way to read the last line of a text file? Right now i'm simply reading each line with code like below. Then S holds the last line read. Is there a good way to grab that last line without looping through entire text file?
TStreamReader* Reader;
Reader = new TStreamReader(myfile);
while (!Reader->EndOfStream)
{
String S = Reader->ReadLine();
}
Exactly as Remy Lebeau commented:
Use file access functions FileOpen,FileSeek,FileRead
look here for example of usage:
Convert the Linux open, read, write, close functions to work on Windows
load your file by chunks from end into memory
so make a static buffer and load file into it from end by chunks ...
stop on eol (end of line) usually CR,LF
just scan for 13,10 ASCII codes or their combinations from end of chunk. Beware some files have last line also terminated so you should skip that the first time ...
known eols are:
13
10
13,10
10,13
construct line
if no eol found add whole chunk to string, if found add just the part after it ...
Here small example:
int hnd,siz,i,n;
const int bufsz=256; // buffer size
char buf[bufsz+1];
AnsiString lin; // last line output
buf[bufsz]=0; // string terminator
hnd=FileOpen("in.txt",fmOpenRead); // open file
siz=FileSeek(hnd,0,2); // obtain size and point to its end
for (i=-1,lin="";siz;)
{
n=bufsz; // n = chunk size to load
if (n>siz) n=siz; siz-=n;
FileSeek(hnd,siz,0); // point to its location (from start)
FileRead(hnd,buf,n); // load it to buf[]
if (i<0) // first time pass (skip last eol)
{
i=n-1; if (i>0) if ((buf[i]==10)||(buf[i]==13)) n--;
i--; if (i>0) if ((buf[i]==10)||(buf[i]==13)) if (buf[i]!=buf[i+1]) n--;
}
for (i=n-1;i>=0;i--) // scan for eol (CR,LF)
if ((buf[i]==10)||(buf[i]==13))
{ siz=0; break; } i++; // i points to start of line and siz is zero so no chunks are readed after...
lin=AnsiString(buf+i)+lin; // add new chunk to line
}
FileClose(hnd); // close file
// here lin is your last line
Related
I found this code below on the internet which is suppose to count the sentences on an 8051 MCU.
Can someone please explain to me what is exactly happening where there are question marks.
Any kind of help would be highly appreciated.
#include<string.h>
char code *text=" what is a program? that has, a a lot of errors! When " ;
char code *text1=" you compile. this file, uVision. reports a number of? ";
char code *text2=" problems that you! may interactively correct. " ; //Null characters are also included in array!!!
void count ( char pdata* , char pdata*);
void main (void){
char pdata Nw,Ns;
char data TextNw[2],TextNs[2];
count(&Nw, &Ns); // call subroutine
TextNw[0]=Nw/10; //?????????????????????????????????
TextNw[1]=Nw%10; //?????????????????????????????????
TextNs[0]=Ns/10; //?????????????????????????????????
TextNs[1]=Ns%10; //?????????????????????????????????
while(1);
}
void count ( char pdata *Nw, char pdata *Ns ){
unsigned char N, i, ch;
typedef enum {idle1, idle2} state; //?????????????????????????????????
state S; // begining state
P2=0x00; // pdata bank definition it must be performed first!!
*Ns=*Nw=0; // without proper start-up there is no initialisation, initialise now!!
S=idle1; // beginning state
N=strlen(text)+strlen(text1)+strlen(text2)+3; //????????????? + 3 to acount 3 Null characters!
P2=0x00; // pdata bank definition
for(i=0;i!=N;i++){
ch=text[i]; // take a caharacter from the text
switch (S)
{
case (idle1):{
if (ch==0) break; // skip NULL terminating character!
if (ch!=' '){
S=idle2;
(*Nw)++;
}
break;
}
case(idle2):{
if (ch==0) break; // skip NULL terminating character!
if((ch==' ')||(ch==','))S=idle1;
else if ((ch=='?')||(ch=='.')||(ch=='!')){
S=idle1;
(*Ns)++;
}
break;
}
}
}
}
This program does 2 things in conjunction - counts number of sentences in the text and counts the number of words in the text. Once the counting is done, the results are stored in 2-char arrays. For example, for 57 words in 3 sentences the results will be stored as this: TextNw = {'5','7'} and TextNs = {'0','3'}.
The variable N contains the full length of the text with the addition of 3 null terminating characters (one per sentence).
The algorithm simultaneously counts words and sentences. In idle1 state the counting is in word-counting mode. In idle2 state the counting is in sentence-counting mode. The modes are interchanged according to current character being read - if delimiter is encountered, the appropriate counter is increased.
I am trying to write a custom reader which serves me the purpose of reading a record (residing in two lines) with defined number of fields.
For Eg
1,2,3,4("," can be there or not)
,5,6,7,8
My requirement is to read the record and push it into mapper as a single record like {1,2,3,4,5,6,7,8}. Please give some inputs.
UPDATE:
public boolean nextKeyValue() throws IOException, InterruptedException {
if(key == null) {
key = new LongWritable();
}
//Current offset is the key
key.set(pos);
if(value == null) {
value = new Text();
}
int newSize = 0;
int numFields = 0;
Text temp = new Text();
boolean firstRead = true;
while(numFields < reqFields) {
while(pos < end) {
//Read up to the '\n' character and store it in 'temp'
newSize = in.readLine( temp,
maxLineLength,
Math.max((int) Math.min(Integer.MAX_VALUE, end - pos),
maxLineLength));
//If 0 bytes were read, then we are at the end of the split
if(newSize == 0) {
break;
}
//Otherwise update 'pos' with the number of bytes read
pos += newSize;
//If the line is not too long, check number of fields
if(newSize < maxLineLength) {
break;
}
//Line too long, try again
LOG.info("Skipped line of size " + newSize + " at pos " +
(pos - newSize));
}
//Exit, since we're at the end of split
if(newSize == 0) {
break;
}
else {
String record = temp.toString();
StringTokenizer fields = new StringTokenizer(record,"|");
numFields += fields.countTokens();
//Reset 'value' if this is the first append
if(firstRead) {
value = new Text();
firstRead = false;
}
if(numFields != reqFields) {
value.append(temp.getBytes(), 0, temp.getLength());
}
else {
value.append(temp.getBytes(), 0, temp.getLength());
}
}
}
if(newSize == 0) {
key = null;
value = null;
return false;
}
else {
return true;
}
}
}
This is the nextKeyValue method which I am trying to work on. But still the mapper are not getting proper values.
reqFields is 4.
Look at how TextInputFormat is implemented. Look at it's superclass, FileInputFormat as well. You must subclass Either TextInputFormat of FileInputFormat and implement your own record handling.
Thing to be aware when implementing any kind of file input format is this:
Framework will split the file and give you the start offset and byte length of the piece of the file you have to read. It may very well happen that it splits the file right across some record. That is why your reader must skip the bytes of the record at the beginning of the split if that record is not fully contained in the split, as well as read past the last byte of the split to read the whole last record if that one is not fully contained in the split.
For example, TextInoutFormat treats \n characters as record delimiters so when it gets the split it skips the bytes until the first \n character and read past the end of the split until the \n character.
As for the code example:
You need to ask yourself the following question: Say you open the file, seek to a random position and start reading forward. How do you detect the start of the record? I don't see anything in your code that deals with that, and without it, you cannot write a good input format, because you don't know what are the record boundaries.
Now it is still possible to make the input format read the whole file end to end by making the isSplittable(JobContext,Path) method return false. That makes the file read wholly by single map task which reduces parallelism.
Your inner while loop seems problematic since it's checking for lines that are too long and is skipping them. Given that your records are written using multiple lines, it can happen that you merge one part of one record and another part of another record when you read it.
The string had to be tokenized using StringTokenizer and not split. The code has been updated with the new implmentation.
Ok, I'm a rookie at this and here is what I've been sitting here for a while scratching my head doing.
My goal is to read in a file from a command line argument and store the contents of the file in an array strings of which each element is a line from the file. I need the whole line including white spaces. And I need to cycle through the whole text file without knowing how large/small it is.
I'm fairly sure that key.eof() here is not right, but I've tried so many things now that I need to ask for help because I feel like I'm getting further and further from the solution.
ifstream key(argv[2]);
if (!key) // if file doesn't exist, EXIT
{
cout << "Could not open the key file!\n";
return EXIT_FAILURE;
}
else
{
vector<string> lines;
for (unsigned i = 0; i != key.eof(); ++i)
{
getline(key, lines[i]);
}
for (auto x : lines)
cout << x;
If anyone could point me in the right direction, this is just the begging of what I have to do and if feel clueless. The goal is for me to be able to break down each line into a vector(or whatever I need) of chars INCLUDING white spaces.
I think you want something like this:
vector<string> lines;
string line;
while(getline(key, line)) // keep going until eof or error
lines.push_back(line); // add line to lines
To continuous read lines from file and keep it in array, I would do something like this using a while loop instead of a for loop.
int counter = 0;
while (getline(key, lines[counter]) {
counter++;
}
I'd like to read a file line-by-line. I have fgets() working okay, but am not sure what to do if a line is longer than the buffer sizes I've passed to fgets()? And furthermore, since fgets() doesn't seem to be Unicode-aware, and I want to allow UTF-8 files, it might miss line endings and read the whole file, no?
Then I thought I'd use getline(). However, I'm on Mac OS X, and while getline() is specified in /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.8.sdk/usr/include/stdio.h, it's not in /usr/include/stdio, so gcc doesn't find it in the shell. And it's not particularly portable, obviously, and I'd like the library I'm developing to be generally useful.
So what's the best way to read a file line-by-line in C?
First of all, it's very unlikely that you need to worry about non-standard line terminators like U+2028. Normal text files are not expected to contain them, and the very overwhelming majority of all existing software that reads normal text files doesn't support them. You mention getline() which is available in glibc but not in MacOS's libc, and it would surprise me if getline() did support such fancy line terminators. It's almost a certainly that you can get away with just supporting LF (U+000A) and maybe also CR+LF (U+000D U+000A). To do that, you don't need to care about UTF-8. That's the beauty of UTF-8's ASCII compatibility and is by design.
As for supporting lines that are longer than the buffer you pass to fgets(), you can do this with a little extra logic around fgets. In pseudocode:
while true {
fgets(buffer, size, stream);
dynamically_allocated_string = strdup(buffer);
while the last char (before the terminating NUL) in the buffer is not '\n' {
concatenate the contents of buffer to the dynamically allocated string
/* the current line is not finished. read more of it */
fgets(buffer, size, stream);
}
process the whole line, as found in the dynamically allocated string
}
But again, I think you will find that there's really quite a lot of software out there that simply doesn't bother with that, from software that parses system config files like /etc/passwd to (some) scripting languages. Depending on your use case, it may very well be good enough to use a "big enough" buffer (e.g. 4096 bytes) and declare that you don't support lines longer than that. You can even call it a security feature (a line length limit is protection against resource exhaustion attacks from a crafted input file).
Based on this answer, here's what I've come up with:
#define LINE_BUF_SIZE 1024
char * getline_from(FILE *fp) {
char * line = malloc(LINE_BUF_SIZE), * linep = line;
size_t lenmax = LINE_BUF_SIZE, len = lenmax;
int c;
if(line == NULL)
return NULL;
for(;;) {
c = fgetc(fp);
if(c == EOF)
break;
if(--len == 0) {
len = lenmax;
char * linen = realloc(linep, lenmax *= 2);
if(linen == NULL) {
// Fail.
free(linep);
return NULL;
}
line = linen + (line - linep);
linep = linen;
}
if((*line++ = c) == '\n')
break;
}
*line = '\0';
return linep;
}
To read stdin:
char *line;
while ( line = getline_from(stdin) ) {
// do stuff
free(line);
}
To read some other file, I first open it with fopen():
FILE *fp;
fp = fopen ( filename, "rb" );
if (!fp) {
fprintf(stderr, "Cannot open %s: ", argv[1]);
perror(NULL);
exit(1);
}
char *line;
while ( line = getline_from(fp) ) {
// do stuff
free(line);
}
This works very nicely for me. I'd love to see an alternative that uses fgets() as suggested by #paul-tomblin, but I don't have the energy to figure it out tonight.
I'm trying to write a unit test that checks some xml parsing code. The unit test creates a file descriptor on an in-memory xml doc using shm_open and then passes that to xmlTextReaderForFd(). But I'm getting an "Extra content at the end of the document" error on the subsequent xmlTextReaderRead(). The parsing code works fine on a file descriptor created from an actual file (I've done a byte-for-byte comparison with the shm_open created one and it's the exact same set of bytes.) Why is libxml2 choking on a file descriptor created with shm_open?
Here's my code:
void unitTest() {
int fd = shm_open("/temporary", O_RDWR | O_CREAT, S_IRUSR | S_IWUSR);
char *pText = "<?xml version=\"1.0\"?><foo></foo>";
write(fd, pText, strlen(pText) + 1);
lseek(fd, 0, SEEK_SET);
xmlTextReaderPtr pReader = xmlReaderForFd(
fd, // file descriptor
"/temporary", // base uri
NULL, // encoding
0); // options
int result = xmlTextReaderRead(pReader);
// result is -1
// Get this error at console:
// /temporary:1: parser error : Extra content at the end of the document
// <?xml version="1.0"?><foo></foo>
// ^
}
I figured out the problem. I was writing out the NULL terminator and that's what was causing libxml2 to choke (although I could have sworn I already tried it without the NULL terminator, d'oh!) The fixed code should simply be:
write(fd, pText, strlen(pText));
Also, make sure you are reading the file as binary, not text. 'Text' strips out CR/LF, reduces the size of the file and leaves detritus at the end of the buffer.
Example (VS 2010):
struct _stat32 stat;
char *buf;
FILE *f = fopen("123.XML", "rb"); // right
//f = fopen("123.XML", "rt"); // WRONG!
_fstat(fileno(f), &stat);
buf = (char *)malloc(stat.st_size);
int ret = fread(buf, stat.st_size, 1, f);
assert(ret == 1);
// etc.