I'm trying to create a program that filters through speech text, removes any unwanted characters (",", "?", etc., etc.") and then produces a new speech where the words are jumbled based on what words follow or precede them. So for example, if you had the Gettysburg Address:
Four score and seven years ago our fathers brought forth, on this continent, a new nation, conceived in Liberty, and dedicated to the proposition that all men are created equal.
my program would take that text, put it into a set of strings. i.e. ["Four","score","and","seven",...."continent,"..."Liberty,"..."equal."] Then it would remove any unwanted characters from each string using c++ .erase and c++ .remove, like "," or "." and capitals. After, you'd have a filtered string like ["four","score","and","seven",...."continent"..."liberty"..."equal."]
After that then the words would be rearranged into a new coherent, funnier speech, like:
"Seven years ago our fathers conceived on men...", etc.
That was just so you know the scope of this project. My trouble at the moment has to do with either using my iterator properly or null terminators.
#include <iostream>
#include <fstream>
#include <iomanip>
#include <string>
#include <set>
#include <iterator> //iterates through sets
#include <algorithm>
using namespace std;
int main() {
set <string> speechSet;
set <string> ::iterator itr; //forgot what :: means. Declares iterator as set
int sum = 0;
int x;
string data;
ofstream out;
string setString;
ifstream speechFile; //declare output file stream object. Unknown type name
speechFile.open("./MySpeech");
if (!speechFile) {
cerr << "Unable to open file " << endl;
exit(1);
}
char unwantedCharacters[] = ".";
while (!speechFile.eof()) {
speechFile >> data; //speechFile input into data
for (unsigned int i = 0; i < strlen(unwantedCharacters); ++i) {
data.erase((remove(data.begin(), data.end(),
unwantedCharacters[i]), data.end())); //remove doesn't delete.
data.end() - 1 = '\0'; //Reorganizes
cout << data << endl;
}
speechSet.insert(string(data));
}
//Go through each string (word) one at a time and remove "",?, etc.
/*for(itr = speechSet.begin(); itr != speechSet.end(); ++itr){
if(*itr == ".")//if value pointed to by *itr is equal to '.'
itr = speechSet.erase(itr);//erase the value in the set and leave blank
cout << " " << *itr;//print out the blank
else{
cout << " " << *itr;
}
}*/
speechFile.close();
return (0);
}
I keep getting an error that says error: no viable overloaded '='. At first I thought it might be due to .end() not being a command for a C++ string, but I checked the documentation and it shouldn't be an issue of mismatched data typed. Then I thought it might have to set the iterator itr equal to the end of the data.
iterator itr = data.end() - 1;
and then dereference that pointer and set it equal to the null terminator
itr* = '\0';
That removed the overload error, but I still had another error use of class template 'iterator' requires template arguments. Let me know if any more clarification is needed.
In the for loop, use auto for iterator so you don't have to specify its type like:
for(auto itr = speechSet.begin(); itr != speechSet.end(); ++itr){
/* I have to find character and remove them, based on the character it has to go inside the if/else if condition. I am facing difficulty in getting inside the else if condition */
#include <iostream>
#include <boost/algorithm/string.hpp>
#include <string>
using namespace std;
int main() {
int fut = 0, spd =0;
std::string symbol = "PGSh/d TWOGK h/d"; //it will contain either 'h/d' or '/'
std::string str = "h/d";
std::string str1 = "/";
if(symbol.find(str)) //if it finds "h/d" then it belongs to future
{
++fut; //even one count is enough
boost::erase_all(symbol, "h/d");
std::cout<<"Future Instrument "<<std::endl;
}
else if(symbol.find(str1)) //if it finds "/" then it belongs to spread
{
++spd; //even one count is enough
boost::erase_all(symbol, "//");
std::cout<<"Spread Instrument "<<std::endl;
}
boost::erase_all(symbol, " ");
boost::to_upper(symbol);
std::cout<<symbol<<std::endl;
return 0;
}
I coded in Borland C++ ages ago, and now I'm trying to understand the "new"(to me) C+11 (I know, we're in 2015, there's a c+14 ... but I'm working on an C++11 project)
Now I have several ways to assign a value to a string.
#include <iostream>
#include <string>
int main ()
{
std::string test1;
std::string test2;
test1 = "Hello World";
test2.assign("Hello again");
std::cout << test1 << std::endl << test2;
return 0;
}
They both work. I learned from http://www.cplusplus.com/reference/string/string/assign/ that there are another ways to use assign . But for simple string assignment, which one is better? I have to fill 100+ structs with 8 std:string each, and I'm looking for the fastest mechanism (I don't care about memory, unless there's a big difference)
Both are equally fast, but = "..." is clearer.
If you really want fast though, use assign and specify the size:
test2.assign("Hello again", sizeof("Hello again") - 1); // don't copy the null terminator!
// or
test2.assign("Hello again", 11);
That way, only one allocation is needed. (You could also .reserve() enough memory beforehand to get the same effect.)
I tried benchmarking both the ways.
static void string_assign_method(benchmark::State& state) {
std::string str;
std::string base="123456789";
// Code inside this loop is measured repeatedly
for (auto _ : state) {
str.assign(base, 9);
}
}
// Register the function as a benchmark
BENCHMARK(string_assign_method);
static void string_assign_operator(benchmark::State& state) {
std::string str;
std::string base="123456789";
// Code before the loop is not measured
for (auto _ : state) {
str = base;
}
}
BENCHMARK(string_assign_operator);
Here is the graphical comparitive solution. It seems like both the methods are equally faster. The assignment operator has better results.
Use string::assign only if a specific position from the base string has to be assigned.
I have an MPI program for having multiple processes read from a file that contains list of file names and based on the file names read - it reads the corresponding file and counts the frequency of words.
If one of the processes completes this and returns - to block executing MPI_Barrier(), the other processes also hang. On debugging, it could be seen that the readFile() function is not entered by the processes currently in process_files() Unable to figure out why this happens. Please find the code below:
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
#include <ctype.h>
#include <string.h>
#include "hash.h"
void process_files(char*, int* , int, hashtable_t* );
void initialize_word(char *c,int size)
{
int i;
for(i=0;i<size;i++)
c[i]=0;
return;
}
char* readFilesList(MPI_File fh, char* file,int rank, int nprocs, char* block, const int overlap, int* length)
{
char *text;
int blockstart,blockend;
MPI_Offset size;
MPI_Offset blocksize;
MPI_Offset begin;
MPI_Offset end;
MPI_Status status;
MPI_File_open(MPI_COMM_WORLD,file,MPI_MODE_RDONLY,MPI_INFO_NULL,&fh);
MPI_File_get_size(fh,&size);
/*Block size calculation*/
blocksize = size/nprocs;
begin = rank*blocksize;
end = begin+blocksize-1;
end+=overlap;
if(rank==nprocs-1)
end = size;
blocksize = end-begin+1;
text = (char*)malloc((blocksize+1)*sizeof(char));
MPI_File_read_at_all(fh,begin,text,blocksize,MPI_CHAR, &status);
text[blocksize+1]=0;
blockstart = 0;
blockend = blocksize;
if(rank!=0)
{
while(text[blockstart]!='\n' && blockstart!=blockend) blockstart++;
blockstart++;
}
if(rank!=nprocs-1)
{
blockend-=overlap;
while(text[blockend]!='\n'&& blockend!=blocksize) blockend++;
}
blocksize = blockend-blockstart;
block = (char*)malloc((blocksize+1)*sizeof(char));
block = memcpy(block, text + blockstart, blocksize);
block[blocksize]=0;
*length = strlen(block);
MPI_File_close(&fh);
return block;
}
void calculate_term_frequencies(char* file, char* text, hashtable_t *hashtable,int rank)
{
printf("Start File %s, rank %d \n\n ",file,rank);
fflush(stdout);
if(strlen(text)!=0||strlen(file)!=0)
{
int i,j;
char w[100];
i=0,j=0;
while(text[i]!=0)
{
if((text[i]>=65&&text[i]<=90)||(text[i]>=97&&text[i]<=122))
{
w[j]=text[i];
j++; i++;
}
else
{
w[j] = 0;
if(j!=0)
{
//ht_set( hashtable, strcat(strcat(w,"#"),file),1);
}
j=0;
i++;
initialize_word(w,100);
}
}
}
return;
}
void readFile(char* filename, hashtable_t *hashtable,int rank)
{
MPI_Status stat;
MPI_Offset size;
MPI_File fx;
char* textFromFile=0;
printf("Start File %d, rank %d \n\n ",strlen(filename),rank);
fflush(stdout);
if(strlen(filename)!=0)
{
MPI_File_open(MPI_COMM_WORLD,filename,MPI_MODE_RDONLY,MPI_INFO_NULL,&fx);
MPI_File_get_size(fx,&size);
printf("Start File %s, rank %d \n\n ",filename,rank);
fflush(stdout);
textFromFile = (char*)malloc((size+1)*sizeof(char));
MPI_File_read_at_all(fx,0,textFromFile,size,MPI_CHAR, &stat);
textFromFile[size]=0;
calculate_term_frequencies(filename, textFromFile, hashtable,rank);
MPI_File_close(&fx);
}
printf("Done File %s, rank %d \n\n ",filename,rank);
fflush(stdout);
return;
}
void process_files(char* block, int* length, int rank,hashtable_t *hashtable)
{
char s[2];
s[0] = '\n';
s[1] = 0;
char *file;
if(*length!=0)
{
/* get the first file */
file = strtok(block, s);
/* walk through other tokens */
while( file != NULL )
{
readFile(file,hashtable,rank);
file = strtok(NULL, s);
}
}
return;
}
void execute_process(MPI_File fh, char* file, int rank, int nprocs, char* block, const int overlap, int * length, hashtable_t *hashtable)
{
block = readFilesList(fh,file,rank,nprocs,block,overlap,length);
process_files(block,length,rank,hashtable);
}
int main(int argc, char *argv[]){
/*Initialization*/
MPI_Init(&argc, &argv);
MPI_File fh=0;
int rank,nprocs,namelen;
char *block=0;
const int overlap = 70;
char* file = "filepaths.txt";
int *length = (int*)malloc(sizeof(int));
hashtable_t *hashtable = ht_create( 65536 );
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &nprocs);
char processor_name[MPI_MAX_PROCESSOR_NAME];
MPI_Get_processor_name(processor_name, &namelen);
printf("Rank %d is on processor %s\n",rank,processor_name);
fflush(stdout);
execute_process(fh,file,rank,nprocs,block,overlap,length,hashtable);
printf("Rank %d returned after processing\n",rank);
MPI_Barrier(MPI_COMM_WORLD);
MPI_Finalize();
return 0;
}
The filepaths.txt is a file that contain the absolute file names of normal text files:
eg:
/home/mpiuser/mpi/MPI_Codes/code/test1.txt
/home/mpiuser/mpi/MPI_Codes/code/test2.txt
/home/mpiuser/mpi/MPI_Codes/code/test3.txt
Your readFilesList function is pretty confusing, and I believe it doesn't do what you want it to do, but maybe I just do not understand it correctly. I believe it is supposed to collect a bunch of filenames out of the list file for each process. A different set for each process. It does not do that, but this is not the problem, even if this would do what you want it to, the subsequent MPI IO would not work.
When reading files, you use MPI_File_read_all with MPI_COMM_WORLD as communicator. This requires all processes to participate in reading this file. Now, if each process should read a different file, this obviously is not going to work.
So there are several issues with your implementation, though I can not really explain your described behavior, I would rather first start off and try to fix them, before debugging in detail, what might go wrong.
I am under the impression, you want to have an algorithm along these lines:
Read a list of file names
Distribute that list of files equally to all processes
Have each process work on its own set of files
Do something with the data from this processing
And I would suggest to try this with the following approach:
Read the list on a single process (no MPI IO)
Scatter the list of files to all processes, such that all get around the same amount of work
Have each process work on its list of files independently and in serial (serial file access and processing)
Some data reduction with MPI, as needed
I believe, this would be the best (easiest and fastest) strategy in your scenario. Note, that no MPI IO is involved here at all. I don't think doing some complicated distributed reading of the file list in the first step would result in any advantage here, and in the actual processing it would actually be harmful. The more independent your processes are, the better your scalability usually.
I'm trying to solve spoj question stavatar http://www.spoj.com/problems/STAVATAR/.
I have tried all test cases generated random ones but still wa.
I am unable to find flaw in my algorithm.
#include<cstring>
#include<iostream>
#include<cstdio>
using namespace std;
char a[1000010],b[1000010];
int d[1000010];
int main()
{
int n;
scanf("%d",&n);
scanf("%s",a);
scanf("%s",b);
int k;
scanf("%d",&k);
for(int i=0;i<k;i++)
{
int x,y;
scanf("%d %d",&x,&y);
++d[x],++d[y+1];
}
long long sum=0;
for(int i=0;i<n;i++)
{
sum+=d[i];
if(sum%2!=0)
{
char t;
t=a[i];
a[i]=b[i];
b[i]=t;
}
}
printf("%s\n",a);
printf("%s\n",b);
return 0;
}
If you pay attention to Constraints, specifically the last part :
You might note the part in the end:
'....\t\r\x0b\0c'
These are the white-space printable characters.
Now, coming back to your solution. We can see you input the strings using scanf, which will read till the first whitespace it encounters, which could be any on '\t', '\x0b', '\x0c'. But in this specific question, a string should terminate only at '\n' character.
For example :
If the string is :
ab\tcd
which will look like
ab cd
in a CLI. The question demands that the first string be ab cd, whereas, you are taking ab as first string and cd as the second.
I guess, you understand the reason of this solution getting a WA.
Also, You might find this function helpful.
Edit :-
One could also use scanf function in this way - scanf("%[^\n]s", string); to perform the same task.