Last element of a for loop - for-loop

I'm splitting a document by a string delimiter in C++.
This is a minimal Python code to demonstrate the problem. la is splitted by 'x' to get (a,b,b) and (c,d) (only the element between x, or between x and end of file is recorded)
la = ['a','x','a','b','b','x','c','d']
out = []
tmp = []
inside = False
for a in la:
if a == "x":
if inside:
out.append(tmp)
tmp = []
inside = True
continue
if inside:
tmp.append(a)
out.append(tmp)
for a in out:
print a
There is code duplication here for the last element out.append(tmp). How do I move it inside the loop?
(out.append(tmp) is actually some large code and it's prone to error to write in different places).
P/S: Since the actual code is in C++, no special function from python is allowed to call in solving the problem
A minimal C++ code, I'm reading from a stringstream:
#include <iostream>
#include <sstream>
#include <string>
#include <vector>
using namespace std;
int main() {
// your code goes here
stringstream instream("a x b c d x c d");
vector<string> result;
string word, content;
while(getline(instream, word, ' ')) {
if (word == "x") {
result.push_back(content);
content = "";
continue;
}
content += word;
}
return 0;
}

Not sure why you would not just append outside the loop but you can check the length in the loop to catch the end elements:
out = []
tmp = []
for ind, ele in enumerate(la):
if ele == "x":
if tmp:
out.append(tmp)
tmp = []
elif ind == len(la) - 1:
tmp.append(ele)
out.append(tmp)
else:
tmp.append(ele)
You can use range in place of enumerate.
If you want to use continue you can remove the else:
for ind, ele in enumerate(la):
if ele == "x":
if tmp:
out.append(tmp)
tmp = []
continue
elif ind == len(la) - 1:
out.append(tmp)
tmp.append(ele)
I have zero experience with c++ but using stringstream.eof to catch the end of file might to do what you want:
#include <iostream>
#include <sstream>
#include <string>
#include <vector>
using namespace std;
int main() {
// your code goes here
stringstream instream("x a x b c d x c d x");
vector<string> result;
string word, content;
while(true) {
getline(instream, word, ' ');
if (instream.eof()){
if (word != "x"){
content += word;
}
cout << content << "\n";
break;
}
if (word == "x") {
result.push_back(content);
cout << content << "\n";
content = "";
continue;
}
content += word;
}
return 0;
}
Output:
a
bcd
cd
You also need to handle the case where he first character is x where you would output an empty string

Related

Why is this not counting correctly?

I am reading text from a text file and need to know the number of characters in the file in total. I thought this should work but it always seems to be overcounting. For example I typed this into my text file:
thisisatestthisisa
thisisa
And the program returned a total of 32.
#include <iostream>
#include <fstream>
#include <string>
#include <ostream>
using namespace std;
int main() {
fstream inFile;
string inputString;
inFile.open("text.txt", ios::in);
unsigned int total = 0;
if (inFile) {
while (inFile)
{
getline(inFile, inputString);
unsigned int tempStringLength = inputString.length();
total += tempStringLength;
}
cout << "total is: " << total << endl;
}
else {
cerr << "Unable to open file text.txt";
exit(1);
}
return 0;
}
You are double-counting the last line in the file.
Because you are using while(inFile) instead of while(getline(inFile, inputString)) the stream's state is not invalidated until the call to getline(...):
Walking through the loop will make this obvious:
Iteration 1:
unsigned int total = 0;
//...
while (inFile) //True
{
getline(inFile, inputString); //inFile: True, inputString: thisisatestthisisa
unsigned int tempStringLength = inputString.length(); //18
total += tempStringLength; //18
}
//...
Iteration 2:
//...
while (inFile) //True
{
getline(inFile, inputString); //inFile: True, inputString: thisisa
unsigned int tempStringLength = inputString.length(); //7
total += tempStringLength; //25
}
//...
Iteration 3:
//...
while (inFile) //True
{
getline(inFile, inputString); //inFile: EOF, inputString: thisisa (not modified)
unsigned int tempStringLength = inputString.length(); //7
total += tempStringLength; //32
}
//...
inFile now returns false because the EOF was reached and your loop terminates. Printing 32 as the length.
Long story short: Don't use the file state as a loop terminator. Use the actual read, either getline or operator>> depending on the situation.

the same result when i run the program

#include <iostream>
using namespace std;
int main()
{
bool result;
char text[1000];
cin>>text;
int len=sizeof(text);
for(int i = 0 ;i<len; ++i)
{
if(text[i]=='t' && text[i+1]=='r' && text[i+2]=='u' && text[i+3]=='e')
result = true;
else if(text[i]=='f' && text[i+1]=='a' && text[i+2]=='l' && text[i+3]=='s' && text[i+4]=='e')
result = false;
}
for(int i = 0 ;i<len; ++i)
{
if(text[i]=='n' && text[i+1]=='o' && text[i+2]=='t')
result = !result;// i think here is the problem
}
if(result == true)
cout<<"true"<<endl;
else if(result == false)
cout<<"false"<<endl;
return 0;
the exercise:
A boolean value can be either True or False. Given a string with less than 1000 characters with a number of space-separated not directives terminated by a True or False value, evaluate the boolean expression.
but when i run the program the result is always true.
please can you tell me where is the problem
Why don't you just use what is already there?
#include <iostream>
#include <iterator>
#include <sstream>
#include <stdexcept>
#include <string>
#include <vector>
int main()
{
bool result;
// Read the line
std::string line;
std::getline(std::cin, line);
// Split the line at spaces (https://stackoverflow.com/a/237280/1944004)
std::istringstream iss(line);
std::vector<std::string> tokens{std::istream_iterator<std::string>{iss}, std::istream_iterator<std::string>{}};
// Convert last element to bool
if (tokens.back() == "true") result = true;
else if (tokens.back() == "false") result = false;
else throw std::invalid_argument("The last argument is not a boolean!");
// Remove the last element
tokens.pop_back();
// Loop over the nots
for (auto const& t : tokens)
{
if (t == "not") result = !result;
else throw std::invalid_argument("Negation has to be indicated by 'not'!");
}
// Output the result
std::cout << std::boolalpha << result << '\n';
}
Live example

Algorithm to print all permutations with repetition of numbers

I have successfully designed the algorithm to print all the permutations with the repetition of numbers. But the algorithm which I have designed has a flaw. It works only if the chars of the string are unique.
Can someone help me out in extending the algorithm for the case where chars of the string may not be unique..
My code so far :
#include<cstdio>
#include<cstdlib>
#include<cstring>
#include<climits>
#include<iostream>
using namespace std;
void _perm(char *arr, char*result, int index)
{
static int count = 1;
if (index == strlen(arr))
{
cout << count++ << ". " << result << endl;
return;
}
for (int i = 0; i < strlen(arr); i++)
{
result[index] = arr[i];
_perm(arr, result, index + 1);
}
}
int compare(const void *a, const void *b)
{
return (*(char*)a - *(char*)b);
}
void perm(char *arr)
{
int n = strlen(arr);
if (n == 0)
return;
qsort(arr, n, sizeof(char), compare);
char *data = new char[n];
_perm(arr, data, 0);
free(data);
return;
}
int main()
{
char arr[] = "BACD";
perm(arr);
return 0;
}
I am printing the output strings in lexicographically sorted way.
I am referring to the example.3 from this page.
http://www.vitutor.com/statistics/combinatorics/permutations_repetition.html
Thanks.
Your code doesn't print permutations, but four draws from the string pool with repetition. It will produce 4^4 == 256 combinations, one of which is "AAAA".
The code Karnuakar linked to will give you permutations of a string, but without distinguishing between the multiple occurrences of certain letters. You need some means to prevent recursing with the same letter in each recursion step. In C++, this can be done with a set.
The example code below uses a typical C string, but uses the terminating '\0' to detect the end. The C-string functions from <cstring> are not needed. The output will not be sorted unless the original string was sorted.
#include <iostream>
#include <algorithm>
#include <set>
using namespace std;
void perm(char *str, int index = 0)
{
std::set<char> used;
char *p = str + index;
char *q = p;
if (*p == '\0') {
std::cout << str << std::endl;
return;
}
while (*q) {
if (used.find(*q) == used.end()) {
std::swap(*p, *q);
perm(str, index + 1);
std::swap(*p, *q);
used.insert(*q);
}
q++;
}
}
int main()
{
char arr[] = "AAABB";
perm(arr);
return 0;
}
This will produce 5! == 120 permutations for "ABCDE", but only 5! / (2! 3!) == 10 unique permutations for "AAABB". It will also create the 1260 permutations from the linked exercise.

Filter only digit sequences containing a given set of digits

I have a large list of digit strings like this one. The individual strings are relatively short (say less than 50 digits).
data = [
'300303334',
'53210234',
'123456789',
'5374576807063874'
]
I need to find out a efficient data structure (speed first, memory second) and algorithm which returns only those strings that are composed of a given set of digits.
Example results:
filter(data, [0,3,4]) = ['300303334']
filter(data, [0,1,2,3,4,5]) = ['300303334', '53210234']
The data list will usually fit into memory.
For each digit, precompute a postings list that don't contain the digit.
postings = [[] for _ in xrange(10)]
for i, d in enumerate(data):
for j in xrange(10):
digit = str(j)
if digit not in d:
postings[j].append(i)
Now, to find all strings that contain, for example, just the digits [1, 3, 5] you can merge the postings lists for the other digits (ie: 0, 2, 4, 6, 7, 8, 9).
def intersect_postings(p0, p1):
i0, i1 = next(p0), next(p1)
while True:
if i0 == i1:
yield i0
i0, i1 = next(p0), next(p1)
elif i0 < i1: i0 = next(p0)
else: i1 = next(p1)
def find_all(digits):
p = None
for d in xrange(10):
if d not in digits:
if p is None: p = iter(postings[d])
else: p = intersect_postings(p, iter(postings[d]))
return (data[i] for i in p) if p else iter(data)
print list(find_all([0, 3, 4]))
print list(find_all([0, 1, 2, 3, 4, 5]))
A string can be encoded by a 10-bit number. There are 2^10, or 1,024 possible values.
So create a dictionary that uses an integer for a key and a list of strings for the value.
Calculate the value for each string and add that string to the list of strings for that value.
General idea:
Dictionary Lookup;
for each (string in list)
value = 0;
for each character in string
set bit N in value, where N is the character (0-9)
Lookup[value] += string // adds string to list for this value in dictionary
Then, to get a list of the strings that match your criteria, just compute the value and do a direct dictionary lookup.
So if the user asks for strings that contain only 3, 5, and 7:
value = (1 << 3) || (1 << 5) || (1 << 7);
list = Lookup[value];
Note that, as Matt pointed out in comment below, this will only return strings that contain all three digits. So, for example, it wouldn't return 37. That seems like a fatal flaw to me.
Edit
If the number of symbols you have to deal with is very large, then the number of possible combinations becomes too large for this solution to be practical.
With a large number of symbols, I'd recommend an inverted index as suggested in the comments, combined with a secondary filter that removes the strings that contain extraneous digits.
Consider a function f which constructs a bitmask for each string with bit i set if digit i is in the string.
For example,
f('0') = 0b0000000001
f('00') = 0b0000000001
f('1') = 0b0000000010
f('1100') = 0b0000000011
Then I suggest storing a list of strings for each bitmask.
For example,
Bitmask 0b0000000001 -> ['0','00']
Once you have prepared this data structure (which is the same size as your original list), you can then easily access all the strings for a particular filter by accessing all lists where the bitmask is a subset of the digits in your filter.
So for your example of filter [0,3,4] you would return the lists from:
Strings containing just 0
Strings containing just 3
Strings containing just 4
Strings containing 0 and 3
Strings containing 0 and 4
Strings containing 3 and 4
Strings containing 0 and 3 and 4
Example Python Code
from collections import defaultdict
import itertools
raw_data = [
'300303334',
'53210234',
'123456789',
'5374576807063874'
]
def preprocess(raw_data):
data = defaultdict(list)
for s in raw_data:
bitmask = 0
for digit in s:
bitmask |= 1<<int(digit)
data[bitmask].append(s)
return data
def filter(data,mask):
for r in range(len(mask)):
for m in itertools.combinations(mask,r+1):
bitmask = sum(1<<digit for digit in m)
for s in data[bitmask]:
yield s
data = preprocess(raw_data)
for a in filter(data, [0,1,2,3,4,5]):
print a
Just for kicks, I have coded up Jim's lovely algorithm and the Perl is here if anyone wants to play with it. Please do not accept this as an answer or anything, pass all credit to Jim:
#!/usr/bin/perl
use strict;
use warnings;
my $Debug=1;
my $Nwords=1000;
my ($word,$N,$value,$i,$j,$k);
my (#dictionary,%Lookup);
################################################################################
# Generate "words" with random number of characters 5-30
################################################################################
print "DEBUG: Generating $Nwords word dictionary\n" if $Debug;
for($i=0;$i<$Nwords;$i++){
$j = rand(25) + 5; # length of this word
$word="";
for($k=0;$k<$j;$k++){
$word = $word . int(rand(10));
}
$dictionary[$i]=$word;
print "$word\n" if $Debug;
}
# Add some obvious test cases
$dictionary[++$i]="0" x 50;
$dictionary[++$i]="1" x 50;
$dictionary[++$i]="2" x 50;
$dictionary[++$i]="3" x 50;
$dictionary[++$i]="4" x 50;
$dictionary[++$i]="5" x 50;
$dictionary[++$i]="6" x 50;
$dictionary[++$i]="7" x 50;
$dictionary[++$i]="8" x 50;
$dictionary[++$i]="9" x 50;
$dictionary[++$i]="0123456789";
################################################################################
# Encode words
################################################################################
for $word (#dictionary){
$value=0;
for($i=0;$i<length($word);$i++){
$N=substr($word,$i,1);
$value |= 1 << $N;
}
push(#{$Lookup{$value}},$word);
print "DEBUG: $word encoded as $value\n" if $Debug;
}
################################################################################
# Do lookups
################################################################################
while(1){
print "Enter permitted digits, separated with commas: ";
my $line=<STDIN>;
my #digits=split(",",$line);
$value=0;
for my $d (#digits){
$value |= 1<<$d;
}
print "Value: $value\n";
print join(", ",#{$Lookup{$value}}),"\n\n" if defined $Lookup{$value};
}
I like Jim Mischel's approach. It has pretty efficient look up and bounded memory usage. Code in C follows:
#include <stdlib.h>
#include <stdint.h>
#include <stdio.h>
#include <string.h>
#include <readline/readline.h>
#include <readline/history.h>
enum {
zero = '0',
nine = '9',
numbers = nine - zero + 1,
masks = 1 << numbers,
};
typedef uint16_t mask;
struct list {
char *s;
struct list *next;
};
typedef struct list list_cell;
typedef struct list *list;
static inline int is_digit(char c) { return c >= zero && c <= nine; }
static inline mask char2mask(char c) { return 1 << (c - zero); }
static inline mask add_char2mask(mask m, char c) {
return m | (is_digit(c) ? char2mask(c) : 0);
}
static inline int is_set(mask m, mask n) { return (m & n) != 0; }
static inline int is_set_char(mask m, char c) { return is_set(m, char2mask(c)); }
static inline int is_submask(mask sub, mask m) { return (sub & m) == sub; }
static inline char *sprint_mask(char buf[11], mask m) {
char *s = buf;
char i;
for(i = zero; i <= nine; i++)
if(is_set_char(m, i)) *s++ = i;
*s = 0;
return buf;
}
static inline mask get_mask(char *s) {
mask m=0;
for(; *s; s++)
m = add_char2mask(m, *s);
return m;
}
static inline int is_empty(list l) { return !l; }
static inline list insert(list *l, char *s) {
list cell = (list)malloc(sizeof(list_cell));
cell->s = s;
cell->next = *l;
return *l = cell;
}
static void *foreach(void *f(char *, void *), list l, void *init) {
for(; !is_empty(l); l = l->next)
init = f(l->s, init);
return init;
}
struct printer_state {
int first;
FILE *f;
};
static void *prin_list_member(char *s, void *data) {
struct printer_state *st = (struct printer_state *)data;
if(st->first) {
fputs(", ", st->f);
} else
st->first = 1;
fputs(s, st->f);
return data;
}
static void print_list(list l) {
struct printer_state st = {.first = 0, .f = stdout};
foreach(prin_list_member, l, (void *)&st);
putchar('\n');
}
static list *init_lu(void) { return (list *)calloc(sizeof(list), masks); }
static list *insert2lu(list lu[masks], char *s) {
mask i, m = get_mask(s);
if(m) // skip string without any number
for(i = m; i < masks; i++)
if(is_submask(m, i))
insert(lu+i, s);
return lu;
}
int usage(const char *name) {
fprintf(stderr, "Usage: %s filename\n", name);
return EXIT_FAILURE;
}
#define handle_error(msg) \
do { perror(msg); exit(EXIT_FAILURE); } while (0)
static inline void chomp(char *s) { if( (s = strchr(s, '\n')) ) *s = '\0'; }
list *load_file(FILE *f) {
char *line = NULL;
size_t len = 0;
ssize_t read;
list *lu = init_lu();
for(; (read = getline(&line, &len, f)) != -1; line = NULL) {
chomp(line);
insert2lu(lu, line);
}
return lu;
}
void read_reqs(list *lu) {
char *line;
char buf[11];
for(; (line = readline("> ")); free(line))
if(*line) {
add_history(line);
mask m = get_mask(line);
printf("mask: %s\nstrings: ", sprint_mask(buf, m));
print_list(lu[m]);
};
putchar('\n');
}
int main(int argc, const char* argv[] ) {
const char *name = argv[0];
FILE *f;
list *lu;
if(argc != 2) return usage(name);
f = fopen(argv[1], "r");
if(!f) handle_error("open");
lu = load_file(f);
fclose(f);
read_reqs(lu);
return EXIT_SUCCESS;
}
To compile use
gcc -lreadline -o digitfilter digitfilter.c
And test run:
$ cat data.txt
300303334
53210234
123456789
5374576807063874
$ ./digitfilter data.txt
> 034
mask: 034
strings: 300303334
> 0,1,2,3,4,5
mask: 012345
strings: 53210234, 300303334
> 0345678
mask: 0345678
strings: 5374576807063874, 300303334
Put each value into a set-- Eg.: '300303334'={3, 0, 4}.
Since the length of your data items are bound by a constant (50),
you can do these at O(1) time for each item using Java HashSet. The overall complexity of this phase adds up to O(n).
For each filter set, use containsAll() of HashSet to see whether
each of these data items is a subset of your filter. Takes O(n).
Takes O(m*n) in the overall where n is the number of data items and m the number of filters.

Reading a text file using VC++

I need to read a text file which is for example like bottom :
8.563E+002 2.051E+004 4.180E-004 7.596E-001 5.260E-005 6.898E-002 1.710E-001 8.053E-011 2.686E-013 8.650E-012
each of this 10 scientific digits are the specific value of one line it means each line contains 10 value like above, There is one such line for every grid point in each file. The X indices value most rapidly, then Y, then Z; the first line in the file refers to element (0,0,0); it means the first 10 values presents the first line which refers to element (0,0,0) and the second line (second 10 values) to second element (1,0,0); the last to element (599,247,247).
I don't know how can I write the code for this file using visual C++ ,what I know is I have to read this file line by line which can be determined by eliminating 10 values and tokenize it , then I have to create the x y z for each line il end of the line. I know the concept but I don't know How can I code it in visual C++ .. I need to submit it as my homework .. I really welcome every help .. Thanks
core part can look like:
std::ifstream is("test.txt");
std::vector<double> numbers;
for(;;) {
double number;
is >> number;
if (!is)
break;
numbers.push_back(number);
}
I do not have here MSVC but GCC 4.3. I hope this code helps:
#include <iostream>
#include <fstream>
#include <list>
#include <string>
#include <iterator>
using namespace std;
class customdata
{
friend istream& operator>>(istream& in, customdata& o);
friend ostream& operator<<(ostream& out, const customdata& i);
public:
customdata()
: x(0), y(0), z(0)
{}
customdata(const customdata& o)
: x(o.x), y(o.y), z(o.z)
{}
customdata& operator=(const customdata& o)
{
if (this != &o)
{
x = o.x;
y = o.y;
z = o.z;
}
return *this;
}
private:
long double x, y, z;
};
istream& operator>>(istream& in, customdata& o)
{
in >> o.x >> o.y >> o.z;
return in;
}
ostream& operator<<(ostream& out, const customdata& i)
{
out << "x=" << i.x << " y=" << i.y << " z=" << i.z;
return out;
}
// Usage: yourexec <infile>
int main(int argc, char** argv)
{
int exitcode=0;
if(argc > 1)
{
ifstream from(argv[1]);
if (!from)
{
cerr << "cannot open input file " << argv[1] << endl;
exitcode=1;
}
else
{
list<customdata> mydata;
copy(istream_iterator<customdata>(from), istream_iterator<customdata>(), back_inserter(mydata));
if(mydata.empty())
{
cerr << "corrupt input data" << endl;
exitcode=3;
}
else
copy(mydata.begin(), mydata.end(), ostream_iterator<customdata>(cout, "\n"));
}
}
else
{
cerr << "insufficient calling parameters" << endl;
exitcode=2;
}
return exitcode;
}

Resources