how can we count or compare strings? [closed] - c++11

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
i want it to read 1 string at a time and count how many time a string appears after the comma like if A appear there is count += 1 and if B appears there a seperate count+= 1 and then do the same thing to another string and if it appear twices more than the first string than save it
Ex:
string num1 = "A, B , C, AB, AC"
string num2 = "A, B, C , AB, A, C, AC, AB"
istringstream uc(num2);
string num3
while(getline(uc,num3,',')) //get part of the string after you see ','
{
}
result: since they appear twice
C, AB , A

First, I'd create a function that counts the number of occurrences of each word in the string. I'd probably store that information in a std::map, since that's fairly convenient.
Then, I'd simply iterate through the counts for num2, and if it's greater than the counts for num1, I'd print the string.
It might look something like this:
#include <iostream>
#include <map>
#include <regex>
#include <string>
std::map<std::string, int> StringCounts(std::string input) {
static const std::regex re(" *, *");
std::map<std::string, int> counts;
for (std::sregex_token_iterator it(input.begin(), input.end(), re, -1);
it != std::sregex_token_iterator();
++it)
counts[*it]++;
return counts;
}
int main() {
const std::string num1 = "A, B , C, AB, AC";
const std::string num2 = "A, B, C , AB, A, C, AC, AB";
auto counts1 = StringCounts(num1);
auto counts2 = StringCounts(num2);
for (auto pair : counts2) {
const std::string &word = pair.first;
if (counts2[word] > counts1[word])
std::cout << word << ", ";
}
std::cout << "\n";
}
Which outputs:
A, AB, C,
If we cared about performance, we might note that we are iterating through the map a huge number of times. We could rewrite that loop in an O(n) manner, but I'll leave that as an exercise for the reader.

Related

Why the move assignment of std::initializer_list was not blocked?

It is clear that std::initializer_list is not an actual container.
The standard defines clearly what you can and cannot do with std::initializer_list.
But why does the language keep the option of doing foolish things, like assigning a temporary std::initializer_list into another - when it could have been easily blocked with =delete on std::initializer_list's move assignment operator?
Here is a broken code example, that compiles:
void foo(std::initializer_list<int> v) {
std::cout << *v.begin() << std::endl;
}
int main() {
int a = 1, b = 2, c = 3;
auto val = {a, b, c}; // ok, extending the lifetime of {a, b, c}
foo(val); // prints ok
int i = 7;
val = {i}; // doesn't handle well assignment of temporary
foo(val); // prints garbage...
}
Output, courtesy of #nwp:
1
-365092848

Finding every possible word out of a bigger word [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
Hi I'm looking for an algorithm to extract every possible word out of a single word in C++.
For example from the word "overflow" I can get these : "love","flow","for","row,"over"...
So how can I get only valid english words efficiently.
Note: I have a dictionary, a big word list.
I can't think how to do this without bruit-forcing it with all the permutations.
Something like this:
#include <string>
#include <algorithm>
int main()
{
using size_type = std::string::size_type;
std::string word = "overflow";
// examine every permutation of the letters contained in word
while(std::next_permutation(word.begin(), word.end()))
{
// examine each substring permutation
for(size_type s = 0; s < word.size(); ++s)
{
std::string sub = word.substr(0, s);
// look up sub in a dictionary here...
}
}
return 0;
}
I can think of 2 ways to speed this up.
1) Keep a check on substrings of a given permutation already tried to avoid unnecessary dictionary lookups (std::set or std::unordered_set maybe).
2) Cache popular results, keeping the most frequently requested words (std::map or std::unordered_map perhaps).
NOTE:
It turns out even after adding cashing at various levels this is indeed a very slow algorithm for larger words.
However this uses a much faster algorithm:
#include <set>
#include <string>
#include <cstring>
#include <fstream>
#include <iostream>
#include <algorithm>
#define con(m) std::cout << m << '\n'
std::string& lower(std::string& s)
{
std::transform(s.begin(), s.end(), s.begin(), tolower);
return s;
}
std::string& trim(std::string& s)
{
static const char* t = " \t\n\r";
s.erase(s.find_last_not_of(t) + 1);
s.erase(0, s.find_first_not_of(t));
return s;
}
void usage()
{
con("usage: anagram [-p] -d <word-file> -w <word>");
con(" -p - (optional) find only perfect anagrams.");
con(" -d <word-file> - (required) A file containing a list of possible words.");
con(" -w <word> - (required) The word to find anagrams of in the <word-file>.");
}
int main(int argc, char* argv[])
{
std::string word;
std::string wordfile;
bool perfect_anagram = false;
for(int i = 1; i < argc; ++i)
{
if(!strcmp(argv[i], "-p"))
perfect_anagram = true;
else if(!strcmp(argv[i], "-d"))
{
if(!(++i < argc))
{
usage();
return 1;
}
wordfile = argv[i];
}
else if(!strcmp(argv[i], "-w"))
{
if(!(++i < argc))
{
usage();
return 1;
}
word = argv[i];
}
}
if(wordfile.empty() || word.empty())
{
usage();
return 1;
}
std::ifstream ifs(wordfile);
if(!ifs)
{
con("ERROR: opening dictionary: " << wordfile);
return 1;
}
// for analyzing the relevant characters and their
// relative abundance
std::string sorted_word = lower(word);
std::sort(sorted_word.begin(), sorted_word.end());
std::string unique_word = sorted_word;
unique_word.erase(std::unique(unique_word.begin(), unique_word.end()), unique_word.end());
// This is where the successful words will go
// using a set to ensure uniqueness
std::set<std::string> found;
// plow through the dictionary
// (storing it in memory would increase performance)
std::string line;
while(std::getline(ifs, line))
{
// quick rejects
if(trim(line).size() < 2)
continue;
if(perfect_anagram && line.size() != word.size())
continue;
if(line.size() > word.size())
continue;
// This may be needed if dictionary file contains
// upper-case words you want to match against
// such as acronyms and proper nouns
// lower(line);
// for analyzing the relevant characters and their
// relative abundance
std::string sorted_line = line;
std::sort(sorted_line.begin(), sorted_line.end());
std::string unique_line = sorted_line;
unique_line.erase(std::unique(unique_line.begin(), unique_line.end()), unique_line.end());
// closer rejects
if(unique_line.find_first_not_of(unique_word) != std::string::npos)
continue;
if(perfect_anagram && sorted_word != sorted_line)
continue;
// final check if candidate line from the dictionary
// contains only the letters (in the right quantity)
// needed to be an anagram
bool match = true;
for(auto c: unique_line)
{
auto n1 = std::count(sorted_word.begin(), sorted_word.end(), c);
auto n2 = std::count(sorted_line.begin(), sorted_line.end(), c);
if(n1 < n2)
{
match = false;
break;
}
}
if(!match)
continue;
// we found a good one
found.insert(std::move(line));
}
con("Found: " << found.size() << " word" << (found.size() == 1?"":"s"));
for(auto&& word: found)
con(word);
}
Explanation:
This algorithm works by concentrating on known good patterns (dictionary words) rather than the vast number of bad patterns generated by the permutation solution.
So it trundles through the dictionary looking for words to match the search term. It successively discounts the words based on tests that increase in accuracy as the more obvious words are discounted.
The crux logic used is to search each surviving dictionary word to ensure it contains every letter from the search term. This is achieved by finding a string that contains exactly one of each of the letters from the search term and the dictionary word. It uses std::unique to produce that string. If it survives this test then it goes on to check that the number of each letter in the dictionary word is reflected in the search term. This uses std::count().
A perfect_anagram is detected only if all the letters match in the dictionary word and the search term. Otherwise it is sufficient that the search term contains at least enough of the correct letters.

Sort words based on custom ordering [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question appears to be off-topic because it lacks sufficient information to diagnose the problem. Describe your problem in more detail or include a minimal example in the question itself.
Closed 9 years ago.
Improve this question
User sets his own ordering, for example:
String s = "bawfedghijklmnopqrstuvcxyz"
And than he enter some words, like:
"aa", "bb","cc","dd"
Now I have to print the letters in the sorted form.
The output should be:
bb, aa, dd, cc
I don't have any clue how to proceed, can anyone help me out the with the way to proceed? Code is not required.
A simple answer:
First, recode them into regular sorting order. Eg. In your case, replace all "b" with "A", "a" with "B" and so on.
Sort it.
Decode according to your mapping. Eg. Replace all "A" with "b" etc
Each letter x has some index k[x] in your string s e.g. b has index 0,
a has index 1, w has index 2 and so on (assuming the string s is 0-based).
So you need to sort your words based on the letter indexes as defined by s and
not based on their 'normal'/'natural' indexes (where a would be 0, and b would be 1,
c would be 2 and so on). So for example based on that ordering defined by s you
have that: b < a (as b is mapped to 0 and a is mapped to 1).
That's all this task asks you to do.
To start, take any sorting algorithm (for words), and implement it literally.
Then sorting algorithms have usually a point whey they compare two chars, there
you have to consult the char indexes (as defined by s) and compare based on them.
That's the only change you need to make in the original implementation.
by C
#include <string.h>
#include <ctype.h>
int strcmp_custom(const char *s1, const char *s2){
static const char *table="bawfedghijklmnopqrstuvcxyz";
for ( ; *s1 == *s2; s1++, s2++)
if (*s1 == '\0')
return 0;
if(islower(*s1) && islower(*s2))
return strchr(table, *s1) < strchr(table, *s2) ? -1 : 1;
else
return *(unsigned char *)s1 < *(unsigned char *)s2 ? -1 : 1;
}
int cmp(const void *a, const void *b){
return strcmp_custom(*(const char **)a, *(const char **)b);
}
Just I have tried the following ways ...
import java.util.Arrays;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
public class Test {
static Map<String, Integer> map = new HashMap<String, Integer>();
public static int compare(String one, String two) {
int len1 = one.length();
int len2 = two.length();
int n = Math.min(len1, len2);
char v1[] = one.toCharArray();
char v2[] = two.toCharArray();
int i = 0;
int j = 0;
if (i == j) {
int k = i;
int lim = n + i;
while (k < lim) {
char c1 = v1[k];
char c2 = v2[k];
if (c1 != c2) {
return map.get(String.valueOf(c1)) - map.get(String.valueOf(c2));
}
k++;
}
}
return len1 - len2;
}
public static void main(String[] args) {
String FORMAT ="bawfedghijklmnopqrstuvcxyz";
char[] charString = FORMAT.toCharArray();
for(int i=0; i<charString.length; i++){
map.put(String.valueOf(charString[i]), i);
}
List<String> list = Arrays.asList("bw", "bb", "bd", "ba" ); // Input Strings
for(int j=0; j<list.size(); j++){
for(int k=j; k<list.size(); k++){
if(compare(list.get(j),list.get(k)) > 0){
String temp = list.get(j);
list.set(j, list.get(k));
list.set(k, temp);
}
}
}
System.out.println(list);
}
}
Note : If you want add capital letters, numbers and special characters in your input strings You have to add all the characters in FORMAT strings. I hope I will help you some what to go to next level ...

print all possible strings of length p that can be formed from the given set [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions must demonstrate a minimal understanding of the problem being solved. Tell us what you've tried to do, why it didn't work, and how it should work. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
Given a set of characters and a positive integer p, I have to print all possible strings of length p that can be formed from the given set.
for eg: if the set is {a,b}
and the value of p is 2
Output is: aa,ab,ba,bb
I know that for a given set of size n, there will be np possible strings of length p.
What is the best method that can be used to print all the possible strings.? I just want an approach to solve.
I'm using C.
A possible approach could be to start from an empty string and add characters one by one to it using a recursive function and printing it.
Here is my code:
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
void print_string(char str[],char new_str[],int current_len,int n,int len)
{
/*
str=orignal set,
new_str=empty char array,
current_len=0(Intially)
n=no of elements to be used
len=the value of p given*/
if(current_len==len)//print string when length is equal to p
{
printf("%s\n",new_str);
return;
}
else
{
int i;
for(i=0;i<n;i++)
{
new_str[current_len]=str[i];
print_string(str,new_str,current_len+1,n,len);
}
}
}
int main()
{
char set[]={'a','b'};
char arr[10]="";
print_string(set,arr,0,2,2);
return 0;
}
output:
aa
ab
ba
bb
You may use a vector, let's call it : string [ p ].
If p is for eg. 7, you will have :
string = [ 0, 0, 0, 0, 0, 0, 0].
The index 0, is for the first char, index 1 for the second and so on until N.
for string : "smthing" , you will have : 0 - s , 1 - m, 2-t, 3-h, 4-i, 5-n, 6-g.
You may use a : while ( all elements in string != 'n' ) {
for the initial string ( string[p]={0} ) you will have : "sssssss" , the first string we built till yes.
you will always add +1 at index each loop and if index = n, you will reset it, like this [0 0 9] -> [0 1 0] if n=9 for exemple.
..and you will have all the posible combination by interpreting the index like i described;
}
You want to list your strings in lexicographical order. Fastest way (and minimal memory usage) is to implement a function to compute the next string to a given one. Here is some temptative code:
char first_char='a';
int n_chars = 2;
int p=2;
char result[100];
int i,j;
/* fill-in first string */
for(i=0;i<p;++i) result[i]=first_char;
result[i]=0; /* string terminator */
printf("%s\n",result); /* print first string */
while(1) {
/* find last character of result which can be incremented
for (j=p-1;j>=0 && result[j]!=first_char + n_chars -1;j--);
if (j<0) break; /* this was the last string */
result[j]++; /* increment j-th character
for(j++;j<p;++j) result[j]=first_char; /* reset following chars */
/* print current string */
printf("%s\n",result);
}

Split string into words

I am looking for the most efficient algorithm to form all possible combinations of words from a string. For example:
Input String: forevercarrot
Output:
forever carrot
forever car rot
for ever carrot
for ever car rot
(All words should be from a dictionary).
I can think of a brute force approach. (find all possible substrings and match) but what would be better ways?
Use a prefix tree for your list of known words. Probably libs like myspell already do so. Try using a ready-made one.
Once you found a match (e.g. 'car'), split your computation: one branch starts to look for a new word ('rot'), another continues to explore variants of current beginning ('carrot').
Effectively you maintain a queue of pairs (start_position, current_position) of offsets into your string every time you split the computation. Several threads can pop from this queue in parallel and try to continue a word that starts from start_position and is already known up to current_position of the pair, but does not end there. When a word is found, it is reported and another pair is popped from the queue. When it's impossible, no result is generated. When a split occurs, a new pair is added to the end of the queue. Initially the queue contains a (0,0).
See this question which has even better answers. It's a standard dynamic programming problem:
How to split a string into words. Ex: "stringintowords" -> "String Into Words"?
A psuedocode implementation, exploiting the fact that every part of the string needs to be a word, we can't skip anything. We work forward from the start of the string until the first bit is a word, and then generate all possible combinations of the rest of the string. Once we've done that, we keep going along until we find any other possibilities for the first word, and so on.
allPossibleWords(string s, int startPosition) {
list ret
for i in startPosition..s'length
if isWord(s[startPosition, i])
ret += s[startPostion, i] * allPossibleWords(s, i)
return ret
}
The bugbear in this code is that you'll end up repeating calculations - in your example, you'll end up having to calculate allPossibleWords("carrot") twice - once in ["forever", allPossibleWords["carrot"]] and once in ["for", "ever", allPossibleWords["carrot"]]. So memoizing this is something to consider.
Input String: forevercarrot
Output:
forever carrot
forever car rot
for ever carrot
for ever car rot
program :
#include<iostream>
#include<string>
#include<vector>
#include<string.h>
void strsplit(std::string str)
{
int len=0,i,x,y,j,k;
len = str.size();
std::string s1,s2,s3,s4,s5,s6,s7;
char *c = new char[len+1]();
char *b = new char[len+1]();
char *d = new char[len+1]();
for(i =0 ;i< len-1;i++)
{
std::cout<<"\n";
for(j=0;j<=i;j++)
{
c[j] = str[j];
b[j] = str[j];
s3 += c[j];
y = j+1;
}
for( int h=i+1;h<len;h++){
s5 += str[h];
}
s6 = s3+" "+s5;
std::cout<<" "<<s6<<"\n";
s5 = "";
for(k = y;k<len-1;k++)
{
d[k] = str[k];
s1 += d[k];
s1 += " ";
for(int l = k+1;l<len;l++){
b[l] = str[l];
s2 += b[l];
}
s4 = s3+" "+s1+s2;
s7 = s4;
std::cout<<" "<<s4<<"\n";
s3 = "";s4 = "";
}
s1 = "";s3 = "";
}
}
int main(int argc, char* argv[])
{
std::string str;
if(argc < 2)
std::cout<<"Usage: "<<argv[0]<<" <InputString> "<<"\n";
else{
str = argv[1];
strsplit(str);
}
return 0;
}

Resources