Ignoring commas and dots using fscanf() - arguments

I'm trying to read strings without commas and dots using fscanf().
Example input:
"Mr. and Mrs. Dursley, of number four, Privet Drive, were proud to say that they were perfectly normal, thank you very much."
I want to read "Mr", "Mrs" and "Dursley" each time e.g
I tried several ways to do this using optional arguments, but I failed. How can I ignore the commas and dots using fscanf()?

You can use the regex feature of sscanf to do this.
#include <stdio.h>
int main()
{
char *str = "Mr. Fiddle Tim went to the mall. Mr. Kurdt was there. Mrs. Love was there also. "
"They said hi and ate ice cream together." ;
char res[800] = { 0 };
sscanf( str, "%800[^.,]", res ) ;
puts( str ) ;
}
To keep going, you will need to use the return value of sscanf (or fscanf) to identify how many characters were matched. I leave that to you.

Related

Need explanation of the short Kotlin solution for strings in Codewars

I got a task on code wars.
The task is
In this simple Kata your task is to create a function that turns a string into a Mexican Wave. You will be passed a string and you must return that string in an array where an uppercase letter is a person standing up.
Rules are
The input string will always be lower case but maybe empty.
If the character in the string is whitespace then pass over it as if it was an empty seat
Example
wave("hello") => []string{"Hello", "hEllo", "heLlo", "helLo", "hellO"}
So I have found the solution but I want to understand the logic of it. Since its so minimalistic and looks cool but I don't understand what happens there. So the solution is
fun wave(str: String) = str.indices.map { str.take(it) + str.drop(it).capitalize() }.filter { it != str }
Could you please explain?
str.indices just returns the valid indices of the string. This means the numbers from 0 to and including str.length - 1 - a total of str.length numbers.
Then, these numbers are mapped (in other words, transformed) into strings. We will now refer to each of these numbers as "it", as that is what it refers to in the map lambda.
Here's how we do the transformation: we first take the first it characters of str, then combine that with the last str.length - it characters of str, but with the first of those characters capitalized. How do we get the last str.length - it characters? We drop the first it characters.
Here's an example for when str is "hello", illustrated in a table:
it
str.take(it)
str.drop(it)
str.drop(it).capitalize()
Combined
0
hello
Hello
Hello
1
h
ello
Ello
hEllo
2
he
llo
Llo
heLLo
3
hel
lo
Lo
helLo
4
hell
o
O
hellO
Lastly, the solution also filters out transformed strings that are the same as str. This is to handle Rule #2. Transformed strings can only be the same as str if the capitalised character is a whitespace (because capitalising a whitespace character doesn't change it).
Side note: capitalize is deprecated. For other ways to capitalise the first character, see Is there a shorter replacement for Kotlin's deprecated String.capitalize() function?
Here's another way you could do it:
fun wave2(str: String) = str.mapIndexed { i, c -> str.replaceRange(i, i + 1, c.uppercase()) }
.filter { it.any(Char::isUpperCase) }
The filter on the original is way more elegant IMO, this is just as an example of how else you might check for a condition. replaceRange is a way to make a copy of a string with some of the characters changed, in this case we're just replacing the one at the current index by uppercasing what's already there. Not as clever as the original, but good to know!

How to make a syntax manipulator?

Firstly, sorry for the question, I know I've heard something that could help, but I just can't remember.
Basically I would like to create my own syntax for a programming language. For example this code:
WRITE OUT 'Hello World!'
NEW LINE
would turn into this Java code:
System.out.print("Hello World!");
System.out.println();
How could I achieve this? Is there a method?
Olá.
There are techniques and proper algorithms to do that.
Search for "compiler techniques" and "Interpreter pattern".
An initial approach could be a basic pattern interpreter.
Assuming simple sentences and only one sentence per line, you could read the input file line by line and search for defined patterns (regular expressions).
The patterns describe the structure of the commands in your invented language.
If you get a match then you do the translation.
In particular, we use the regex.h library in c to perform the regular expression search.
Of course regex is also available in java.
Ex. NEW LINE match the pattern " *NEW +LINE *"
The * means that the preceding character occurs 0 or more times.
The + means that the preceding character occurs 1 or more times.
Thus, this pattern can match the command " NEW LINE " with arbitrary spaces between the words.
Ex. WRITE OUT 'Hello World!' match the pattern "WRITE OUT '([[:print:]]*)'"
or if you want to allow spaces " *WRITE +OUT +'([[:print:]]*)' *"
[[:print:]] means: match one printable character (ex. 'a' or 'Z' or '0' or '+')
Thus, [[:print:]]* match a sequence of 0, 1 or more printable characters
If a line of your input file matched the pattern of some command then you can do the translation, but in most cases you will need to retrieve some information before,
ex. the arbitrary text after WRITE OUT. Thats why you need to put parenthesis around [[:print:]]*. That will indicate to the function that perform the search that you want retrieve that particular part of your pattern.
A nice coincidence is that I recently assisted a friend with an college project similar to the problem you want to solve: a translator from c to basic. I reused that code to make an example for you.
I tested the code and it works.
It can translate:
WRITE OUT 'some text'
WRITE OUT variable
NEW LINE
#include <stdio.h>
#include <stdlib.h>
#include <regex.h>
#include <string.h>
#define STR_SHORT 100
#define MATCHES_SIZE 10
/**************************************************************
Returns the string of a match
**************************************************************/
char * GetExp(char *Source, char *Destination, regmatch_t Matches) {
//Source The string that was searched
//Destination Will contains the matched string
//Matches One element of the vector passed to regexec
int Length = Matches.rm_eo - Matches.rm_so;
strncpy(Destination, Source+Matches.rm_so, Length);
Destination[Length]=0;
return Destination;
}
/**************************************************************
MAIN
**************************************************************/
int main(int argc, char *argv[]) {
//Usage
if (argc==1) {
printf("Usage:\n");
printf("interpreter source_file\n");
printf("\n");
printf("Implements a very basic interpreter\n");
return 0;
}
//Open the source file
FILE *SourceFile;
if ( (SourceFile=fopen(argv[1], "r"))==NULL )
return 1;
//This variable is used to get the strings that matched the pattern
//Matches[0] -> the whole string being searched
//Matches[1] -> first parenthetical
//Matches[2] -> second parenthetical
regmatch_t Matches[MATCHES_SIZE];
char MatchedStr[STR_SHORT];
//Regular expression for NEW LINE
regex_t Regex_NewLine;
regcomp(&Regex_NewLine, " *NEW +LINE *", REG_EXTENDED);
//Regular expression for WRITE OUT 'some text'
regex_t Regex_WriteOutStr;
regcomp(&Regex_WriteOutStr, " *WRITE +OUT +'([[:print:]]*)' *", REG_EXTENDED);
//Regular expresion for WRITE OUT variable
regex_t Regex_WriteOutVar;
regcomp(&Regex_WriteOutVar, " *WRITE +OUT +([_[:alpha:]][[:alnum:]]*) *", REG_EXTENDED);
//Regular expression for an empty line'
regex_t Regex_EmptyLine;
regcomp(&Regex_EmptyLine, "^([[:space:]]+)$", REG_EXTENDED);
//Now we read the file line by line
char Buffer[STR_SHORT];
while( fgets(Buffer, STR_SHORT, SourceFile)!=NULL ) {
//printf("%s", Buffer);
//Shorcut for an empty line
if ( regexec(&Regex_EmptyLine, Buffer, MATCHES_SIZE, Matches, 0)==0 ) {
printf("\n");
continue;
}
//NEW LINE
if ( regexec(&Regex_NewLine, Buffer, MATCHES_SIZE, Matches, 0)==0 ) {
printf("System.out.println();\n");
continue;
}
//WRITE OUT 'some text'
if ( regexec(&Regex_WriteOutStr, Buffer, MATCHES_SIZE, Matches, 0)==0 ) {
printf("System.out.print(\"%s\");\n", GetExp(Buffer, MatchedStr, Matches[1]));
continue;
}
//WRITE OUT variable
//Assumes variable is a string variable
if ( regexec(&Regex_WriteOutVar, Buffer, MATCHES_SIZE, Matches, 0)==0 ) {
printf("System.out.print(\"%%s\", %s);\n", GetExp(Buffer, MatchedStr, Matches[1]));
continue;
}
//Unknown command
printf("Unknown command: %s", Buffer);
}
return 0;
}
Proper solution for this question requires the following steps:
Parse the original syntax code and create a syntax tree.
That is commonly done with tools like ANTLR.
Go through the syntax tree and either convert it to Java code, or to a Java syntax tree.
Both of those steps have their own complexity, so it would be better to ask separate questions about specific issues you encounter while implementing them.
Strictly speaking you can skip step 2 and generate Java directly when parsing, but unless your language is very simple renaming of Java concepts, you wouldn't be able to do that easily.

What is the best way to delimit a csv files thats contain commas and double quotes?

Lets say I have the following string and I want the below output without requiring csv.
this, "what I need", to, do, "i, want, this", to, work
this
what i need
to
do
i, want, this
to
work
This problem is a classic case of the technique explained in this question to "regex-match a pattern, excluding..."
We can solve it with a beautifully-simple regex:
"([^"]+)"|[^, ]+
The left side of the alternation | matches complete "quotes" and captures the contents to Group1. The right side matches characters that are neither commas nor spaces, and we know they are the right ones because they were not matched by the expression on the left.
Option 2: Allowing Multiple Words
In your input, all tokens are single words, but if you also want the regex to work for my cat scratches, "what I need", your dog barks, use this:
"([^"]+)"|[^, ]+(?:[ ]*[^, ]+)*
The only difference is the addition of (?:[ ]*[^, ]+)* which optionally adds spaces + characters, zero or more times.
This program shows how to use the regex (see the results at the bottom of the online demo):
subject = 'this, "what I need", to, do, "i, want, this", to, work'
regex = /"([^"]+)"|[^, ]+/
# put Group 1 captures in an array
mymatches = []
subject.scan(regex) {|m|
$1.nil? ? mymatches << $& : mymatches << $1
}
mymatches.each { |x| puts x }
Output
this
what I need
to
do
i, want, this
to
work
Reference
How to match (or replace) a pattern except in situations s1, s2, s3...
Article about matching a pattern unless...

How do you print a dollar sign $ in Dart

I need to actually print a Dollar sign in Dart, ahead of a variable. For example:
void main()
{
int dollars=42;
print("I have $dollars."); // I have 42.
}
I want the output to be: I have $42. How can I do this? Thanks.
Dart strings can be either raw or ... not raw (normal? cooked? interpreted? There isn't a formal name). I'll go with "interpreted" here, because it describes the problem you have.
In a raw string, "$" and "\" mean nothing special, they are just characters like any other.
In an interpreted string, "$" starts an interpolation and "\" starts an escape.
Since you want the interpolation for "$dollars", you can't use "$" literally, so you need to escape it:
int dollars = 42;
print("I have \$$dollars.");
If you don't want to use an escape, you can combine the string from raw and interpreted parts:
int dollars = 42;
print(r"I have $" "$dollars.");
Two adjacent string literals are combined into one string, even if they are different types of string.
You can use a backslash to escape:
int dollars=42;
print("I have \$$dollars."); // I have $42.
When you are using literals instead of variables you can also use raw strings:
print(r"I have $42."); // I have $42.

How to remove these kind of symbols (junk) from string?

Imagine I have String in C#: "I Don’t see ya.."
I want to remove (replace to nothing or etc.) these "’" symbols.
How do I do this?
That 'junk' looks a lot like someone interpreted UTF-8 data as ISO 8859-1 or Windows-1252, probably repeatedly.
’ is the sequence C3 A2, E2 82 AC, E2 84 A2.
UTF-8 C3 A2 = U+00E2 = â
UTF-8 E2 82 AC = U+20AC = €
UTF-8 E2 84 A2 = U+2122 = ™
We then do it again: in Windows 1252 this sequence is E2 80 99, so the character should have been U+2019, RIGHT SINGLE QUOTATION MARK (’)
You could make multiple passes with byte arrays, Encoding.UTF8 and Encoding.GetEncoding(1252) to correctly turn the junk back into what was originally entered. You will need to check your processing to find the two places that UTF-8 data was incorrectly interpreted as Windows-1252.
"I Don’t see ya..".Replace( "’", string.Empty);
How did that junk get in there the first place? That's the real question.
By removing any non-latin character you'll be intentionally breaking some internationalization support.
Don't forget the poor guy who's name has a "â" in it.
This looks disturbingly familiar to a character encoding issue dealing with the Windows character set being stored in a database using the standard character encoding. I see someone voted Will down, but he has a point. You may be solving the immediate issue, but the combinations of characters are limitless if this is the issue.
If you really have to do this, regular expressions are probably the best solution.
I would strongly recommend that you think about why you have to do this, though - at least some of the characters your listing as undesirable are perfectly valid and useful in other languages, and just filtering them out will most likely annoy at least some of your international users. As a swede, I can't emphasize enough how much I hate systems that can't handle our å, ä and ö characters correctly.
Consider Regex.Replace(your_string, regex, "") - that's what I use.
Test each character in turn to see if it is a valid alphabetic or numeric character and if not then remove it from the string. The character test is very simple, just use...
char.IsLetterOrDigit;
Please there are various others such as...
char.IsSymbol;
char.IsControl;
Regex.Replace("The string", "[^a-zA-Z ]","");
That's how you'd do it in C#, although that regular expression ([^a-zA-Z ]) should work in most languages.
[Edited: forgot the space in the regex]
The ASCII / Integer code for these characters would be out of the normal alphabetic Ranges. Seek and replace with empty characters. String has a Replace method I believe.
Either use a blacklist of stuff you do not want, or preferably a white list (set). With a white list you iterate over the string and only copy the letters that are in your white list to the result string. You said remove, and the way you do that is having two pointers one you read from (R) and one you write to (W):
I Donââ‚
W R
if comma is in your whitelist then you would in this case read the comma and write it where à is then advance both pointers. UTF-8 is a multi-byte encoding, so you advancing the pointer may not just be adding to the address.
With C an easy to way to get a white list by using one of the predefined functions (or macros): isalnum, isalpha, isascii, isblank, iscntrl, isdigit, isgraph, islower, isprint, ispunct, isspace, isupper, isxdigit. In this case you send up with a white list function instead of a set of course.
Usually when I see data like you have I look for memory corruption, or evidence to suggest that the encoding I expect is different than the one the data was entered with.
/Allan
I had the same problem with extraneous junk thrown in by adobe in an EXIF dump. I spent an hour looking for a straight answer and trying numerous half-baked suggestions which did not work here.
This thread more than most I have read was replete with deep, probing questions like 'how did it get there?', 'what if somebody has this character in their name?', 'are you sure you want to break internationalization?'.
There were some impressive displays of erudition positing how this junk could have gotten here and explaining the evolution of the various character encoding schemes. The person wanted to know how to remove it, not how it came to be or what the standards orgs are up to, interesting as this trivia may be.
I wrote a tiny program which gave me the right answer. Instead of paraphrasing the main concept, here is the entire, self-contained, working (at least on my system) program and the output I used to nuke the junk:
#!/usr/local/bin/perl -w
# This runs in a dos window and shows the char, integer and hex values
# for the weird chars. Install the HEX values in the REGEXP below until
# the final test line looks normal.
$str = 's: “Brian'; # Nuke the 3 werid chars in front of Brian.
#str = split(//, $str);
printf("len str '$str' = %d, scalar \#str = %d\n",
length $str, scalar #str);
$ii = -1;
foreach $c (#str) {
$ii++;
printf("$ii) char '$c', ord=%03d, hex='%s'\n",
ord($c), unpack("H*", $c));
}
# Take the hex characters shown above, plug them into the below regexp
# until the junk disappears!
($s2 = $str) =~ s/[\xE2\x80\x9C]//g; # << Insert HEX values HERE
print("S2=>$s2<\n"); # Final test
Result:
M:\new\6s-2014.1031-nef.halloween>nuke_junk.pl
len str 's: GÇ£Brian' = 11, scalar #str = 11
0) char 's', ord=115, hex='73'
1) char ':', ord=058, hex='3a'
2) char ' ', ord=032, hex='20'
3) char 'G', ord=226, hex='e2'
4) char 'Ç', ord=128, hex='80'
5) char '£', ord=156, hex='9c'
6) char 'B', ord=066, hex='42'
7) char 'r', ord=114, hex='72'
8) char 'i', ord=105, hex='69'
9) char 'a', ord=097, hex='61'
10) char 'n', ord=110, hex='6e'
S2=>s: Brian<
It's NORMAL!!!
One other actionable, working suggestion I ran across:
iconv -c -t ASCII < 6s-2014.1031-238246.halloween.exf.dif > exf.ascii.dif
If String having the any Junk date , This is good to way remove those junk date
string InputString = "This is grate kingdom¢Ã‚¬â";
string replace = "’";
string OutputString= Regex.Replace(InputString, replace, "");
//OutputString having the following result
It's working good to me , thanks for looking this review.

Resources