Efficient Command Line Parsing - command-line-arguments

I have programmed a small Java application for fun, and it all works well. My problem is that when I was working on my method to parse the command-line parameters, I just felt that there should be a much more efficient way of doing it. My program accepts:
-l I (I is an integer for a new lower bound)
-u I (I is an integer for a new upper bound)
-? (Outputs available command-line options)
-v (Activates Verbose output for the main class)
I implemented my parseArgs method in the following way:
private static void parseArgs(String[] arguments) {
String t = "";
for (int c = 0; c < arguments.length; c++) t += arguments[c];
if (t.contains("-")) {
String[] params = t.split("-");
for (int c = 0; c < params.length; c++) params[c] = params[c].trim();
for (int c = 0; c < params.length; c++) {
if (params[c].startsWith("?") && !docOnly) {
docOnly = true;
printHelp();
}
if (params[c].startsWith("l") && startPoint == 1) {
try {
startPoint = Integer.parseInt(params[c].substring(1));
if (startPoint < 0) {
startPoint = 0;
}
} catch (NumberFormatException e) {
error = true;
System.out.println("\tParameter Error: " + params[c]);
}
}
if (params[c].startsWith("u") && endPoint == 1000) {
try {
endPoint = Integer.parseInt(params[c].substring(1));
if (endPoint < 0) {
endPoint = 1000;
}
} catch (NumberFormatException e) {
error = true;
System.out.println("\tParameter Error: " + params[c]);
}
}
if (params[c].startsWith("v") && !verbose) {
verbose = true;
}
}
} else {
error = true;
System.out.println("\tError in Parameters. Use -? for available options.");
}
}
As I said, my code all works fine and you can take a look at it if you'd like to verify that. I'm just curious how other programmers, professional or not, would tackle the situation of passing a variable number and order of parameters and having an efficient codepath that would parse them accurately. I know mine isn't the best, which is why I'm wondering. Any language will be okay for an answer, I'm just curious on procedure. I can adapt any language necessary to learn a bit more from it.
Thanks in advance.
~ David Collins

String t = "";
for (int c = 0; c < arguments.length; c++) t += arguments[c];
Use a StringBuilder instead. Or even easier, join() provided by any of a number of popular libraries.
if (t.contains("-")) {
String[] params = t.split("-");
Better:
String[] params = t.split("-");
if (params.length > 1) {
While that works for your particular case (arguments are non-negative integers), in general it will be problematic. You should look for - only at the beginning of parameters, so that someone can request a log file named my-log.txt.
startPoint = Integer.parseInt(params[c].substring(1));
if (startPoint < 0) {
startPoint = 0;
}
All - signs got eaten by split, so startPoint < 0 will never be true.
Why do you set an error flag for non-numeric data, silently ignore numbers out of range or repeated arguments?

Related

Fast String Searching Algorithm for large strings

I'm trying to implement a plagiarism detection software using pattern matching algorithms. I came across the KMP Algorithm Here and tried out the c# implementation. I can see that it's not as fast for actual documents (not strings, I uploaded two pdf documents using iText and got the implementation to check for plagiarism in these two papers. About 50 pages).
It's really slow and I have no idea how to go about this. I've looked at Boyer Moore and Rabin Karp as well.
What I am currently doing is taking each sentence in the document (split on '.') and scanning through the whole reference document (2nd document) for a match. Then taking the next sentence and so on...
I am fully aware that this could be very expensive. But I have no idea how else to implement string (pattern) matching without using this approach. It's for my final year project and I was given a topic so I HAVE to use string matching. (Not allowed to do Citation based plagiarism, Semantics or Vector Space.)
The larger the text and pattern gets, the slower the algorithm gets (extremely slow, not even reasonably slow). Is there another way to go about this that I don't know? Or are there faster algorithms for me to use with this my approach?
EDIT
My code below:`
public class MatchChecker
{
public void KMPSearch(string pattern, string text, int page)
{
if (pattern.Trim() != "")
{
int M = pattern.Length;
int N = text.Length;
// create lps[] that will hold the longest
// prefix suffix values for pattern
int[] lps = new int[M];
int j = 0; // index for pat[]
// Preprocess the pattern (calculate lps[]
// array)
computeLPSArray(pattern, M, lps);
int i = 0; //index for text[]
while (i < N)
{
if (pattern[j] == text[i])
{
j++;
i++;
}
if (j == M)
{
Console.WriteLine("Found pattern " + pattern + " at page " + page);
j = lps[j - 1];
}
//mismatch after j matches
else if (i < N && pattern[j] != text[i])
{
//Do not match lps[0..lps[j-1]] characters,
//they will match anyway
if (j != 0)
j = lps[j - 1];
else
i = i + 1;
}
}
}
}
private void computeLPSArray(string pattern, int M, int[] lps)
{
//length of the previous longest prefix suffix
int length = 0;
int i = 1;
lps[0] = 0; //lps[0]is always 0
//the loop calculates lps[i] for i = 1 to M - 1
while (i < M)
{
if (pattern[i] == pattern[length])
{
length++;
lps[i] = length;
i++;
}
else // (pat[i] != pat[len])
{
// This is tricky. Consider the example.
// AAACAAAA and i = 7. The idea is similar
// to search step.
if (length != 0)
{
length = lps[length - 1];
// Also, note that we do not increment
// i here
}
else // if (len == 0)
{
lps[i] = length;
i++;
}
}
}
}
public string ReadDocPDF()
{
List<string> pages = new List<string>();
PdfReader reader2 = new PdfReader(#"C:\Users\obn\Desktop\project\chapter1.pdf");
string strText = string.Empty;
for (int page = 1; page <= reader2.NumberOfPages; page++)
{
ITextExtractionStrategy its = new SimpleTextExtractionStrategy();
PdfReader reader = new PdfReader(#"C:\Users\obn\Desktop\project\chapter1.pdf");
String s = PdfTextExtractor.GetTextFromPage(reader, page, its);
s = Regex.Replace(Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(s))).Replace(",", ""), "[0-9]", "").ToLower();
pages.Add(s);
strText = strText + s;
reader.Close();
}
return strText;
}
public void CheckForPlag()
{
string document = ReadDocPDF().Trim();
string[] sentences = document.Split(new string[] { "\n", "\t\n\r", ". ", "." }, StringSplitOptions.RemoveEmptyEntries);
foreach(string sentence in sentences) {
PdfReader reader2 = new PdfReader(#"C:\Users\obn\Documents\visual studio 2015\Projects\PlagDetector\PlagDetector\bin\Debug\test3.pdf");
for (int page = 1; page <= reader2.NumberOfPages; page++)
{
ITextExtractionStrategy its = new SimpleTextExtractionStrategy();
PdfReader reader = new PdfReader(#"C:\Users\obn\Documents\visual studio 2015\Projects\PlagDetector\PlagDetector\bin\Debug\test3.pdf");
String s = PdfTextExtractor.GetTextFromPage(reader, page, its);
s = Regex.Replace(Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(s))).Trim().Replace(".","").Replace(",","").Replace("\n", ""), "[0-9]", "").ToLower();
KMPSearch(sentence, s, page);
reader.Close();
}
}
}
}`
Your algorithm is doing lot of repeated searches purely a brute force. Some of the problems features can be considered to optimize it.
How do you define 'plagiarism'? content-1 and content-2 are nearly similar. Let us say >80% are same. i.e content-1 is taken 20% is changed to produce content-2.
Now, Let us try to solve: what will be cost (no.of changes) to convert content-1 to content-2?
This is a well know problem in DP(dynamic programming world) as EDIT Distance problem. The standard problem talks about strings distance, but you can easily modify it for words instead of chars.
Now, the above problem will give you least no.of changes for conversion of content-1 to content-2.
With the total length of content-1, we can easily calculate the % of changes to go to content-2 from content-1. If it below a fixed threshold (say 20%) then declare the plagiarism.

/*undefined sequence*/ in sliced code from Frama-C

I am trying to slice code using Frama-C.
The source code is
static uint8_T ALARM_checkOverInfusionFlowRate(void)
{
uint8_T ov;
ov = 0U;
if (ALARM_Functional_B.In_Therapy) {
if (ALARM_Functional_B.Flow_Rate > ALARM_Functional_B.Flow_Rate_High) {
ov = 1U;
} else if (ALARM_Functional_B.Flow_Rate >
ALARM_Functional_B.Commanded_Flow_Rate * div_s32
(ALARM_Functional_B.Tolerance_Max, 100) +
ALARM_Functional_B.Commanded_Flow_Rate) {
ov = 1U;
} else {
if (ALARM_Functional_B.Flow_Rate > ALARM_Functional_B.Commanded_Flow_Rate * div_s32(ALARM_Functional_B.Tolerance_Min, 100) + ALARM_Functional_B.Commanded_Flow_Rate) {
ov = 2U;
}
}
}
return ov;
}
When I sliced the code usig Frama-C, I get the following. I don't know what this “undefined sequence” means.
static uint8_T ALARM_checkOverInfusionFlowRate(void)
{
uint8_T ov;
ov = 0U;
if (ALARM_Functional_B.In_Therapy)
if ((int)ALARM_Functional_B.Flow_Rate > (int)ALARM_Functional_B.Flow_Rate_High)
ov = 1U;
else {
int32_T tmp_0;
{
/*undefined sequence*/
tmp_0 = div_s32((int)ALARM_Functional_B.Tolerance_Max,100);
}
if ((int)ALARM_Functional_B.Flow_Rate > (int)ALARM_Functional_B.Commanded_Flow_Rate * tmp_0 + (int)ALARM_Functional_B.Commanded_Flow_Rate)
ov = 1U;
else {
int32_T tmp;
{
/*undefined sequence*/
tmp = div_s32((int)ALARM_Functional_B.Tolerance_Min,100);
}
if ((int)ALARM_Functional_B.Flow_Rate > (int)ALARM_Functional_B.Commanded_Flow_Rate * tmp + (int)ALARM_Functional_B.Commanded_Flow_Rate)
ov = 2U;
}
}
return ov;
}
Appreciate any help in explaining why this happens.
/* undefined sequence */ in a block simply means that the block has been generated during the code normalization at parsing time but that with respect to C semantics there is no sequence point between the statements composing it. For instance x++ + x++ will be normalized as
{
/*undefined sequence*/
tmp = x;
x ++;
tmp_0 = x;
x ++;
;
}
Internally, each statement in such a sequence is decorated with lists of locations that are accessed for writing or reading (use -kernel-debug 1 with -print to see them in the output). Option -unspecified-access used together with -val will check that such accesses are correct, i.e. that there is at most one statement inside the sequence that write to a given location and if this is the case, that there is no read access to it (except for building the value it is assigned to). In addition, this option does not take care of side-effects occurring in a function call inside the sequence. There is a special plug-in for that, but it has not been released yet.
Finally note that since Frama-C Neon, the comment reads only /*sequence*/, which seems to be less daunting for the user. Indeed, the original code may be correct or may show undefined behavior, but syntactic analysis is too weak to decide in the general case. For instance, (*p)++ + (*q)++ is correct as long as p and q do not overlap. This is why the normalization phase only points out the sequences and leaves it up to more powerful analysis plug-ins to check whether there might be an issue.

search tree in scala

I'm trying to put my first steps into Scala, and to practice I took a look at the google code jam storecredit excersize. I tried it in java first, which went well enough, and now I'm trying to port it to Scala. Now with the java collections framework, I could try to do a straight syntax conversion, but I'd end up writing java in scala, and that kind of defeats the purpose. In my Java implementation, I have a PriorityQueue that I empty into a Deque, and pop the ends off untill we have bingo. This all uses mutable collections, which give me the feeling is very 'un-scala'. What I think would be a more functional approach is to construct a datastructure that can be traversed both from highest to lowest, and from lowest to highest. Am I on the right path? Are there any suitable datastructures supplied in the Scala libraries, or should I roll my own here?
EDIT: full code of the much simpler version in Java. It should run in O(max(credit,inputchars)) and has become:
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.Arrays;
public class StoreCredit {
private static BufferedReader in;
public static void main(String[] args) {
in = new BufferedReader(new InputStreamReader(System.in));
try {
int numCases = Integer.parseInt(in.readLine());
for (int i = 0; i < numCases; i++) {
solveCase(i);
}
} catch (IOException e) {
e.printStackTrace();
}
}
private static void solveCase(int casenum) throws NumberFormatException,
IOException {
int credit = Integer.parseInt(in.readLine());
int numItems = Integer.parseInt(in.readLine());
int itemnumber = 0;
int[] item_numbers_by_price = new int[credit];
Arrays.fill(item_numbers_by_price, -1); // makes this O(max(credit,
// items)) instead of O(items)
int[] read_prices = readItems();
while (itemnumber < numItems) {
int next_price = read_prices[itemnumber];
if (next_price <= credit) {
if (item_numbers_by_price[credit - next_price] >= 0) {
// Bingo! DinoDNA!
printResult(new int[] {
item_numbers_by_price[credit - next_price],
itemnumber }, casenum);
break;
}
item_numbers_by_price[next_price] = itemnumber;
}
itemnumber++;
}
}
private static int[] readItems() throws IOException {
String line = in.readLine();
String[] items = line.split(" "); // uh-oh, now it's O(max(credit,
// inputchars))
int[] result = new int[items.length];
for (int i = 0; i < items.length; i++) {
result[i] = Integer.parseInt(items[i]);
}
return result;
}
private static void printResult(int[] result, int casenum) {
int one;
int two;
if (result[0] > result[1]) {
one = result[1];
two = result[0];
} else {
one = result[0];
two = result[1];
}
one++;
two++;
System.out.println(String.format("Case #%d: %d %d", casenum + 1, one,
two));
}
}
I'm wondering what you are trying to accomplish using sophisticated data structures such as PriorityQueue and Deque for a problem such as this. It can be solved with a pair of nested loops:
for {
i <- 2 to I
j <- 1 until i
if i != j && P(i-1) + P(j - 1) == C
} println("Case #%d: %d %d" format (n, j, i))
Worse than linear, better than quadratic. Since the items are not sorted, and sorting them would require O(nlogn), you can't do much better than this -- as far as I can see.
Actually, having said all that, I now have figured a way to do it in linear time. The trick is that, for every number p you find, you know what its complement is: C - p. I expect there are a few ways to explore that -- I have so far thought of two.
One way is to build a map with O(n) characteristics, such as a bitmap or a hash map. For each element, make it point to its index. One then only has to find an element for which its complement also has an entry in the map. Trivially, this could be as easily as this:
val PM = P.zipWithIndex.toMap
val (p, i) = PM find { case (p, i) => PM isDefinedAt C - p }
val j = PM(C - p)
However, that won't work if the number is equal to its complement. In other words, if there are two p such that p + p == C. There are quite a few such cases in the examples. One could then test for that condition, and then just use indexOf and lastIndexOf -- except that it is possible that there is only one p such that p + p == C, in which case that wouldn't be the answer either.
So I ended with something more complex, that tests the existence of the complement at the same time the map is being built. Here's the full solution:
import scala.io.Source
object StoreCredit3 extends App {
val source = if (args.size > 0) Source fromFile args(0) else Source.stdin
val input = source getLines ()
val N = input.next.toInt
1 to N foreach { n =>
val C = input.next.toInt
val I = input.next.toInt
val Ps = input.next split ' ' map (_.toInt)
val (_, Some((p1, p2))) = Ps.zipWithIndex.foldLeft((Map[Int, Int](), None: Option[(Int, Int)])) {
case ((map, None), (p, i)) =>
if (map isDefinedAt C - p) map -> Some(map(C - p) -> (i + 1))
else (map updated (p, i + 1), None)
case (answer, _) => answer
}
println("Case #%d: %d %d" format (n, p1, p2))
}
}

Finding shortest repeating cycle in word?

I'm about to write a function which, would return me a shortest period of group of letters which would eventually create the given word.
For example word abkebabkebabkeb is created by repeated abkeb word. I would like to know, how efficiently analyze input word, to get the shortest period of characters creating input word.
Here is a correct O(n) algorithm. The first for loop is the table building portion of KMP. There are various proofs that it always runs in linear time.
Since this question has 4 previous answers, none of which are O(n) and correct, I heavily tested this solution for both correctness and runtime.
def pattern(inputv):
if not inputv:
return inputv
nxt = [0]*len(inputv)
for i in range(1, len(nxt)):
k = nxt[i - 1]
while True:
if inputv[i] == inputv[k]:
nxt[i] = k + 1
break
elif k == 0:
nxt[i] = 0
break
else:
k = nxt[k - 1]
smallPieceLen = len(inputv) - nxt[-1]
if len(inputv) % smallPieceLen != 0:
return inputv
return inputv[0:smallPieceLen]
O(n) solution. Assumes that the entire string must be covered. The key observation is that we generate the pattern and test it, but if we find something along the way that doesn't match, we must include the entire string that we already tested, so we don't have to reobserve those characters.
def pattern(inputv):
pattern_end =0
for j in range(pattern_end+1,len(inputv)):
pattern_dex = j%(pattern_end+1)
if(inputv[pattern_dex] != inputv[j]):
pattern_end = j;
continue
if(j == len(inputv)-1):
print pattern_end
return inputv[0:pattern_end+1];
return inputv;
This is an example for PHP:
<?php
function getrepeatedstring($string) {
if (strlen($string)<2) return $string;
for($i = 1; $i<strlen($string); $i++) {
if (substr(str_repeat(substr($string, 0, $i),strlen($string)/$i+1), 0, strlen($string))==$string)
return substr($string, 0, $i);
}
return $string;
}
?>
Most easiest one in python:
def pattern(self, s):
ans=(s+s).find(s,1,-1)
return len(pat) if ans == -1 else ans
I believe there is a very elegant recursive solution. Many of the proposed solutions solve the extra complexity where the string ends with part of the pattern, like abcabca. But I do not think that is asked for.
My solution for the simple version of the problem in clojure:
(defn find-shortest-repeating [pattern string]
(if (empty? (str/replace string pattern ""))
pattern
(find-shortest-repeating (str pattern (nth string (count pattern))) string)))
(find-shortest-repeating "" "abcabcabc") ;; "abc"
But be aware that this will not find patterns that are uncomplete at the end.
I found a solution based on your post, that could take an incomplete pattern:
(defn find-shortest-repeating [pattern string]
(if (or (empty? (clojure.string/split string (re-pattern pattern)))
(empty? (second (clojure.string/split string (re-pattern pattern)))))
pattern
(find-shortest-repeating (str pattern (nth string (count pattern))) string)))
My Solution:
The idea is to find a substring from the position zero such that it becomes equal to the adjacent substring of same length, when such a substring is found return the substring. Please note if no repeating substring is found I am printing the entire input String.
public static void repeatingSubstring(String input){
for(int i=0;i<input.length();i++){
if(i==input.length()-1){
System.out.println("There is no repetition "+input);
}
else if(input.length()%(i+1)==0){
int size = i+1;
if(input.substring(0, i+1).equals(input.substring(i+1, i+1+size))){
System.out.println("The subString which repeats itself is "+input.substring(0, i+1));
break;
}
}
}
}
This is a solution I came up with using the queue, it passed all the test cases of a similar problem in codeforces. Problem No is 745A.
#include<bits/stdc++.h>
using namespace std;
typedef long long ll;
int main()
{
ios_base::sync_with_stdio(false);
cin.tie(NULL);
string s, s1, s2; cin >> s; queue<char> qu; qu.push(s[0]); bool flag = true; int ind = -1;
s1 = s.substr(0, s.size() / 2);
s2 = s.substr(s.size() / 2);
if(s1 == s2)
{
for(int i=0; i<s1.size(); i++)
{
s += s1[i];
}
}
//cout << s1 << " " << s2 << " " << s << "\n";
for(int i=1; i<s.size(); i++)
{
if(qu.front() == s[i]) {qu.pop();}
qu.push(s[i]);
}
int cycle = qu.size();
/*queue<char> qu2 = qu; string str = "";
while(!qu2.empty())
{
cout << qu2.front() << " ";
str += qu2.front();
qu2.pop();
}*/
while(!qu.empty())
{
if(s[++ind] != qu.front()) {flag = false; break;}
qu.pop();
}
flag == true ? cout << cycle : cout << s.size();
return 0;
}
Simpler answer which I can come up in an interview is just a O(n^2) solution, which tries out all combinations of substring starting from 0.
int findSmallestUnit(string str){
for(int i=1;i<str.length();i++){
int j=0;
for(;j<str.length();j++){
if(str[j%i] != str[j]){
break;
}
}
if(j==str.length()) return str.substr(0,i);
}
return str;
}
Now if someone is interested in O(n) solution to this problem in c++:
int findSmallestUnit(string str){
vector<int> lps(str.length(),0);
int i=1;
int len=0;
while(i<str.length()){
if(str[i] == str[len]){
len++;
lps[i] = len;
i++;
}
else{
if(len == 0) i++;
else{
len = lps[len-1];
}
}
}
int n=str.length();
int x = lps[n-1];
if(n%(n-x) == 0){
return str.substr(0,n-x);
}
return str;
}
The above is just #Buge's answer in c++, since someone asked in comments.
Regex solution:
Use the following regex replacement to find the shortest repeating substring, and only keeping that substring:
^(.+?)\1*$
$1
Explanation:
^(.+?)\1*$
^ $ # Start and end, to match the entire input-string
( ) # Capture group 1:
.+ # One or more characters,
? # with a reluctant instead of greedy match†
\1* # Followed by the first capture group repeated zero or more times
$1 # Replace the entire input-string with the first capture group match,
# removing all other duplicated substrings
† Greedy vs reluctant would in this case mean: greedy = consumes as many characters as it can; reluctant = consumes as few characters as it can. Since we want the shortest repeating substring, we would want a reluctant match in our regex.
Example input: "abkebabkebabkeb"
Example output: "abkeb"
Try it online in Retina.
Here an example implementation in Java.
Super delayed answer, but I got the question in an interview, here was my answer (probably not the most optimal but it works for strange test cases as well).
private void run(String[] args) throws IOException {
File file = new File(args[0]);
BufferedReader buffer = new BufferedReader(new FileReader(file));
String line;
while ((line = buffer.readLine()) != null) {
ArrayList<String> subs = new ArrayList<>();
String t = line.trim();
String out = null;
for (int i = 0; i < t.length(); i++) {
if (t.substring(0, t.length() - (i + 1)).equals(t.substring(i + 1, t.length()))) {
subs.add(t.substring(0, t.length() - (i + 1)));
}
}
subs.add(0, t);
for (int j = subs.size() - 2; j >= 0; j--) {
String match = subs.get(j);
int mLength = match.length();
if (j != 0 && mLength <= t.length() / 2) {
if (t.substring(mLength, mLength * 2).equals(match)) {
out = match;
break;
}
} else {
out = match;
}
}
System.out.println(out);
}
}
Testcases:
abcabcabcabc
bcbcbcbcbcbcbcbcbcbcbcbcbcbc
dddddddddddddddddddd
adcdefg
bcbdbcbcbdbc
hellohell
Code returns:
abc
bc
d
adcdefg
bcbdbc
hellohell
Works in cases such as bcbdbcbcbdbc.
function smallestRepeatingString(sequence){
var currentRepeat = '';
var currentRepeatPos = 0;
for(var i=0, ii=sequence.length; i<ii; i++){
if(currentRepeat[currentRepeatPos] !== sequence[i]){
currentRepeatPos = 0;
// Add next character available to the repeat and reset i so we don't miss any matches inbetween
currentRepeat = currentRepeat + sequence.slice(currentRepeat.length, currentRepeat.length+1);
i = currentRepeat.length-1;
}else{
currentRepeatPos++;
}
if(currentRepeatPos === currentRepeat.length){
currentRepeatPos = 0;
}
}
// If repeat wasn't reset then we didn't find a full repeat at the end.
if(currentRepeatPos !== 0){ return sequence; }
return currentRepeat;
}
I came up with a simple solution that works flawlessly even with very large strings.
PHP Implementation:
function get_srs($s){
$hash = md5( $s );
$i = 0; $p = '';
do {
$p .= $s[$i++];
preg_match_all( "/{$p}/", $s, $m );
} while ( ! hash_equals( $hash, md5( implode( '', $m[0] ) ) ) );
return $p;
}

How to find validity of a string of parentheses, curly brackets and square brackets?

I recently came in contact with this interesting problem. You are given a string containing just the characters '(', ')', '{', '}', '[' and ']', for example, "[{()}]", you need to write a function which will check validity of such an input string, function may be like this:
bool isValid(char* s);
these brackets have to close in the correct order, for example "()" and "()[]{}" are all valid but "(]", "([)]" and "{{{{" are not!
I came out with following O(n) time and O(n) space complexity solution, which works fine:
Maintain a stack of characters.
Whenever you find opening braces '(', '{' OR '[' push it on the stack.
Whenever you find closing braces ')', '}' OR ']' , check if top of stack is corresponding opening bracket, if yes, then pop the stack, else break the loop and return false.
Repeat steps 2 - 3 until end of the string.
This works, but can we optimize it for space, may be constant extra space, I understand that time complexity cannot be less than O(n) as we have to look at every character.
So my question is can we solve this problem in O(1) space?
With reference to the excellent answer from Matthieu M., here is an implementation in C# that seems to work beautifully.
/// <summary>
/// Checks to see if brackets are well formed.
/// Passes "Valid parentheses" challenge on www.codeeval.com,
/// which is a programming challenge site much like www.projecteuler.net.
/// </summary>
/// <param name="input">Input string, consisting of nothing but various types of brackets.</param>
/// <returns>True if brackets are well formed, false if not.</returns>
static bool IsWellFormedBrackets(string input)
{
string previous = "";
while (input.Length != previous.Length)
{
previous = input;
input = input
.Replace("()", String.Empty)
.Replace("[]", String.Empty)
.Replace("{}", String.Empty);
}
return (input.Length == 0);
}
Essentially, all it does is remove pairs of brackets until there are none left to remove; if there is anything left the brackets are not well formed.
Examples of well formed brackets:
()[]
{()[]}
Example of malformed brackets:
([)]
{()[}]
Actually, there's a deterministic log-space algorithm due to Ritchie and Springsteel: http://dx.doi.org/10.1016/S0019-9958(72)90205-7 (paywalled, sorry not online). Since we need log bits to index the string, this is space-optimal.
If you're willing to accept one-sided error, then there's an algorithm that uses n polylog(n) time and polylog(n) space: http://www.eccc.uni-trier.de/report/2009/119/
If the input is read-only, I don't think we can do O(1) space. It is a well known fact that any O(1) space decidable language is regular (i.e writeable as a regular expression). The set of strings you have is not a regular language.
Of course, this is about a Turing Machine. I would expect it to be true for fixed word RAM machines too.
Edit: Although simple, this algorithm is actually O(n^2) in terms of character comparisons. To demonstrate it, one can simply generate a string as '(' * n + ')' * n.
I have a simple, though perhaps erroneous idea, that I will submit to your criticisms.
It's a destructive algorithm, which means that if you ever need the string it would not help (since you would need to copy it down).
Otherwise, the algorithm work with a simple index within the current string.
The idea is to remove pairs one after the others:
([{}()])
([()])
([])
()
empty -> OK
It is based on the simple fact that if we have matching pairs, then at least one is of the form () without any pair character in between.
Algorithm:
i := 0
Find a matching pair from i. If none is found, then the string is not valid. If one is found, let i be the index of the first character.
Remove [i:i+1] from the string
If i is at the end of the string, and the string is not empty, it's a failure.
If [i-1:i] is a matching pair, i := i-1 and back to 3.
Else, back to 1.
The algorithm is O(n) in complexity because:
each iteration of the loop removes 2 characters from the string
the step 2., which is linear, is naturally bound (i cannot grow indefinitely)
And it's O(1) in space because only the index is required.
Of course, if you can't afford to destroy the string, then you'll have to copy it, and that's O(n) in space so no real benefit there!
Unless, of course, I am deeply mistaken somewhere... and perhaps someone could use the original idea (there is a pair somewhere) to better effect.
I doubt you'll find a better solution, since even if you use internal functions to regexp or count occurrences, they still have a O(...) cost. I'd say your solution is the best :)
To optimize for space you could do some run-length encoding on your stack, but I doubt it would gain you very much, except in cases like {{{{{{{{{{}}}}}}}}}}.
http://www.sureinterview.com/shwqst/112007
It is natural to solve this problem with a stack.
If only '(' and ')' are used, the stack is not necessary. We just need to maintain a counter for the unmatched left '('. The expression is valid if the counter is always non-negative during the match and is zero at the end.
In general case, although the stack is still necessary, the depth of the stack can be reduced by using a counter for unmatched braces.
This is an working java code where I filter out the brackets from the string expression and then check the well formedness by replacing wellformed braces by nulls
Sample input = (a+{b+c}-[d-e])+[f]-[g] FilterBrackets will output = ({}[])[][] Then I check for wellformedness.
Comments welcome.
public class ParanString {
public static void main(String[] args) {
String s = FilterBrackets("(a+{b+c}-[d-e])[][]");
while ((s.length()!=0) && (s.contains("[]")||s.contains("()")||s.contains("{}")))
{
//System.out.println(s.length());
//System.out.println(s);
s = s.replace("[]", "");
s = s.replace("()", "");
s = s.replace("{}", "");
}
if(s.length()==0)
{
System.out.println("Well Formed");
}
else
{
System.out.println("Not Well Formed");
}
}
public static String FilterBrackets(String str)
{
int len=str.length();
char arr[] = str.toCharArray();
String filter = "";
for (int i = 0; i < len; i++)
{
if ((arr[i]=='(') || (arr[i]==')') || (arr[i]=='[') || (arr[i]==']') || (arr[i]=='{') || (arr[i]=='}'))
{
filter=filter+arr[i];
}
}
return filter;
}
}
The following modification of Sbusidan's answer is O(n2) time complex but O(log n) space simple.
#include <stdio.h>
#include <string.h>
#include <stdbool.h>
char opposite(char bracket) {
switch(bracket) {
case '[':
return ']';
case '(':
return ')';
}
}
bool is_balanced(int length, char *s) {
int depth, target_depth, index;
char target_bracket;
if(length % 2 != 0) {
return false;
}
for(target_depth = length/2; target_depth > 0; target_depth--) {
depth=0;
for(index = 0; index < length; index++) {
switch(s[index]) {
case '(':
case '[':
depth++;
if(depth == target_depth) target_bracket = opposite(s[index]);
break;
case ')':
case ']':
if(depth == 0) return false;
if(depth == target_depth && s[index] != target_bracket) return false;
depth--;
break;
}
}
}
}
void main(char* argv[]) {
char input[] = "([)[(])]";
char *balanced = is_balanced(strlen(input), input) ? "balanced" : "imbalanced";
printf("%s is %s.\n", input, balanced);
}
If you can overwrite the input string (not reasonable in the use cases I envision, but what the heck...) you can do it in constant space, though I believe the time requirement goes up to O(n2).
Like this:
string s = input
char c = null
int i=0
do
if s[i] isAOpenChar()
c = s[i]
else if
c = isACloseChar()
if closeMatchesOpen(s[i],c)
erase s[i]
while s[--i] != c ;
erase s[i]
c == null
i = 0; // Not optimal! It would be better to back up until you find an opening character
else
return fail
end if
while (s[++i] != EOS)
if c==null
return pass
else
return fail
The essence of this is to use the early part of the input as the stack.
I know I'm a little late to this party; it's also my very first post on StackOverflow.
But when I looked through the answers, I thought I might be able to come up with a better solution.
So my solution is to use a few pointers.
It doesn't even have to use any RAM storage, as registers can be used for this.
I have not tested the code; it's written it on the fly.
You'll need to fix my typos, and debug it, but I believe you'll get the idea.
Memory usage: Only the CPU registers in most cases.
CPU usage: It depends, but approximately twice the time it takes to read the string.
Modifies memory: No.
b: string beginning, e: string end.
l: left position, r: right position.
c: char, m: match char
if r reaches the end of the string, we have a success.
l goes backwards from r towards b.
Whenever r meets a new start kind, set l = r.
when l reaches b, we're done with the block; jump to beginning of next block.
const char *chk(const char *b, int len) /* option 2: remove int len */
{
char c, m;
const char *l, *r;
e = &b[len]; /* option 2: remove. */
l = b;
r = b;
while(r < e) /* option 2: change to while(1) */
{
c = *r++;
/* option 2: if(0 == c) break; */
if('(' == c || '{' == c || '[' == c)
{
l = r;
}
else if(')' == c || ']' == c || '}' == c)
{
/* find 'previous' starting brace */
m = 0;
while(l > b && '(' != m && '[' != m && '{' != m)
{
m = *--l;
}
/* now check if we have the correct one: */
if(((m & 1) + 1 + m) != c) /* cryptic: convert starting kind to ending kind and match with c */
{
return(r - 1); /* point to error */
}
if(l <= b) /* did we reach the beginning of this block ? */
{
b = r; /* set new beginning to 'head' */
l = b; /* obsolete: make left is in range. */
}
}
}
m = 0;
while(l > b && '(' != m && '[' != m && '{' != m)
{
m = *--l;
}
return(m ? l : NULL); /* NULL-pointer for OK */
}
After thinking about this approach for a while, I realized that it will not work as it is right now.
The problem will be that if you have "[()()]", it'll fail when reaching the ']'.
But instead of deleting the proposed solution, I'll leave it here, as it's actually not impossible to make it work, it does require some modification, though.
/**
*
* #author madhusudan
*/
public class Main {
/**
* #param args the command line arguments
*/
public static void main(String[] args) {
new Main().validateBraces("()()()()(((((())))))()()()()()()()()");
// TODO code application logic here
}
/**
* #Use this method to validate braces
*/
public void validateBraces(String teststr)
{
StringBuffer teststr1=new StringBuffer(teststr);
int ind=-1;
for(int i=0;i<teststr1.length();)
{
if(teststr1.length()<1)
break;
char ch=teststr1.charAt(0);
if(isClose(ch))
break;
else if(isOpen(ch))
{
ind=teststr1.indexOf(")", i);
if(ind==-1)
break;
teststr1=teststr1.deleteCharAt(ind).deleteCharAt(i);
}
else if(isClose(ch))
{
teststr1=deleteOpenBraces(teststr1,0,i);
}
}
if(teststr1.length()>0)
{
System.out.println("Invalid");
}else
{
System.out.println("Valid");
}
}
public boolean isOpen(char ch)
{
if("(".equals(Character.toString(ch)))
{
return true;
}else
return false;
}
public boolean isClose(char ch)
{
if(")".equals(Character.toString(ch)))
{
return true;
}else
return false;
}
public StringBuffer deleteOpenBraces(StringBuffer str,int start,int end)
{
char ar[]=str.toString().toCharArray();
for(int i=start;i<end;i++)
{
if("(".equals(ar[i]))
str=str.deleteCharAt(i).deleteCharAt(end);
break;
}
return str;
}
}
Instead of putting braces into the stack, you could use two pointers to check the characters of the string. one start from the beginning of the string and the other start from end of the string. something like
bool isValid(char* s) {
start = find_first_brace(s);
end = find_last_brace(s);
while (start <= end) {
if (!IsPair(start,end)) return false;
// move the pointer forward until reach a brace
start = find_next_brace(start);
// move the pointer backward until reach a brace
end = find_prev_brace(end);
}
return true;
}
Note that there are some corner case not handled.
I think that you can implement an O(n) algorithm. Simply you have to initialise an counter variable for each type: curly, square and normal brackets. After than you should iterate the string and should increase the coresponding counter if the bracket is opened, otherwise to decrease it. If the counter is negative return false. AfterI think that you can implement an O(n) algorithm. Simply you have to initialise an counter variable for each type: curly, square and normal brackets. After than you should iterate the string and should increase the coresponding counter if the bracket is opened, otherwise to decrease it. If the counter is negative return false. After you count all brackets, you should check if all counters are zero. In that case, the string is valid and you should return true.
You could provide the value and check if its a valid one, it would print YES otherwise it would print NO
static void Main(string[] args)
{
string value = "(((([{[(}]}]))))";
List<string> jj = new List<string>();
if (!(value.Length % 2 == 0))
{
Console.WriteLine("NO");
}
else
{
bool isValid = true;
List<string> items = new List<string>();
for (int i = 0; i < value.Length; i++)
{
string item = value.Substring(i, 1);
if (item == "(" || item == "{" || item == "[")
{
items.Add(item);
}
else
{
string openItem = items[items.Count - 1];
if (((item == ")" && openItem == "(")) || (item == "}" && openItem == "{") || (item == "]" && openItem == "["))
{
items.RemoveAt(items.Count - 1);
}
else
{
isValid = false;
break;
}
}
}
if (isValid)
{
Console.WriteLine("Yes");
}
else
{
Console.WriteLine("NO");
}
}
Console.ReadKey();
}
var verify = function(text)
{
var symbolsArray = ['[]', '()', '<>'];
var symbolReg = function(n)
{
var reg = [];
for (var i = 0; i < symbolsArray.length; i++) {
reg.push('\\' + symbolsArray[i][n]);
}
return new RegExp('(' + reg.join('|') + ')','g');
};
// openReg matches '(', '[' and '<' and return true or false
var openReg = symbolReg(0);
// closeReg matches ')', ']' and '>' and return true or false
var closeReg = symbolReg(1);
// nestTest matches openSymbol+anyChar+closeSymbol
// and returns an obj with the match str and it's start index
var nestTest = function(symbols, text)
{
var open = symbols[0]
, close = symbols[1]
, reg = new RegExp('(\\' + open + ')([\\s\\S])*(\\' + close + ')','g')
, test = reg.exec(text);
if (test) return {
start: test.index,
str: test[0]
};
else return false;
};
var recursiveCheck = function(text)
{
var i, nestTests = [], test, symbols;
// nestTest with each symbol
for (i = 0; i < symbolsArray.length; i++)
{
symbols = symbolsArray[i];
test = nestTest(symbols, text);
if (test) nestTests.push(test);
}
// sort tests by start index
nestTests.sort(function(a, b)
{
return a.start - b.start;
});
if (nestTests.length)
{
// build nest data: calculate match end index
for (i = 0; i < nestTests.length; i++)
{
test = nestTests[i];
var end = test.start + ( (test.str) ? test.str.length : 0 );
nestTests[i].end = end;
var last = (nestTests[i + 1]) ? nestTests[i + 1].index : text.length;
nestTests[i].pos = text.substring(end, last);
}
for (i = 0; i < nestTests.length; i++)
{
test = nestTests[i];
// recursive checks what's after the nest
if (test.pos.length && !recursiveCheck(test.pos)) return false;
// recursive checks what's in the nest
if (test.str.length) {
test.str = test.str.substring(1, test.str.length - 1);
return recursiveCheck(test.str);
} else return true;
}
} else {
// if no nests then check for orphan symbols
var closeTest = closeReg.test(text);
var openTest = openReg.test(text);
return !(closeTest || openTest);
}
};
return recursiveCheck(text);
};
Using c# OOPS programming... Small and simple solution
Console.WriteLine("Enter the string");
string str = Console.ReadLine();
int length = str.Length;
if (length % 2 == 0)
{
while (length > 0 && str.Length > 0)
{
for (int i = 0; i < str.Length; i++)
{
if (i + 1 < str.Length)
{
switch (str[i])
{
case '{':
if (str[i + 1] == '}')
str = str.Remove(i, 2);
break;
case '(':
if (str[i + 1] == ')')
str = str.Remove(i, 2);
break;
case '[':
if (str[i + 1] == ']')
str = str.Remove(i, 2);
break;
}
}
}
length--;
}
if(str.Length > 0)
Console.WriteLine("Invalid input");
else
Console.WriteLine("Valid input");
}
else
Console.WriteLine("Invalid input");
Console.ReadKey();
This is my solution to the problem.
O(n) is the complexity of time without complexity of space.
Code in C.
#include <stdio.h>
#include <string.h>
#include <stdbool.h>
bool checkBraket(char *s)
{
int curly = 0, rounded = 0, squre = 0;
int i = 0;
char ch = s[0];
while (ch != '\0')
{
if (ch == '{') curly++;
if (ch == '}') {
if (curly == 0) {
return false;
} else {
curly--; }
}
if (ch == '[') squre++;
if (ch == ']') {
if (squre == 0) {
return false;
} else {
squre--;
}
}
if (ch == '(') rounded++;
if (ch == ')') {
if (rounded == 0) {
return false;
} else {
rounded--;
}
}
i++;
ch = s[i];
}
if (curly == 0 && rounded == 0 && squre == 0){
return true;
}
else {
return false;
}
}
void main()
{
char mystring[] = "{{{{{[(())}}]}}}";
int answer = checkBraket(mystring);
printf("my answer is %d\n", answer);
return;
}

Resources