New Parser Generator

Question

frencheneesz 0 Newbie Poster

15 Years Ago

I have recently created a parser generator named the Lithium Parser Generator (LiPG). I also wrote a pretty readable documentation for it. I was wondering if anyone could give me some feedback - especially if you have experience with other parser generators like Bison/Flex, etc.

The Doc:http://www.uweb.ucsb.edu/~frencheneesz/LiPG/LiPG%20Documentation.html

Thanks

2 Contributors
15 Replies
160 Views
13 Hours Discussion Span
Latest Post 15 Years Ago Latest Post by Rashakil Fol

Rashakil Fol 978 Super Senior Demiposter

15 Years Ago

void getline(char* line)  
    {   int n=0;
        while((line[n]=getchar()) != '\n')
        { n++; }
        line[n]=0;
    }

You shouldn't use broken code examples.

parse ws
[   anychar[] Wo
    anyindex an
->  Wo   Wo[ Wo[an]==' ' || Wo[an]=='\t' || Wo[an]=='\n' ]
    [ return true; ]
]

This is _extremely_ verbose for something that just parses whitespace. I'm pretty sure that's simpler to do in other parser generators, and more readable. For example, with ANTLR (and I might be buggy here, I haven't seen ANTLR myself except on IRC channels), you would just see something like

WS : (' '|'\t'|'\f'|'\n'|'\r')+ { $channel=HIDDEN; };

For a parser combinator library like Parsec, it would just be

ws = skipMany1 (satisfy isSpace)

Rashakil Fol 978 Super Senior Demiposter

15 Years Ago

Also, I don't see the reason to have two mechanisms for concatenating parsers: successive choice blocks and wordforms. They do the same thing, so they add needless redundancy.

Your 'anychar[]' and 'anyindex' features seem unnecessary, almost silly. To say you want a string of characters that satisfy a given predicate, you already have that feature.

Rashakil Fol 978 Super Senior Demiposter

15 Years Ago

Note that I'm not saying any good things about your project not because they aren't there, but because it just doesn't suit my disposition.

Now, if you're going to continue with C++, you should spend some time learning C++. You write C++ like a C programmer. For starters, why are you using malloc? You're writing C++. Using "malloc" or "realloc" in C++ is a huge mistake. You can't just take any old array of type T and call realloc on it. Why not: Because realloc does a raw memory copy, instead of calling the objects' copy constructors. Your 'T' might be a datatype that looks like this:

class foo {
  int[10] nums;
  int* current;
 public:
  foo() { current = nums; }
  foo(const foo& other) {
    for (int i = 0; i < 10; ++i)
      nums[i] = other.nums[i];
    current = nums + (other.current - other.nums);
  }
  void add(int n) {
    if (current < nums + 10)
      *current++ = n;
  }
  void get(int i) { return nums[i]; }
};

This class's implementation depends on its location -- if you just copy its raw memory, your class will behave erratically.

The same goes for malloc: a = (T*)malloc(n*sizeof(T)); is just horrible -- your T's will never get constructed!

#define ITER1(x)		for(int n1=0; n1<x; n1++)
#define ITER2(x,y)		for(int n1=0; n1<x; n1++) \
							for(int n2=0; n2<y; n2++)

These macros are unsafe and insane.

[[Edit:
They are unsafe because ITER1(f()) will call f() each time through -- and the user has no way of realizing this. They're insane because they're insane.

A slight, but incomplete, improvement, would be to use the following: #define ITER1(x) for(int n1 = 0, e1 = (x); n1 < e1; n1++) . There's still the problem that they introduce variables. If the user has a variable named n1 in play, or in my example, e1, they would get unexpected behavior.

It would be better to let the user supply his own iterator variable name, and the ending variable name should be something completely unpredictable. But using EACH1 is still insane.
]]

Why do you have your own BDArr and BDFifo implementations anyway? There is std::vector and std::queue for a reason...

Rashakil Fol 978 Super Senior Demiposter

15 Years Ago

How is it broken? It looks perfectly valid to me, and it works. What do you suggest I do if gets and fgets crash my program?

Your code has no way of checking if it is overflowing the buffer which has been passed into it. You shouldn't use gets for the same reason. Paste an example where fgets is behaving buggily and I can take a look.

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

frencheneesz 0 Newbie Poster · Answer 1 · 2008-12-15T06:03:19+00:00

You shouldn't use broken code examples.

How is it broken? It looks perfectly valid to me, and it works. What do you suggest I do if gets and fgets crash my program?

This is _extremely_ verbose for something that just parses whitespace.

You're right. The format is much more useful for complicated wordforms - ws takes the hit so that more complicated wordforms are easier to describe.

I was thinking it might be useful to have syntax like this:

parse ws
[-> " " || "\t" || "\t"
[ return true; ]
]

Do you think that would be worth while? In other words, do you think that is acceptable verbosity?

frencheneesz 0 Newbie Poster · Answer 2 · 2008-12-15T06:09:38+00:00

successive choice blocks and wordforms ... add needless redundancy.

Thats a very good point. The problem is that wordforms currently don't have a good way to describe having multiple choices (which is why the whitespace example is so verbose). Choice blocks are verbose however. Do you have an idea of what better syntax would look like?

you already have that feature

Yes and no. The 'anychar[]' and 'anyindex' feature is extremely powerful and can be used to put arbitrary constraints on a variable. What are you asserting that they are redundant with?

Btw, thanks for delving so deep into this - I appreciate the feedback.

Rashakil Fol 978 Super Senior Demiposter Team Colleague · Answer 3 · 2008-12-15T06:27:10+00:00

Thats a very good point. The problem is that wordforms currently don't have a good way to describe having multiple choices (which is why the whitespace example is so verbose). Choice blocks are verbose however. Do you have an idea of what better syntax would look like?

I'm not sure if you misinterpreted me. I was saying that having multiple choice blocks in a row is unnecessary -- instead of having consecutive blocks for /[0-9]{3}/, / [0-9]{3}/, and /-[0-9]{4}/, the way you have with your first telephone example, you could just have separate named parse blocks for those three things.

Rashakil Fol 978 Super Senior Demiposter Team Colleague · Answer 4 · 2008-12-15T06:31:10+00:00

Also, it seems to me like 'every' and 'tween' are rather kludgy features.

I'll post an example of what I would consider to be a more comfortable C++ parser generator at this level of scope later tonight.

frencheneesz 0 Newbie Poster · Answer 5 · 2008-12-15T06:40:19+00:00

why are you using malloc?...Why do you have your own BDArr and BDFifo implementations anyway?

I wrote those libraries i'm using as just a test. I did them in C style because I eventually want to write a programming language that is translated to C, then compiled. I'm aware of the limitations of my implimentation - and I did want to switch to using STL, but I failed to port it correctly, and didn't really feel like debugging that. The dot-h libraries it uses are not meant to be pristine - in fact I do realize they contain some pretty bad coding practice.

You shouldn't use gets for the same reason.

What should I use?

Paste an example where fgets is behaving buggily and I can take a look.

I messed around with my calculator example and found that when I added a printf statment after getting the line, my function crashes as well. I've assumed these types of crashes are the result of some kind of corruption in my compiler, I should compile the program on another system. If this program runs fine for you, then my compiler probably has some issue: http://www.uweb.ucsb.edu/~frencheneesz/junk/SICalculator.cpp
(make sure if you compile it, don't use optimizations)

Rashakil Fol 978 Super Senior Demiposter Team Colleague · Answer 6 · 2008-12-15T06:52:32+00:00

It's not your compiler, it's you. Also, that file you linked doesn't use fgets.

Instead of using gets, you should use fgets.

But really you should use std::string functions.

frencheneesz 0 Newbie Poster · Answer 7 · 2008-12-15T09:04:40+00:00

Paste an example where fgets is behaving buggily and I can take a look.

So I actually tested it on another computer - and gets does indeed crash it. This is one of those errors that make 0 sense. I just write them off as a bug in GCC.

Rashakil Fol 978 Super Senior Demiposter Team Colleague · Answer 8 · 2008-12-15T09:59:24+00:00

That's extremely dumb. Compiler errors are extremely rare, and you're not going to set one off just by using gets.

frencheneesz 0 Newbie Poster · Answer 9 · 2008-12-15T10:53:38+00:00

Come on. Please don't insult me. Look, I fixed the file up at the link I had given you - sorry I was careless about leaving gets out.

http://www.uweb.ucsb.edu/~frencheneesz/junk/SICalculator.cpp

That does use gets, and it doesn't work properly. I shouldn't have said "crash" - that was misleading. The string is not recognized by my parser for some reason, and it fails. Commenting out fgets, and using my code makes it work. I don't believe there is any logical error. This is why I assumed something was wrong with the compiler.

Instead of using gets, you should use fgets.

Why? I would be very surprised if 'gets' didn't use 'fgets' under the covers.

frencheneesz 0 Newbie Poster · Answer 10 · 2008-12-15T10:56:12+00:00

Just to clarify, no offense, but I'm not actually looking for critique of my programming style. Much of this code was slapped together - I am not attempting to push it as the best example of C++ code.

Rashakil Fol 978 Super Senior Demiposter Team Colleague · Answer 11 · 2008-12-15T10:59:30+00:00

I'll post an example of what I would consider to be a more comfortable C++ parser generator at this level of scope later tonight.

Er, tomorrow night.