Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mini-LISP - unification of errors in C and JS implementations #426

Open
OfirMarkowitz1 opened this issue Jul 28, 2021 · 22 comments
Open

Mini-LISP - unification of errors in C and JS implementations #426

OfirMarkowitz1 opened this issue Jul 28, 2021 · 22 comments
Assignees

Comments

@OfirMarkowitz1
Copy link
Collaborator

No description provided.

@yossigil
Copy link
Collaborator

This means in particular, very brief, and not so informational messages; no worry, eventually someone will do the bells and whistles. Not for now please.

@OfirMarkowitz1
Copy link
Collaborator Author

OfirMarkowitz1 commented Jul 28, 2021 via email

yossigil added a commit that referenced this issue Jul 28, 2021
yossigil added a commit that referenced this issue Jul 28, 2021
@OfirMarkowitz1
Copy link
Collaborator Author

OfirMarkowitz1 commented Aug 2, 2021

Mini-LISP C project, file S.h, line 182:

inline bool die(S s) { throw BUG.cons(s); }

Shouldn't it be "s.cons(BUG)", since "BUG" is the kind? This is the convention for all other error kinds.

@OfirMarkowitz1
Copy link
Collaborator Author

OfirMarkowitz1 commented Aug 2, 2021

Mini-LISP C project, file eval.cc, lines 50-57:

S only(S s) {
    s.pair() || die(s);
    s.cdr().cdr().t() && s.error(REDUNDANT).t();
    S a = evaluate_list(s.cdr());
    a.null() && s.error(MISSING).t();
    a.atom() && s.error(INVALID).t();
    return a.car();
}

This function seems wrong to me in a few levels:

  1. s.error(...).t() - calling t() will do nothing since error throws an exception.
  2. The line s.cdr().cdr().t() && s.error(REDUNDANT).t(); makes sure that s is a list with exactly 2 elements (otherwise REDUNDANT or CDR exception is thrown). It means that a is always a list with 1 element, thus the lines:
    a.null() && s.error(MISSING).t();
    a.atom() && s.error(INVALID).t();

are redundant - a is neither an atom nor nil

@OfirMarkowitz1
Copy link
Collaborator Author

OfirMarkowitz1 commented Aug 2, 2021

Mini-LISP C project, file eval.cc, lines 59-74:

void checkNumberOfArgs(S s) {
    S f = s.car();
    if (f.eq(QUOTE) || f.eq(CAR) || f.eq(CDR) || // 1 Args:
        f.eq(ATOM) || f.eq(NULL) || f.eq(EVAL)) {
        s.more2() && s.error(REDUNDANT).t();
        s.less2() && s.error(MISSING).t();
    }
    if (f.eq(CONS) || f.eq(SET) || f.eq(EQ) || f.eq(LAMBDA) || f.eq(NLAMBDA)) { // 2 Args:
        s.more3() && s.error(REDUNDANT).t();
        s.less3() && s.error(MISSING).t();
    }
    if (f.eq(NDEFUN) || f.eq(DEFUN)) { // 3 Args:
        s.more4() && s.error(REDUNDANT).t();
        s.less4() && s.error(MISSING).t();
    }
}

This function doesn't make sure that s is an actual list, i.e. "(a b c)" and "(a . (b . (c . d)))" is exactly the same for this function. That's because the "more" and "less" methods only check after how many "cdr" calls we arrive at an atom, but fails to make sure this atom is nil.

@OfirMarkowitz1
Copy link
Collaborator Author

OfirMarkowitz1 commented Aug 2, 2021

This combined behaviors of the only(S s) and checkNumberOfArgs(S s) functions means that there's an error handling discrepancy between single arg atomics and the rest.
Mini-LISP C project, file eval.cc, lines 76-102:

S evaluate_atomic_function(S s) { M(s);
  checkNumberOfArgs(s);
  const S f = s.car();
  // Atomic functions:
  if (f.eq(CAR))     return only(s).car();
  if (f.eq(CONS))    return s.$2$().eval().cons(s.$3$().eval());
  if (f.eq(SET))     return set(s.$2$().eval(),          s.$3$().eval());
  if (f.eq(EQ))      return s.$2$().eval().eq(s.$3$().eval()) ? T : NIL;
  if (f.eq(COND))    return evaluate_cond(s.cdr());
  if (f.eq(CDR))     return only(s).cdr();
  if (f.eq(ATOM))    return only(s).atom() ? T : NIL;
  if (f.eq(EVAL))    return only(s).eval();
  if (f.eq(ERROR))   return s.error(s.cdr().eval());
  if (f.eq(NULL))    return only(s).null() ? T : NIL;
  if (f.eq(QUOTE))   return s.cdr().car();
  if (f.eq(NLAMBDA)) return list(NLAMBDA, s.$2$(),  s.$3$());
  if (f.eq(LAMBDA))  return list(LAMBDA, s.$2$(), s.$3$());
  if (f.eq(NDEFUN)) { 
    ndefun(s.$2$(), s.$3$(), s.$4$());
    return  s.$2$();
  }
  if (f.eq(DEFUN)) { 
    defun(s.$2$(), s.$3$(), s.$4$());
    return  s.$2$();
  }
  return bug(s);
}

As you can see, the single arg atomics are called with only(S s) that makes sure that s is a list, while the rest don't - see my comment about checkNumberOfArgs(S s). I think the solution is that checkNumberOfArgs(S s) will make sure that s is a list (otherwise it will throw INVALID), and it will also render only(S s) redundant - see my comment about only(S s).

I'm referring you to this problems because I don't want to blindly copy what seems to me as an odd behavior, without making sure this is what you intended.

@yossigil
Copy link
Collaborator

yossigil commented Aug 2, 2021

@DorYeheskel

What do you say to this?

@DorYeheskel
Copy link
Collaborator

@OfirMarkowitz1
Calling evaluate_atomic_function(S s) means that s is a list (e.g: (car 'a)), so you don't need to check it.
If s is not a list, in the eval function, it will go to the lookup function (see the conditions here)

You can see also here:

(defun evaluate−atomic (S−expression a−list)
     (apply−atomic (car S−expression) (cdr S−expression) a−list))

That cdr is called (which mean s is a list, otherwise, error will occur)

@OfirMarkowitz1
Copy link
Collaborator Author

OfirMarkowitz1 commented Aug 2, 2021

@DorYeheskel @yossigil
It seems to me the confusion stems from the fact that in the C implementation there's no notion of a pair that is not a list.
If that's true, this is in inconsistent with the book.
Am I correct about this?

And in any case, the only(S s) function is redundant. You should just use eval, i.e.:

if (f.eq(CAR))     return s.$2$().eval();

or

if (f.eq(CAR))     return evaluate_list(s.cdr()).car();

@yossigil
Copy link
Collaborator

yossigil commented Aug 3, 2021

I will let you guys sort it out a little...

@OfirMarkowitz1
Copy link
Collaborator Author

OfirMarkowitz1 commented Aug 3, 2021

@OfirMarkowitz1
Calling evaluate_atomic_function(S s) means that s is a list (e.g: (car 'a)), so you don't need to check it.
If s is not a list, in the eval function, it will go to the lookup function (see the conditions here)

You can see also here:

(defun evaluate−atomic (S−expression a−list)
     (apply−atomic (car S−expression) (cdr S−expression) a−list))

That cdr is called (which mean s is a list, otherwise, error will occur)

@DorYeheskel
The discussion about pairs in #439 is relevant here. It seems that the current C implementation does not allow pairs that are not lists. If pairs that are not lists will be allowed, my remarks about the implementation of void checkNumberOfArgs(S s) are valid.

@OfirMarkowitz1
Copy link
Collaborator Author

OfirMarkowitz1 commented Aug 3, 2021

In the function S evaluate_atomic_function(S s), file eval.cc, line 88:

if (f.eq(ERROR))   return s.error(s.cdr().eval());

This isn't working properly since s.cdr().eval() will throw an "INVALID" exception, before the call to error.
There's also tests that rely on this incorrect behavior, see t-alist-global-local.cc, in test SetWithError, specifically line 101:

EXPECT_EXCEPTION(s.eval(), list(b_2, x.q()), INVALID);

I think the correct behavior is throwing an S expression containing the actuals passed to error, after their evaluation:

if (f.eq(ERROR))   return s.error(evaluate_list(s.cdr()));

I guess that's what was meant to be written but eval was called instead of evaluate_list.

@OfirMarkowitz1
Copy link
Collaborator Author

I've resolved issue #433, e.g., finished the error handling in JS. Perhaps I will need to make very few changes in order to close the current issue. I'm waiting for @DorYeheskel and @yossigil final take on this, to corroborate my work. Same with #427

@yossigil
Copy link
Collaborator

yossigil commented Aug 8, 2021

@DorYeheskel are you sure that we do not have pairs in the implementation, I remember vividly supporting these in my parser.

@DorYeheskel
Copy link
Collaborator

@OfirMarkowitz1
I fixed the error message issues, thanks.

About pairs, see the following examples, using the mini-lisp interpreter:
Lists:
(car (car '(a b))) -> ERROR (car error)
(cdr (car '(a b))) -> ERROR (cdr error)

Pairs:
(car (car '(a.b))) -> A
(cdr (car '(a.b))) -> B

As you can see, (a.b) is a pair in a list (as written here too) and thus car on it will return the pair A.B
and car/cdr on it will return A/B.

If pair was not supported, then (car (car '(a.b)) should raise a car error, which is not the case.

@OfirMarkowitz1
Copy link
Collaborator Author

OfirMarkowitz1 commented Aug 9, 2021

@DorYeheskel
Firstly, from page 34 in the book:

(car '(b.a)) ⇒ B
(car '(b a)) ⇒ B
(car '(a)) ⇒ A
(car 'a) ⇒ #
(car ()) ⇒ #

A.B is not a pair, but rather (A.B). The first line makes it clear.
Also, see @yossigil remark from #439:

The C implementation recognizes all S expressions. This includes both Pairs and Lists.
By definition, the car of a pair, gives the CAR, i.e., the left child of the pair.
(car '(a.b)) is a. If it is not, it is a bug.

Also, A.B shouldn't be an atom, but rather rejected by the parser, see @yossigil remark from #439:

Well, we encountered a subtle difference between the parser of our implementation and that of common lisp. This suggests even more to avoid working on a common lisp implementation. This also may be related to the problem of unique unambiguous output.
What is (a.b)? Could be a list containing one element, or a pair. I think it is a pair...

Why?

Mini Lisp should be minimal. Identifiers are really simple. Only few characters are excluded; precisely those used for tokens, I think there are total of 6 reserved characters, out of which only 4 are in actual use; there are also spaces. That's it.

So, by this minimalist approach, (a.b) could only be a pair. it cannot be a list, since list elements are separated by spaces. It could not be a list containing one atom, because a.b is not an atom.

I hope this makes sense.

In both cases, the fact that the tests don't reflect this behavior means the tests are wrong, not the other way around.
By the way:

I think there are total of 6 reserved characters, out of which only 4 are in actual use; there are also spaces. That's it.

There should be an additional "token" character, ';', which should be used only for line comments, so 7 reserved characters in total. This character should be added in Tokenizer.h, in line 6: const char tokens[] = "()[].\'";

@yossigil
Copy link
Collaborator

Seems like an error in the book. Should be fixed.

@yossigil
Copy link
Collaborator

Comments are not tokens; never...

@OfirMarkowitz1
Copy link
Collaborator Author

OfirMarkowitz1 commented Sep 12, 2021

I still need to unify these errors.
It would be much more comfortable for me if we gathered all the cases where the user did something wrong, and decide whether it warrants and error, and if it is, which kind.
Discussing it with @yossigil, I understood that our rule of thumb is simplicity. We want the minimal number of exception scenarios in the code, and to let the stack dump do the talking.

Necessary error example - fake lambda tag:

('(fake_lambda_tag (x) (car x)) '[a.b])

This should be an error (and is, both in C and JS implementations), because it doesn't make sense for a wrong tag to act like lambda/nlambda, i.e., have a default behavior.

An existing error example, which I'm not sure is necessary:

((lambda (x) (car x)) '[a.b] 'c)

Currently, this raises a "REDUNDANT" error, but we can decide just to ignore the second variable. I think it's simpler.

Another existing error example:

((lambda (x y) (cons x y)) 'a)

Currently, this raises a "MISSING" error, but I think it also doesn't have to be an error because we can implement currying instead, i.e., this expression evaluation can return (LAMBDA (Y) (CONS (QUOTE A) Y)).

I think a discussion like this would be much more fruitful than for me to blindly copy the C implementation errors. Also, I think it would make a nice addition to the book, which currently contains only some of the present C implementation errors.

@yossigil
Copy link
Collaborator

I think there are the following errors:

  1. Undefined variable
  2. Missing argument (it boils down to CDR failing on some list)
  3. Redundant argument (it boils down to CDR failing on some list)
  4. Bad function, i.e., an anonymous function that does follow the strict format
  5. Memory overflow (occurs in C, not in JS)
  6. Bug, when something should not happen

This is all I could think of.

@OfirMarkowitz1
Copy link
Collaborator Author

OfirMarkowitz1 commented Sep 14, 2021

  1. MISSING and REDUNDANT:

Missing argument (it boils down to CDR failing on some list)
Redundant argument (it boils down to CDR failing on some list)

If it boils down to a different error, this means there shouldn't be a REDUNDANT and MISSING error when applying lambda or primitive, and there are:
When binding upon applying lambda, in a-list.cc:

S bind(S formals, S actuals, S list) { 
  if (!formals.null() && actuals.null()) return formals.error(MISSING);
  if (formals.null() && !actuals.null()) return actuals.error(REDUNDANT);
  if (formals.null()) return list;
  return formals.car().cons(actuals.car()).cons(bind(formals.cdr(), actuals.cdr(), list));
}

Before applying primitive, in eval.cc:

void checkNumberOfArgs(S s) {
    S f = s.car();
    push(ARGUMENT, s.cdr());
    if (f.eq(QUOTE) || f.eq(CAR) || f.eq(CDR) || // 1 Args:
        f.eq(ATOM) || f.eq(NULL) || f.eq(EVAL)) {
        s.more2() && s.error(REDUNDANT).t();
        s.less2() && s.error(MISSING).t();
    }
    if (f.eq(CONS) || f.eq(SET) || f.eq(EQ) || f.eq(LAMBDA) || f.eq(NLAMBDA)) { // 2 Args:
        s.more3() && s.error(REDUNDANT).t();
        s.less3() && s.error(MISSING).t();
    }
    if (f.eq(NDEFUN) || f.eq(DEFUN)) { // 3 Args:
        s.more4() && s.error(REDUNDANT).t();
        s.less4() && s.error(MISSING).t();
    }
    remove_element(ARGUMENT);
}
  1. Applying lambda:

Bad function, i.e., an anonymous function that does follow the strict format

S apply(S f, S args) {
  D(f,args);
  f.n3() || f.cons(args).error(INVALID).t();
  D(f.$1$(),f.$2$(),f.$3$(),args,f.$1$().eq(NLAMBDA));
  push(ARGUMENT, args);
  const auto actuals = f.$1$().eq(NLAMBDA)? args : f.$1$().eq(LAMBDA) ? evaluate_list(args) : f.cons(args).error(INVALID);
  remove_element(ARGUMENT);
  alist() = bind(f.$2$(), actuals, alist());
  push(ARGUMENT, actuals);
  const auto result = f.$3$().eval();
  remove_element(ARGUMENT);
  remove_local_binding(f.$2$());
  return result;
}

There's a check that the S expression is a list of length 3, and that the tag is valid. There's no check that the formals are a valid list.
Observing the bind function C implementation above, in case the formals are invalid, it boils down to a CAR error, when the formals are a non-nil atom, MISSING error when its a non-list nested pair (e.g. [a . [b . c]]) with insufficient degree of nesting, and no error will be thrown if it's a non-list nested pair with sufficient degree (the number of actuals or above).
I think it's confusing and non-coherent, and there should be a proper check that the formals are a list, and INVALID error should be thrown if it doesn't.

  1. CAR and CDR on an atom should be included in this list.

  2. Evaluating COND:
    A COND error is thrown when one of the arguments passed to COND is an atom - eval.cc:

FUN(S, evaluate_cond, S)  IS( 
    _.null() ?  NIL:
    _.car().atom() ? _.car().error(COND):
      _.car().car().eval().t() ? _.car().cdr().car().eval():
        $$(_.cdr()) ;
)

It can easily be boiled down to CAR error.

  1. The ERROR primitive:
    We need to define exactly what happens when the user calls the ERROR primitive. In the book, you could pass the error any number of arguments, but in the C implementation, you can pass 1 argument at most, and it ignores the rest:
  if (f.eq(ERROR)) {
      if (! s.less2()) {
          args_1 = eval_argument(s.$2$());
      }
      push(ARGUMENT, args_1);
      return s.error(args_1); // throw
  }
  1. The evaluation function:
    Similar to what I described about the formals, there's no check that the S expression given to the eval function is a proper list (if it's not an atom):
S eval(S s) {
    D(s);
    inc_eval_counter();
    S res = NIL;
    D(s);
    if (s.atom()) {
       res = lookup(s);
       D(s, res);
    } else {
      D(s, s.car(), s.cdr());
      if (atomic(s.car())) {
       D(s.car());
       push(RESCUE, s.car());
       res = evaluate_atomic_function(s);
       remove_element(RESCUE);
       D(s, res);
      } else {
       S f_name = s.car();
       S f_body = s.car().eval();
       push(RESCUE, f_name);
       res = apply_defined_function(f_body, s.cdr());
       remove_element(RESCUE);
       D(s, res);
      }
    }
    D(s,res);
    dec_eval_counter();
    if (get_eval_counter() == 0) reset_set_counter();
    return res;
}

This means that [set . ['a . 'b]] will be equivalent to (set 'a 'b), and it seems weird to me. I think we should check that the evaluation function argument is a proper list, and throw INVALID error, if it's not.

@yossigil
Copy link
Collaborator

yossigil commented Oct 7, 2021

This boils down to the evaluation order: with gnu common lisp, we obtain:

>(car t t a)
Error: UNBOUND-VARIABLE :NAME A

which means that the evaluation algorithm in CL first evaluates all arguments; it then binds them and determines whether their number is correct. Check out also (cons x) which fails on x. I saw no point in using our own evaluation order. For this reason, I re-implemented eval function in the code. I keep updating, so please check this out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants