ibm-netrexx

Keywords (was Re: AST, BNF, ANTLR)

Classic

List

Threaded

16 messages Options

Mike Cowlishaw

Keywords (was Re: AST, BNF, ANTLR)

Bill, sorry about the delay in reply. A lot of questions/comments here; mine in blue....

I don't believe that I've misunderstood you, but I do believe that we have a disagreement about how best to handle the language (keyword) extensibility problem.

Or perhaps at cross purposes on some things.

You have described the problem quite clearly - in many programming languages breakage can occur when new keywords are added. In other words, old programs do not work as they originally did.

It's more than that .. it's also new programs that run on a newer interpreter. Scripts interpreted from the source of the program are much more vulnerable to language changes than executables that are compiled. That's my primary concern.

My point is that the NetRexx approach to distinguishing keywords from variable names has significant downsides.

Of course NetRexx 3 is not about to change - as the saying goes "It is what it is". I'm not advocating any change, although if a new Rexx dialect is developed I am advocating that the breakage problem be handled differently. This is a philosophical discussion, not a practical one.

I think it is important to point out that with careful design, a programming language can totally avoid the breakage problem. The best example is PL/I. In its 50 year history, PL/I has added many dozens of keywords to the language, but as far as I know, there has never been an instance of breakage. Why? Because keywords are never identified by examining them! In other words, tokens are are identified as keywords by syntax context rather than content, and that is why keywords are never confused with variables with the same name. The down side is that the language has lots of parens and commas, and sometimes an unnatural feel. I know that you are well aware of this Mike, but some other readers may not be.

Yes, and it is possible for PL/I due to one big difference from Rexx (and NetRexx) -- blanks are separators in PL/I, and can never be operators.   But the blank operator in Rexx is one of the features that keep it relatively notation-free.   I have some programs (e.g., wiki2html translation) where the blank operator is by far the most heavily used operator -- more often used than +, in particular.

PL/I was famously ridiculed for its acceptance of the perfectly valid statement:
     "if if = then then then = else; else else = if;".
The mind sees "if", "then" and "else" as keywords and not variable names. The fact that the situation was a byproduct of avoiding the breakage problem was generally not acknowledged.

Yes, and of course something like this:

if file=bin then file=input; else file=list;

might well be part of a more real program (all the variable names there can be keywords in other contexts).

The crucial point is that in any language that avoids the breakage problem, the separation of keywords and variable names comes first. Then the keyword token in question is compared with the list of known keywords. If a token which is known to be a keyword is not within the list of known keywords (for that version of the language), it is an "invalid keyword" situation. It is not presumed to be a variable name.

Most other languages use the "reserved keyword" approach. Keywords are identified by comparing tokens with a list of words, and anything that matches is a keyword, anything that doesn't match is a variable name, and "never the twain shall meet". In that case, breakage will always occur in places where keywords and variables can occur in the same location.

As you know, NetRexx takes the opposite approach. It compares tokens with a dynamically computed list of variable and method names (i.e. everything that is not a keyword).

I would disagree with the word 'dynamic', here. Variables are statically computed (they have to be, because the NetRexx processor can emit Java -- whose variables are static). Method names only affect instruction keywords (i.e., first token in a clause and, in practice, when a sole symbol in that clause) and since (except for static methods in USES classes -- which I think was a mistake) they must be in the current class they are not dynamic in any sense.   Of course, method calls don't come into it at all if strictargs is in effect (more on that below).

If the token is not within that list, the token is judged to be a keyword. Then if the keyword is not within the list of known keywords, it is an "invalid keyword" situation. Thus NetRexx, like PL/I, avoids the breakage problem.

Agreed.

In my opinion, here are some downsides of the NetRexx approach.

First, as I tried to point out, by giving variable names priority over keywords, keywords may be overloaded. In my view, that is a bad idea for a language which strives for simplicity and low "astonishment" levels.

It is a natural tendency for programmers (particularly those with experience in other languages) to recognize keywords by content. In other words, when reading "options args", options is assumed to be a keyword. Allowing any other interpretation is simply confusing.

I find the reserved keywords approach to be more astonishing (we may have to agree to differ on that). And the PL/I approach -- which would mean more notations and losing the blank operator -- would make Rexx more a variant of PL/I than an advance on PL/I, perhaps.

Of course we could use different syntax to differentiate variables from keywords; many languages take that approach (e.g., EXEC 2, Tkl, etc.). I consider that approach too ugly and too wasteful.

Second, using a dynamic list of available variable names locks the program into its execution environment.

With the exception of static methods in USES classes when strictargs is off, I believe NetRexx syntax is independent of its execution environment. That loophole was unfortunate -- but of course it only applies in classes with USES classes specified which are rare.

The example on page 79 of TRL contains two occurrences of "say 'hello' " in the same short program. The first is valid, and the second is an error. In my opinion, that is confusing and a bad idea. It is, of course, a byproduct of the way that NetRexx avoids the breakage problem.

But that example is deliberately chose to look confusing and silly, to make the point -- just like your PL/I example above.   'say' is so well known in NetRexx that few, if any, programmers would use 'say' as a variable name. However, there are more obscure keywords in NetRexx; someone might well want to call a variable 'label' or 'digits' for example, as in:

   label='fred'

   digits=3

and if they never use (and maybe never even learned about) those features, all is well and good.   This is particularly important for new programmers who can get very frustrated when trying to pick variable names in a language with many reserved words.

Consider the following program:
/* NetRexx 4 */import some.package.pleaseexplainthisprogram
What does a person familiar with only NetRexx 3 make of this? Each of the words might be a method or a new keyword added in version 4 of the language.

Indeed, and that would be true of a future PL/I perhaps, too.   However, it would be rather unlikely that new instructions would be single words, so I would assume that all of those (except 'this' which is an error even now) must be method calls?

There's a good point here, however: the NetRexx ability to reduce notation by allowing a method call without specifying parentheses was probably a mistake, or at least the strictargs option should probably have been the default.   But that generally does not expose a program to later breakage because any method calls have to be in the same class. The exception to that, already mentioned, is static methods in USES classes which could be added at a later date -- that was definitely a mistake (but again it would be OK so long as parentheses were required).

Third, using a dynamic list of available variable names not only locks the program into its execution environment, it also locks any other program which attempts to correctly process a NetRexx source file into the execution environment.

(Same comments as above.)

That means than any formatter, pretty printer, statistics gatherer, intelligent editor etc. for NetRexx must also include the same logic that the translator uses. It must dynamically determine everything which is not a keyword to identify keywords. I think that is unfortunate since it makes the development of peripheral NetRexx processors more difficult or impossible.

This would be true if keywords depended on the environment, but I don't think they do except for the USES case.

Finally, it makes the language difficult, if not impossible, to define in a formal manner with BNF or another formal definition method. While some may feel this is actually an advantage (!), the truth is that it makes standardization difficult. Essentially all compilable programming languages have formal definitions.

Almost all languages (I would say all 'practical' languages) have to supplement BNF with prose; NetRexx is actually rather clean in that respect -- there is really just one exception rule to add to the pure BNF.

The overall problem of language versions is a complex one, since every change to a language in effect defines a new language. I believe the assumption that any future NetRexx processor should be able to correctly process every NetRexx program without knowing which version of the NetRexx language it is programmed in, is, (while laudable), not worth it if it requires the current NetRexx method of identifying keywords.

As I have suggested, I believe in adopting the approach that NetRexx programs should identify themselves. In HTML web pages, the very first thing is a DOCTYPE declaration of exactly what language the page is written in. I think the same approach could be adopted for NetRexx so that if at some later point the method of identifying keywords is changed, it could be accommodated. I suggest that the language level should be included in the initial comment or in an option which must be specified before the remainder of the program. Existing NetRexx programs would, of course, default to the current language version.

But this has exactly the same problem you just expressed concern about: every future compiler, interpreter, formatter, pretty printer, statistics gatherer, intelligent editor etc. for NetRexx must also include logic to handle previous levels of the language that remain supported.

As I said, you and I just disagree on this, Mike. I don't expect you to change anything, but I do hope you will give it some (more) serious thought.

With hindsight there is very little I would change in NetRexx (the strictargs matter is one, and maybe I'd drop USES entirely). The variables-override-keywords rule is extremely simple to understand and gets enormous 'bang for the buck' by making the language safely extendable in a way that Rexx (and almost all other languages) never achieved. PL/I is the nearest to NetRexx -- but at a cost that would make achieving the other goals of Rexx/NetRexx impossible.

All these things are tradeoffs, of course, and it took many iterations of language design before I got to NetRexx.



Mike

_______________________________________________
Ibm-netrexx mailing list
[hidden email]
Online Archive : http://ibm-netrexx.215625.n3.nabble.com/

billfen

Re: Keywords (was Re: AST, BNF, ANTLR)

On 3/27/2013 6:58 AM, Mike Cowlishaw wrote:

Bill, sorry about the delay in reply. A lot of questions/comments here; mine in blue....

Thanks for giving it your full consideration - I realize you are busy.

I don't believe that I've misunderstood you, but I do believe that we have a disagreement about how best to handle the language (keyword) extensibility problem.

Or perhaps at cross purposes on some things.

Perhaps at the tactical level, but I think not at the strategic. Wouldn't you agree that the ability to quickly and effectively develop NetRexx source processing programs would benefit the usability and spread of the language?

You have described the problem quite clearly - in many programming languages breakage can occur when new keywords are added. In other words, old programs do not work as they originally did.

It's more than that .. it's also new programs that run on a newer interpreter. Scripts interpreted from the source of the program are much more vulnerable to language changes than executables that are compiled. That's my primary concern.

My point is that the NetRexx approach to distinguishing keywords from variable names has significant downsides.

Of course NetRexx 3 is not about to change - as the saying goes "It is what it is". I'm not advocating any change, although if a new Rexx dialect is developed I am advocating that the breakage problem be handled differently. This is a philosophical discussion, not a practical one.

I think it is important to point out that with careful design, a programming language can totally avoid the breakage problem. The best example is PL/I. In its 50 year history, PL/I has added many dozens of keywords to the language, but as far as I know, there has never been an instance of breakage. Why? Because keywords are never identified by examining them! In other words, tokens are are identified as keywords by syntax context rather than content, and that is why keywords are never confused with variables with the same name. The down side is that the language has lots of parens and commas, and sometimes an unnatural feel. I know that you are well aware of this Mike, but some other readers may not be.

Yes, and it is possible for PL/I due to one big difference from Rexx (and NetRexx) -- blanks are separators in PL/I, and can never be operators. But the blank operator in Rexx is one of the features that keep it relatively notation-free. I have some programs (e.g., wiki2html translation) where the blank operator is by far the most heavily used operator -- more often used than +, in particular.

The unrestricted use of the blank concatenate and direct abutment concatenate operators causes most of the "breakage possibilities" in NetRexx. I define a "breakage possibility" as every place in the language definition where both a keyword and a variable name can occur. (Think railroad track diagrams, etc.) Breakage possibilities are important because it is at those points that breakage may occur if a new keyword is added to the language. Every NetRexx situation in which an expression is followed by a keyword is subject to breakage, since the keyword may be a variable name after an implied concatenate operator.

In order to get an idea of how much of a problem this is in the real word, can you estimate how often in wiki2html an expression using an implied concatenate operator is followed by a keyword? Or if the source is available, could you point me to it so I can check it?

To put things in perspective, the implied concatenate operators are novel, convenient and in some situations very convenient. But they are nothing more than substitutions for "||' '||" and "||". Providing full keyword independence in NetRexx is much more important in my view. If the implied concatenate operators must be restricted in some situations to accomplish that goal, then I think they should be.

If NetRexx required that if an expression contains an implied concatenate operator and is followed by a keyword, the expression must be enclosed in parentheses at some level, there would be no need for the special cases regarding the non-statement level keywords. The keyword independent model could be used (assuming other NetRexx breakage possibilities were fixed).

For example, page 94 of the Language Definition says: "The expressions exprw or expru will be ended by either of the keywords while or until (unless the word is the name of a variable)." What it does not say is "For example, if 'while' was used as a variable name at any earlier point, the 'while' condition may not be specified." A NetRexx error is generated for "x=0; while = 3; loop while x = 1; end;" With full keyword independence, no explanation is necessary at all, since variable names can be the same as keywords. In addition, the above code would not be in error.

In Rexx, the attempt was apparently to allow at least partial keyword independence since "say = 42; say say;" was valid. (Robert, thanks for pointing this out in your earlier append.) In NetRexx it is not valid, and I think NetRexx would be a better language if it were. Of course that is not possible since NetRexx is, in essence, a reserved keyword language (because of how breakage is prevented). If true keyword independence were adopted by NetRexx, I think it would be a better and less confusing situation.

PL/I was famously ridiculed for its acceptance of the perfectly valid statement:
"if if = then then then = else; else else = if;".
The mind sees "if", "then" and "else" as keywords and not variable names. The fact that the situation was a byproduct of avoiding the breakage problem was generally not acknowledged.

Yes, and of course something like this:

if file=bin then file=input; else file=list;

might well be part of a more real program (all the variable names there can be keywords in other contexts).

PL/I has full keyword independence but NetRexx does not. PL/I allows "put (put);" where the second "put" is a variable while NetRexx does not allow "say say;". Note that NetRexx with keyword independence would not require "say (say);" but would allow it. The parens in "put (put);" are required because of the complex nature of the PL/I "put" statement.

The crucial point is that in any language that avoids the breakage problem, the separation of keywords and variable names comes first. Then the keyword token in question is compared with the list of known keywords. If a token which is known to be a keyword is not within the list of known keywords (for that version of the language), it is an "invalid keyword" situation. It is not presumed to be a variable name.

Most other languages use the "reserved keyword" approach. Keywords are identified by comparing tokens with a list of words, and anything that matches is a keyword, anything that doesn't match is a variable name, and "never the twain shall meet". In that case, breakage will always occur in places where keywords and variables can occur in the same location.

As you know, NetRexx takes the opposite approach. It compares tokens with a dynamically computed list of variable and method names (i.e. everything that is not a keyword).

I would disagree with the word 'dynamic', here. Variables are statically computed (they have to be, because the NetRexx processor can emit Java -- whose variables are static). Method names only affect instruction keywords (i.e., first token in a clause and, in practice, when a sole symbol in that clause) and since (except for static methods in USES classes -- which I think was a mistake) they must be in the current class they are not dynamic in any sense.   Of course, method calls don't come into it at all if strictargs is in effect (more on that below).

Yes, "dynamic" is the wrong word (I misunderstood the detailed nature of the related translator processing).

I most certainly agree that allowing single word method names without "()" was ill-advised. It is the primary reason that the parsing of the NetRexx source requires access to the external execution environment. In addition, it is a significant breakage point since it allows the first word of a statement to be a variable name as well as a keyword.

If the search for variable names is limited to names contained in the source file, some of my objections go away. Unfortunately I think it is too late to "close the barn door" without a new NetRexx dialect or a version control mechanism.

To emphasize why this is important, consider the task of writing a NetRexx source processor that does not run in the Java environment, perhaps on a machine that does not even include Java. To be reasonably successful, it must only consider the input source, not anything external to it. Currently I don't see how that is possible (with the USES problem). Of course, some things can not be accomplished without the Java environment, such as checking the validity of external method calls, but assume the processor does not need to do that.

In addition, is there any inherent reason why NetRexx could not be used with languages other than Java? I'm not proficient enough in C++ to understand the pitfalls, but why couldn't a C++ version of NetRexx be considered? Perhaps a C++ programmer can comment?

If the token is not within that list, the token is judged to be a keyword. Then if the keyword is not within the list of known keywords, it is an "invalid keyword" situation. Thus NetRexx, like PL/I, avoids the breakage problem.

Agreed.

In my opinion, here are some downsides of the NetRexx approach.

First, as I tried to point out, by giving variable names priority over keywords, keywords may be overloaded. In my view, that is a bad idea for a language which strives for simplicity and low "astonishment" levels.

It is a natural tendency for programmers (particularly those with experience in other languages) to recognize keywords by content. In other words, when reading "options args", options is assumed to be a keyword. Allowing any other interpretation is simply confusing.

I find the reserved keywords approach to be more astonishing (we may have to agree to differ on that). And the PL/I approach -- which would mean more notations and losing the blank operator -- would make Rexx more a variant of PL/I than an advance on PL/I, perhaps.

Of course we could use different syntax to differentiate variables from keywords; many languages take that approach (e.g., EXEC 2, Tkl, etc.). I consider that approach too ugly and too wasteful.

I "agree to disagree" with the first point, and agree with the second. If I understand you correctly, I think you may be in the minority on the first point. I submit that most programmers see the statement "options args;" as the keyword "options" followed by an argument "args", and the statement "say greeting;" as the keyword "say" followed by the argument "greeting". If all of the source code in the world were analyzed, undoubtedly almost all of the non-assignment statements consist of a fixed keyword optionally followed by arguments or modifiers. That model is so pervasive that it is second nature to the majority of programmers. Is that not the case for you? (Of course, there is always Lisp, but even that follows the "FunctionName(argument list)" model - perhaps you see "options" as a fixed function? But the distinction between a user defined function and a keyword defined function isn't clear :)

Perhaps a clarification of my understanding of "reserved keyword", "keyword independent" and other languages would help. A "reserved keyword" language (like C or Java) is one in which keywords are always reserved to be used as keywords and can never be used as variable names. In these languages breakage may occur when a new keyword is added. A "keyword independent" language (like PL/I) is one in which keywords may be used as either a keyword or a variable name at any point. In general, breakage will not occur with these languages. Other languages (like NetRexx) are ones in which keywords are reserved some of the time. In Netrexx, keywords are reserved but if used as a variable name, the word can no longer be used as a keyword.

I don't know of any other language which uses the model that NetRexx uses. I think it may be confusing to some programmers that a variable name invalidates a keyword.

I don't agree concerning the "loss of the blank operator" or that keyword independence makes Rexx or NetRexx anything near a variant of PL/I.

What I am advocating is that implied concatenate operator use be restricted when the expression may be followed by a keyword. In those cases, recognition of implied concatenate operators is enabled only within a parenthesis level of one or greater. Thus the following are all valid:
"x = a + b c; x = a b + c; if x = a + b then nop; if (x = a + b c) then nop; if (x = a b + c) then nop; if (x = a'b' + c); if x = (a b) + c then nop;"

Of course the following are also valid:
"if x = (a + b c) then nop; if x = (a b + c) then nop; if (x = a + b) then nop; if (x = ((a b) + c)) then nop;"

but "if x = a + b c then nop;" would be invalid and "if x = a b + c then nop;" would be invalid.

Second, using a dynamic list of available variable names locks the program into its execution environment.

With the exception of static methods in USES classes when strictargs is off, I believe NetRexx syntax is independent of its execution environment. That loophole was unfortunate -- but of course it only applies in classes with USES classes specified which are rare.

As discussed above. Still a loophole which should be closed.

The example on page 79 of TRL contains two occurrences of "say 'hello' " in the same short program. The first is valid, and the second is an error. In my opinion, that is confusing and a bad idea. It is, of course, a byproduct of the way that NetRexx avoids the breakage problem.

But that example is deliberately chose to look confusing and silly, to make the point -- just like your PL/I example above.   'say' is so well known in NetRexx that few, if any, programmers would use 'say' as a variable name. However, there are more obscure keywords in NetRexx; someone might well want to call a variable 'label' or 'digits' for example, as in:

   label='fred'

   digits=3

and if they never use (and maybe never even learned about) those features, all is well and good.   This is particularly important for new programmers who can get very frustrated when trying to pick variable names in a language with many reserved words.

This is exactly why the PL/I model of keyword independence is preferable! While beginning PL/I programmers have to deal with the complexity of the language, they are never frustrated by the problem of variable names being keywords.

If NetRexx adopted the full keyword independence model, there would be no need for any discussion of the sub-keywords in the IF, LOOP and other statements. The issue could be described:

"Keywords in NetRexx are not reserved, and any variable may have the same name as a keyword. For clarity, the use of variable names which are the same as keywords is not recommended."

I believe that relatively few changes are needed to the NetRexx language to move it to the keyword independent model. Once an exhaustive determination of exactly what those changes are is made, then it can be determined if the changes damage the language more than keyword independence helps it.

Consider the following program:
/* NetRexx 4 */import some.package.pleaseexplainthisprogram
What does a person familiar with only NetRexx 3 make of this? Each of the words might be a method or a new keyword added in version 4 of the language.

Indeed, and that would be true of a future PL/I perhaps, too.   However, it would be rather unlikely that new instructions would be single words, so I would assume that all of those (except 'this' which is an error even now) must be method calls?

There's a good point here, however: the NetRexx ability to reduce notation by allowing a method call without specifying parentheses was probably a mistake, or at least the strictargs option should probably have been the default.   But that generally does not expose a program to later breakage because any method calls have to be in the same class. The exception to that, already mentioned, is static methods in USES classes which could be added at a later date -- that was definitely a mistake (but again it would be OK so long as parentheses were required).

In a future PL/I, because of keyword independence, the words in all of the above statements (assuming ";" added to each) must be keywords. All PL/I statements begin with a keyword unless (like the assignment statement) it can be differentiated by lookahead.

We agree that in NetRexx, these should not be method calls - the "()" is to be required for a single word method call. If NetRexx had keyword independence, all of these statements (including "this") would be new NetRexx keyword statements.

Note that the use of a variable name and a keyword as the first word of a statement is a breakage possibility which must be examined. As with the assignment statement, simple lookahead can determine if the word is a variable name So requiring "()" provides the necessary lookahead for a method call, just as "=" identifies an assignment.

Third, using a dynamic list of available variable names not only locks the program into its execution environment, it also locks any other program which attempts to correctly process a NetRexx source file into the execution environment.

(Same comments as above.)

That means than any formatter, pretty printer, statistics gatherer, intelligent editor etc. for NetRexx must also include the same logic that the translator uses. It must dynamically determine everything which is not a keyword to identify keywords. I think that is unfortunate since it makes the development of peripheral NetRexx processors more difficult or impossible.

This would be true if keywords depended on the environment, but I don't think they do except for the USES case.

Discussed above

Finally, it makes the language difficult, if not impossible, to define in a formal manner with BNF or another formal definition method. While some may feel this is actually an advantage (!), the truth is that it makes standardization difficult. Essentially all compilable programming languages have formal definitions.

Almost all languages (I would say all 'practical' languages) have to supplement BNF with prose; NetRexx is actually rather clean in that respect -- there is really just one exception rule to add to the pure BNF.

Does NetRexx actually have a "pure BNF" definition? I wasn't aware that had been done, and if it has I'm certainly interested in seeing it. What is the exception rule?

As a descriptive mechanism, BNF is extremely powerful, and a fully robust definition requires almost no prose to accompany it. The problem is that a fully robust definition can be huge and incredibly boring to generate. For example, it is easier to add "one or more of the following, without duplicates..." than it is to enumerate all the possibilities. In the case of the blank and abutment operators, the exact details of white space and significant blanks must be included rather than adding prose to describe it. The use of an extended BNF can help but not eliminate these description problems.

In addition, a BNF specification can describe ambiguous languages. It can describe unambiguous languages which can not be effectively parsed. There is a great difference between a "BNF specification" and a "practical and usable BNF specification".

The trick is to develop a grammar specification which fits one of the commonly used types such as LR(1), LL(1), LALR, etc. Not only can a specification like that be used to generate scanners, parsers and the like, it can also serve as a detailed spec to be used to check a hand written parser.

Note that in no way am I advocating that an automatically generated parser be used in the NetRexx translator!

Clearly hand written scanners and parsers can by just as effective as automatically generated ones, although experience has shown that they may be somewhat more error prone. What I am suggesting is that for those who are comfortable with generators, having an adaptable grammar which can be used to accurately parse NetRexx could shorten the time necessary to implement a NetRexx source processor. As it stands, having the variable names take priority over keywords (and the USES issue) substantially complicate the problem.

The overall problem of language versions is a complex one, since every change to a language in effect defines a new language. I believe the assumption that any future NetRexx processor should be able to correctly process every NetRexx program without knowing which version of the NetRexx language it is programmed in, is, (while laudable), not worth it if it requires the current NetRexx method of identifying keywords.

As I have suggested, I believe in adopting the approach that NetRexx programs should identify themselves. In HTML web pages, the very first thing is a DOCTYPE declaration of exactly what language the page is written in. I think the same approach could be adopted for NetRexx so that if at some later point the method of identifying keywords is changed, it could be accommodated. I suggest that the language level should be included in the initial comment or in an option which must be specified before the remainder of the program. Existing NetRexx programs would, of course, default to the current language version.

But this has exactly the same problem you just expressed concern about: every future compiler, interpreter, formatter, pretty printer, statistics gatherer, intelligent editor etc. for NetRexx must also include logic to handle previous levels of the language that remain supported.

I don't believe that is true if the keyword independence model is used. I was suggesting this as an alternate approach to the breakage problem.

As I said, you and I just disagree on this, Mike. I don't expect you to change anything, but I do hope you will give it some (more) serious thought.

With hindsight there is very little I would change in NetRexx (the strictargs matter is one, and maybe I'd drop USES entirely). The variables-override-keywords rule is extremely simple to understand and gets enormous 'bang for the buck' by making the language safely extendable in a way that Rexx (and almost all other languages) never achieved. PL/I is the nearest to NetRexx -- but at a cost that would make achieving the other goals of Rexx/NetRexx impossible.

All these things are tradeoffs, of course, and it took many iterations of language design before I got to NetRexx.

I still think the keyword independent model is preferable for breakage control, and that the variables-override-keywords approach costs more than it is worth. I realize you currently believe that making NetRexx fully keyword independent conflicts with other NetRexx goals, but I (humbly) suggest that issue might deserve reevaluation.

So we obviously should "agree to disagree" on these points. But if you do yet another iteration of the Rexx family, I hope you will consider this discussion.

Mike

Bill
PS Sorry about the redundancy of my comments - yes, I really do favor keyword independence :)

_______________________________________________
Ibm-netrexx mailing list
[hidden email]
Online Archive : http://ibm-netrexx.215625.n3.nabble.com/

ThSITC

Re: Keywords (was Re: AST, BNF, ANTLR)

Just to throw in my 0,0001 cents again:

To resolve those, and similar problems, I did introduce in my own Scanner & Parser a so called:

    token_type ::= giving the Type of a Token (Verb, ID, Var, Stem, Keyword, Method, etc)
    token_class ::= int, float, ...
    token_ID      ::= the actual token
    token_level   ::= the Level of the token (as determined by parenthesis, brackets, etc)
    token_spelling ::= the actual spelling used

    etc, etc

With this (enghanced) tokenizing scheme I am currently able to scan & parse NetRexx, ooRexx, classic Rexx,
PL/I, and COBOL, etc correctly, I think!

In addition, my parser is driven by *external definition files*, allowing to define SYNONYMS, for instance.

Ok, having said that, I'm also saying I am currently in progress to actually release all this stuff,
Open source, to KENAI (under a FAIR SHARE LICENCE, which is exactly *no open source* license.)

Maybe, later, this year, I can then contribute in more details with my approach taken.

Happy easter, anyway, Thomas.

PS: I have been very ill the past year, heart-infarct, very big depression for monthes, nowadays
undergoing a Grey Star operation of both Eyes, and I'm really *very sorry* I did too frequentyl
announce some release dates of some potential products of my shop ... Sorry, Last Time!

Please *do* accept me, when saying sorry! OK?

============================================================================

Am 28.03.2013 21:34, schrieb Bill Fenlason:

On 3/27/2013 6:58 AM, Mike Cowlishaw wrote:

Bill, sorry about the delay in reply. A lot of questions/comments here; mine in blue....

Thanks for giving it your full consideration - I realize you are busy.

I don't believe that I've misunderstood you, but I do believe that we have a disagreement about how best to handle the language (keyword) extensibility problem.

Or perhaps at cross purposes on some things.

Perhaps at the tactical level, but I think not at the strategic. Wouldn't you agree that the ability to quickly and effectively develop NetRexx source processing programs would benefit the usability and spread of the language?

You have described the problem quite clearly - in many programming languages breakage can occur when new keywords are added. In other words, old programs do not work as they originally did.

It's more than that .. it's also new programs that run on a newer interpreter. Scripts interpreted from the source of the program are much more vulnerable to language changes than executables that are compiled. That's my primary concern.

My point is that the NetRexx approach to distinguishing keywords from variable names has significant downsides.

Of course NetRexx 3 is not about to change - as the saying goes "It is what it is". I'm not advocating any change, although if a new Rexx dialect is developed I am advocating that the breakage problem be handled differently. This is a philosophical discussion, not a practical one.

I think it is important to point out that with careful design, a programming language can totally avoid the breakage problem. The best example is PL/I. In its 50 year history, PL/I has added many dozens of keywords to the language, but as far as I know, there has never been an instance of breakage. Why? Because keywords are never identified by examining them! In other words, tokens are are identified as keywords by syntax context rather than content, and that is why keywords are never confused with variables with the same name. The down side is that the language has lots of parens and commas, and sometimes an unnatural feel. I know that you are well aware of this Mike, but some other readers may not be.

Yes, and it is possible for PL/I due to one big difference from Rexx (and NetRexx) -- blanks are separators in PL/I, and can never be operators.   But the blank operator in Rexx is one of the features that keep it relatively notation-free.   I have some programs (e.g., wiki2html translation) where the blank operator is by far the most heavily used operator -- more often used than +, in particular.

The unrestricted use of the blank concatenate and direct abutment concatenate operators causes most of the "breakage possibilities" in NetRexx. I define a "breakage possibility" as every place in the language definition where both a keyword and a variable name can occur. (Think railroad track diagrams, etc.) Breakage possibilities are important because it is at those points that breakage may occur if a new keyword is added to the language. Every NetRexx situation in which an expression is followed by a keyword is subject to breakage, since the keyword may be a variable name after an implied concatenate operator.

In order to get an idea of how much of a problem this is in the real word, can you estimate how often in wiki2html an expression using an implied concatenate operator is followed by a keyword? Or if the source is available, could you point me to it so I can check it?

To put things in perspective, the implied concatenate operators are novel, convenient and in some situations very convenient. But they are nothing more than substitutions for "||' '||" and "||". Providing full keyword independence in NetRexx is much more important in my view. If the implied concatenate operators must be restricted in some situations to accomplish that goal, then I think they should be.

If NetRexx required that if an expression contains an implied concatenate operator and is followed by a keyword, the expression must be enclosed in parentheses at some level, there would be no need for the special cases regarding the non-statement level keywords. The keyword independent model could be used (assuming other NetRexx breakage possibilities were fixed).

For example, page 94 of the Language Definition says: "The expressions exprw or expru will be ended by either of the keywords while or until (unless the word is the name of a variable)." What it does not say is "For example, if 'while' was used as a variable name at any earlier point, the 'while' condition may not be specified." A NetRexx error is generated for "x=0; while = 3; loop while x = 1; end;" With full keyword independence, no explanation is necessary at all, since variable names can be the same as keywords. In addition, the above code would not be in error.

In Rexx, the attempt was apparently to allow at least partial keyword independence since "say = 42; say say;" was valid. (Robert, thanks for pointing this out in your earlier append.) In NetRexx it is not valid, and I think NetRexx would be a better language if it were. Of course that is not possible since NetRexx is, in essence, a reserved keyword language (because of how breakage is prevented). If true keyword independence were adopted by NetRexx, I think it would be a better and less confusing situation.

PL/I was famously ridiculed for its acceptance of the perfectly valid statement:
     "if if = then then then = else; else else = if;".
The mind sees "if", "then" and "else" as keywords and not variable names. The fact that the situation was a byproduct of avoiding the breakage problem was generally not acknowledged.

Yes, and of course something like this:

if file=bin then file=input; else file=list;

might well be part of a more real program (all the variable names there can be keywords in other contexts).

PL/I has full keyword independence but NetRexx does not. PL/I allows "put (put);" where the second "put" is a variable while NetRexx does not allow "say say;". Note that NetRexx with keyword independence would not require "say (say);" but would allow it. The parens in "put (put);" are required because of the complex nature of the PL/I "put" statement.

The crucial point is that in any language that avoids the breakage problem, the separation of keywords and variable names comes first. Then the keyword token in question is compared with the list of known keywords. If a token which is known to be a keyword is not within the list of known keywords (for that version of the language), it is an "invalid keyword" situation. It is not presumed to be a variable name.

Most other languages use the "reserved keyword" approach. Keywords are identified by comparing tokens with a list of words, and anything that matches is a keyword, anything that doesn't match is a variable name, and "never the twain shall meet". In that case, breakage will always occur in places where keywords and variables can occur in the same location.

As you know, NetRexx takes the opposite approach. It compares tokens with a dynamically computed list of variable and method names (i.e. everything that is not a keyword).

I would disagree with the word 'dynamic', here. Variables are statically computed (they have to be, because the NetRexx processor can emit Java -- whose variables are static). Method names only affect instruction keywords (i.e., first token in a clause and, in practice, when a sole symbol in that clause) and since (except for static methods in USES classes -- which I think was a mistake) they must be in the current class they are not dynamic in any sense.   Of course, method calls don't come into it at all if strictargs is in effect (more on that below).

Yes, "dynamic" is the wrong word (I misunderstood the detailed nature of the related translator processing).

I most certainly agree that allowing single word method names without "()" was ill-advised. It is the primary reason that the parsing of the NetRexx source requires access to the external execution environment. In addition, it is a significant breakage point since it allows the first word of a statement to be a variable name as well as a keyword.

If the search for variable names is limited to names contained in the source file, some of my objections go away. Unfortunately I think it is too late to "close the barn door" without a new NetRexx dialect or a version control mechanism.

To emphasize why this is important, consider the task of writing a NetRexx source processor that does not run in the Java environment, perhaps on a machine that does not even include Java. To be reasonably successful, it must only consider the input source, not anything external to it. Currently I don't see how that is possible (with the USES problem). Of course, some things can not be accomplished without the Java environment, such as checking the validity of external method calls, but assume the processor does not need to do that.

In addition, is there any inherent reason why NetRexx could not be used with languages other than Java? I'm not proficient enough in C++ to understand the pitfalls, but why couldn't a C++ version of NetRexx be considered? Perhaps a C++ programmer can comment?

If the token is not within that list, the token is judged to be a keyword. Then if the keyword is not within the list of known keywords, it is an "invalid keyword" situation. Thus NetRexx, like PL/I, avoids the breakage problem.

Agreed.

In my opinion, here are some downsides of the NetRexx approach.

First, as I tried to point out, by giving variable names priority over keywords, keywords may be overloaded. In my view, that is a bad idea for a language which strives for simplicity and low "astonishment" levels.

It is a natural tendency for programmers (particularly those with experience in other languages) to recognize keywords by content. In other words, when reading "options args", options is assumed to be a keyword. Allowing any other interpretation is simply confusing.

I find the reserved keywords approach to be more astonishing (we may have to agree to differ on that). And the PL/I approach -- which would mean more notations and losing the blank operator -- would make Rexx more a variant of PL/I than an advance on PL/I, perhaps.

Of course we could use different syntax to differentiate variables from keywords; many languages take that approach (e.g., EXEC 2, Tkl, etc.). I consider that approach too ugly and too wasteful.

I "agree to disagree" with the first point, and agree with the second. If I understand you correctly, I think you may be in the minority on the first point. I submit that most programmers see the statement "options args;" as the keyword "options" followed by an argument "args", and the statement "say greeting;" as the keyword "say" followed by the argument "greeting". If all of the source code in the world were analyzed, undoubtedly almost all of the non-assignment statements consist of a fixed keyword optionally followed by arguments or modifiers. That model is so pervasive that it is second nature to the majority of programmers. Is that not the case for you? (Of course, there is always Lisp, but even that follows the "FunctionName(argument list)" model - perhaps you see "options" as a fixed function? But the distinction between a user defined function and a keyword defined function isn't clear :)

Perhaps a clarification of my understanding of "reserved keyword", "keyword independent" and other languages would help. A "reserved keyword" language (like C or Java) is one in which keywords are always reserved to be used as keywords and can never be used as variable names. In these languages breakage may occur when a new keyword is added. A "keyword independent" language (like PL/I) is one in which keywords may be used as either a keyword or a variable name at any point. In general, breakage will not occur with these languages. Other languages (like NetRexx) are ones in which keywords are reserved some of the time. In Netrexx, keywords are reserved but if used as a variable name, the word can no longer be used as a keyword.

I don't know of any other language which uses the model that NetRexx uses. I think it may be confusing to some programmers that a variable name invalidates a keyword.

I don't agree concerning the "loss of the blank operator" or that keyword independence makes Rexx or NetRexx anything near a variant of PL/I.

What I am advocating is that implied concatenate operator use be restricted when the expression may be followed by a keyword. In those cases, recognition of implied concatenate operators is enabled only within a parenthesis level of one or greater. Thus the following are all valid:
"x = a + b c; x = a b + c; if x = a + b then nop; if (x = a + b c) then nop; if (x = a b + c) then nop; if (x = a'b' + c); if x = (a b) + c then nop;"

Of course the following are also valid:
"if x = (a + b c) then nop; if x = (a b + c) then nop; if (x = a + b) then nop; if (x = ((a b) + c)) then nop;"

but "if x = a + b c then nop;" would be invalid and "if x = a b + c then nop;" would be invalid.

Second, using a dynamic list of available variable names locks the program into its execution environment.

With the exception of static methods in USES classes when strictargs is off, I believe NetRexx syntax is independent of its execution environment. That loophole was unfortunate -- but of course it only applies in classes with USES classes specified which are rare.

As discussed above. Still a loophole which should be closed.

The example on page 79 of TRL contains two occurrences of "say 'hello' " in the same short program. The first is valid, and the second is an error. In my opinion, that is confusing and a bad idea. It is, of course, a byproduct of the way that NetRexx avoids the breakage problem.

But that example is deliberately chose to look confusing and silly, to make the point -- just like your PL/I example above.   'say' is so well known in NetRexx that few, if any, programmers would use 'say' as a variable name. However, there are more obscure keywords in NetRexx; someone might well want to call a variable 'label' or 'digits' for example, as in:

   label='fred'

   digits=3

and if they never use (and maybe never even learned about) those features, all is well and good.   This is particularly important for new programmers who can get very frustrated when trying to pick variable names in a language with many reserved words.

This is exactly why the PL/I model of keyword independence is preferable! While beginning PL/I programmers have to deal with the complexity of the language, they are never frustrated by the problem of variable names being keywords.

If NetRexx adopted the full keyword independence model, there would be no need for any discussion of the sub-keywords in the IF, LOOP and other statements. The issue could be described:

"Keywords in NetRexx are not reserved, and any variable may have the same name as a keyword. For clarity, the use of variable names which are the same as keywords is not recommended."

I believe that relatively few changes are needed to the NetRexx language to move it to the keyword independent model. Once an exhaustive determination of exactly what those changes are is made, then it can be determined if the changes damage the language more than keyword independence helps it.

Consider the following program:
/* NetRexx 4 */import some.package.pleaseexplainthisprogram
What does a person familiar with only NetRexx 3 make of this? Each of the words might be a method or a new keyword added in version 4 of the language.

Indeed, and that would be true of a future PL/I perhaps, too.   However, it would be rather unlikely that new instructions would be single words, so I would assume that all of those (except 'this' which is an error even now) must be method calls?

There's a good point here, however: the NetRexx ability to reduce notation by allowing a method call without specifying parentheses was probably a mistake, or at least the strictargs option should probably have been the default.   But that generally does not expose a program to later breakage because any method calls have to be in the same class. The exception to that, already mentioned, is static methods in USES classes which could be added at a later date -- that was definitely a mistake (but again it would be OK so long as parentheses were required).

In a future PL/I, because of keyword independence, the words in all of the above statements (assuming ";" added to each) must be keywords. All PL/I statements begin with a keyword unless (like the assignment statement) it can be differentiated by lookahead.

We agree that in NetRexx, these should not be method calls - the "()" is to be required for a single word method call. If NetRexx had keyword independence, all of these statements (including "this") would be new NetRexx keyword statements.

Note that the use of a variable name and a keyword as the first word of a statement is a breakage possibility which must be examined. As with the assignment statement, simple lookahead can determine if the word is a variable name So requiring "()" provides the necessary lookahead for a method call, just as "=" identifies an assignment.

Third, using a dynamic list of available variable names not only locks the program into its execution environment, it also locks any other program which attempts to correctly process a NetRexx source file into the execution environment.

(Same comments as above.)

That means than any formatter, pretty printer, statistics gatherer, intelligent editor etc. for NetRexx must also include the same logic that the translator uses. It must dynamically determine everything which is not a keyword to identify keywords. I think that is unfortunate since it makes the development of peripheral NetRexx processors more difficult or impossible.

This would be true if keywords depended on the environment, but I don't think they do except for the USES case.

Discussed above

Finally, it makes the language difficult, if not impossible, to define in a formal manner with BNF or another formal definition method. While some may feel this is actually an advantage (!), the truth is that it makes standardization difficult. Essentially all compilable programming languages have formal definitions.

Almost all languages (I would say all 'practical' languages) have to supplement BNF with prose; NetRexx is actually rather clean in that respect -- there is really just one exception rule to add to the pure BNF.

Does NetRexx actually have a "pure BNF" definition? I wasn't aware that had been done, and if it has I'm certainly interested in seeing it. What is the exception rule?

As a descriptive mechanism, BNF is extremely powerful, and a fully robust definition requires almost no prose to accompany it. The problem is that a fully robust definition can be huge and incredibly boring to generate. For example, it is easier to add "one or more of the following, without duplicates..." than it is to enumerate all the possibilities. In the case of the blank and abutment operators, the exact details of white space and significant blanks must be included rather than adding prose to describe it. The use of an extended BNF can help but not eliminate these description problems.

In addition, a BNF specification can describe ambiguous languages. It can describe unambiguous languages which can not be effectively parsed. There is a great difference between a "BNF specification" and a "practical and usable BNF specification".

The trick is to develop a grammar specification which fits one of the commonly used types such as LR(1), LL(1), LALR, etc. Not only can a specification like that be used to generate scanners, parsers and the like, it can also serve as a detailed spec to be used to check a hand written parser.

Note that in no way am I advocating that an automatically generated parser be used in the NetRexx translator!

Clearly hand written scanners and parsers can by just as effective as automatically generated ones, although experience has shown that they may be somewhat more error prone. What I am suggesting is that for those who are comfortable with generators, having an adaptable grammar which can be used to accurately parse NetRexx could shorten the time necessary to implement a NetRexx source processor. As it stands, having the variable names take priority over keywords (and the USES issue) substantially complicate the problem.

The overall problem of language versions is a complex one, since every change to a language in effect defines a new language. I believe the assumption that any future NetRexx processor should be able to correctly process every NetRexx program without knowing which version of the NetRexx language it is programmed in, is, (while laudable), not worth it if it requires the current NetRexx method of identifying keywords.

As I have suggested, I believe in adopting the approach that NetRexx programs should identify themselves. In HTML web pages, the very first thing is a DOCTYPE declaration of exactly what language the page is written in. I think the same approach could be adopted for NetRexx so that if at some later point the method of identifying keywords is changed, it could be accommodated. I suggest that the language level should be included in the initial comment or in an option which must be specified before the remainder of the program. Existing NetRexx programs would, of course, default to the current language version.

But this has exactly the same problem you just expressed concern about: every future compiler, interpreter, formatter, pretty printer, statistics gatherer, intelligent editor etc. for NetRexx must also include logic to handle previous levels of the language that remain supported.

I don't believe that is true if the keyword independence model is used. I was suggesting this as an alternate approach to the breakage problem.

As I said, you and I just disagree on this, Mike. I don't expect you to change anything, but I do hope you will give it some (more) serious thought.

With hindsight there is very little I would change in NetRexx (the strictargs matter is one, and maybe I'd drop USES entirely). The variables-override-keywords rule is extremely simple to understand and gets enormous 'bang for the buck' by making the language safely extendable in a way that Rexx (and almost all other languages) never achieved. PL/I is the nearest to NetRexx -- but at a cost that would make achieving the other goals of Rexx/NetRexx impossible.

All these things are tradeoffs, of course, and it took many iterations of language design before I got to NetRexx.



I still think the keyword independent model is preferable for breakage control, and that the variables-override-keywords approach costs more than it is worth. I realize you currently believe that making NetRexx fully keyword independent conflicts with other NetRexx goals, but I (humbly) suggest that issue might deserve reevaluation.

So we obviously should "agree to disagree" on these points. But if you do yet another iteration of the Rexx family, I hope you will consider this discussion.

Mike

Bill
PS Sorry about the redundancy of my comments - yes, I really do favor keyword independence :)
_______________________________________________
Ibm-netrexx mailing list
[hidden email]
Online Archive : http://ibm-netrexx.215625.n3.nabble.com/

--
Thomas Schneider, IT Consulting; http://www.thsitc.com; Vienna, Austria, Europe

_______________________________________________
Ibm-netrexx mailing list
[hidden email]
Online Archive : http://ibm-netrexx.215625.n3.nabble.com/

Thomas Schneider, Vienna, Austria (Europe) :-)

www.thsitc.com
www.db-123.com

George Hovey-2

Re: Keywords (was Re: AST, BNF, ANTLR)

In reply to this post by billfen

Bill,

Re C++ version of NetRexx, two problems occur to me.

Without the ability to interoperate with Java, NetRexx is just a reworking of Rexx, perhaps even inferior to the original.

You lose the predictability and safety of the JVM environment where, for example, all array references are checked for validity at run time. Thus the security of a program could depend on where it is run.

Without Java a vast amount of NetRexx's utility disappears.

On Thu, Mar 28, 2013 at 4:34 PM, Bill Fenlason <[hidden email]> wrote:

On 3/27/2013 6:58 AM, Mike Cowlishaw wrote:

Bill, sorry about the delay in reply. A lot of questions/comments here; mine in blue....

Thanks for giving it your full consideration - I realize you are busy.

I don't believe that I've misunderstood you, but I do believe that we have a disagreement about how best to handle the language (keyword) extensibility problem.

Or perhaps at cross purposes on some things.

Perhaps at the tactical level, but I think not at the strategic. Wouldn't you agree that the ability to quickly and effectively develop NetRexx source processing programs would benefit the usability and spread of the language?

You have described the problem quite clearly - in many programming languages breakage can occur when new keywords are added. In other words, old programs do not work as they originally did.

It's more than that .. it's also new programs that run on a newer interpreter. Scripts interpreted from the source of the program are much more vulnerable to language changes than executables that are compiled. That's my primary concern.

My point is that the NetRexx approach to distinguishing keywords from variable names has significant downsides.

Of course NetRexx 3 is not about to change - as the saying goes "It is what it is". I'm not advocating any change, although if a new Rexx dialect is developed I am advocating that the breakage problem be handled differently. This is a philosophical discussion, not a practical one.

I think it is important to point out that with careful design, a programming language can totally avoid the breakage problem. The best example is PL/I. In its 50 year history, PL/I has added many dozens of keywords to the language, but as far as I know, there has never been an instance of breakage. Why? Because keywords are never identified by examining them! In other words, tokens are are identified as keywords by syntax context rather than content, and that is why keywords are never confused with variables with the same name. The down side is that the language has lots of parens and commas, and sometimes an unnatural feel. I know that you are well aware of this Mike, but some other readers may not be.

Yes, and it is possible for PL/I due to one big difference from Rexx (and NetRexx) -- blanks are separators in PL/I, and can never be operators.   But the blank operator in Rexx is one of the features that keep it relatively notation-free.   I have some programs (e.g., wiki2html translation) where the blank operator is by far the most heavily used operator -- more often used than +, in particular.

The unrestricted use of the blank concatenate and direct abutment concatenate operators causes most of the "breakage possibilities" in NetRexx. I define a "breakage possibility" as every place in the language definition where both a keyword and a variable name can occur. (Think railroad track diagrams, etc.) Breakage possibilities are important because it is at those points that breakage may occur if a new keyword is added to the language. Every NetRexx situation in which an expression is followed by a keyword is subject to breakage, since the keyword may be a variable name after an implied concatenate operator.

In order to get an idea of how much of a problem this is in the real word, can you estimate how often in wiki2html an expression using an implied concatenate operator is followed by a keyword? Or if the source is available, could you point me to it so I can check it?

To put things in perspective, the implied concatenate operators are novel, convenient and in some situations very convenient. But they are nothing more than substitutions for "||' '||" and "||". Providing full keyword independence in NetRexx is much more important in my view. If the implied concatenate operators must be restricted in some situations to accomplish that goal, then I think they should be.

If NetRexx required that if an expression contains an implied concatenate operator and is followed by a keyword, the expression must be enclosed in parentheses at some level, there would be no need for the special cases regarding the non-statement level keywords. The keyword independent model could be used (assuming other NetRexx breakage possibilities were fixed).

For example, page 94 of the Language Definition says: "The expressions exprw or expru will be ended by either of the keywords while or until (unless the word is the name of a variable)." What it does not say is "For example, if 'while' was used as a variable name at any earlier point, the 'while' condition may not be specified." A NetRexx error is generated for "x=0; while = 3; loop while x = 1; end;" With full keyword independence, no explanation is necessary at all, since variable names can be the same as keywords. In addition, the above code would not be in error.

In Rexx, the attempt was apparently to allow at least partial keyword independence since "say = 42; say say;" was valid. (Robert, thanks for pointing this out in your earlier append.) In NetRexx it is not valid, and I think NetRexx would be a better language if it were. Of course that is not possible since NetRexx is, in essence, a reserved keyword language (because of how breakage is prevented). If true keyword independence were adopted by NetRexx, I think it would be a better and less confusing situation.

PL/I was famously ridiculed for its acceptance of the perfectly valid statement:
     "if if = then then then = else; else else = if;".
The mind sees "if", "then" and "else" as keywords and not variable names. The fact that the situation was a byproduct of avoiding the breakage problem was generally not acknowledged.

Yes, and of course something like this:

if file=bin then file=input; else file=list;

might well be part of a more real program (all the variable names there can be keywords in other contexts).

PL/I has full keyword independence but NetRexx does not. PL/I allows "put (put);" where the second "put" is a variable while NetRexx does not allow "say say;". Note that NetRexx with keyword independence would not require "say (say);" but would allow it. The parens in "put (put);" are required because of the complex nature of the PL/I "put" statement.

The crucial point is that in any language that avoids the breakage problem, the separation of keywords and variable names comes first. Then the keyword token in question is compared with the list of known keywords. If a token which is known to be a keyword is not within the list of known keywords (for that version of the language), it is an "invalid keyword" situation. It is not presumed to be a variable name.

Most other languages use the "reserved keyword" approach. Keywords are identified by comparing tokens with a list of words, and anything that matches is a keyword, anything that doesn't match is a variable name, and "never the twain shall meet". In that case, breakage will always occur in places where keywords and variables can occur in the same location.

As you know, NetRexx takes the opposite approach. It compares tokens with a dynamically computed list of variable and method names (i.e. everything that is not a keyword).

I would disagree with the word 'dynamic', here. Variables are statically computed (they have to be, because the NetRexx processor can emit Java -- whose variables are static). Method names only affect instruction keywords (i.e., first token in a clause and, in practice, when a sole symbol in that clause) and since (except for static methods in USES classes -- which I think was a mistake) they must be in the current class they are not dynamic in any sense.   Of course, method calls don't come into it at all if strictargs is in effect (more on that below).

Yes, "dynamic" is the wrong word (I misunderstood the detailed nature of the related translator processing).

I most certainly agree that allowing single word method names without "()" was ill-advised. It is the primary reason that the parsing of the NetRexx source requires access to the external execution environment. In addition, it is a significant breakage point since it allows the first word of a statement to be a variable name as well as a keyword.

If the search for variable names is limited to names contained in the source file, some of my objections go away. Unfortunately I think it is too late to "close the barn door" without a new NetRexx dialect or a version control mechanism.

To emphasize why this is important, consider the task of writing a NetRexx source processor that does not run in the Java environment, perhaps on a machine that does not even include Java. To be reasonably successful, it must only consider the input source, not anything external to it. Currently I don't see how that is possible (with the USES problem). Of course, some things can not be accomplished without the Java environment, such as checking the validity of external method calls, but assume the processor does not need to do that.

In addition, is there any inherent reason why NetRexx could not be used with languages other than Java? I'm not proficient enough in C++ to understand the pitfalls, but why couldn't a C++ version of NetRexx be considered? Perhaps a C++ programmer can comment?

If the token is not within that list, the token is judged to be a keyword. Then if the keyword is not within the list of known keywords, it is an "invalid keyword" situation. Thus NetRexx, like PL/I, avoids the breakage problem.

Agreed.

In my opinion, here are some downsides of the NetRexx approach.

First, as I tried to point out, by giving variable names priority over keywords, keywords may be overloaded. In my view, that is a bad idea for a language which strives for simplicity and low "astonishment" levels.

It is a natural tendency for programmers (particularly those with experience in other languages) to recognize keywords by content. In other words, when reading "options args", options is assumed to be a keyword. Allowing any other interpretation is simply confusing.

I find the reserved keywords approach to be more astonishing (we may have to agree to differ on that). And the PL/I approach -- which would mean more notations and losing the blank operator -- would make Rexx more a variant of PL/I than an advance on PL/I, perhaps.

Of course we could use different syntax to differentiate variables from keywords; many languages take that approach (e.g., EXEC 2, Tkl, etc.). I consider that approach too ugly and too wasteful.

I "agree to disagree" with the first point, and agree with the second. If I understand you correctly, I think you may be in the minority on the first point. I submit that most programmers see the statement "options args;" as the keyword "options" followed by an argument "args", and the statement "say greeting;" as the keyword "say" followed by the argument "greeting". If all of the source code in the world were analyzed, undoubtedly almost all of the non-assignment statements consist of a fixed keyword optionally followed by arguments or modifiers. That model is so pervasive that it is second nature to the majority of programmers. Is that not the case for you? (Of course, there is always Lisp, but even that follows the "FunctionName(argument list)" model - perhaps you see "options" as a fixed function? But the distinction between a user defined function and a keyword defined function isn't clear :)

Perhaps a clarification of my understanding of "reserved keyword", "keyword independent" and other languages would help. A "reserved keyword" language (like C or Java) is one in which keywords are always reserved to be used as keywords and can never be used as variable names. In these languages breakage may occur when a new keyword is added. A "keyword independent" language (like PL/I) is one in which keywords may be used as either a keyword or a variable name at any point. In general, breakage will not occur with these languages. Other languages (like NetRexx) are ones in which keywords are reserved some of the time. In Netrexx, keywords are reserved but if used as a variable name, the word can no longer be used as a keyword.

I don't know of any other language which uses the model that NetRexx uses. I think it may be confusing to some programmers that a variable name invalidates a keyword.

I don't agree concerning the "loss of the blank operator" or that keyword independence makes Rexx or NetRexx anything near a variant of PL/I.

What I am advocating is that implied concatenate operator use be restricted when the expression may be followed by a keyword. In those cases, recognition of implied concatenate operators is enabled only within a parenthesis level of one or greater. Thus the following are all valid:
"x = a + b c; x = a b + c; if x = a + b then nop; if (x = a + b c) then nop; if (x = a b + c) then nop; if (x = a'b' + c); if x = (a b) + c then nop;"

Of course the following are also valid:
"if x = (a + b c) then nop; if x = (a b + c) then nop; if (x = a + b) then nop; if (x = ((a b) + c)) then nop;"

but "if x = a + b c then nop;" would be invalid and "if x = a b + c then nop;" would be invalid.

Second, using a dynamic list of available variable names locks the program into its execution environment.

With the exception of static methods in USES classes when strictargs is off, I believe NetRexx syntax is independent of its execution environment. That loophole was unfortunate -- but of course it only applies in classes with USES classes specified which are rare.

As discussed above. Still a loophole which should be closed.

The example on page 79 of TRL contains two occurrences of "say 'hello' " in the same short program. The first is valid, and the second is an error. In my opinion, that is confusing and a bad idea. It is, of course, a byproduct of the way that NetRexx avoids the breakage problem.

But that example is deliberately chose to look confusing and silly, to make the point -- just like your PL/I example above.   'say' is so well known in NetRexx that few, if any, programmers would use 'say' as a variable name. However, there are more obscure keywords in NetRexx; someone might well want to call a variable 'label' or 'digits' for example, as in:

   label='fred'

   digits=3

and if they never use (and maybe never even learned about) those features, all is well and good.   This is particularly important for new programmers who can get very frustrated when trying to pick variable names in a language with many reserved words.

This is exactly why the PL/I model of keyword independence is preferable! While beginning PL/I programmers have to deal with the complexity of the language, they are never frustrated by the problem of variable names being keywords.

If NetRexx adopted the full keyword independence model, there would be no need for any discussion of the sub-keywords in the IF, LOOP and other statements. The issue could be described:

"Keywords in NetRexx are not reserved, and any variable may have the same name as a keyword. For clarity, the use of variable names which are the same as keywords is not recommended."

I believe that relatively few changes are needed to the NetRexx language to move it to the keyword independent model. Once an exhaustive determination of exactly what those changes are is made, then it can be determined if the changes damage the language more than keyword independence helps it.

Consider the following program:
/* NetRexx 4 */import some.package.pleaseexplainthisprogram
What does a person familiar with only NetRexx 3 make of this? Each of the words might be a method or a new keyword added in version 4 of the language.

Indeed, and that would be true of a future PL/I perhaps, too.   However, it would be rather unlikely that new instructions would be single words, so I would assume that all of those (except 'this' which is an error even now) must be method calls?

There's a good point here, however: the NetRexx ability to reduce notation by allowing a method call without specifying parentheses was probably a mistake, or at least the strictargs option should probably have been the default.   But that generally does not expose a program to later breakage because any method calls have to be in the same class. The exception to that, already mentioned, is static methods in USES classes which could be added at a later date -- that was definitely a mistake (but again it would be OK so long as parentheses were required).

In a future PL/I, because of keyword independence, the words in all of the above statements (assuming ";" added to each) must be keywords. All PL/I statements begin with a keyword unless (like the assignment statement) it can be differentiated by lookahead.

We agree that in NetRexx, these should not be method calls - the "()" is to be required for a single word method call. If NetRexx had keyword independence, all of these statements (including "this") would be new NetRexx keyword statements.

Note that the use of a variable name and a keyword as the first word of a statement is a breakage possibility which must be examined. As with the assignment statement, simple lookahead can determine if the word is a variable name So requiring "()" provides the necessary lookahead for a method call, just as "=" identifies an assignment.

Third, using a dynamic list of available variable names not only locks the program into its execution environment, it also locks any other program which attempts to correctly process a NetRexx source file into the execution environment.

(Same comments as above.)

That means than any formatter, pretty printer, statistics gatherer, intelligent editor etc. for NetRexx must also include the same logic that the translator uses. It must dynamically determine everything which is not a keyword to identify keywords. I think that is unfortunate since it makes the development of peripheral NetRexx processors more difficult or impossible.

This would be true if keywords depended on the environment, but I don't think they do except for the USES case.

Discussed above

Finally, it makes the language difficult, if not impossible, to define in a formal manner with BNF or another formal definition method. While some may feel this is actually an advantage (!), the truth is that it makes standardization difficult. Essentially all compilable programming languages have formal definitions.

Almost all languages (I would say all 'practical' languages) have to supplement BNF with prose; NetRexx is actually rather clean in that respect -- there is really just one exception rule to add to the pure BNF.

Does NetRexx actually have a "pure BNF" definition? I wasn't aware that had been done, and if it has I'm certainly interested in seeing it. What is the exception rule?

As a descriptive mechanism, BNF is extremely powerful, and a fully robust definition requires almost no prose to accompany it. The problem is that a fully robust definition can be huge and incredibly boring to generate. For example, it is easier to add "one or more of the following, without duplicates..." than it is to enumerate all the possibilities. In the case of the blank and abutment operators, the exact details of white space and significant blanks must be included rather than adding prose to describe it. The use of an extended BNF can help but not eliminate these description problems.

In addition, a BNF specification can describe ambiguous languages. It can describe unambiguous languages which can not be effectively parsed. There is a great difference between a "BNF specification" and a "practical and usable BNF specification".

The trick is to develop a grammar specification which fits one of the commonly used types such as LR(1), LL(1), LALR, etc. Not only can a specification like that be used to generate scanners, parsers and the like, it can also serve as a detailed spec to be used to check a hand written parser.

Note that in no way am I advocating that an automatically generated parser be used in the NetRexx translator!

Clearly hand written scanners and parsers can by just as effective as automatically generated ones, although experience has shown that they may be somewhat more error prone. What I am suggesting is that for those who are comfortable with generators, having an adaptable grammar which can be used to accurately parse NetRexx could shorten the time necessary to implement a NetRexx source processor. As it stands, having the variable names take priority over keywords (and the USES issue) substantially complicate the problem.

The overall problem of language versions is a complex one, since every change to a language in effect defines a new language. I believe the assumption that any future NetRexx processor should be able to correctly process every NetRexx program without knowing which version of the NetRexx language it is programmed in, is, (while laudable), not worth it if it requires the current NetRexx method of identifying keywords.

As I have suggested, I believe in adopting the approach that NetRexx programs should identify themselves. In HTML web pages, the very first thing is a DOCTYPE declaration of exactly what language the page is written in. I think the same approach could be adopted for NetRexx so that if at some later point the method of identifying keywords is changed, it could be accommodated. I suggest that the language level should be included in the initial comment or in an option which must be specified before the remainder of the program. Existing NetRexx programs would, of course, default to the current language version.

But this has exactly the same problem you just expressed concern about: every future compiler, interpreter, formatter, pretty printer, statistics gatherer, intelligent editor etc. for NetRexx must also include logic to handle previous levels of the language that remain supported.

I don't believe that is true if the keyword independence model is used. I was suggesting this as an alternate approach to the breakage problem.

As I said, you and I just disagree on this, Mike. I don't expect you to change anything, but I do hope you will give it some (more) serious thought.

With hindsight there is very little I would change in NetRexx (the strictargs matter is one, and maybe I'd drop USES entirely). The variables-override-keywords rule is extremely simple to understand and gets enormous 'bang for the buck' by making the language safely extendable in a way that Rexx (and almost all other languages) never achieved. PL/I is the nearest to NetRexx -- but at a cost that would make achieving the other goals of Rexx/NetRexx impossible.

All these things are tradeoffs, of course, and it took many iterations of language design before I got to NetRexx.



I still think the keyword independent model is preferable for breakage control, and that the variables-override-keywords approach costs more than it is worth. I realize you currently believe that making NetRexx fully keyword independent conflicts with other NetRexx goals, but I (humbly) suggest that issue might deserve reevaluation.

So we obviously should "agree to disagree" on these points. But if you do yet another iteration of the Rexx family, I hope you will consider this discussion.

Mike

Bill
PS Sorry about the redundancy of my comments - yes, I really do favor keyword independence :)

_______________________________________________
Ibm-netrexx mailing list
[hidden email]
Online Archive : http://ibm-netrexx.215625.n3.nabble.com/

--
"One can live magnificently in this world if one knows how to work and how to love." -- Leo Tolstoy
_______________________________________________
Ibm-netrexx mailing list
[hidden email]
Online Archive : http://ibm-netrexx.215625.n3.nabble.com/

billfen

Re: Keywords (was Re: AST, BNF, ANTLR)

On 3/29/2013 11:14 AM, George Hovey wrote:

Bill,

Re C++ version of NetRexx, two problems occur to me.

Without the ability to interoperate with Java, NetRexx is just a reworking of Rexx, perhaps even inferior to the original.

You lose the predictability and safety of the JVM environment where, for example, all array references are checked for validity at run time. Thus the security of a program could depend on where it is run.

Without Java a vast amount of NetRexx's utility disappears.

George,

Thanks for the insights.

I wasn't sure if the Class and Method structure of NetRexx was compatible with those of C++ and that C++ methods were callable from some form of NetRexx output. Are you telling me that they are not?

Bill

_______________________________________________
Ibm-netrexx mailing list
[hidden email]
Online Archive : http://ibm-netrexx.215625.n3.nabble.com/

ThSITC

Re: Keywords (was Re: AST, BNF, ANTLR)

Sorry that I do have to *interrupt* you all, again:

1.) For sure, C# from Microsoft, did steal a lot of *ideas* form (originally SUN's) Java 1.0
2.) For sure, Java *has been influenced* --- to much --- by existing *C* and *C++* Syntax and
Semantics.
3.) The original *author* and *creator* of Java has been forced by *SUN commercial management*
to *not invite a totally new language*, what he & his team did originally *plan to do* ... :-(
4.) Mike F. Cowlishaw (shortly called MFC, as you all do know, for sure) got notice about
what's happening in SUN Laboratories, those times, due to his JOB at IBM, and simply

did invent *NetRexx*, as a New Language.

Rest is history, or her-story, or whatever!

For sure, anybody interested shall be able to *generate* C++, or plain C, or even C# code,
when somebody wants to do so .... C# has been stolen from Java, and Java has been *much to much*
influenced by C++, and C, syntax and semantics!

Full stop from my side!
Thomas.
*********************************************************************************************************
PS: The *real problem*, as I think, is, that the *potential powers* of NetRexx are not recognized
at all, by the community world of programmers.

Look at the success of PHP, etc, etc, ...
We shall learn a bit, all together, how to make an open source project a success!

But, for sure, Kermit and Rene are doing their best!
Thanks, and happy Easter, again ... :-)======
================================================================
Thomas Schneider, Author of www.Rexx2Nrx.com, back in the Year 2000-2001, or so!
****************************************************************************************************
Have FUN running *classic Rexx* programs on the JVM (when you want to do that!)
========================================================================

Am 29.03.2013 16:19, schrieb Bill Fenlason:

On 3/29/2013 11:14 AM, George Hovey wrote:

Bill,

Re C++ version of NetRexx, two problems occur to me.

Without the ability to interoperate with Java, NetRexx is just a reworking of Rexx, perhaps even inferior to the original.

You lose the predictability and safety of the JVM environment where, for example, all array references are checked for validity at run time. Thus the security of a program could depend on where it is run.

Without Java a vast amount of NetRexx's utility disappears.

George,

Thanks for the insights.

I wasn't sure if the Class and Method structure of NetRexx was compatible with those of C++ and that C++ methods were callable from some form of NetRexx output. Are you telling me that they are not?

Bill
_______________________________________________
Ibm-netrexx mailing list
[hidden email]
Online Archive : http://ibm-netrexx.215625.n3.nabble.com/

--
Thomas Schneider, IT Consulting; http://www.thsitc.com; Vienna, Austria, Europe

_______________________________________________
Ibm-netrexx mailing list
[hidden email]
Online Archive : http://ibm-netrexx.215625.n3.nabble.com/

Thomas Schneider, Vienna, Austria (Europe) :-)

www.thsitc.com
www.db-123.com

ThSITC

Re: Keywords (was Re: AST, BNF, ANTLR)

In reply to this post by billfen

Hi Bill, and all,

*for sure*, when all the *builtin* coupling of *scanning*, *Parsing*, *Analysis*, and *Code Generation*, and/or
*Interpretation*, which is currently *always coupled* in one and only one *Class* by *Statement/Instruction Type*
shall be *uncoupled*, step, by step ....

Then:

Adding a new C# Generator (*or* even a PL/I Generator) to the NetRexx Language shall be *peanuts*!

As I do see it, at least currently, MFC did simply put in too many details of too many steps of parsing, interpretation, and/or compiling into the *same CLASSES*.

Might have been nice, for initial boots-trapping of his approach, but is +a bit un-comfortable+ for adding
new language features, as well as Generators for new Host Languages ....

(Only my personal viewpoint again, *please*, my friends, here!)

Ok?
Thomas.
==========================================================================
Am 29.03.2013 16:19, schrieb Bill Fenlason:

On 3/29/2013 11:14 AM, George Hovey wrote:

Bill,

Re C++ version of NetRexx, two problems occur to me.

Without the ability to interoperate with Java, NetRexx is just a reworking of Rexx, perhaps even inferior to the original.

You lose the predictability and safety of the JVM environment where, for example, all array references are checked for validity at run time. Thus the security of a program could depend on where it is run.

Without Java a vast amount of NetRexx's utility disappears.

George,

Thanks for the insights.

I wasn't sure if the Class and Method structure of NetRexx was compatible with those of C++ and that C++ methods were callable from some form of NetRexx output. Are you telling me that they are not?

Bill
_______________________________________________
Ibm-netrexx mailing list
[hidden email]
Online Archive : http://ibm-netrexx.215625.n3.nabble.com/

--
Thomas Schneider, IT Consulting; http://www.thsitc.com; Vienna, Austria, Europe

_______________________________________________
Ibm-netrexx mailing list
[hidden email]
Online Archive : http://ibm-netrexx.215625.n3.nabble.com/

Thomas Schneider, Vienna, Austria (Europe) :-)

www.thsitc.com
www.db-123.com

ThSITC

Re: Keywords (was Re: AST, BNF, ANTLR)

In reply to this post by George Hovey-2

Sorry; Gerorge, I think you are again *wrong*!

Of course, a *C++*, or even *C#* implementation of *NetRexx* shall have to do any and all
checks currently done by either the NetRexxC compiler&interpreter, itself, *or* the major
run-time classes, namely Rexx.nrx, and RexxUtil.nrx, as far as I do know, at this minute

But, before telling to much criticism, I shall for sure *go ahead*, and release, what I did!

Thus: Happy Easter, all!
Bye, 4 now!
=============================================================
Am 29.03.2013 16:14, schrieb George Hovey:

Bill,

Re C++ version of NetRexx, two problems occur to me.

Without the ability to interoperate with Java, NetRexx is just a reworking of Rexx, perhaps even inferior to the original.

You lose the predictability and safety of the JVM environment where, for example, all array references are checked for validity at run time. Thus the security of a program could depend on where it is run.

Without Java a vast amount of NetRexx's utility disappears.

On Thu, Mar 28, 2013 at 4:34 PM, Bill Fenlason <[hidden email]> wrote:

On 3/27/2013 6:58 AM, Mike Cowlishaw wrote:

Bill, sorry about the delay in reply. A lot of questions/comments here; mine in blue....

Thanks for giving it your full consideration - I realize you are busy.

I don't believe that I've misunderstood you, but I do believe that we have a disagreement about how best to handle the language (keyword) extensibility problem.

Or perhaps at cross purposes on some things.

Perhaps at the tactical level, but I think not at the strategic. Wouldn't you agree that the ability to quickly and effectively develop NetRexx source processing programs would benefit the usability and spread of the language?

You have described the problem quite clearly - in many programming languages breakage can occur when new keywords are added. In other words, old programs do not work as they originally did.

It's more than that .. it's also new programs that run on a newer interpreter. Scripts interpreted from the source of the program are much more vulnerable to language changes than executables that are compiled. That's my primary concern.

My point is that the NetRexx approach to distinguishing keywords from variable names has significant downsides.

Of course NetRexx 3 is not about to change - as the saying goes "It is what it is". I'm not advocating any change, although if a new Rexx dialect is developed I am advocating that the breakage problem be handled differently. This is a philosophical discussion, not a practical one.

I think it is important to point out that with careful design, a programming language can totally avoid the breakage problem. The best example is PL/I. In its 50 year history, PL/I has added many dozens of keywords to the language, but as far as I know, there has never been an instance of breakage. Why? Because keywords are never identified by examining them! In other words, tokens are are identified as keywords by syntax context rather than content, and that is why keywords are never confused with variables with the same name. The down side is that the language has lots of parens and commas, and sometimes an unnatural feel. I know that you are well aware of this Mike, but some other readers may not be.

Yes, and it is possible for PL/I due to one big difference from Rexx (and NetRexx) -- blanks are separators in PL/I, and can never be operators.   But the blank operator in Rexx is one of the features that keep it relatively notation-free.   I have some programs (e.g., wiki2html translation) where the blank operator is by far the most heavily used operator -- more often used than +, in particular.

The unrestricted use of the blank concatenate and direct abutment concatenate operators causes most of the "breakage possibilities" in NetRexx. I define a "breakage possibility" as every place in the language definition where both a keyword and a variable name can occur. (Think railroad track diagrams, etc.) Breakage possibilities are important because it is at those points that breakage may occur if a new keyword is added to the language. Every NetRexx situation in which an expression is followed by a keyword is subject to breakage, since the keyword may be a variable name after an implied concatenate operator.

In order to get an idea of how much of a problem this is in the real word, can you estimate how often in wiki2html an expression using an implied concatenate operator is followed by a keyword? Or if the source is available, could you point me to it so I can check it?

To put things in perspective, the implied concatenate operators are novel, convenient and in some situations very convenient. But they are nothing more than substitutions for "||' '||" and "||". Providing full keyword independence in NetRexx is much more important in my view. If the implied concatenate operators must be restricted in some situations to accomplish that goal, then I think they should be.

If NetRexx required that if an expression contains an implied concatenate operator and is followed by a keyword, the expression must be enclosed in parentheses at some level, there would be no need for the special cases regarding the non-statement level keywords. The keyword independent model could be used (assuming other NetRexx breakage possibilities were fixed).

For example, page 94 of the Language Definition says: "The expressions exprw or expru will be ended by either of the keywords while or until (unless the word is the name of a variable)." What it does not say is "For example, if 'while' was used as a variable name at any earlier point, the 'while' condition may not be specified." A NetRexx error is generated for "x=0; while = 3; loop while x = 1; end;" With full keyword independence, no explanation is necessary at all, since variable names can be the same as keywords. In addition, the above code would not be in error.

In Rexx, the attempt was apparently to allow at least partial keyword independence since "say = 42; say say;" was valid. (Robert, thanks for pointing this out in your earlier append.) In NetRexx it is not valid, and I think NetRexx would be a better language if it were. Of course that is not possible since NetRexx is, in essence, a reserved keyword language (because of how breakage is prevented). If true keyword independence were adopted by NetRexx, I think it would be a better and less confusing situation.

PL/I was famously ridiculed for its acceptance of the perfectly valid statement:
     "if if = then then then = else; else else = if;".
The mind sees "if", "then" and "else" as keywords and not variable names. The fact that the situation was a byproduct of avoiding the breakage problem was generally not acknowledged.

Yes, and of course something like this:

if file=bin then file=input; else file=list;

might well be part of a more real program (all the variable names there can be keywords in other contexts).

PL/I has full keyword independence but NetRexx does not. PL/I allows "put (put);" where the second "put" is a variable while NetRexx does not allow "say say;". Note that NetRexx with keyword independence would not require "say (say);" but would allow it. The parens in "put (put);" are required because of the complex nature of the PL/I "put" statement.

The crucial point is that in any language that avoids the breakage problem, the separation of keywords and variable names comes first. Then the keyword token in question is compared with the list of known keywords. If a token which is known to be a keyword is not within the list of known keywords (for that version of the language), it is an "invalid keyword" situation. It is not presumed to be a variable name.

Most other languages use the "reserved keyword" approach. Keywords are identified by comparing tokens with a list of words, and anything that matches is a keyword, anything that doesn't match is a variable name, and "never the twain shall meet". In that case, breakage will always occur in places where keywords and variables can occur in the same location.

As you know, NetRexx takes the opposite approach. It compares tokens with a dynamically computed list of variable and method names (i.e. everything that is not a keyword).

I would disagree with the word 'dynamic', here. Variables are statically computed (they have to be, because the NetRexx processor can emit Java -- whose variables are static). Method names only affect instruction keywords (i.e., first token in a clause and, in practice, when a sole symbol in that clause) and since (except for static methods in USES classes -- which I think was a mistake) they must be in the current class they are not dynamic in any sense.   Of course, method calls don't come into it at all if strictargs is in effect (more on that below).

Yes, "dynamic" is the wrong word (I misunderstood the detailed nature of the related translator processing).

I most certainly agree that allowing single word method names without "()" was ill-advised. It is the primary reason that the parsing of the NetRexx source requires access to the external execution environment. In addition, it is a significant breakage point since it allows the first word of a statement to be a variable name as well as a keyword.

If the search for variable names is limited to names contained in the source file, some of my objections go away. Unfortunately I think it is too late to "close the barn door" without a new NetRexx dialect or a version control mechanism.

To emphasize why this is important, consider the task of writing a NetRexx source processor that does not run in the Java environment, perhaps on a machine that does not even include Java. To be reasonably successful, it must only consider the input source, not anything external to it. Currently I don't see how that is possible (with the USES problem). Of course, some things can not be accomplished without the Java environment, such as checking the validity of external method calls, but assume the processor does not need to do that.

In addition, is there any inherent reason why NetRexx could not be used with languages other than Java? I'm not proficient enough in C++ to understand the pitfalls, but why couldn't a C++ version of NetRexx be considered? Perhaps a C++ programmer can comment?

If the token is not within that list, the token is judged to be a keyword. Then if the keyword is not within the list of known keywords, it is an "invalid keyword" situation. Thus NetRexx, like PL/I, avoids the breakage problem.

Agreed.

In my opinion, here are some downsides of the NetRexx approach.

First, as I tried to point out, by giving variable names priority over keywords, keywords may be overloaded. In my view, that is a bad idea for a language which strives for simplicity and low "astonishment" levels.

It is a natural tendency for programmers (particularly those with experience in other languages) to recognize keywords by content. In other words, when reading "options args", options is assumed to be a keyword. Allowing any other interpretation is simply confusing.

I find the reserved keywords approach to be more astonishing (we may have to agree to differ on that). And the PL/I approach -- which would mean more notations and losing the blank operator -- would make Rexx more a variant of PL/I than an advance on PL/I, perhaps.

Of course we could use different syntax to differentiate variables from keywords; many languages take that approach (e.g., EXEC 2, Tkl, etc.). I consider that approach too ugly and too wasteful.

I "agree to disagree" with the first point, and agree with the second. If I understand you correctly, I think you may be in the minority on the first point. I submit that most programmers see the statement "options args;" as the keyword "options" followed by an argument "args", and the statement "say greeting;" as the keyword "say" followed by the argument "greeting". If all of the source code in the world were analyzed, undoubtedly almost all of the non-assignment statements consist of a fixed keyword optionally followed by arguments or modifiers. That model is so pervasive that it is second nature to the majority of programmers. Is that not the case for you? (Of course, there is always Lisp, but even that follows the "FunctionName(argument list)" model - perhaps you see "options" as a fixed function? But the distinction between a user defined function and a keyword defined function isn't clear :)

Perhaps a clarification of my understanding of "reserved keyword", "keyword independent" and other languages would help. A "reserved keyword" language (like C or Java) is one in which keywords are always reserved to be used as keywords and can never be used as variable names. In these languages breakage may occur when a new keyword is added. A "keyword independent" language (like PL/I) is one in which keywords may be used as either a keyword or a variable name at any point. In general, breakage will not occur with these languages. Other languages (like NetRexx) are ones in which keywords are reserved some of the time. In Netrexx, keywords are reserved but if used as a variable name, the word can no longer be used as a keyword.

I don't know of any other language which uses the model that NetRexx uses. I think it may be confusing to some programmers that a variable name invalidates a keyword.

I don't agree concerning the "loss of the blank operator" or that keyword independence makes Rexx or NetRexx anything near a variant of PL/I.

What I am advocating is that implied concatenate operator use be restricted when the expression may be followed by a keyword. In those cases, recognition of implied concatenate operators is enabled only within a parenthesis level of one or greater. Thus the following are all valid:
"x = a + b c; x = a b + c; if x = a + b then nop; if (x = a + b c) then nop; if (x = a b + c) then nop; if (x = a'b' + c); if x = (a b) + c then nop;"

Of course the following are also valid:
"if x = (a + b c) then nop; if x = (a b + c) then nop; if (x = a + b) then nop; if (x = ((a b) + c)) then nop;"

but "if x = a + b c then nop;" would be invalid and "if x = a b + c then nop;" would be invalid.

Second, using a dynamic list of available variable names locks the program into its execution environment.

With the exception of static methods in USES classes when strictargs is off, I believe NetRexx syntax is independent of its execution environment. That loophole was unfortunate -- but of course it only applies in classes with USES classes specified which are rare.

As discussed above. Still a loophole which should be closed.

The example on page 79 of TRL contains two occurrences of "say 'hello' " in the same short program. The first is valid, and the second is an error. In my opinion, that is confusing and a bad idea. It is, of course, a byproduct of the way that NetRexx avoids the breakage problem.

But that example is deliberately chose to look confusing and silly, to make the point -- just like your PL/I example above.   'say' is so well known in NetRexx that few, if any, programmers would use 'say' as a variable name. However, there are more obscure keywords in NetRexx; someone might well want to call a variable 'label' or 'digits' for example, as in:

   label='fred'

   digits=3

and if they never use (and maybe never even learned about) those features, all is well and good.   This is particularly important for new programmers who can get very frustrated when trying to pick variable names in a language with many reserved words.

This is exactly why the PL/I model of keyword independence is preferable! While beginning PL/I programmers have to deal with the complexity of the language, they are never frustrated by the problem of variable names being keywords.

If NetRexx adopted the full keyword independence model, there would be no need for any discussion of the sub-keywords in the IF, LOOP and other statements. The issue could be described:

"Keywords in NetRexx are not reserved, and any variable may have the same name as a keyword. For clarity, the use of variable names which are the same as keywords is not recommended."

I believe that relatively few changes are needed to the NetRexx language to move it to the keyword independent model. Once an exhaustive determination of exactly what those changes are is made, then it can be determined if the changes damage the language more than keyword independence helps it.

Consider the following program:
/* NetRexx 4 */import some.package.pleaseexplainthisprogram
What does a person familiar with only NetRexx 3 make of this? Each of the words might be a method or a new keyword added in version 4 of the language.

Indeed, and that would be true of a future PL/I perhaps, too.   However, it would be rather unlikely that new instructions would be single words, so I would assume that all of those (except 'this' which is an error even now) must be method calls?

There's a good point here, however: the NetRexx ability to reduce notation by allowing a method call without specifying parentheses was probably a mistake, or at least the strictargs option should probably have been the default.   But that generally does not expose a program to later breakage because any method calls have to be in the same class. The exception to that, already mentioned, is static methods in USES classes which could be added at a later date -- that was definitely a mistake (but again it would be OK so long as parentheses were required).

In a future PL/I, because of keyword independence, the words in all of the above statements (assuming ";" added to each) must be keywords. All PL/I statements begin with a keyword unless (like the assignment statement) it can be differentiated by lookahead.

We agree that in NetRexx, these should not be method calls - the "()" is to be required for a single word method call. If NetRexx had keyword independence, all of these statements (including "this") would be new NetRexx keyword statements.

Note that the use of a variable name and a keyword as the first word of a statement is a breakage possibility which must be examined. As with the assignment statement, simple lookahead can determine if the word is a variable name So requiring "()" provides the necessary lookahead for a method call, just as "=" identifies an assignment.

Third, using a dynamic list of available variable names not only locks the program into its execution environment, it also locks any other program which attempts to correctly process a NetRexx source file into the execution environment.

(Same comments as above.)

That means than any formatter, pretty printer, statistics gatherer, intelligent editor etc. for NetRexx must also include the same logic that the translator uses. It must dynamically determine everything which is not a keyword to identify keywords. I think that is unfortunate since it makes the development of peripheral NetRexx processors more difficult or impossible.

This would be true if keywords depended on the environment, but I don't think they do except for the USES case.

Discussed above

Finally, it makes the language difficult, if not impossible, to define in a formal manner with BNF or another formal definition method. While some may feel this is actually an advantage (!), the truth is that it makes standardization difficult. Essentially all compilable programming languages have formal definitions.

Almost all languages (I would say all 'practical' languages) have to supplement BNF with prose; NetRexx is actually rather clean in that respect -- there is really just one exception rule to add to the pure BNF.

Does NetRexx actually have a "pure BNF" definition? I wasn't aware that had been done, and if it has I'm certainly interested in seeing it. What is the exception rule?

As a descriptive mechanism, BNF is extremely powerful, and a fully robust definition requires almost no prose to accompany it. The problem is that a fully robust definition can be huge and incredibly boring to generate. For example, it is easier to add "one or more of the following, without duplicates..." than it is to enumerate all the possibilities. In the case of the blank and abutment operators, the exact details of white space and significant blanks must be included rather than adding prose to describe it. The use of an extended BNF can help but not eliminate these description problems.

In addition, a BNF specification can describe ambiguous languages. It can describe unambiguous languages which can not be effectively parsed. There is a great difference between a "BNF specification" and a "practical and usable BNF specification".

The trick is to develop a grammar specification which fits one of the commonly used types such as LR(1), LL(1), LALR, etc. Not only can a specification like that be used to generate scanners, parsers and the like, it can also serve as a detailed spec to be used to check a hand written parser.

Note that in no way am I advocating that an automatically generated parser be used in the NetRexx translator!

Clearly hand written scanners and parsers can by just as effective as automatically generated ones, although experience has shown that they may be somewhat more error prone. What I am suggesting is that for those who are comfortable with generators, having an adaptable grammar which can be used to accurately parse NetRexx could shorten the time necessary to implement a NetRexx source processor. As it stands, having the variable names take priority over keywords (and the USES issue) substantially complicate the problem.

The overall problem of language versions is a complex one, since every change to a language in effect defines a new language. I believe the assumption that any future NetRexx processor should be able to correctly process every NetRexx program without knowing which version of the NetRexx language it is programmed in, is, (while laudable), not worth it if it requires the current NetRexx method of identifying keywords.

As I have suggested, I believe in adopting the approach that NetRexx programs should identify themselves. In HTML web pages, the very first thing is a DOCTYPE declaration of exactly what language the page is written in. I think the same approach could be adopted for NetRexx so that if at some later point the method of identifying keywords is changed, it could be accommodated. I suggest that the language level should be included in the initial comment or in an option which must be specified before the remainder of the program. Existing NetRexx programs would, of course, default to the current language version.

But this has exactly the same problem you just expressed concern about: every future compiler, interpreter, formatter, pretty printer, statistics gatherer, intelligent editor etc. for NetRexx must also include logic to handle previous levels of the language that remain supported.

I don't believe that is true if the keyword independence model is used. I was suggesting this as an alternate approach to the breakage problem.

As I said, you and I just disagree on this, Mike. I don't expect you to change anything, but I do hope you will give it some (more) serious thought.

With hindsight there is very little I would change in NetRexx (the strictargs matter is one, and maybe I'd drop USES entirely). The variables-override-keywords rule is extremely simple to understand and gets enormous 'bang for the buck' by making the language safely extendable in a way that Rexx (and almost all other languages) never achieved. PL/I is the nearest to NetRexx -- but at a cost that would make achieving the other goals of Rexx/NetRexx impossible.

All these things are tradeoffs, of course, and it took many iterations of language design before I got to NetRexx.



I still think the keyword independent model is preferable for breakage control, and that the variables-override-keywords approach costs more than it is worth. I realize you currently believe that making NetRexx fully keyword independent conflicts with other NetRexx goals, but I (humbly) suggest that issue might deserve reevaluation.

So we obviously should "agree to disagree" on these points. But if you do yet another iteration of the Rexx family, I hope you will consider this discussion.

Mike

Bill
PS Sorry about the redundancy of my comments - yes, I really do favor keyword independence :)

_______________________________________________
Ibm-netrexx mailing list
[hidden email]
Online Archive : http://ibm-netrexx.215625.n3.nabble.com/

--
"One can live magnificently in this world if one knows how to work and how to love." -- Leo Tolstoy
_______________________________________________
Ibm-netrexx mailing list
[hidden email]
Online Archive : http://ibm-netrexx.215625.n3.nabble.com/

--
Thomas Schneider, IT Consulting; http://www.thsitc.com; Vienna, Austria, Europe

_______________________________________________
Ibm-netrexx mailing list
[hidden email]
Online Archive : http://ibm-netrexx.215625.n3.nabble.com/

Thomas Schneider, Vienna, Austria (Europe) :-)

www.thsitc.com
www.db-123.com

George Hovey-2

Re: Keywords (was Re: AST, BNF, ANTLR)

In reply to this post by billfen

Bill,

Definitely not knowledgeable about C++. However, it's hard to see how it could interface Java classes without some facility that compiled Java byte code.

In the early days of Java it was perceived as an interpreted language (i.e. inefficient) with time-wasting behaviors like array bounds checking. It was widely assumed that these objections would be removed by compiling to native code on the target platform, and by making bounds checking an option ("after all, I only need it until the program is debugged'). Neither of these 'fixes' transpired.

If Java ever was interpreted it certainly isn't now. The Sun/Oracle 'hotspot' JVM compiles frequently executed code into native code which has the advantage of keeping all JVM policies in effect (can't be circumvented by renegade compilers). You can find internet sources that assert Java surpasses C++ in efficiency.

And Sun firmly nixed optional bounds checking, I think because it would be a fatal security flaw.

Java was designed with complete awareness of C++ and in the view of many, cured bad design decisions of that language. Too bad they didn't ditch the C syntax, but we have NetRexx to deal with that. ;-)

What do you feel would be the attraction of a C++ NetRexx?

On Fri, Mar 29, 2013 at 11:19 AM, Bill Fenlason <[hidden email]> wrote:

On 3/29/2013 11:14 AM, George Hovey wrote:

Bill,

Re C++ version of NetRexx, two problems occur to me.

Without the ability to interoperate with Java, NetRexx is just a reworking of Rexx, perhaps even inferior to the original.

You lose the predictability and safety of the JVM environment where, for example, all array references are checked for validity at run time. Thus the security of a program could depend on where it is run.

Without Java a vast amount of NetRexx's utility disappears.

George,

Thanks for the insights.

I wasn't sure if the Class and Method structure of NetRexx was compatible with those of C++ and that C++ methods were callable from some form of NetRexx output. Are you telling me that they are not?

Bill

_______________________________________________
Ibm-netrexx mailing list
[hidden email]
Online Archive : http://ibm-netrexx.215625.n3.nabble.com/

billfen

Re: Keywords (was Re: AST, BNF, ANTLR)

George,

What I said was:

"In addition, is there any inherent reason why NetRexx could not be used with languages other than Java? I'm not proficient enough in C++ to understand the pitfalls, but why couldn't a C++ version of NetRexx be considered? Perhaps a C++ programmer can comment?"

I was trying to make the point that being able to parse NetRexx outside of a Java environment is important, and perhaps more important that currently understood. I don't know if NetRexx could have application as a general purpose Object Oriented language or not. But if so, being able to processes it in unexpected places is important.

I also feel that NetRexx is more a compiled language than an interpreted one. (Probably Mike and I will disagree on that one too :)

Bill

On 3/29/2013 2:08 PM, George Hovey wrote:

Bill,

Definitely not knowledgeable about C++. However, it's hard to see how it could interface Java classes without some facility that compiled Java byte code.

In the early days of Java it was perceived as an interpreted language (i.e. inefficient) with time-wasting behaviors like array bounds checking. It was widely assumed that these objections would be removed by compiling to native code on the target platform, and by making bounds checking an option ("after all, I only need it until the program is debugged'). Neither of these 'fixes' transpired.

If Java ever was interpreted it certainly isn't now. The Sun/Oracle 'hotspot' JVM compiles frequently executed code into native code which has the advantage of keeping all JVM policies in effect (can't be circumvented by renegade compilers). You can find internet sources that assert Java surpasses C++ in efficiency.

And Sun firmly nixed optional bounds checking, I think because it would be a fatal security flaw.

Java was designed with complete awareness of C++ and in the view of many, cured bad design decisions of that language. Too bad they didn't ditch the C syntax, but we have NetRexx to deal with that. ;-)

What do you feel would be the attraction of a C++ NetRexx?

On Fri, Mar 29, 2013 at 11:19 AM, Bill Fenlason <[hidden email]> wrote:

On 3/29/2013 11:14 AM, George Hovey wrote:

Bill,

Re C++ version of NetRexx, two problems occur to me.

Without the ability to interoperate with Java, NetRexx is just a reworking of Rexx, perhaps even inferior to the original.

You lose the predictability and safety of the JVM environment where, for example, all array references are checked for validity at run time. Thus the security of a program could depend on where it is run.

Without Java a vast amount of NetRexx's utility disappears.

George,

Thanks for the insights.

I wasn't sure if the Class and Method structure of NetRexx was compatible with those of C++ and that C++ methods were callable from some form of NetRexx output. Are you telling me that they are not?

Bill

_______________________________________________
Ibm-netrexx mailing list
[hidden email]
Online Archive : http://ibm-netrexx.215625.n3.nabble.com/

--
"One can live magnificently in this world if one knows how to work and how to love." -- Leo Tolstoy
_______________________________________________
Ibm-netrexx mailing list
[hidden email]
Online Archive : http://ibm-netrexx.215625.n3.nabble.com/
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2013.0.3267 / Virus Database: 3161/6211 - Release Date: 03/28/13

_______________________________________________
Ibm-netrexx mailing list
[hidden email]
Online Archive : http://ibm-netrexx.215625.n3.nabble.com/

ThSITC

Re: Keywords (was Re: AST, BNF, ANTLR)

In reply to this post by George Hovey-2

Hi George, Bill, again ...

*the question* and/or *issue* in the case of enhancing *NetRexx* to Generate C# classes shall and will
*not be* to *interface Java Classes*, at all!

The *issue* would be, to make a MATRIX (Table, for non-mathematics, as I am) which does *DEFINE*,
which *Java Class, Methiod, and Property* ...

... does have ...

*which name* (and or new concept) in C#, etc ...

Shall be *not* an easy task, of course, but: *is definitely feasable*, as C# has been stolen (as an idea)
from Java (as far as I do know)...

When I'm wrong, do correct me, please !

Happy Easter, all, again.

Massa Thomas (a nick-name I did get more than 20 Years ago here in Vienna from my african & arabic friends here ... )

Shall I now do an excuse, *or what* ????
===========================================================================================
Am 29.03.2013 19:08, schrieb George Hovey:

Bill,

Definitely not knowledgeable about C++. However, it's hard to see how it could interface Java classes without some facility that compiled Java byte code.

In the early days of Java it was perceived as an interpreted language (i.e. inefficient) with time-wasting behaviors like array bounds checking. It was widely assumed that these objections would be removed by compiling to native code on the target platform, and by making bounds checking an option ("after all, I only need it until the program is debugged'). Neither of these 'fixes' transpired.

If Java ever was interpreted it certainly isn't now. The Sun/Oracle 'hotspot' JVM compiles frequently executed code into native code which has the advantage of keeping all JVM policies in effect (can't be circumvented by renegade compilers). You can find internet sources that assert Java surpasses C++ in efficiency.

And Sun firmly nixed optional bounds checking, I think because it would be a fatal security flaw.

Java was designed with complete awareness of C++ and in the view of many, cured bad design decisions of that language. Too bad they didn't ditch the C syntax, but we have NetRexx to deal with that. ;-)

What do you feel would be the attraction of a C++ NetRexx?

On Fri, Mar 29, 2013 at 11:19 AM, Bill Fenlason <[hidden email]> wrote:

On 3/29/2013 11:14 AM, George Hovey wrote:

Bill,

Re C++ version of NetRexx, two problems occur to me.

Without the ability to interoperate with Java, NetRexx is just a reworking of Rexx, perhaps even inferior to the original.

You lose the predictability and safety of the JVM environment where, for example, all array references are checked for validity at run time. Thus the security of a program could depend on where it is run.

Without Java a vast amount of NetRexx's utility disappears.

George,

Thanks for the insights.

I wasn't sure if the Class and Method structure of NetRexx was compatible with those of C++ and that C++ methods were callable from some form of NetRexx output. Are you telling me that they are not?

Bill

_______________________________________________
Ibm-netrexx mailing list
[hidden email]
Online Archive : http://ibm-netrexx.215625.n3.nabble.com/

--
"One can live magnificently in this world if one knows how to work and how to love." -- Leo Tolstoy
_______________________________________________
Ibm-netrexx mailing list
[hidden email]
Online Archive : http://ibm-netrexx.215625.n3.nabble.com/

--
Thomas Schneider, IT Consulting; http://www.thsitc.com; Vienna, Austria, Europe

_______________________________________________
Ibm-netrexx mailing list
[hidden email]
Online Archive : http://ibm-netrexx.215625.n3.nabble.com/

Thomas Schneider, Vienna, Austria (Europe) :-)

www.thsitc.com
www.db-123.com

ThSITC

Re: Keywords (was Re: AST, BNF, ANTLR)

In reply to this post by billfen

Bill, and all, and MFC, please do Object, whene you want

Bill Fenlason did say (*quoted from below*)

I also feel that NetRexx is more a compiled language than an interpreted one. (Probably Mike and I will disagree on that one too :)

Thomas Schneider does say.:

As Mike Ciowlishaw, obviously, did *first* write and release the compiler (and Java Program Generator), and ..
... did later *add* the INTERPRETER hooks necessary, and ...

...... as a review of this existing NetRexxC.nrx source code does show:

1.) Mike F. Cowlishaw has been and always will be, a GENIUS! Congrats!
2.) He (MFC) has *not been able* to sell IBM the potentials of a GENERALIZED *Parser* and *Code* generator for *multiple* target and/or sourcer languages.
3.) As I did work for GEISCO (General Electric Information Services, marketing the so called MARK III service,
those days, and did *author* a lot of programs there, getting my royaltees, for international usage, for decenniums)

(a) Mike did a *very good JOB*, in the time-frame, he did have available!
(b) Nearly nobody, except MFC, and Kermit, and maybe Rene, is *able to read and understand* his, frankly speaking, *very concise* type of programming!

It obviously does all work! Great!"

Does anybody *know*, except MFC, and Kermit, maybe, what effect it shall have to change a line of the original source ???

4.) Having worked so many Years, for GEISCO, I did learn a lot, those days ...

For instance, how to do and maintain QUALITY ASSURANCE in an international,
distributed FrameWork!

Back in the 1970's, Friends !
Internationally, around the Earth, Friends!

Ok, anyway, when I shall help, pls. simply send an e-mail !
Full Stop, again
Thomas.

m 29.03.2013 19:20, schrieb Bill Fenlason:

George,

What I said was:

"In addition, is there any inherent reason why NetRexx could not be used with languages other than Java? I'm not proficient enough in C++ to understand the pitfalls, but why couldn't a C++ version of NetRexx be considered? Perhaps a C++ programmer can comment?"

I was trying to make the point that being able to parse NetRexx outside of a Java environment is important, and perhaps more important that currently understood. I don't know if NetRexx could have application as a general purpose Object Oriented language or not. But if so, being able to processes it in unexpected places is important.

I also feel that NetRexx is more a compiled language than an interpreted one. (Probably Mike and I will disagree on that one too :)

Bill

On 3/29/2013 2:08 PM, George Hovey wrote:
Bill,

Definitely not knowledgeable about C++. However, it's hard to see how it could interface Java classes without some facility that compiled Java byte code.

In the early days of Java it was perceived as an interpreted language (i.e. inefficient) with time-wasting behaviors like array bounds checking. It was widely assumed that these objections would be removed by compiling to native code on the target platform, and by making bounds checking an option ("after all, I only need it until the program is debugged'). Neither of these 'fixes' transpired.

If Java ever was interpreted it certainly isn't now. The Sun/Oracle 'hotspot' JVM compiles frequently executed code into native code which has the advantage of keeping all JVM policies in effect (can't be circumvented by renegade compilers). You can find internet sources that assert Java surpasses C++ in efficiency.

And Sun firmly nixed optional bounds checking, I think because it would be a fatal security flaw.

Java was designed with complete awareness of C++ and in the view of many, cured bad design decisions of that language. Too bad they didn't ditch the C syntax, but we have NetRexx to deal with that. ;-)

What do you feel would be the attraction of a C++ NetRexx?

On Fri, Mar 29, 2013 at 11:19 AM, Bill Fenlason <[hidden email]> wrote:

On 3/29/2013 11:14 AM, George Hovey wrote:

Bill,

Re C++ version of NetRexx, two problems occur to me.

Without the ability to interoperate with Java, NetRexx is just a reworking of Rexx, perhaps even inferior to the original.

You lose the predictability and safety of the JVM environment where, for example, all array references are checked for validity at run time. Thus the security of a program could depend on where it is run.

Without Java a vast amount of NetRexx's utility disappears.

George,

Thanks for the insights.

I wasn't sure if the Class and Method structure of NetRexx was compatible with those of C++ and that C++ methods were callable from some form of NetRexx output. Are you telling me that they are not?

Bill

_______________________________________________
Ibm-netrexx mailing list
[hidden email]
Online Archive : http://ibm-netrexx.215625.n3.nabble.com/

--
"One can live magnificently in this world if one knows how to work and how to love." -- Leo Tolstoy
_______________________________________________
Ibm-netrexx mailing list
[hidden email]
Online Archive : http://ibm-netrexx.215625.n3.nabble.com/
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2013.0.3267 / Virus Database: 3161/6211 - Release Date: 03/28/13
_______________________________________________
Ibm-netrexx mailing list
[hidden email]
Online Archive : http://ibm-netrexx.215625.n3.nabble.com/

--
Thomas Schneider, IT Consulting; http://www.thsitc.com; Vienna, Austria, Europe

_______________________________________________
Ibm-netrexx mailing list
[hidden email]
Online Archive : http://ibm-netrexx.215625.n3.nabble.com/

Thomas Schneider, Vienna, Austria (Europe) :-)

www.thsitc.com
www.db-123.com

Mike Cowlishaw

Re: Keywords (was Re: AST, BNF, ANTLR)

In reply to this post by billfen

Some brief comments below, in red. I don't think we are going to get much further in this discussion.

Mike

From: [hidden email] [mailto:[hidden email]] On Behalf Of Bill Fenlason
Sent: 28 March 2013 20:35
To: IBM Netrexx
Subject: Re: [Ibm-netrexx] Keywords (was Re: AST, BNF, ANTLR)

On 3/27/2013 6:58 AM, Mike Cowlishaw wrote:

Bill, sorry about the delay in reply. A lot of questions/comments here; mine in blue....

Thanks for giving it your full consideration - I realize you are busy.

I don't believe that I've misunderstood you, but I do believe that we have a disagreement about how best to handle the language (keyword) extensibility problem.

Or perhaps at cross purposes on some things.

Perhaps at the tactical level, but I think not at the strategic. Wouldn't you agree that the ability to quickly and effectively develop NetRexx source processing programs would benefit the usability and spread of the language?

I think NetRexx (any language) needs more than just source-processing tools.   Mostly it needs users.

You have described the problem quite clearly - in many programming languages breakage can occur when new keywords are added. In other words, old programs do not work as they originally did.

It's more than that .. it's also new programs that run on a newer interpreter. Scripts interpreted from the source of the program are much more vulnerable to language changes than executables that are compiled. That's my primary concern.

My point is that the NetRexx approach to distinguishing keywords from variable names has significant downsides.

Of course NetRexx 3 is not about to change - as the saying goes "It is what it is". I'm not advocating any change, although if a new Rexx dialect is developed I am advocating that the breakage problem be handled differently. This is a philosophical discussion, not a practical one.

I think it is important to point out that with careful design, a programming language can totally avoid the breakage problem. The best example is PL/I. In its 50 year history, PL/I has added many dozens of keywords to the language, but as far as I know, there has never been an instance of breakage. Why? Because keywords are never identified by examining them! In other words, tokens are are identified as keywords by syntax context rather than content, and that is why keywords are never confused with variables with the same name. The down side is that the language has lots of parens and commas, and sometimes an unnatural feel. I know that you are well aware of this Mike, but some other readers may not be.

Yes, and it is possible for PL/I due to one big difference from Rexx (and NetRexx) -- blanks are separators in PL/I, and can never be operators.   But the blank operator in Rexx is one of the features that keep it relatively notation-free.   I have some programs (e.g., wiki2html translation) where the blank operator is by far the most heavily used operator -- more often used than +, in particular.

The unrestricted use of the blank concatenate and direct abutment concatenate operators causes most of the "breakage possibilities" in NetRexx. I define a "breakage possibility" as every place in the language definition where both a keyword and a variable name can occur. (Think railroad track diagrams, etc.) Breakage possibilities are important because it is at those points that breakage may occur if a new keyword is added to the language. Every NetRexx situation in which an expression is followed by a keyword is subject to breakage, since the keyword may be a variable name after an implied concatenate operator.

In order to get an idea of how much of a problem this is in the real word, can you estimate how often in wiki2html an expression using an implied concatenate operator is followed by a keyword? Or if the source is available, could you point me to it so I can check it?   Almost all such cases are in IF clauses.

To put things in perspective, the implied concatenate operators are novel, convenient and in some situations very convenient. But they are nothing more than substitutions for "||' '||" and "||". Providing full keyword independence in NetRexx is much more important in my view  (but not in my view, as you might expect). If the implied concatenate operators must be restricted in some situations to accomplish that goal, then I think they should be.

If NetRexx required that if an expression contains an implied concatenate operator and is followed by a keyword, the expression must be enclosed in parentheses at some level, there would be no need for the special cases regarding the non-statement level keywords. The keyword independent model could be used (assuming other NetRexx breakage possibilities were fixed).

For example, page 94 of the Language Definition says: "The expressions exprw or expru will be ended by either of the keywords while or until (unless the word is the name of a variable)." What it does not say is "For example, if 'while' was used as a variable name at any earlier point, the 'while' condition may not be specified." A NetRexx error is generated for "x=0; while = 3; loop while x = 1; end;" With full keyword independence, no explanation is necessary at all, since variable names can be the same as keywords. In addition, the above code would not be in error.

In Rexx, the attempt was apparently to allow at least partial keyword independence since "say = 42; say say;" was valid. (Robert, thanks for pointing this out in your earlier append.) In NetRexx it is not valid, and I think NetRexx would be a better language if it were. Of course that is not possible since NetRexx is, in essence, a reserved keyword language (because of how breakage is prevented). If true keyword independence were adopted by NetRexx, I think it would be a better and less confusing situation.

Yes, Rexx has the 'keyword in context' rule .. mbut that made it impossible for the ANSI committee to extend the ADDRESS instruction without breakage.

PL/I was famously ridiculed for its acceptance of the perfectly valid statement:
     "if if = then then then = else; else else = if;".
The mind sees "if", "then" and "else" as keywords and not variable names. The fact that the situation was a byproduct of avoiding the breakage problem was generally not acknowledged.

Yes, and of course something like this:

if file=bin then file=input; else file=list;

might well be part of a more real program (all the variable names there can be keywords in other contexts).

PL/I has full keyword independence but NetRexx does not. PL/I allows "put (put);" where the second "put" is a variable while NetRexx does not allow "say say;". Note that NetRexx with keyword independence would not require "say (say);" but would allow it. The parens in "put (put);" are required because of the complex nature of the PL/I "put" statement.

The crucial point is that in any language that avoids the breakage problem, the separation of keywords and variable names comes first. Then the keyword token in question is compared with the list of known keywords. If a token which is known to be a keyword is not within the list of known keywords (for that version of the language), it is an "invalid keyword" situation. It is not presumed to be a variable name.

Most other languages use the "reserved keyword" approach. Keywords are identified by comparing tokens with a list of words, and anything that matches is a keyword, anything that doesn't match is a variable name, and "never the twain shall meet". In that case, breakage will always occur in places where keywords and variables can occur in the same location.

As you know, NetRexx takes the opposite approach. It compares tokens with a dynamically computed list of variable and method names (i.e. everything that is not a keyword).

I would disagree with the word 'dynamic', here. Variables are statically computed (they have to be, because the NetRexx processor can emit Java -- whose variables are static). Method names only affect instruction keywords (i.e., first token in a clause and, in practice, when a sole symbol in that clause) and since (except for static methods in USES classes -- which I think was a mistake) they must be in the current class they are not dynamic in any sense.   Of course, method calls don't come into it at all if strictargs is in effect (more on that below).

Yes, "dynamic" is the wrong word (I misunderstood the detailed nature of the related translator processing).

I most certainly agree that allowing single word method names without "()" was ill-advised. It is the primary reason that the parsing of the NetRexx source requires access to the external execution environment. In addition, it is a significant breakage point since it allows the first word of a statement to be a variable name as well as a keyword.

No, it allows the first word of a statement (clause) to be a method name.   And that can only be external if exposed through USES. The error is just the case (under 'Terms' on page 48 of netrexx2):

(or to a static method in a class used by the current class)

Delete that and the problem goes away.  The ability to omit the baffling (to non-programmers) empty parentheses is a probably a Good Thing.

If the search for variable names is limited to names contained in the source file, some of my objections go away. Unfortunately I think it is too late to "close the barn door" without a new NetRexx dialect or a version control mechanism.

To emphasize why this is important, consider the task of writing a NetRexx source processor that does not run in the Java environment, perhaps on a machine that does not even include Java. To be reasonably successful, it must only consider the input source, not anything external to it. Currently I don't see how that is possible (with the USES problem). Of course, some things can not be accomplished without the Java environment, such as checking the validity of external method calls, but assume the processor does not need to do that.

I think you could delete that one phrase and no-one would notice (except maybe a few testcases). One could add an option to allow it to work just in case someone is relying on it.

In addition, is there any inherent reason why NetRexx could not be used with languages other than Java? I'm not proficient enough in C++ to understand the pitfalls, but why couldn't a C++ version of NetRexx be considered? Perhaps a C++ programmer can comment?

Indeed .. I always hoped for a non-Java based version.

If the token is not within that list, the token is judged to be a keyword. Then if the keyword is not within the list of known keywords, it is an "invalid keyword" situation. Thus NetRexx, like PL/I, avoids the breakage problem.

Agreed.

In my opinion, here are some downsides of the NetRexx approach.

First, as I tried to point out, by giving variable names priority over keywords, keywords may be overloaded. In my view, that is a bad idea for a language which strives for simplicity and low "astonishment" levels.

It is a natural tendency for programmers (particularly those with experience in other languages) to recognize keywords by content. In other words, when reading "options args", options is assumed to be a keyword. Allowing any other interpretation is simply confusing.

I find the reserved keywords approach to be more astonishing (we may have to agree to differ on that). And the PL/I approach -- which would mean more notations and losing the blank operator -- would make Rexx more a variant of PL/I than an advance on PL/I, perhaps.

Of course we could use different syntax to differentiate variables from keywords; many languages take that approach (e.g., EXEC 2, Tkl, etc.). I consider that approach too ugly and too wasteful.

I "agree to disagree" with the first point, and agree with the second. If I understand you correctly, I think you may be in the minority on the first point. I submit that most programmers see the statement "options args;" as the keyword "options" followed by an argument "args", and the statement "say greeting;" as the keyword "say" followed by the argument "greeting". If all of the source code in the world were analyzed, undoubtedly almost all of the non-assignment statements consist of a fixed keyword optionally followed by arguments or modifiers. That model is so pervasive that it is second nature to the majority of programmers. Is that not the case for you? (Of course, there is always Lisp, but even that follows the "FunctionName(argument list)" model - perhaps you see "options" as a fixed function? But the distinction between a user defined function and a keyword defined function isn't clear :)

You really are missing the point that Rexx was not designed for existing programmers but for new (non-programmers). I researched what those people found difficult (such as notations in general and also the * for multiply and empty parentheses which I could not work around for Rexx) and avoided them where I could.

If you want to design a language for the (tiny minority of) people who are already programmers then you have to use syntax that they are familiar with. Nowadays that means curly braces, etc.

Perhaps a clarification of my understanding of "reserved keyword", "keyword independent" and other languages would help. A "reserved keyword" language (like C or Java) is one in which keywords are always reserved to be used as keywords and can never be used as variable names. In these languages breakage may occur when a new keyword is added. A "keyword independent" language (like PL/I) is one in which keywords may be used as either a keyword or a variable name at any point. In general, breakage will not occur with these languages. Other languages (like NetRexx) are ones in which keywords are reserved some of the time. In Netrexx, keywords are reserved but if used as a variable name, the word can no longer be used as a keyword.

I don't know of any other language which uses the model that NetRexx uses. I think it may be confusing to some programmers that a variable name invalidates a keyword.

I don't agree concerning the "loss of the blank operator" or that keyword independence makes Rexx or NetRexx anything near a variant of PL/I.

What I am advocating is that implied concatenate operator use be restricted when the expression may be followed by a keyword. In those cases, recognition of implied concatenate operators is enabled only within a parenthesis level of one or greater. Thus the following are all valid:
"x = a + b c; x = a b + c; if x = a + b then nop; if (x = a + b c) then nop; if (x = a b + c) then nop; if (x = a'b' + c); if x = (a b) + c then nop;"

Of course the following are also valid:
"if x = (a + b c) then nop; if x = (a b + c) then nop; if (x = a + b) then nop; if (x = ((a b) + c)) then nop;"

but "if x = a + b c then nop;" would be invalid and "if x = a b + c then nop;" would be invalid.

Describing when those parentheses have to be used sounds hard, even if the audience is programmers.

Second, using a dynamic list of available variable names locks the program into its execution environment.

With the exception of static methods in USES classes when strictargs is off, I believe NetRexx syntax is independent of its execution environment. That loophole was unfortunate -- but of course it only applies in classes with USES classes specified which are rare.

As discussed above. Still a loophole which should be closed.   Agreed; no reason why that shouldn't be done, es[ecially since the USES rule arguably contradicts the later sentence "Method invocations that take no arguments may omit the (empty) parentheses in circumstances where this would not be ambiguous".

The example on page 79 of TRL contains two occurrences of "say 'hello' " in the same short program. The first is valid, and the second is an error. In my opinion, that is confusing and a bad idea. It is, of course, a byproduct of the way that NetRexx avoids the breakage problem.

But that example is deliberately chose to look confusing and silly, to make the point -- just like your PL/I example above.   'say' is so well known in NetRexx that few, if any, programmers would use 'say' as a variable name. However, there are more obscure keywords in NetRexx; someone might well want to call a variable 'label' or 'digits' for example, as in:

   label='fred'

   digits=3

and if they never use (and maybe never even learned about) those features, all is well and good.   This is particularly important for new programmers who can get very frustrated when trying to pick variable names in a language with many reserved words.

This is exactly why the PL/I model of keyword independence is preferable! While beginning PL/I programmers have to deal with the complexity of the language, they are never frustrated by the problem of variable names being keywords.

If NetRexx adopted the full keyword independence model, there would be no need for any discussion of the sub-keywords in the IF, LOOP and other statements. The issue could be described:

"Keywords in NetRexx are not reserved, and any variable may have the same name as a keyword. For clarity, the use of variable names which are the same as keywords is not recommended."

I believe that relatively few changes are needed to the NetRexx language to move it to the keyword independent model. Once an exhaustive determination of exactly what those changes are is made, then it can be determined if the changes damage the language more than keyword independence helps it.

Consider the following program:
/* NetRexx 4 */import some.package.pleaseexplainthisprogram
What does a person familiar with only NetRexx 3 make of this? Each of the words might be a method or a new keyword added in version 4 of the language.

Indeed, and that would be true of a future PL/I perhaps, too.   However, it would be rather unlikely that new instructions would be single words, so I would assume that all of those (except 'this' which is an error even now) must be method calls?

There's a good point here, however: the NetRexx ability to reduce notation by allowing a method call without specifying parentheses was probably a mistake, or at least the strictargs option should probably have been the default.   But that generally does not expose a program to later breakage because any method calls have to be in the same class. The exception to that, already mentioned, is static methods in USES classes which could be added at a later date -- that was definitely a mistake (but again it would be OK so long as parentheses were required).

In a future PL/I, because of keyword independence, the words in all of the above statements (assuming ";" added to each) must be keywords. All PL/I statements begin with a keyword unless (like the assignment statement) it can be differentiated by lookahead.

We agree that in NetRexx, these should not be method calls - the "()" is to be required for a single word method call. If NetRexx had keyword independence, all of these statements (including "this") would be new NetRexx keyword statements.

Note that the use of a variable name and a keyword as the first word of a statement is a breakage possibility which must be examined. As with the assignment statement, simple lookahead can determine if the word is a variable name So requiring "()" provides the necessary lookahead for a method call, just as "=" identifies an assignment.

Except for the USES case, the lookahead is all within the same class; methods (and their names) have to be identified before individual clauses within methods have to be parsed. Requiring () would indeed make it a little simpler for a parser .. but worse for the programmer/reader. It's also nice to be able to replace a variable by a more 'intelligent' method without changing the code that referes to it.

Third, using a dynamic list of available variable names not only locks the program into its execution environment, it also locks any other program which attempts to correctly process a NetRexx source file into the execution environment.

(Same comments as above.)

That means than any formatter, pretty printer, statistics gatherer, intelligent editor etc. for NetRexx must also include the same logic that the translator uses. It must dynamically determine everything which is not a keyword to identify keywords. I think that is unfortunate since it makes the development of peripheral NetRexx processors more difficult or impossible.

This would be true if keywords depended on the environment, but I don't think they do except for the USES case.

Discussed above

Finally, it makes the language difficult, if not impossible, to define in a formal manner with BNF or another formal definition method. While some may feel this is actually an advantage (!), the truth is that it makes standardization difficult. Essentially all compilable programming languages have formal definitions.

Almost all languages (I would say all 'practical' languages) have to supplement BNF with prose; NetRexx is actually rather clean in that respect -- there is really just one exception rule to add to the pure BNF.

Does NetRexx actually have a "pure BNF" definition? I wasn't aware that had been done, and if it has I'm certainly interested in seeing it. What is the exception rule?

Essentially it's there in the syntax diagrams: collect them them all together and you have what you need. The exception rule is a symbol cannot be a keyword if it is already in use as a variable name.

As a descriptive mechanism, BNF is extremely powerful, and a fully robust definition requires almost no prose to accompany it. The problem is that a fully robust definition can be huge and incredibly boring to generate. For example, it is easier to add "one or more of the following, without duplicates..." than it is to enumerate all the possibilities. In the case of the blank and abutment operators, the exact details of white space and significant blanks must be included rather than adding prose to describe it. The use of an extended BNF can help but not eliminate these description problems.

In addition, a BNF specification can describe ambiguous languages. It can describe unambiguous languages which can not be effectively parsed. There is a great difference between a "BNF specification" and a "practical and usable BNF specification".

The trick is to develop a grammar specification which fits one of the commonly used types such as LR(1), LL(1), LALR, etc. Not only can a specification like that be used to generate scanners, parsers and the like, it can also serve as a detailed spec to be used to check a hand written parser.

Note that in no way am I advocating that an automatically generated parser be used in the NetRexx translator!

Clearly hand written scanners and parsers can by just as effective as automatically generated ones, although experience has shown that they may be somewhat more error prone. What I am suggesting is that for those who are comfortable with generators, having an adaptable grammar which can be used to accurately parse NetRexx could shorten the time necessary to implement a NetRexx source processor. As it stands, having the variable names take priority over keywords (and the USES issue) substantially complicate the problem.

I see this the other way around: LALR parsers, etc., make it easy to write compliers but gaurantee a user-unfriendly language. To me that's backwards.

The overall problem of language versions is a complex one, since every change to a language in effect defines a new language. I believe the assumption that any future NetRexx processor should be able to correctly process every NetRexx program without knowing which version of the NetRexx language it is programmed in, is, (while laudable), not worth it if it requires the current NetRexx method of identifying keywords.

As I have suggested, I believe in adopting the approach that NetRexx programs should identify themselves. In HTML web pages, the very first thing is a DOCTYPE declaration of exactly what language the page is written in. I think the same approach could be adopted for NetRexx so that if at some later point the method of identifying keywords is changed, it could be accommodated. I suggest that the language level should be included in the initial comment or in an option which must be specified before the remainder of the program. Existing NetRexx programs would, of course, default to the current language version.

But this has exactly the same problem you just expressed concern about: every future compiler, interpreter, formatter, pretty printer, statistics gatherer, intelligent editor etc. for NetRexx must also include logic to handle previous levels of the language that remain supported.

I don't believe that is true if the keyword independence model is used. I was suggesting this as an alternate approach to the breakage problem.

As I said, you and I just disagree on this, Mike. I don't expect you to change anything, but I do hope you will give it some (more) serious thought.

With hindsight there is very little I would change in NetRexx (the strictargs matter is one, and maybe I'd drop USES entirely). The variables-override-keywords rule is extremely simple to understand and gets enormous 'bang for the buck' by making the language safely extendable in a way that Rexx (and almost all other languages) never achieved. PL/I is the nearest to NetRexx -- but at a cost that would make achieving the other goals of Rexx/NetRexx impossible.

All these things are tradeoffs, of course, and it took many iterations of language design before I got to NetRexx.



I still think the keyword independent model is preferable for breakage control, and that the variables-override-keywords approach costs more than it is worth. I realize you currently believe that making NetRexx fully keyword independent conflicts with other NetRexx goals, but I (humbly) suggest that issue might deserve reevaluation.

So we obviously should "agree to disagree" on these points. But if you do yet another iteration of the Rexx family, I hope you will consider this discussion.

A fully-keyword-independent language such as you describe would be extremely interesting. Why not put together a specification document for one?   It wouldn't need to define a full language -- just the basic sytax & semantics and a couple of instructions/constructs. Whether it would be Rexx-like would be up to you.

[By the way from the middle of this week I am traveling for 3 weeks and won't be checking this e-mail, so no hurry to reply to this. :-)]

Mike

_______________________________________________
Ibm-netrexx mailing list
[hidden email]
Online Archive : http://ibm-netrexx.215625.n3.nabble.com/

billfen

Re: Keywords (was Re: AST, BNF, ANTLR)

On 3/30/2013 12:19 PM, Mike Cowlishaw wrote:

Some brief comments below, in red. I don't think we are going to get much further in this discussion.

Mike

Noted.

I'll write my some ideas and specs as you suggest, and send them to you privately in a few weeks.

Bill

_______________________________________________
Ibm-netrexx mailing list
[hidden email]
Online Archive : http://ibm-netrexx.215625.n3.nabble.com/

Jerry McBride

Re: Keywords (was Re: AST, BNF, ANTLR)

In reply to this post by Mike Cowlishaw

On 03/30/13 12:19, Mike Cowlishaw wrote:
>
> [By the way from the middle of this week I am traveling for 3 weeks and won't be
> checking this e-mail, so no hurry to reply to this. :-)]
>
> Mike
>

Have a safe trip, Mike.

_______________________________________________
Ibm-netrexx mailing list
[hidden email]
Online Archive : http://ibm-netrexx.215625.n3.nabble.com/

ThSITC

Re: Keywords (was Re: AST, BNF, ANTLR)

In reply to this post by billfen

Hi Bill, *and* Mike,

I shall be also *very interested* to read your specs, as well (when Mike does allow, only, of course ---).

As you all do know, I am *very interested* in this topic, and how the possible resolutions might look like :-)

Happy Easter, again, and do enjoy the free days, in the way you do find appropriate for
yourself, your family and your friends ;-)

Thomas.
=============================================================================

Am 30.03.2013 17:25, schrieb Bill Fenlason:

On 3/30/2013 12:19 PM, Mike Cowlishaw wrote:

Some brief comments below, in red. I don't think we are going to get much further in this discussion.

Mike

Noted.

I'll write my some ideas and specs as you suggest, and send them to you privately in a few weeks.

Bill
_______________________________________________
Ibm-netrexx mailing list
[hidden email]
Online Archive : http://ibm-netrexx.215625.n3.nabble.com/

--
Thomas Schneider, IT Consulting; http://www.thsitc.com; Vienna, Austria, Europe

_______________________________________________
Ibm-netrexx mailing list
[hidden email]
Online Archive : http://ibm-netrexx.215625.n3.nabble.com/

Thomas Schneider, Vienna, Austria (Europe) :-)

www.thsitc.com
www.db-123.com