tl;dr warning: May be off-topic for some
George, The Eclipse Java Development Environment (JDE) is an excellent example of full semantic coloring. That effort is huge and obviously contains many thousands of lines of code and hours of work. Since I didn't use it as a model, I'm not able to describe it simply but all the Java source is available for study. To my knowledge, all the scanning and parsing is hand written. I'll try to give an overview of how I'm implementing semantic coloring. (This may be more than most readers will want to know about it.) My approach is similar to some other Eclipse efforts (e.g. IMP, XText, DLTK, the Antlr stuff, etc.), although I'm a bit more pragmatic and more interested in performance then oo elegance. I believe that some aspects of my approach are unique. No doubt I'm reinventing the wheel here and there, but I'm doing this for fun :) The Eclipse framework provides the basic structure of the editor and reconciling process. My plugin uses a JFlex generated (incremental) token scanner which assigns preliminary color and font characteristics to each token for instantaneous feedback (in the GUI thread). As soon as any keyboard input is paused, the token string is incrementally parsed using a JavaCC generated parser in a background (reconciler) thread. It performs error checking, provides error and warning messages and overrides the colorization as needed. I provide somewhat more flexibility with the coloring than most in that each keyword can be colored differently, etc.. The first (currently available) version of the plugin just used the parser to perform statement (NetRexx clause) level parsing and error checking. The next version (current work) uses JJTree to generate a full AST (abstract syntax tree) which is used to generate the outline view and folding structure, group (do-end) balancing and error checking, etc. Once an AST is available, full semantic coloring is primarily a matter of deciding which display characteristics are to be assigned to which language elements and providing a way to specify that linkage via display preferences. The parser (or a following reconciling process which walks the tree) assigns the display characteristics to the appropriate tokens. The approach in the first version was to provide warning messages when words (like "do" in the example) are used in an unusual manner - in the next version there may be a way to color them differently as well - it's a good suggestion. Of course, the dynamic interpretation of words that NetRexx uses is problematic. NetRexx assumes that "do" is a method clause unless a method named "do" is NOT found, and then it is interpreted as a keyword (as described in "NetRexx 2", page 77). Given that, how should "do" be colored? Editors (and probably most people) tend to interpret it as a keyword first. While I understand Mike's motivation, I think NetRexx should have an option that allows some set or subset of NetRexx keywords to be reserved. I also believe if the option were available, many people would use it and accept the minor risk of program breakage if the NetRexx language changes. Without the NetRexx source to work with, this is indeed a problem for me, particularly in the construction of an accurate AST. In my opinion, the root causes of the situation are the blank concatenate operator and the unproven assumption that allowing a program to override keywords in the language is a good idea, but that is another (probably heated) discussion. I'm currently working on adding incremental parsing support to JavaCC / JJTree, and if anyone is interested in that sort of thing, check out billfen.wordpress.com. I started to write a simple description but got carried away, just like I did with this response :) I'm happy to discuss the internals of semantic coloring further, but perhaps off-line would be more appropriate. Bill On 11/12/2010 9:33 PM, George Hovey wrote: Just a slight quibble about the behavior of the Eclipse NetRexx editor. You mention _______________________________________________ Ibm-netrexx mailing list [hidden email] |
I'm all for public (on-list) further discussion of this matters. IMO debate on existing or upcoming NetRexx tools squarely fits on this list 'topic'.
Maybe a proper tag could be prepended to the subject line so non-interested recipients can easily identify/delete messages belonging to this thread. - Saludos / Kind regards, David Requena -----Original Message----- From: Bill Fenlason <[hidden email]> Sender: [hidden email] Date: Sat, 13 Nov 2010 08:57:34 To: IBM Netrexx<[hidden email]> Reply-To: IBM Netrexx <[hidden email]> Subject: Re: [Ibm-netrexx] NetRexx system environment variables _______________________________________________ Ibm-netrexx mailing list [hidden email] _______________________________________________ Ibm-netrexx mailing list [hidden email] |
In reply to this post by billfen
In my opinion, the root causes of the situation are
the ... unproven assumption that allowing
a program to override keywords in the language is a good idea, but that is
another (probably heated) discussion.
Chuckle .. the concatenate operator is an interesting one, and
yes, could easily have been designed differently; perhaps a matter of
taste.
But I have to comment on that 'unproven
assumption'. The intent is not to allow programs override
keywords -- people rarely want to do that -- but the reverse. The design
stops the language keywords overriding programs. And the latter
has most surely been proven to be a bad idea. I have sat in
standards meetings of language designers: C, C++, JavaScript (ECMAScript), and
ANSI Rexx itself, for a start -- where perfectly reasonable and necessary
extensions to those languages were deemed impossible because they would break
existing programs because new keywords were necessary, those keywords would have
to be reserved in some context, and users might have used those names for
variables.
Adding new reserved keywords to a laguage (perhaps C) that is
designed to be compiled to some 'binary' form before being executed is not too
bad -- it only breaks programs when they are being compiled, not when they are
being run. It's expensive (huge Makes suddenly start to fail) -- but
it won't halt a statically-compiled system that is already running (unless it
dynamically buiilds itself from source files).
In contrast, if a program is being run from its source (think
Rexx programs, scripts embedded in web pages, etc.) -- then the language the
program is written in cannot add new reserved keywords, because running programs
would suddenly start to fail as new versions of the language processors were
deployed.
If a language is successful, it will need to be
expanded. The NetRexx design allows considerable expansion of the
language without breaking existing programs. It allows new keywords to be added
when and where necessary without breakage (either of programs or of user's
understanding of the language).
I wish I knew why this isn't obvious to other language
designers -- new languages appear almost daily, but basic language design
principles such as this seem to be unknown (or forgotten).
Mike
_______________________________________________ Ibm-netrexx mailing list [hidden email] |
Mike;
A very interesting Essay. I hope it is one of many about your experiences with programming. BobH On Sat, Nov 13, 2010 at 9:28 AM, Mike Cowlishaw <[hidden email]> wrote:
_______________________________________________ Ibm-netrexx mailing list [hidden email] |
In reply to this post by Mike Cowlishaw
Hi Mike,
On 11/13/2010 10:28 AM, Mike Cowlishaw wrote:
We can all agree that breaking old programs by adding keywords that can be confused with variable names is bad, and has been proven to be bad. I'm also sure that the intent in NetRexx is to allow the language to be extensible. But please note, in my post I did not say the intent of NetRexx was to allow programs to override keywords. I said "the unproven assumption that allowing a program to override keywords in the language is a good idea", which is not the same thing. Certainly the process of recognizing keywords by omission has the effect of allowing a program to override keywords, and I'm suggesting that effect is a bad idea. Obviously it is a consequence, not an intent. No doubt I should have written my thought more clearly and completely - I wish I had said: "... is not a bad idea.". From your meetings with other language designers, I can certainly agree there there is a problem adding keywords to any language which uses reserved words. I submit that the problem is that any language which either reserves keywords or must examine words to determine if they are keys can never be extended by adding a new key (unless it is added as a non-reserved word recognizable by context rather than content). Or, as in the case of NetRexx, if keywords are allowed to be overloaded. In many cases, languages are implemented with scanners which examine token contents to determine if they are keywords, and that is the start of the problems. Certainly the Lex / Yacc model used by college computer science classes is not helpful in that regard, nor is thoughtlessly implementing a language that way without regarding a token marked as a keyword as only a potential keyword rather than a reserved keyword. Clearly it is possible to design languages which will never have a problem adding new keywords. PL/I figured out how to do it over 40 years ago. With its huge number of keywords, PL/I has been extended many times over the years but as far as I know there has never been a problem confusing keywords with variable names. Since that is the case, there has been no problem in adding new keyword statements with complex structures. All that is necessary in language design and processing to achieve keyword independence is to insure that under no circumstance is a word examined to determine if it is a keyword. But once it is known by context as a keyword, it can be examined to determine which keyword it is. If the syntax and punctuation (only) unambiguously distinguish keywords from non-keywords, breaking old programs by adding new keywords is just not possible. The guideline is: "Don't look at a word to see if it is a keyword!!" As long as that simple rule is followed (in the base language and all extensions), even the oldest program can not be broken. Of course, a new compiler or other language processor might not honor an old keyword, but that is a totally different situation. The old program is not broken, just not supported. Keywords need not be reserved, but it does no damage to reserve them in the base language. Breakage can only occur if a new keyword is added and reserved. Unfortunately, it appears that the NetRexx interpreter examines the content of the first clause word to determine if it is a keyword by searching the loaded methods.
Agreed for reserved keywords, but not for keywords in general. In the case of PL/I, there are no reserved keywords. A PL/I source interpreter might have to look ahead because PL/I does not obey "define before use", but in theory new keywords in the language could never prevent an old program from being interpreted. And any other language in which keywords and variables can be determined by syntax alone (and is implemented that way) would not fail either.
That is true, of course. However, I might point out that while the user's understanding of the language isn't broken, the ability to understand what a program is doing when keywords are overloaded can be a problem. Imagine trying to understand a program if "do", "end" or other structural keywords are overloaded? possibly unintentionally? In my opinion, there is quite a difference between overloading a keyword and extending or overriding a method or overloading an operator. (Languages without keywords in which all statements are method calls can be discussed another day :) No doubt we agree that ideally, NetRexx should have no reserved words and unrestricted keyword extensibility. I think allowing keywords to be overloaded is much too high a price to achieve that. The problem with keyword overloading is that it is subject to dynamic breakage as well, but more importantly, programs other than the NetRexx interpreter are unable to process NetRexx source accurately without performing a non-trivial part of the interpreter processing, or else by assuming that the keywords are not overloaded. I think that is a significant drawback to the language, and I can not think of any other major (or even minor) language with that characteristic (other than source input to a preprocessor). Forcing every NetRexx source processing program to examine all the classes and methods which would be loaded during execution or interpretation (to search for keywords by omission) is a significant burden. There are many kinds of programs which process source code other than compilers or interpreters. I'm concerned with intelligent editors, but there are also formatting programs, statistic counters, flow analyzers, obfuscators, optimizers, auto documenters, language converters, and so on. No doubt a long list of programs which process Rexx source code could be made, and if NetRexx is successful, the list will be matched by NetRexx source processors. If that happens, it is likely that most or all of them will operate under the assumption that NetRexx keywords are never overloaded, which rather defeats the purpose. Certainly the NetRexx interpreter will not break old programs if the language changes, but many of the other source processing programs may be broken by old code or require modification and "Language Level" options. Is that really a desirable situation? Wouldn't it be better if keywords were not overloadable? I believe NetRexx is close to the PL/I model of context identified keywords, with the exception of the blank concatenation operator (which prevents recognizing the end of an expression) and the fact that the first word of a method statement is not syntactically identified as a non-keyword (and perhaps another minor glitch or two). I would argue that the goal of extensibility might be better served by fixing those problems and making NetRexx keyword independent rather than by using "keyword by content" which allows keyword overloading. Certainly designing a language which identifies keywords by context is not without its costs. PL/I is rather "clunky" with its apparently gratuitous parentheses and commas. And how can one argue against the simplicity, clarity and ease of use of the Rexx family of languages? There is clearly a trade-off. For example, those of us old enough to remember the original Basic may remember the seemingly unnecessary "Let" keyword required before assignment statements. It guaranteed that the first statement word was a keyword, thus avoiding the keyword breakage problem without source look ahead. Am I suggesting that there should be a keyword or punctuation in front of method statements? Perhaps, although it detracts from the simplicity of the language, and there may be other solutions. Bottom line, I'm suggesting that changing NetRexx so that it does not allow keyword overloading would be a significant language improvement. Of course, it may be too late - NetRexx is in its second version and there is already a substantial code base. But if there is a "NetRexx 3", I would definitely lobby for it to have true keyword independence rather than keyword by content.
I would certainly agree that languages in which keywords can not be differentiated from non-keywords without examining the word content will always have extensibility problems of one kind or another. We should certainly teach "If you want your language to be extensible, find keywords by context, not content."
Bill _______________________________________________ Ibm-netrexx mailing list [hidden email] |
Bill,
thanks for the long reply. It would seem we agree on the objective,
but disagree on the means to achieve that end. Maybe it's time for an
entorely new language :-). Actually it might be nice to have a
Rexx-like language that covers what C does now -- low level access to registers,
etc.
But I
think I'll leave that to someone else -- I've designed more than enough
languages for one lifetime, and am weary of the same old arguments and going
back to scratch on every discussion.
Cheers
-- Mike
_______________________________________________ Ibm-netrexx mailing list [hidden email] |
In reply to this post by billfen
I can't say that I understand this discussion very well but
unfortunately I may need to understand it better because one of my
projects will eventually need a parser for NetRexx code (as well as
Java and perhaps others). If I understand this discussion even a
little bit, it seems that the standard parser generators like ANTLR
or JavaCC won't work with NetRexx source code. Is that a true
conclusion?
I understand the problem with full support for editor highlighting but that is not what concerns me. What about other needed tools for NetRexx such as a graphical IDE or even plugins for existing IDEs like Eclipse or NetBeans? Are we on our own for providing the needed parsing tools? Do we just need to wait and hope to pick up Mike's parse code from the open source NetRexx compiler? Can anyone clarify this situation for me? TIA, -- Kermit On 11/14/2010 1:46 PM, Bill Fenlason wrote: Hi Mike, _______________________________________________ Ibm-netrexx mailing list [hidden email] |
In reply to this post by Mike Cowlishaw
thank Mike, for your contribution :-) and Mike, for your frank answer :-) both for you are *true*. I did deliberately *NOT* reply, but I did file all the pro's and con's. Maybe we can discuss a *possible solution* (template pattern recognition) with a beer in the Netherlands, Mike? I hope to meet you all at the REXXLA 2010 meeting! Thomas Schneider. ======================================================= Am 15.11.2010 08:00, schrieb Mike Cowlishaw:
_______________________________________________ Ibm-netrexx mailing list [hidden email]
Tom. (ths@db-123.com)
|
In reply to this post by Mike Cowlishaw
Mike,
On 11/15/2010 2:00 AM, Mike Cowlishaw wrote:
Right on about the objectives and our minor disagreement. I've been thinking about a better C-level language for some time - no details finalized yet. Of course Rexx sets the standard for natural, easy to use programming languages. I'm more an implementer than a language designer (just a novice in that arena), but if I ever get specs on paper you would be the first I ask to review it :)
If you haven't already done so, I hope you write an essay or blog entry on "How to Design a Programming Language". With your experience and accomplishments, it should be a classic and required reading for all computer science students.
Bill _______________________________________________ Ibm-netrexx mailing list [hidden email] |
Hi Bill,
first of all, my apologies, that I did say two time 'Mike' *and not' Bill, Mike. :-( Simply a mis-typing. Second, I'm thinking a lot about the meaning / denotation of languages. I would really like to discuss this (and also my trial implementation in ReyC and PP) with you. OffLine, when ibm-netrexx wouldn't be interested in the details. :-) Any chance to meet you at the REXXLA 2010 Symposium ? Thomas Schneider. ============================================================: Am 15.11.2010 15:21, schrieb Bill Fenlason: Mike, _______________________________________________ Ibm-netrexx mailing list [hidden email]
Tom. (ths@db-123.com)
|
In reply to this post by billfen
On 15 Nov 2010, at 14:21, Bill Fenlason wrote:
I second that! Connor. _______________________________________________ Ibm-netrexx mailing list [hidden email] |
I've often wondered if the 'idea' of NetREXX predated JAVA itself.
?? BobH Richardson,TEXAS On Mon, Nov 15, 2010 at 2:58 PM, Connor Birch <[hidden email]> wrote:
_______________________________________________ Ibm-netrexx mailing list [hidden email] |
In reply to this post by Kermit Kiser
Kermit,
On 11/15/2010 3:44 AM, Kermit Kiser wrote: I can't say that I understand this discussion very well but unfortunately I may need to understand it better because one of my projects will eventually need a parser for NetRexx code (as well as Java and perhaps others). If I understand this discussion even a little bit, it seems that the standard parser generators like ANTLR or JavaCC won't work with NetRexx source code. Is that a true conclusion? Well, sort of. From a practical standpoint, processing NetRexx source with a generated scanner / parser will work reasonably well. I use a JavaCC parser in my Eclipse plugin. The primary source of problems is if NetRexx keywords are used as variable names in the source, and generally that is more the exception than the rule. The NetRexx 2 document (page 13) describes the situation: "Similarly, the rules for keyword recognition allow instructions to be added whenever required without compromising the integrity of existing programs. There are no reserved keywords in NetRexx; variable names chosen by a programmer always take precedence over recognition of keywords. This ensures that NetRexx programs may safely be executed, from source, at a time or place remote from their original writing – even if in the meantime new keywords have been added to the language." More details are on page 77:: "Further, if a current local variable, method argument, or property has the same name as a keyword then the keyword will not be recognized. This important rule allows NetRexx to be extended with new keywords in the future without invalidating existing programs." Overall, the biggest problem with using a generated parser is that the NetRexx language is designed to be interpreted, not compiled. The example on page 77 illustrates the kind of problem that may occur if a grammar generated parser is used. "Thus, for example, this sequence in a program with no say variable: say 'Hello' say('1') say=3 say 'Hello' would be a say instruction, a call to some say method, an assignment to a say variable, and an error." Note that the first and fourth statements are exactly the same, but require different processing during and after parsing. The first statement "say" is a keyword because there is no "say" variable at that point. The third statement creates a variable named "say". The fourth statement is an error because: "a variable named say" 'Hello' ... is an error. A simple generated parser will most likely parse the first and fourth statements the same way, and thus will be unable to mirror the NetRexx interpreter processing. A more sophisticated parser might attempt to keep a symbol table (or AST) and be able to alter the token type of "say" from "keyword" to "variable" when the third statement is processed so the fourth statement can be parsed correctly. In my opinion, attempting to use a generated parser for NetRexx is reasonably OK if the usage of keywords as variable names can be ruled out, but is certainly a major headache if not. The easy way out for a NetRexx source processing program which uses a generated parser will be to mandate that all NetRexx keywords will be treated as reserved words and let the chips fall where they may. Unfortunately this may be exactly opposite of what Mike intended. Another significant problem is that if and when a new keyword is added to NetRexx, the traditional parser will probably have to be changed, and old programs which used the new keyword as a name may not be processed correctly by the revised parser. An old program which used the new keyword as a name will be correctly processed by the NetRexx interpreter. That was Mike's primary objective: don't break old programs when the NetRexx language is extended. I believe he succeeded with that. This is more a problem for the traditional parser methodology than with NetRexx, in that traditional parsers in the generated scanner / parser model do not handle languages without reserved words particularly easily. What is required in the parser is the ability the change the type of a keyword token (identified by the scanner) to an identifier token when that is appropriate. (At least that is the approach I used in my NetRexx and PL/I grammars for JavaCC.) The example above (which demonstrates that it is a dynamic problem) just makes the parser all the more complicated. I understand the problem with full support for editor highlighting but that is not what concerns me. What about other needed tools for NetRexx such as a graphical IDE or even plugins for existing IDEs like Eclipse or NetBeans? Are we on our own for providing the needed parsing tools? Do we just need to wait and hope to pick up Mike's parse code from the open source NetRexx compiler? Perhaps readers can now understand why I have ranted on and on about the NetRexx source release. (Being a PIA wasn't my intent, just a consequence :) Can anyone clarify this situation for me? TIA, The problem I was addressing earlier is one of the finer points related to the need to search all the included classes rather than the fundamental problem of keyword and variable name clashes. If a keyword name is a method name in an included class, it is theoretically possible for the keyword to be overloaded. That means if a source processing program is to precisely mirror the operation of the NetRexx translator, it must also process dynamically (i.e. examine data not within the NetRexx source) in some situations. Unfortunately the example shown on page 77 did not include "say ;". Our interpretation is that it could either be a "say" keyword statement or an included method call in some cases. Certainly I may have created a tempest in a teapot - I believe that the use of the strictargs option may eliminate this particular problem. That means "say ;" would be a keyword (if not already a variable), and "say() ;" would be a method call. Unfortunately the default is nostrictargs, which makes the parens optional when there are no arguments. I honestly don't know all the situations in which names external to the source can be confused with keywords. What I do know is that the NetRexx interpreter gives priority to the name, and traditional parsers will (most likely) give priority to the keyword. (I wish I had the NetRexx source to research this in depth). As I said earlier, I expect that many programs which process NetRexx source will use ANTLR, JavaCC or some other traditional parser approach, and require that the NetRexx keyword set be reserved. If that is done (without also handling the dynamic problem and the keyword-name clashes), the source processing program may not correctly anticipate the operation of the NetRexx interpreter. It is not known how often that will occur. Perhaps so infrequently that it really won't matter, or perhaps it will be a significant problem. My final suggestion was to change the NetRexx language syntax so that names can never be confused with keywords. I think that would eliminate several problems. I am still quite troubled by the implications of the example on page 77, in which the first "say 'Hello' " is valid but the second "say 'Hello' " is an error. I believe the language would be better (and less surprising - see "astonishment factor", page 13) such that if the first token of a statement is a word followed by space (and then not "=") that it should always be a keyword, but that is another discussion. Again, that would be the case if all keywords are identified by context (a la PL/I) and thus never confused with names. -- KermitBill PS - Sorry for the long append. Hopefully it answered more questions than it created. _______________________________________________ Ibm-netrexx mailing list [hidden email] |
Bill,
I've been following your remarks on keywords, etc with great interest. Here are a couple of things I've wondered about.
On Mon, Nov 15, 2010 at 10:48 PM, Bill Fenlason <[hidden email]> wrote:
_______________________________________________ Ibm-netrexx mailing list [hidden email] |
Hello Bill, George,
thanks you both for your comments. One comment from me (in addition): Please do NOT change the operation of the ABUT and BLANK operator. It#s one of the *most useful* and *genious* and *elegant* features of the REXX family of languages :-) I would, however, go even further a step my allowing method-name BLANK parameters (in addition to Mikes: method-name(parameters) ) Supposed that we are the having a syntax driven parser, the parameters might even have a syntax, and we an do funnny things in NetRexx, i.e. implement ADRESS TSO EXECIO parameters (in NetRexx). I do call these ACTIONS in my trial Rey Compiler implementation... Thomas. Note, however, that I'm helding those definitions EXTERNAL to the program, in a plain Text file, readin when the source-language changes ... ======================================================== Am 16.11.2010 17:20, schrieb George Hovey: Bill, _______________________________________________ Ibm-netrexx mailing list [hidden email]
Tom. (ths@db-123.com)
|
In reply to this post by George Hovey-2
Correct me if I'm wrong (which in this group goes without saying) but
did not this whole issue arise from a litany of the problems parsing NetRexx source code? It seems to me that these are not issues of _programming_ in NetRexx or of maintaining a NetRexx application, regardless the size. They instead reflect the difficulty of trying to externally parse a NetRexx program's source statements, certainly a second-order problem. Statically parsing the source of any interpreted language is a non-trivial task in the first place. Mike has unambiguously stated that he sides with the language user over the language processor developer. Making Rexx/NetRexx behavior easier for the LPD is a rare benefit to an individual at the perpetual expense of all users. The fact that NetRexx confounds antlr, yacc, javacc, et ilk is a plus in my book. -Chip- On 11/16/10 16:20 George Hovey said: > Bill, > I've been following your remarks on keywords, etc with great interest. > Here are a couple of things I've wondered about. > > * I have a nagging doubt about the claim that NetRexx's current > approach to new keywords won't break old programs. Perhaps I > misunderstand but it seems to me that it could hamper the user's > ability to maintain a program, and this might have serious > consequences in, say, managing a big project subject to continual > fixes and updates. Since we can't predict how a user might run > afoul of a keyword conflict (eg, how extensively it might affect > his code), he could conceivably be placed in a considerable bind > if a NetRexx change came at a delicate stage of some program > modification. Or is this argument defective? Even if so, it > still sounds desirable (in principle) to definitively remove > keywords as an issue, as you describe. Specifically, what would > have to change? Could old and new ways coexist? > * You mention that the "blank concatenate operator" is a sticking > point in language processing. Does this also apply to > concatenation by abutment? How would you propose to handle > concatenation? > > > On Mon, Nov 15, 2010 at 10:48 PM, Bill Fenlason <[hidden email] > <mailto:[hidden email]>> wrote: > > Kermit, > > On 11/15/2010 3:44 AM, Kermit Kiser wrote: >> I can't say that I understand this discussion very well but >> unfortunately I may need to understand it better because one of my >> projects will eventually need a parser for NetRexx code (as well >> as Java and perhaps others). If I understand this discussion even >> a little bit, it seems that the standard parser generators like >> ANTLR or JavaCC won't work with NetRexx source code. Is that a >> true conclusion? > > Well, sort of. From a practical standpoint, processing NetRexx > source with a generated scanner / parser /will /work reasonably > well. I use a JavaCC parser in my Eclipse plugin. The primary > source of problems is if NetRexx keywords are used as variable names > in the source, and generally that is more the exception than the rule. > > The NetRexx 2 document (page 13) describes the situation: > "Similarly, the rules for keyword recognition allow instructions to > be added whenever required without compromising the integrity of > existing programs. There are *no *reserved keywords in NetRexx; > variable names chosen by a programmer always take precedence over > recognition of keywords. This ensures that NetRexx programs may > safely be executed, from source, at a time or place remote from > their original writing – even if in the meantime new keywords have > been added to the language." > > More details are on page 77:: "Further, if a current local > variable, method argument, or property has the same name as a > keyword then the keyword will not be recognized. This important rule > allows NetRexx to be extended with new keywords in the future > without invalidating existing programs." > > Overall, the biggest problem with using a generated parser is that > the NetRexx language is designed to be interpreted, not compiled. > > The example on page 77 illustrates the kind of problem that may > occur if a grammar generated parser is used. > > "Thus, for example, this sequence in a program with no say variable: > say 'Hello' > say('1') > say=3 > say 'Hello' > would be a say instruction, a call to some say method, an assignment > to a say variable, and an error." > > Note that the first and fourth statements are exactly the same, but > require different processing during and after parsing. The first > statement "say" is a keyword because there is no "say" variable at > that point. The third statement creates a variable named "say". > The fourth statement is an error because: "a variable named say" > 'Hello' ... is an error. > > A simple generated parser will most likely parse the first and > fourth statements the same way, and thus will be unable to mirror > the NetRexx interpreter processing. A more sophisticated parser > might attempt to keep a symbol table (or AST) and be able to alter > the token type of "say" from "keyword" to "variable" when the third > statement is processed so the fourth statement can be parsed correctly. > > In my opinion, attempting to use a generated parser for NetRexx is > reasonably OK if the usage of keywords as variable names can be > ruled out, but is certainly a major headache if not. The easy way > out for a NetRexx source processing program which uses a generated > parser will be to mandate that all NetRexx keywords will be treated > as reserved words and let the chips fall where they may. > Unfortunately this may be exactly opposite of what Mike intended. > > Another significant problem is that if and when a new keyword is > added to NetRexx, the traditional parser will probably have to be > changed, and old programs which used the new keyword as a name may > not be processed correctly by the revised parser. > > An old program which used the new keyword as a name will be > /correctly /processed by the NetRexx interpreter. That was Mike's > primary objective: don't break old programs when the NetRexx > language is extended. I believe he succeeded with that. > > This is more a problem for the traditional parser methodology than > with NetRexx, in that traditional parsers in the generated scanner / > parser model do not handle languages without reserved words > particularly easily. What is required in the parser is the ability > the change the type of a keyword token (identified by the scanner) > to an identifier token when that is appropriate. (At least that is > the approach I used in my NetRexx and PL/I grammars for JavaCC.) > The example above (which demonstrates that it is a dynamic problem) > just makes the parser all the more complicated. > > >> I understand the problem with full support for editor highlighting >> but that is not what concerns me. What about other needed tools >> for NetRexx such as a graphical IDE or even plugins for existing >> IDEs like Eclipse or NetBeans? Are we on our own for providing the >> needed parsing tools? Do we just need to wait and hope to pick up >> Mike's parse code from the open source NetRexx compiler? > > Perhaps readers can now understand why I have ranted on and on about > the NetRexx source release. (Being a PIA wasn't my intent, just a > consequence :) > > >> Can anyone clarify this situation for me? TIA, > > The problem I was addressing earlier is one of the finer points > related to the need to search all the included classes rather than > the fundamental problem of keyword and variable name clashes. > > If a keyword name is a method name in an included class, it is > theoretically possible for the keyword to be overloaded. That means > if a source processing program is to /precisely /mirror the > operation of the NetRexx translator, it must also process > dynamically (i.e. examine data not within the NetRexx source) in > some situations. Unfortunately the example shown on page 77 did not > include "say ;". Our interpretation is that it could either be a > "say" keyword statement or an included method call in some cases. > > Certainly I may have created a tempest in a teapot - I believe that > the use of the /strictargs /option may eliminate this particular > problem. That means "say ;" would be a keyword (if not already a > variable), and "say() ;" would be a method call. Unfortunately the > default is /nostrictargs/, which makes the parens optional when > there are no arguments. I honestly don't know all the situations in > which names external to the source can be confused with keywords. > What I do know is that the NetRexx interpreter gives priority to the > name, and traditional parsers will (most likely) give priority to > the keyword. (I wish I had the NetRexx source to research this in > depth). > > As I said earlier, I expect that many programs which process NetRexx > source will use ANTLR, JavaCC or some other traditional parser > approach, and require that the NetRexx keyword set be reserved. > > If that is done (without also handling the dynamic problem and the > keyword-name clashes), the source processing program may not > correctly anticipate the operation of the NetRexx interpreter. It > is not known how often that will occur. Perhaps so infrequently > that it really won't matter, or perhaps it will be a significant > problem. > > My final suggestion was to change the NetRexx language syntax so > that names can never be confused with keywords. I think that would > eliminate several problems. > > I am still quite troubled by the implications of the example on page > 77, in which the first "say 'Hello' " is valid but the second "say > 'Hello' " is an error. I believe the language would be better (and > less surprising - see "/astonishment factor/", page 13) such that if > the first token of a statement is a word followed by space (and then > not "=") that it should always be a keyword, but that is another > discussion. Again, that would be the case if all keywords are > identified by context (a la PL/I) and thus never confused with names. > >> -- Kermit > Bill > > PS - Sorry for the long append. Hopefully it answered more > questions than it created. > > _______________________________________________ > Ibm-netrexx mailing list > [hidden email] <mailto:[hidden email]> > > > > > ------------------------------------------------------------------------ > > _______________________________________________ > Ibm-netrexx mailing list > [hidden email] > _______________________________________________ Ibm-netrexx mailing list [hidden email] |
Chip,
If it is true that NetRexx's policy on handling new keywords is deficient (and I've indicated my uncertainty on this point) it is surely of burning interest to the user, and not just an arcane point. On Tue, Nov 16, 2010 at 1:13 PM, Chip Davis <[hidden email]> wrote: Correct me if I'm wrong (which in this group goes without saying) but did not this whole issue arise from a litany of the problems parsing NetRexx source code? _______________________________________________ Ibm-netrexx mailing list [hidden email] |
In reply to this post by Thomas.Schneider.Wien
_______________________________________________ Ibm-netrexx mailing list [hidden email] |
In reply to this post by George Hovey-2
On 11/16/2010 11:20 AM, George Hovey wrote:
Bill,Suppose NetRexx adds an "xyz" statement, and an old program has a variable or method named "xyz". When the NetRexx interpreter processes the old program, "xyz" will not be recognized as a keyword, even though the new interpreter can process "xyz" statements. The trade off is that the old program cannot use the new "xyz" statement until it is changed so that "xyz" is no longer a variable or method name. That is how NetRexx successfully remains keyword independent. If programs are being modified after the "xyz" statement is added, users will have the opportunity to make the changes at that point (if necessary). But old unmodified programs will always work. I've always felt that the invention of the blank (and abutment) operators was brilliant. The problem I referred to is that (in essence) an expression consists of a variable followed by any number of operator, variable pairs. It gets more complicated than that, but when all is said and done the end of an expression (in other languages) can be recognized because two consecutive variables are not allowed. A following word can be known to be a keyword (by context). Because of the blank concatenate operator, a word following a variable in an expression can either be a keyword or another variable. In Rexx any keyword following an expression is reserved, but in NetRexx variables are given priority over keywords. For example, the sequence: to = 'abc'; loop i = 1 to 3; will likely fail because it is equivalent to loop i = 1 || " " || 'abc' || " " || 3; As Mike mentioned in an earlier append, there are various ways to design the concatenate operator. Bill _______________________________________________ Ibm-netrexx mailing list [hidden email] |
>The trade off is that the old program cannot use the
new "xyz" statement until it is changed so that "xyz" is no longer a
variable or method name.
Exactly the case I had in mind. This could conceivably cause serious pain, and there is no justification for calling it improbable since the probability is unknowable. Can we say that it is unimportant to head off this possibility, especially if it can be totally prevented? [I have no idea how, or whether the cost of doing so would be acceptable.] As I've noted before, at least one language (FORTRAN) has no reserved words and can recognize a keyword followed by a variable with or without an intervening space. Has the secret of this technology died with Backus? I take it you see no need to adjust NetRexx's ideas on concatenation, which I'm glad to hear, as it would break ALL of my programs :) ! On Tue, Nov 16, 2010 at 2:34 PM, Bill Fenlason <[hidden email]> wrote:
_______________________________________________ Ibm-netrexx mailing list [hidden email] |
Free forum by Nabble | Edit this page |