My NetRexx source files are in character encoding ISO-8859-1 (Latin 1). I have recently conceived a need for including Greek letters (complete upper and lower case set), as well as ASCII letters, in symbols (variable names). My understanding is that this is permitted in both NetRexx and Java, assuming Unicode is used. Since this is the only foreign language support I need, (I guess) the most economical Unicode encoding would be UTF-8.
I'm working under Windows 7, and I use the high-legibility ClearType font "Consolas" and hope to continue using it, if possible. This is shown in the font directory as covering Latin, Greek and Cyrillic. It doesn't mention Unicode. Can anyone offer advice on how to get myself to the point where I can use greek letters in symbols? Thanks, George _______________________________________________ Ibm-netrexx mailing list [hidden email] Online Archive : http://ibm-netrexx.215625.n3.nabble.com/ |
George,
I'm no expert but seeing no response, let me chime in.
In general this shouldn´t be too much of a problem. NetRexx will happily process unicode text. You need to set jEdit to save your files in UTF-8, well globally (Utilities\Global Options) or on a per buffer basis (Utilities\Buffer Options).
In fact text encoding is only relevant for String or Rexx literals in your program. As you should be internationalizing these with ResourceBundle or similar, you should only need to worry about encoding in the corresponding *.properties files containing them in different translations.
All variants of utf encodings (8/16/32) will encode all Unicode code-points. utf-8 happens to encode in less bytes when text contains mainly the lower Unicode code-point char; more otherwise. I don't use Win 7 so I don't really know about greek support in consolas. I don't know how you would enter greek characters on a US keyboard either. HTH. 2011/6/13 George Hovey <[hidden email]> My NetRexx source files are in character encoding ISO-8859-1 (Latin 1). I have recently conceived a need for including Greek letters (complete upper and lower case set), as well as ASCII letters, in symbols (variable names). My understanding is that this is permitted in both NetRexx and Java, assuming Unicode is used. Since this is the only foreign language support I need, (I guess) the most economical Unicode encoding would be UTF-8. -- Saludos / Regards, David Requena _______________________________________________ Ibm-netrexx mailing list [hidden email] Online Archive : http://ibm-netrexx.215625.n3.nabble.com/ |
David,
Thanks for your thoughts. I didn't know about ResourceBundle so I'll look into that. I think its fairly established that String and Rexx literals can contain any unicode character. But my need is to use Greek letters in identifiers (variable names), e.g. If this is not allowed I'm on a wild goose chase. So I think I'll put that On Tue, Jun 14, 2011 at 10:12 AM, David Requena <[hidden email]> wrote: George, _______________________________________________ Ibm-netrexx mailing list [hidden email] Online Archive : http://ibm-netrexx.215625.n3.nabble.com/ |
In reply to this post by David Requena
David,
Sorry, I inadvertently hit a SEND key. To continue... Thanks for your thoughts. I didn't know about ResourceBundle so I'll look into that. I think its fairly well established that String and Rexx literals and comments can contain Unicode characters. But my need is to use Greek letters in identifiers (variable names), e.g. AnIdentifierContainingGreekLetters = Rexx (or BufferedReader or any type) If this is not allowed then I'm on a wild goose chase. So I think I'll put that narrower question on a separate thread before continuing. I've studiously avoided Unicode issues in the past, but I guess I have to face up to it. I'm working on the Java Tutorial. Re the keyboard entry issue, I use the jEdit "Character Map" plugin. This works for entering characters from the high page of ISO-8859-1, and the documentation says it can be made to work with Unicode. It was written by Slava, so presumably is not a half-baked effort. Thanks again, George On Tue, Jun 14, 2011 at 10:12 AM, David Requena <[hidden email]> wrote: George, _______________________________________________ Ibm-netrexx mailing list [hidden email] Online Archive : http://ibm-netrexx.215625.n3.nabble.com/ |
George,
[venetia-2:~/test] rvjansen% nrc greek NetRexx portable processor, version 3.00 Copyright (c) RexxLA, 2011. All rights reserved. Parts Copyright (c) IBM Corporation, 1995,2008. Program greek.nrx 2 +++ βγδ = 'does it work? the identifier is called βγδ' +++ ^ +++ Error: Unexpected character found in source: '?' (hexadecimal encoding: 2264) Compilation of 'greek.nrx' failed [one error] [venetia-2:~/test] rvjansen% nrl p 31 mentions two character set "the first is used to express the NetRexx program itself, and is the relatively small set of characters [...]" So that really says it all. best regards, René. On 14 jun 2011, at 17:14, George Hovey wrote: David, _______________________________________________ Ibm-netrexx mailing list [hidden email] Online Archive : http://ibm-netrexx.215625.n3.nabble.com/ |
In reply to this post by George Hovey-2
and, if i am not mistaken, this is the table, in RxClauser, that is used by RxToken to return the type of a token:
/* Here is the setup table for 'standard' NetRexx characters. */ /* Note: white space is handled before translation, for speed. */ /* This table handles the 'core' characters. Unicode letters and */ /* digits (>'\x7f') are handled in-line, when encountered. */ /* White space characters are also in the table. */ /* Note: the euro character (\u20ac) is not in the table as it */ /* would cause the table to become 8K long; it's special-cased. */ intrans=char[]- ('abcdefghijklmnopqrstuvwxyz'- ||'ABCDEFGHIJKLMNOPQRSTUVWXYZ'- ||'_$'- -- special symbol characters ||'1234567890'- ||'+-/*\\%&|=<>'- -- '\\' here is '\' ||'.()[]"'';,'- ||'\t\f') outrans=char[]- ('ssssssssssssssssssssssssss'- -- 's' is non-digit symbol ||'ssssssssssssssssssssssssss'- ||'ss'- ||'nnnnnnnnnn'- -- 'n' is digit ||'ooooooooooo'- -- 'o' is operator ||'.()[]"'';,'- -- specials are themselves ||' ') -- whitespace /* # used for break */ a comment /* 1998.03.08 Allow $ in names */ tells us that the dollar symbol was added later to the allowed set for variable names. best regards, René. On 14 jun 2011, at 17:14, George Hovey wrote: David, _______________________________________________ Ibm-netrexx mailing list [hidden email] Online Archive : http://ibm-netrexx.215625.n3.nabble.com/ |
In reply to this post by George Hovey-2
George,
Have a look at the NetRexx spec. I don´t remember the exact wording but in short yes, you can use Unicode characters in identifiers. Mind you same rules apply as for ascii characters, i.e. you cannot use a variable name which starts with an Hindi character which is in fact an Hindi digit.
Regards, 2011/6/14 George Hovey <[hidden email]> David, -- Saludos / Regards, David Requena _______________________________________________ Ibm-netrexx mailing list [hidden email] Online Archive : http://ibm-netrexx.215625.n3.nabble.com/ |
I'll have TNRL to lookup what I was actually talking about.. 2011/6/14 David Requena <[hidden email]> George, -- Saludos / Regards, David Requena _______________________________________________ Ibm-netrexx mailing list [hidden email] Online Archive : http://ibm-netrexx.215625.n3.nabble.com/ |
Hi All,
On page 35 of TNRL, "Elan" is shown as a legal symbol. The "E" has an accent acute; this is not an ASCII character. The surrounding text mentions the usual symbol characters -- ( A-Z, a-z, 0-9, _) but also mentions extra letters and extra digits, whose purpose is "to improve the readability of programs in languages other than English." TNRL seems to have been written when the Unicode situation was somewhat fluid, and MFC mentions, in footnote 17, that implementations might be based on Unicode or "smaller character sets." AFAIK, there are no other implementations, and I would expect that a revision of the NetRexx standard will declare Unicode to be the only acceptable character set. [To fail to do so could be construed as Anglo-centric or worse.] Javadoc for Java SE6 says a character can start and identifier if Character.isUnicodeIdentifierStart returns true, and an identifier character elsewhere in an identifier if Character.isJavaIdentifierPart returns true. >From the method descriptions, this plainly covers a range much greater than ASCII. However, the question remains "What does NetRexx currently do?" I hope MFC will see this! George On Tue, Jun 14, 2011 at 12:43 PM, David Requena <[hidden email]> wrote:
_______________________________________________ Ibm-netrexx mailing list [hidden email] Online Archive : http://ibm-netrexx.215625.n3.nabble.com/ |
my copy of tnrl2, p 40, says
So this differs from Java. By the way, the list with examples has a $ in front of Virtual3D, which is in line with the fact that dollar and euro were added later. I will do some experimenting later; I am not sure if I am feeding it real UTF-8 - euro also fails right now. René. On 14 jun 2011, at 21:04, George Hovey wrote: Hi All, _______________________________________________ Ibm-netrexx mailing list [hidden email] Online Archive : http://ibm-netrexx.215625.n3.nabble.com/ |
Rene,
That makes sense; isLetter was used before Unicode changed from maximum code of FFFF to 01FFFF. However, Greek letters are not in the extended region so I hope it may be legal with current NetRexx. But I'm not in a position to verify it by experiment. George On Tue, Jun 14, 2011 at 3:33 PM, René Jansen <[hidden email]> wrote:
_______________________________________________ Ibm-netrexx mailing list [hidden email] Online Archive : http://ibm-netrexx.215625.n3.nabble.com/ |
In reply to this post by George Hovey-2
with apologies to David,
after checking that my files were in fact UTF8 and hex dumping them, setting my editors and shells to only accept UTF8, it suddenly occurred to me that you have to compile with option -utf8 to make the translator recognize the encoding. So this works: [venetia-2:~/test] rvjansen% nrc -utf8 elan.nrx NetRexx portable processor, version 3.00 Copyright (c) RexxLA, 2011. All rights reserved. Parts Copyright (c) IBM Corporation, 1995,2008. Program elan.nrx Compilation of 'elan.nrx' successful best regards, René. On 14 jun 2011, at 21:04, George Hovey wrote: Hi All, _______________________________________________ Ibm-netrexx mailing list [hidden email] Online Archive : http://ibm-netrexx.215625.n3.nabble.com/ |
In reply to this post by George Hovey-2
George, I would expect that any NetRexx implementation does
what the documents say!
On Unicode: remember NetRexx is a language and that
might not be tied into Java (that's why the binary options are not part of the
core language), It is still an option for people to provide a simpler
NetRexx with no binary types (for example). I would agree that Unicode (or
at least Unicode covered by UTF8) would be an obvious choice
nowadays.
Or to put this another way: the language specification defines
an 'interchange format'. User guides and compiler documentation define the
details where the language specification allows options. A language
specification that did not allow options and extensions would only define a dead
language.
Mike
_______________________________________________ Ibm-netrexx mailing list [hidden email] Online Archive : http://ibm-netrexx.215625.n3.nabble.com/ |
In reply to this post by rvjansen
René,
No apologies needed whatsoever I wrote that form the top of my head. I had the vague memory of having tested some time long ago. Then when I saw your reply I thought: "ok, an identifier is a symbol but a symbol's not an identifier. Maybe there were further restrictions on identifiers.." Never is a bad moment go back to TNRL. It's always a pleasure in fact :-) - Saludos / Kind regards, David Requena NOTE: The opinions expressed here represent the opinions of the authors and do not necessarily represent the opinions of those who hold other opinions. -----Original Message----- From: René Jansen <[hidden email]> Sender: [hidden email] Date: Tue, 14 Jun 2011 22:04:16 To: IBM Netrexx<[hidden email]> Reply-To: IBM Netrexx <[hidden email]> Subject: Re: [Ibm-netrexx] jEdit Users: Need help with unicode in NetRexx / jEdit / Windows 7 _______________________________________________ Ibm-netrexx mailing list [hidden email] Online Archive : http://ibm-netrexx.215625.n3.nabble.com/ _______________________________________________ Ibm-netrexx mailing list [hidden email] Online Archive : http://ibm-netrexx.215625.n3.nabble.com/ |
Free forum by Nabble | Edit this page |