I need some help with Translate in NetRexx. From an input record, I need to change any non-numeric characters to blank , leaving 0123456789- where ever they occur. Some years back I had this working in OBJREXX but cannot find the code.
_______________________________________________ Ibm-netrexx mailing list [hidden email] |
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1 Robert Hamilton schrieb am 28.03.2010 01:00: > > I need some help with Translate in NetRexx. From an input record, I need > to change any non-numeric characters to blank , leaving 0123456789- > where ever they occur. Some years back I had this working in OBJREXX but > cannot find the code. > > > thnx > > bobh I can tell you the "Java API" way to do this in NetRexx: 'bla123-bla-123'.replaceAll('[^0-9\\-]',' ') - -- Mit freundlichen Gruessen / Regards Patric Bechtel -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: GnuPT 2.5.2 iEYEARECAAYFAkuuntgACgkQfGgGu8y7ypDOwQCg+hHpH1BjkWcJ9NzHMDOAQXwh D/cAoO2SfgFvvpGyk7XWDuxQ/MzAvuiy =axZe -----END PGP SIGNATURE----- _______________________________________________ Ibm-netrexx mailing list [hidden email] |
In reply to this post by Robert L Hamilton
This may be somewhat dependent on character sets, but if you have an
input translate table and output translate table defined as follows:
it='\0'.sequence('\xff') ot=' '.copies(256).overlay('0123456789','0'.c2d+1) Then the following record: rec="a0987654321sadf09253490-3t4l'dkfgdi09531=5798" can be translated like this: say rec.translate(ot,it) giving output like this: 0987654321 09253490 3 4 09531 5798 HTH -- Kermit Robert Hamilton wrote:
_______________________________________________ Ibm-netrexx mailing list [hidden email] |
In reply to this post by Robert L Hamilton
Deferring to Walter Pachl's more elegant solution than the one I proffered, I
was going to suggest: s = "a0987654321sadf09253490-3t4l'dkfgdi09531=5798" Say s.translate('1234567890','1234567890'||xrange) only to find that xrange() is conspicuous by its absence. I'm not sure why. Mike, can you elaborate? -Chip- On 3/28/10 00:00 Robert Hamilton said: > > I need some help with Translate in NetRexx. From an input record, I need > to change any non-numeric characters to blank , leaving 0123456789- > where ever they occur. Some years back I had this working in OBJREXX but > cannot find the code. _______________________________________________ Ibm-netrexx mailing list [hidden email] |
In reply to this post by Kermit Kiser
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1 Kermit Kiser schrieb am 28.03.2010 03:00: > This may be somewhat dependent on character sets, but if you have an > input translate table and output translate table defined as follows: > > it='\0'.sequence('\xff') > ot=' '.copies(256).overlay('0123456789','0'.c2d+1) > > > Then the following record: > > rec="a0987654321sadf09253490-3t4l'dkfgdi09531=5798" > > can be translated like this: > > say rec.translate(ot,it) > > giving output like this: > > 0987654321 09253490 3 4 09531 5798 May I remark that the reason for my suggestion is that the translate command of Rexx doesn't work all too well in Java because of the lack of Unicode awareness. Never treat characters in Java as bytes. - -- Mit freundlichen Gruessen / Regards Patric Bechtel -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: GnuPT 2.5.2 iEYEARECAAYFAkuuwFAACgkQfGgGu8y7ypA0yQCgspQC1W2zNQFHmovC5Rs5YvBs v9UAoP57oYm7Bw8X95do6v64YXRpDPpK =s6We -----END PGP SIGNATURE----- _______________________________________________ Ibm-netrexx mailing list [hidden email] |
In reply to this post by Aviatrexx
Offhand, I would guess that lack of Unicode support is the reason. The
newer more general "sequence" method would work with your example however. s = "a0987654321sadf09253490-3t4l'dkfgdi09531=5798" Say s.translate('1234567890','1234567890'||'!'.sequence('~')) -- Kermit -- Chip Davis wrote: > Deferring to Walter Pachl's more elegant solution than the one I > proffered, I was going to suggest: > > s = "a0987654321sadf09253490-3t4l'dkfgdi09531=5798" > Say s.translate('1234567890','1234567890'||xrange) > > only to find that xrange() is conspicuous by its absence. I'm not > sure why. > > Mike, can you elaborate? > > -Chip- > > On 3/28/10 00:00 Robert Hamilton said: >> >> I need some help with Translate in NetRexx. From an input record, I >> need to change any non-numeric characters to blank , leaving >> 0123456789- where ever they occur. Some years back I had this working >> in OBJREXX but cannot find the code. > > _______________________________________________ > Ibm-netrexx mailing list > [hidden email] > > > Ibm-netrexx mailing list [hidden email] |
In reply to this post by Patric Bechtel
Mike's TNRL book indicates that NetRexx supports Unicode so I don't see
why the translate method would not work with Unicode data, but I don't
know how to prove it one way or the other. Do you know a way?
-- Kermit Patric Bechtel wrote: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Kermit Kiser schrieb am 28.03.2010 03:00:This may be somewhat dependent on character sets, but if you have an input translate table and output translate table defined as follows: it='\0'.sequence('\xff') ot=' '.copies(256).overlay('0123456789','0'.c2d+1) Then the following record: rec="a0987654321sadf09253490-3t4l'dkfgdi09531=5798" can be translated like this: say rec.translate(ot,it) giving output like this: 0987654321 09253490 3 4 09531 5798May I remark that the reason for my suggestion is that the translate command of Rexx doesn't work all too well in Java because of the lack of Unicode awareness. Never treat characters in Java as bytes. - -- Mit freundlichen Gruessen / Regards Patric Bechtel -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: GnuPT 2.5.2 iEYEARECAAYFAkuuwFAACgkQfGgGu8y7ypA0yQCgspQC1W2zNQFHmovC5Rs5YvBs v9UAoP57oYm7Bw8X95do6v64YXRpDPpK =s6We -----END PGP SIGNATURE----- _______________________________________________ Ibm-netrexx mailing list [hidden email] _______________________________________________ Ibm-netrexx mailing list [hidden email] |
In reply to this post by Aviatrexx
Chip wrote:
> I was going to suggest: > > s = "a0987654321sadf09253490-3t4l'dkfgdi09531=5798" > Say s.translate('1234567890','1234567890'||xrange) > > only to find that xrange() is conspicuous by its absence. > I'm not sure why. > > Mike, can you elaborate? Character strings in NetRexx/Java are Unicode (the subset that can be UTF-8-encoded), whereas xrange in 'classic' Rexx produces a series of bytes, which could really only be represented as a byte array. However, all the "BIFs" work on character strings (of type Rexx). Indeed, the NetRexx language as such does not require that binary data (such as bytes) exist, although it admits that implementations might support such antique concepts :-). For the same reason x2c (for example) can only produce a single character. Translate should be fine. Mike _______________________________________________ Ibm-netrexx mailing list [hidden email] |
In reply to this post by Aviatrexx
> only to find that xrange() is conspicuous by its absence. > I'm not sure why. > > Mike, can you elaborate? Forgot to add: also, the default call to xrange in Rexx returns 'all possible character encodings'; that would be a very long string in NetRexx. Mike _______________________________________________ Ibm-netrexx mailing list [hidden email] |
In reply to this post by Mike Cowlishaw
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1 Mike Cowlishaw schrieb am 28.03.2010 11:23: > Chip wrote: > >> I was going to suggest: >> >> s = "a0987654321sadf09253490-3t4l'dkfgdi09531=5798" >> Say s.translate('1234567890','1234567890'||xrange) >> >> only to find that xrange() is conspicuous by its absence. >> I'm not sure why. >> >> Mike, can you elaborate? > > Character strings in NetRexx/Java are Unicode (the subset that can be > UTF-8-encoded), whereas xrange in 'classic' Rexx produces a series of > bytes, which could really only be represented as a byte array. However, > all the "BIFs" work on character strings (of type Rexx). Indeed, the > NetRexx language as such does not require that binary data (such as bytes) > exist, although it admits that implementations might support such antique > concepts :-). > > For the same reason x2c (for example) can only produce a single character. > > Translate should be fine. Hi Mike, sorry, wasn't my intention telling that translate does not work. It's just the wrong tool for the given usecase. It's good if you want to translate something into something else, but it's just plain condemned to fail in a case, where it should replace "everything but" in a unicode case. - -- Mit freundlichen Gruessen / Regards Patric Bechtel -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: GnuPT 2.5.2 iEYEARECAAYFAkuvZmYACgkQfGgGu8y7ypDr2wCg052wQPqfR20viqAwCNY2xMfX g8kAnRB0ArA+f+eDUQ/lkgzRh4dEtWci =JLJ4 -----END PGP SIGNATURE----- _______________________________________________ Ibm-netrexx mailing list [hidden email] |
In reply to this post by Patric Bechtel
Many thanks, I will try figure out how these commands work.
enjoy the day, Bob Hamilton On Sat, Mar 27, 2010 at 7:12 PM, Patric Bechtel <[hidden email]> wrote: -----BEGIN PGP SIGNED MESSAGE----- _______________________________________________ Ibm-netrexx mailing list [hidden email] |
In reply to this post by Patric Bechtel
While I understand the problem with the size of a unicode translate
table, I think one of the strengths of NetRexx /Rexx is in NOT
requiring you to be a regexp wizard!
I propose that we add a Rexx variable method to open source NetRexx with logic similar to the method "replaceifnot" in the following example: s = "a0987654321sadf09253490-3t4l'dkfgdi09531=5798x" say replaceifnot(s,'0123456789') say replaceifnot(s,'0123456789','-') say replaceifnot(s,'0123456789','') method replaceifnot(r,v,n=' ') static tr='' loop forever i=r.verify(v) if i=0 then return tr||r tr=tr||r.left(i-1)||n.left('1'.min(n.length)) r=r.substr(i+1) end which gives the following output (best viewed with fixed width chars): 0987654321 09253490 3 4 09531 5798 -0987654321----09253490-3-4--------09531-5798- 09876543210925349034095315798 Patric Bechtel wrote: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Mike Cowlishaw schrieb am 28.03.2010 11:23:Chip wrote:I was going to suggest: s = "a0987654321sadf09253490-3t4l'dkfgdi09531=5798" Say s.translate('1234567890','1234567890'||xrange) only to find that xrange() is conspicuous by its absence. I'm not sure why. Mike, can you elaborate?Character strings in NetRexx/Java are Unicode (the subset that can be UTF-8-encoded), whereas xrange in 'classic' Rexx produces a series of bytes, which could really only be represented as a byte array. However, all the "BIFs" work on character strings (of type Rexx). Indeed, the NetRexx language as such does not require that binary data (such as bytes) exist, although it admits that implementations might support such antique concepts :-). For the same reason x2c (for example) can only produce a single character. Translate should be fine.Hi Mike, sorry, wasn't my intention telling that translate does not work. It's just the wrong tool for the given usecase. It's good if you want to translate something into something else, but it's just plain condemned to fail in a case, where it should replace "everything but" in a unicode case. - -- Mit freundlichen Gruessen / Regards Patric Bechtel -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: GnuPT 2.5.2 iEYEARECAAYFAkuvZmYACgkQfGgGu8y7ypDr2wCg052wQPqfR20viqAwCNY2xMfX g8kAnRB0ArA+f+eDUQ/lkgzRh4dEtWci =JLJ4 -----END PGP SIGNATURE----- _______________________________________________ Ibm-netrexx mailing list [hidden email] _______________________________________________ Ibm-netrexx mailing list [hidden email] |
In reply to this post by Mike Cowlishaw
Ah, I see. It appears my mainframe SBCS slip is showing... :-)
I guess it's time to acquire a whole new set of character string manipulation idioms. -Chip- On 3/28/10 09:25 Mike Cowlishaw said: > >> only to find that xrange() is conspicuous by its absence. >> I'm not sure why. > > Forgot to add: also, the default call to xrange in Rexx returns 'all > possible character encodings'; that would be a very long string in NetRexx. _______________________________________________ Ibm-netrexx mailing list [hidden email] |
In reply to this post by Kermit Kiser
'verify' always was a badly named function -- direct from
PL/I, as was 'index' .. dropped early on from Rexx.
But a pair of methods might make sense..
foo.posnotin('abcd')
-- returns pos of first char not
listed
bar.changenotin('abcd', 'x') -- changes
unexpected chars to 'x'
-- (like changestr, 'x' can be any length)
The first of these is pretty much Verify. The second
follows 'changestr' precedent. Not great names ...
Mike
From:
[hidden email] [mailto:[hidden email]]
On Behalf Of Kermit Kiser
Sent: 28 March 2010 17:59 To: IBM Netrexx Subject: Re: [Ibm-netrexx] Translate question [was: Welcometo the "Ibm-netrexx"mailing list] While I understand the problem with the size of a unicode translate table, I think one of the strengths of NetRexx /Rexx is in NOT requiring you to be a regexp wizard! _______________________________________________ Ibm-netrexx mailing list [hidden email] |
In reply to this post by Aviatrexx
> I guess it's time to acquire a whole new set of character
> string manipulation idioms. Not necessary .. almost all the Rexx BIFs were/are directly preserved in NetRexx. The only ones that needed changes are the '2c' and 'c2' ones (and xrange) -- that assumed one character = one byte. Maybe five in total .. everything else just works :-) Mike _______________________________________________ Ibm-netrexx mailing list [hidden email] |
I know (and am very grateful!) but some of my more clever Rexx tricks started
out as Assembler byte-dependent idioms. Good thing the character-rearrangement Translate() trick still works. :-) -Chip- On 3/28/10 19:09 Mike Cowlishaw said: >> I guess it's time to acquire a whole new set of character >> string manipulation idioms. > > Not necessary .. almost all the Rexx BIFs were/are directly preserved in > NetRexx. The only ones that needed changes are the '2c' and 'c2' ones (and > xrange) -- that assumed one character = one byte. Maybe five in total .. > everything else just works :-) _______________________________________________ Ibm-netrexx mailing list [hidden email] |
In reply to this post by Mike Cowlishaw
Hi there, may I add that the www.Rexx2Nrx.com
run-time package contains the functions c2x and x2c with STRINGS as the argument as well.... Thomas. ================================================================= Mike Cowlishaw schrieb: >> I guess it's time to acquire a whole new set of character >> string manipulation idioms. >> > > Not necessary .. almost all the Rexx BIFs were/are directly preserved in > NetRexx. The only ones that needed changes are the '2c' and 'c2' ones (and > xrange) -- that assumed one character = one byte. Maybe five in total .. > everything else just works :-) > > Mike > > _______________________________________________ > Ibm-netrexx mailing list > [hidden email] > > > _______________________________________________ Ibm-netrexx mailing list [hidden email]
Tom. (ths@db-123.com)
|
> Hi there, may I add that the www.Rexx2Nrx.com run-time > package contains the functions c2x and x2c with STRINGS as > the argument as well.... So what does it do when (for example) c2x has characters in the input string such as \u8212 (emdash)? If all characters go to 16 bits, then it's not compatible with classic Rexx. If they are decapitated to 8 bits, then c2x and x2c are not reversible. And then there are 32-bit Unicodes (which Java did not support when I did NetRexx, but maybe it does now). Mike _______________________________________________ Ibm-netrexx mailing list [hidden email] |
Indeed. Trying to maintain compatible behavior with this was a major
impediment to any plans for making ooRexx change to Unicode-based strings. I was constantly seeing presentations showing how to accomplish different things that would just fail once characters were no longer 8-bits long. Bob's translate() example happens to be one of those. Rick On Mon, Mar 29, 2010 at 9:26 AM, Mike Cowlishaw <[hidden email]> wrote: > >> Hi there, may I add that the www.Rexx2Nrx.com run-time >> package contains the functions c2x and x2c with STRINGS as >> the argument as well.... > > So what does it do when (for example) c2x has characters in the input > string such as \u8212 (emdash)? If all characters go to 16 bits, then > it's not compatible with classic Rexx. If they are decapitated to 8 bits, > then c2x and x2c are not reversible. And then there are 32-bit Unicodes > (which Java did not support when I did NetRexx, but maybe it does now). > > Mike > > _______________________________________________ > Ibm-netrexx mailing list > [hidden email] > > _______________________________________________ Ibm-netrexx mailing list [hidden email] |
In reply to this post by Mike Cowlishaw
Hello Mike,
I am NOT sure whether I do understand your question at all. I simply repeatedly call your c2x, and concatenate the result together. And vice versa. But I do fill in leading Hex zeroes to pad the result of your 'c2x' to 32 bits (4 Hex chars) per character, as far as I can remember ... Thomas. ================================================= Mike Cowlishaw schrieb: > > >> Hi there, may I add that the www.Rexx2Nrx.com run-time >> package contains the functions c2x and x2c with STRINGS as >> the argument as well.... >> > > So what does it do when (for example) c2x has characters in the input > string such as \u8212 (emdash)? If all characters go to 16 bits, then > it's not compatible with classic Rexx. If they are decapitated to 8 bits, > then c2x and x2c are not reversible. And then there are 32-bit Unicodes > (which Java did not support when I did NetRexx, but maybe it does now). > > Mike > > _______________________________________________ > Ibm-netrexx mailing list > [hidden email] > > > _______________________________________________ Ibm-netrexx mailing list [hidden email]
Tom. (ths@db-123.com)
|
Free forum by Nabble | Edit this page |