ibm-netrexx

parse behaviour with binary data

Classic

List

Threaded

4 messages Options

rvjansen

parse behaviour with binary data

I tried to find some info about this in the documentation but I am not really clear if the following behaviour is guaranteed:

parse line . 79 volser 84 . 89 allocatedkb 93 usedkb 97 .

a = ByteUtil.bytesAsInt(allocatedkb.toString().getBytes())

b = ByteUtil.bytesAsInt(usedkb.toString().getBytes())

I am parsing z/OS IDCAMS DCOLLECT output. Allocatedkb and usedkb are signed 31 bit big-endian integers according to the documentation, and the jzos ByteUtil class handles them nicely (i.e. the right numbers come out). Both NetRexx and Java throw number exceptions when trying the fields as-is. When checking the documentation for parse, I only see promises about the parsing of text (string) data, with examples of numbers that are represented in text - I seem not to be able to find a statement on binary data. I understand that the underlying byte array implementation of the reference translator makes this possible- but the parse documentation in nrl3 (p110) states:

parse term template;

"The value of the term is expected to be a string; if it is not a string, it will be converted to a string.

Any variables used in the template are named by non-numeric symbols (that is, they cannot be an array

reference or other term); they refer to a variable or property in the current class. Any values that are

used in patterns during the parse are converted to strings before use.

Any variables set by the parse instruction must have a known string type, or are given the NetRexx

string type, Rexx, if they are new.

The term itself is not changed unless it is a variable which also appears in the template and whose

value is changed by being in the template."

Now: "The term itself is not changed unless [...]" and "The value of the term [...] will be converted to a string" seem to be at odds, and I can only observe that the bits of DCollect's signed ints are left intact. It this something I can count on (are people parsing binary data with parse on other platforms and other Rexx implementations?) or is there a better way?

best regards,

René Jansen.

_______________________________________________
Ibm-netrexx mailing list
[hidden email]
Online Archive : http://ibm-netrexx.215625.n3.nabble.com/

Mike Cowlishaw

RE: parse behaviour with binary data

I tried to find some info about this in the documentation but I am not really clear if the following behaviour is guaranteed:

parse line . 79 volser 84 . 89 allocatedkb 93 usedkb 97 .

a = ByteUtil.bytesAsInt(allocatedkb.toString().getBytes())

b = ByteUtil.bytesAsInt(usedkb.toString().getBytes())

I am parsing z/OS IDCAMS DCOLLECT output. Allocatedkb and usedkb are signed 31 bit big-endian integers according to the documentation, and the jzos ByteUtil class handles them nicely (i.e. the right numbers come out).

A question .. what is the type of the variable 'line' ?

If this works, I suspect that you somehow created 'line' with a method that assigns 8 bits of binary data to one Unicode character (So 0xA5 => the character \u00a5').

Both NetRexx and Java throw number exceptions when trying the fields as-is. When checking the documentation for parse, I only see promises about the parsing of text (string) data, with examples of numbers that are represented in text - I seem not to be able to find a statement on binary data. I understand that the underlying byte array implementation of the reference translator makes this possible- but the parse documentation in nrl3 (p110) states:

parse term template;

"The value of the term is expected to be a string; if it is not a string, it will be converted to a string.

Any variables used in the template are named by non-numeric symbols (that is, they cannot be an array

reference or other term); they refer to a variable or property in the current class. Any values that are

used in patterns during the parse are converted to strings before use.

Any variables set by the parse instruction must have a known string type, or are given the NetRexx

string type, Rexx, if they are new.

The term itself is not changed unless it is a variable which also appears in the template and whose

value is changed by being in the template."

Now: "The term itself is not changed unless [...]" and "The value of the term [...] will be converted to a string" seem to be at odds,

Why 'at odds?' the first quoted bits refers to a variable that might be set, and the second refers to the value of the term.

and I can only observe that the bits of DCollect's signed ints are left intact. It this something I can count on (are people parsing binary data with parse on other platforms and other Rexx implementations?) or is there a better way?

(Other Rexx implementations don't do Unicode yet, so not really relevant.)

The action of PARSE is well-defined and you can count on it.   The tricky bit is how binary data is converted to string data in the first place.

Mike

_______________________________________________
Ibm-netrexx mailing list
[hidden email]
Online Archive : http://ibm-netrexx.215625.n3.nabble.com/

rvjansen

Re: parse behaviour with binary data

This is the whole thing, short enough to copy it in:

"line" is obtained from a bufferedreader of an inoutstreamreader delivered by jzos.Zfile, which is a jni wrapper around the file API of IBM C++. It is cast to a Rexx.

import com.ibm.jzos.

iFile_ = ZFile("//DD:INPUT", "rt")

oFile_ = ZFile("//DD:OUTPUT", "w")

enc = ZUtil.getDefaultPlatformEncoding();

is = iFile_.getInputStream();

rdr = BufferedReader(InputStreamReader(is, enc))

catch Exception

say "file could not be opened:" iFile_

exit

end

enc = ZUtil.getDefaultPlatformEncoding();

os = oFile_.getOutputStream();

btr = BufferedWriter(OutputStreamWriter(os, enc))

wtr = PrintWriter(btr)

catch Exception

say "file could not be opened:" oFile_

exit

end

loop forever

textLine = Rexx rdr.readLine()

if textLine = null then leave

parse textLine . 5 rectype 6 . 25 dsn 68 .

select

when rectype = 'D' then

parse textLine . 79 volser 84 . 89 allocatedkb 93 usedkb 97 .

a = ByteUtil.bytesAsInt(allocatedkb.toString().getBytes())

b = ByteUtil.bytesAsInt(usedkb.toString().getBytes())

wtr.print('<type>'rectype'</type><volser>'volser'</volser><dsn>')

wtr.println(dsn.strip()'</dsn><alloc>'a'</alloc><used>'b'</used>')

end

when rectype = 'A' then

parse textLine . 117 hurba 121 harba 125 .

a = ByteUtil.bytesAsInt(hurba.toString().getBytes())

b = ByteUtil.bytesAsInt(harba.toString().getBytes())

wtr.print('<type>'rectype'</type><volser>unk<volser><dsn>')

wtr.println(dsn.strip()'</dsn><alloc>'a'</alloc><used>'b'</used>')

end

otherwise iterate

end

iFile_.close()

wtr.close()

On 6 aug 2011, at 17:28, Mike Cowlishaw wrote:

I tried to find some info about this in the documentation but I am not really clear if the following behaviour is guaranteed:

parse line . 79 volser 84 . 89 allocatedkb 93 usedkb 97 .

a = ByteUtil.bytesAsInt(allocatedkb.toString().getBytes())

b = ByteUtil.bytesAsInt(usedkb.toString().getBytes())

I am parsing z/OS IDCAMS DCOLLECT output. Allocatedkb and usedkb are signed 31 bit big-endian integers according to the documentation, and the jzos ByteUtil class handles them nicely (i.e. the right numbers come out).

A question .. what is the type of the variable 'line' ?

If this works, I suspect that you somehow created 'line' with a method that assigns 8 bits of binary data to one Unicode character (So 0xA5 => the character \u00a5').

Both NetRexx and Java throw number exceptions when trying the fields as-is. When checking the documentation for parse, I only see promises about the parsing of text (string) data, with examples of numbers that are represented in text - I seem not to be able to find a statement on binary data. I understand that the underlying byte array implementation of the reference translator makes this possible- but the parse documentation in nrl3 (p110) states:

parse term template;

"The value of the term is expected to be a string; if it is not a string, it will be converted to a string.

Any variables used in the template are named by non-numeric symbols (that is, they cannot be an array

reference or other term); they refer to a variable or property in the current class. Any values that are

used in patterns during the parse are converted to strings before use.

Any variables set by the parse instruction must have a known string type, or are given the NetRexx

string type, Rexx, if they are new.

The term itself is not changed unless it is a variable which also appears in the template and whose

value is changed by being in the template."

Now: "The term itself is not changed unless [...]" and "The value of the term [...] will be converted to a string" seem to be at odds,

Why 'at odds?' the first quoted bits refers to a variable that might be set, and the second refers to the value of the term.

and I can only observe that the bits of DCollect's signed ints are left intact. It this something I can count on (are people parsing binary data with parse on other platforms and other Rexx implementations?) or is there a better way?

(Other Rexx implementations don't do Unicode yet, so not really relevant.)

The action of PARSE is well-defined and you can count on it.   The tricky bit is how binary data is converted to string data in the first place.

Mike

_______________________________________________
Ibm-netrexx mailing list
[hidden email]
Online Archive : http://ibm-netrexx.215625.n3.nabble.com/

_______________________________________________
Ibm-netrexx mailing list
[hidden email]
Online Archive : http://ibm-netrexx.215625.n3.nabble.com/

Mike Cowlishaw

RE: parse behaviour with binary data

OK, so really none of that is anything to do with the NetRexx language/compiler/interpreter. It's String or Rexx long before it gets to PARSE.

On the parse instruction itself, it was a bit hard to read/check with purely absolute column numbers:

parse textLine . 79 volser 84 . 89 allocatedkb 93 usedkb 97 .

I think I'd write that as:

parse textLine 79 volser +5 89 allocatedkb +4 93 usedkb +4

(easier to maintain and see actual field lengths, and no redundant "." placeholders -- the latter would cause extra code to be generated, especially if tracing might be on).

Mike

From: [hidden email] [mailto:[hidden email]] On Behalf Of René Jansen
Sent: 06 August 2011 16:44
To: IBM Netrexx
Subject: Re: [Ibm-netrexx] parse behaviour with binary data

This is the whole thing, short enough to copy it in:
"line" is obtained from a bufferedreader of an inoutstreamreader delivered by jzos.Zfile, which is a jni wrapper around the file API of IBM C++. It is cast to a Rexx.

import com.ibm.jzos.

iFile_ = ZFile("//DD:INPUT", "rt")

oFile_ = ZFile("//DD:OUTPUT", "w")

do

enc = ZUtil.getDefaultPlatformEncoding();

is = iFile_.getInputStream();

rdr = BufferedReader(InputStreamReader(is, enc))

catch Exception

say "file could not be opened:" iFile_

exit

end

do

enc = ZUtil.getDefaultPlatformEncoding();

os = oFile_.getOutputStream();

btr = BufferedWriter(OutputStreamWriter(os, enc))

wtr = PrintWriter(btr)

catch Exception

say "file could not be opened:" oFile_

exit

end

loop forever

textLine = Rexx rdr.readLine()

if textLine = null then leave

parse textLine . 5 rectype 6 . 25 dsn 68 .

select

when rectype = 'D' then

do

parse textLine . 79 volser 84 . 89 allocatedkb 93 usedkb 97 .

a = ByteUtil.bytesAsInt(allocatedkb.toString().getBytes())

b = ByteUtil.bytesAsInt(usedkb.toString().getBytes())

wtr.print('<type>'rectype'</type><volser>'volser'</volser><dsn>')

wtr.println(dsn.strip()'</dsn><alloc>'a'</alloc><used>'b'</used>')

end

when rectype = 'A' then

do

parse textLine . 117 hurba 121 harba 125 .

a = ByteUtil.bytesAsInt(hurba.toString().getBytes())

b = ByteUtil.bytesAsInt(harba.toString().getBytes())

wtr.print('<type>'rectype'</type><volser>unk<volser><dsn>')

wtr.println(dsn.strip()'</dsn><alloc>'a'</alloc><used>'b'</used>')

end

otherwise iterate

end

end

iFile_.close()

wtr.close()

On 6 aug 2011, at 17:28, Mike Cowlishaw wrote:

I tried to find some info about this in the documentation but I am not really clear if the following behaviour is guaranteed:

parse line . 79 volser 84 . 89 allocatedkb 93 usedkb 97 .

a = ByteUtil.bytesAsInt(allocatedkb.toString().getBytes())

b = ByteUtil.bytesAsInt(usedkb.toString().getBytes())

I am parsing z/OS IDCAMS DCOLLECT output. Allocatedkb and usedkb are signed 31 bit big-endian integers according to the documentation, and the jzos ByteUtil class handles them nicely (i.e. the right numbers come out).

A question .. what is the type of the variable 'line' ?

If this works, I suspect that you somehow created 'line' with a method that assigns 8 bits of binary data to one Unicode character (So 0xA5 => the character \u00a5').

Both NetRexx and Java throw number exceptions when trying the fields as-is. When checking the documentation for parse, I only see promises about the parsing of text (string) data, with examples of numbers that are represented in text - I seem not to be able to find a statement on binary data. I understand that the underlying byte array implementation of the reference translator makes this possible- but the parse documentation in nrl3 (p110) states:

parse term template;

"The value of the term is expected to be a string; if it is not a string, it will be converted to a string.

Any variables used in the template are named by non-numeric symbols (that is, they cannot be an array

reference or other term); they refer to a variable or property in the current class. Any values that are

used in patterns during the parse are converted to strings before use.

Any variables set by the parse instruction must have a known string type, or are given the NetRexx

string type, Rexx, if they are new.

The term itself is not changed unless it is a variable which also appears in the template and whose

value is changed by being in the template."

Now: "The term itself is not changed unless [...]" and "The value of the term [...] will be converted to a string" seem to be at odds,

Why 'at odds?' the first quoted bits refers to a variable that might be set, and the second refers to the value of the term.

and I can only observe that the bits of DCollect's signed ints are left intact. It this something I can count on (are people parsing binary data with parse on other platforms and other Rexx implementations?) or is there a better way?

(Other Rexx implementations don't do Unicode yet, so not really relevant.)

The action of PARSE is well-defined and you can count on it.   The tricky bit is how binary data is converted to string data in the first place.

Mike

_______________________________________________
Ibm-netrexx mailing list
[hidden email]
Online Archive : http://ibm-netrexx.215625.n3.nabble.com/

_______________________________________________
Ibm-netrexx mailing list
[hidden email]
Online Archive : http://ibm-netrexx.215625.n3.nabble.com/