parse behaviour with binary data

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

parse behaviour with binary data

rvjansen
I tried to find some info about this in the documentation but I am not really clear if the following behaviour is guaranteed:

        parse line  . 79 volser 84 . 89 allocatedkb 93 usedkb 97 .
        a = ByteUtil.bytesAsInt(allocatedkb.toString().getBytes())
        b = ByteUtil.bytesAsInt(usedkb.toString().getBytes())

I am parsing z/OS IDCAMS DCOLLECT output. Allocatedkb and usedkb are signed 31 bit big-endian integers according to the documentation, and the jzos ByteUtil class handles them nicely (i.e. the right numbers come out). Both NetRexx and Java throw number exceptions when trying the fields as-is. When checking the documentation for parse, I only see promises about the parsing of text (string) data, with examples of numbers that are represented in text  - I seem not to be able to find a statement on binary data. I understand that the underlying byte array implementation of the reference translator makes this possible- but the parse documentation in nrl3 (p110) states:

parse term template;

"The value of the term is expected to be a string; if it is not a string, it will be converted to a string.
Any variables used in the template are named by non-numeric symbols (that is, they cannot be an array
reference or other term); they refer to a variable or property in the current class. Any values that are
used in patterns during the parse are converted to strings before use.
Any variables set by the parse instruction must have a known string type, or are given the NetRexx
string type, Rexx, if they are new.
The term itself is not changed unless it is a variable which also appears in the template and whose
value is changed by being in the template."

Now: "The term itself is not changed unless [...]" and "The value of the term [...] will be converted to a string" seem to be at odds, and I can only observe that the bits of DCollect's signed ints are left intact. It this something I can count on (are people parsing binary data with parse on other platforms and other Rexx implementations?) or is there a better way?

best regards,

René Jansen.

_______________________________________________
Ibm-netrexx mailing list
[hidden email]
Online Archive : http://ibm-netrexx.215625.n3.nabble.com/

Reply | Threaded
Open this post in threaded view
|

RE: parse behaviour with binary data

Mike Cowlishaw
 
I tried to find some info about this in the documentation but I am not really clear if the following behaviour is guaranteed:

        parse line  . 79 volser 84 . 89 allocatedkb 93 usedkb 97 .
        a = ByteUtil.bytesAsInt(allocatedkb.toString().getBytes())
        b = ByteUtil.bytesAsInt(usedkb.toString().getBytes())

I am parsing z/OS IDCAMS DCOLLECT output. Allocatedkb and usedkb are signed 31 bit big-endian integers according to the documentation, and the jzos ByteUtil class handles them nicely (i.e. the right numbers come out).  
 
A question .. what is the type of the variable 'line' ?
If this works, I suspect that you somehow created 'line' with a method that assigns 8 bits of binary data to one Unicode character (So 0xA5 => the character \u00a5').
 
Both NetRexx and Java throw number exceptions when trying the fields as-is. When checking the documentation for parse, I only see promises about the parsing of text (string) data, with examples of numbers that are represented in text  - I seem not to be able to find a statement on binary data. I understand that the underlying byte array implementation of the reference translator makes this possible- but the parse documentation in nrl3 (p110) states:

parse term template; 

 
"The value of the term is expected to be a string; if it is not a string, it will be converted to a string.
Any variables used in the template are named by non-numeric symbols (that is, they cannot be an array
reference or other term); they refer to a variable or property in the current class. Any values that are
used in patterns during the parse are converted to strings before use.
Any variables set by the parse instruction must have a known string type, or are given the NetRexx
string type, Rexx, if they are new.
The term itself is not changed unless it is a variable which also appears in the template and whose
value is changed by being in the template."

Now: "The term itself is not changed unless [...]" and "The value of the term [...] will be converted to a string" seem to be at odds,  
 
Why 'at odds?' the first quoted bits refers to a variable that might be set, and the second refers to the value of the term. 
 
 and I can only observe that the bits of DCollect's signed ints are left intact. It this something I can count on (are people parsing binary data with parse on other platforms and other Rexx implementations?) or is there a better way? 
 
(Other Rexx implementations don't do Unicode yet, so not really relevant.)
The action of PARSE is well-defined and you can count on it.   The tricky bit is how binary data is converted to string data in the first place. 
 
Mike

 

_______________________________________________
Ibm-netrexx mailing list
[hidden email]
Online Archive : http://ibm-netrexx.215625.n3.nabble.com/

Reply | Threaded
Open this post in threaded view
|

Re: parse behaviour with binary data

rvjansen
This is the whole thing, short enough to copy it in:
"line" is obtained from a bufferedreader of an inoutstreamreader delivered by jzos.Zfile, which is a jni wrapper around the file API of IBM C++. It is cast to a Rexx.

import com.ibm.jzos.
iFile_ = ZFile("//DD:INPUT", "rt")
oFile_ = ZFile("//DD:OUTPUT", "w")
do
  enc = ZUtil.getDefaultPlatformEncoding();
  is = iFile_.getInputStream();
  rdr = BufferedReader(InputStreamReader(is, enc))
catch Exception
  say "file could not be opened:" iFile_
  exit
end
do
  enc = ZUtil.getDefaultPlatformEncoding();
  os = oFile_.getOutputStream();
  btr = BufferedWriter(OutputStreamWriter(os, enc))
  wtr = PrintWriter(btr)
catch Exception
  say "file could not be opened:" oFile_
  exit
end
loop forever
  textLine = Rexx rdr.readLine()
  if textLine = null then leave
  parse textLine . 5 rectype 6 . 25 dsn 68 .
  select
    when rectype = 'D' then
      do
        parse textLine  . 79 volser 84 . 89 allocatedkb 93 usedkb 97 .
        a = ByteUtil.bytesAsInt(allocatedkb.toString().getBytes())
        b = ByteUtil.bytesAsInt(usedkb.toString().getBytes())
        wtr.print('<type>'rectype'</type><volser>'volser'</volser><dsn>')
        wtr.println(dsn.strip()'</dsn><alloc>'a'</alloc><used>'b'</used>')
      end
    when rectype = 'A' then
      do
parse textLine . 117 hurba 121 harba 125 .
a = ByteUtil.bytesAsInt(hurba.toString().getBytes())
b = ByteUtil.bytesAsInt(harba.toString().getBytes())
wtr.print('<type>'rectype'</type><volser>unk<volser><dsn>')
wtr.println(dsn.strip()'</dsn><alloc>'a'</alloc><used>'b'</used>')
      end
    otherwise iterate
  end
end
iFile_.close()
wtr.close()

On 6 aug 2011, at 17:28, Mike Cowlishaw wrote:

 
I tried to find some info about this in the documentation but I am not really clear if the following behaviour is guaranteed:

        parse line  . 79 volser 84 . 89 allocatedkb 93 usedkb 97 .
        a = ByteUtil.bytesAsInt(allocatedkb.toString().getBytes())
        b = ByteUtil.bytesAsInt(usedkb.toString().getBytes())

I am parsing z/OS IDCAMS DCOLLECT output. Allocatedkb and usedkb are signed 31 bit big-endian integers according to the documentation, and the jzos ByteUtil class handles them nicely (i.e. the right numbers come out).  
 
A question .. what is the type of the variable 'line' ?
If this works, I suspect that you somehow created 'line' with a method that assigns 8 bits of binary data to one Unicode character (So 0xA5 => the character \u00a5').
 
Both NetRexx and Java throw number exceptions when trying the fields as-is. When checking the documentation for parse, I only see promises about the parsing of text (string) data, with examples of numbers that are represented in text  - I seem not to be able to find a statement on binary data. I understand that the underlying byte array implementation of the reference translator makes this possible- but the parse documentation in nrl3 (p110) states:

parse term template; 

 
"The value of the term is expected to be a string; if it is not a string, it will be converted to a string.
Any variables used in the template are named by non-numeric symbols (that is, they cannot be an array
reference or other term); they refer to a variable or property in the current class. Any values that are
used in patterns during the parse are converted to strings before use.
Any variables set by the parse instruction must have a known string type, or are given the NetRexx
string type, Rexx, if they are new.
The term itself is not changed unless it is a variable which also appears in the template and whose
value is changed by being in the template."

Now: "The term itself is not changed unless [...]" and "The value of the term [...] will be converted to a string" seem to be at odds,  
 
Why 'at odds?' the first quoted bits refers to a variable that might be set, and the second refers to the value of the term. 
 
 and I can only observe that the bits of DCollect's signed ints are left intact. It this something I can count on (are people parsing binary data with parse on other platforms and other Rexx implementations?) or is there a better way? 
 
(Other Rexx implementations don't do Unicode yet, so not really relevant.)
The action of PARSE is well-defined and you can count on it.   The tricky bit is how binary data is converted to string data in the first place. 
 
Mike

 
_______________________________________________
Ibm-netrexx mailing list
[hidden email]
Online Archive : http://ibm-netrexx.215625.n3.nabble.com/



_______________________________________________
Ibm-netrexx mailing list
[hidden email]
Online Archive : http://ibm-netrexx.215625.n3.nabble.com/

Reply | Threaded
Open this post in threaded view
|

RE: parse behaviour with binary data

Mike Cowlishaw
OK, so really none of that is anything to do with the NetRexx language/compiler/interpreter.  It's String or Rexx long before it gets to PARSE. 
 
On the parse instruction itself, it was a bit hard to read/check with purely absolute column numbers:
 
  parse textLine . 79 volser 84 . 89 allocatedkb 93 usedkb 97 .
 
I think I'd write that as:
 
  parse textLine 79 volser +5   89 allocatedkb +4   93 usedkb +4
 
(easier to maintain and see actual field lengths, and no redundant "." placeholders -- the latter would cause extra code to be generated, especially if tracing might be on).
 
Mike
 
 


From: [hidden email] [mailto:[hidden email]] On Behalf Of René Jansen
Sent: 06 August 2011 16:44
To: IBM Netrexx
Subject: Re: [Ibm-netrexx] parse behaviour with binary data

This is the whole thing, short enough to copy it in:
"line" is obtained from a bufferedreader of an inoutstreamreader delivered by jzos.Zfile, which is a jni wrapper around the file API of IBM C++. It is cast to a Rexx.

import com.ibm.jzos.
iFile_ = ZFile("//DD:INPUT", "rt")
oFile_ = ZFile("//DD:OUTPUT", "w")
do
  enc = ZUtil.getDefaultPlatformEncoding();
  is = iFile_.getInputStream();
  rdr = BufferedReader(InputStreamReader(is, enc))
catch Exception
  say "file could not be opened:" iFile_
  exit
end
do
  enc = ZUtil.getDefaultPlatformEncoding();
  os = oFile_.getOutputStream();
  btr = BufferedWriter(OutputStreamWriter(os, enc))
  wtr = PrintWriter(btr)
catch Exception
  say "file could not be opened:" oFile_
  exit
end
loop forever
  textLine = Rexx rdr.readLine()
  if textLine = null then leave
  parse textLine . 5 rectype 6 . 25 dsn 68 .
  select
    when rectype = 'D' then
      do
        parse textLine  . 79 volser 84 . 89 allocatedkb 93 usedkb 97 .
        a = ByteUtil.bytesAsInt(allocatedkb.toString().getBytes())
        b = ByteUtil.bytesAsInt(usedkb.toString().getBytes())
        wtr.print('<type>'rectype'</type><volser>'volser'</volser><dsn>')
        wtr.println(dsn.strip()'</dsn><alloc>'a'</alloc><used>'b'</used>')
      end
    when rectype = 'A' then
      do
parse textLine . 117 hurba 121 harba 125 .
a = ByteUtil.bytesAsInt(hurba.toString().getBytes())
b = ByteUtil.bytesAsInt(harba.toString().getBytes())
wtr.print('<type>'rectype'</type><volser>unk<volser><dsn>')
wtr.println(dsn.strip()'</dsn><alloc>'a'</alloc><used>'b'</used>')
      end
    otherwise iterate
  end
end
iFile_.close()
wtr.close()

On 6 aug 2011, at 17:28, Mike Cowlishaw wrote:

 
I tried to find some info about this in the documentation but I am not really clear if the following behaviour is guaranteed:

        parse line  . 79 volser 84 . 89 allocatedkb 93 usedkb 97 .
        a = ByteUtil.bytesAsInt(allocatedkb.toString().getBytes())
        b = ByteUtil.bytesAsInt(usedkb.toString().getBytes())

I am parsing z/OS IDCAMS DCOLLECT output. Allocatedkb and usedkb are signed 31 bit big-endian integers according to the documentation, and the jzos ByteUtil class handles them nicely (i.e. the right numbers come out).  
 
A question .. what is the type of the variable 'line' ?
If this works, I suspect that you somehow created 'line' with a method that assigns 8 bits of binary data to one Unicode character (So 0xA5 => the character \u00a5').
 
Both NetRexx and Java throw number exceptions when trying the fields as-is. When checking the documentation for parse, I only see promises about the parsing of text (string) data, with examples of numbers that are represented in text  - I seem not to be able to find a statement on binary data. I understand that the underlying byte array implementation of the reference translator makes this possible- but the parse documentation in nrl3 (p110) states:

parse term template; 

 
"The value of the term is expected to be a string; if it is not a string, it will be converted to a string.
Any variables used in the template are named by non-numeric symbols (that is, they cannot be an array
reference or other term); they refer to a variable or property in the current class. Any values that are
used in patterns during the parse are converted to strings before use.
Any variables set by the parse instruction must have a known string type, or are given the NetRexx
string type, Rexx, if they are new.
The term itself is not changed unless it is a variable which also appears in the template and whose
value is changed by being in the template."

Now: "The term itself is not changed unless [...]" and "The value of the term [...] will be converted to a string" seem to be at odds,  
 
Why 'at odds?' the first quoted bits refers to a variable that might be set, and the second refers to the value of the term. 
 
 and I can only observe that the bits of DCollect's signed ints are left intact. It this something I can count on (are people parsing binary data with parse on other platforms and other Rexx implementations?) or is there a better way? 
 
(Other Rexx implementations don't do Unicode yet, so not really relevant.)
The action of PARSE is well-defined and you can count on it.   The tricky bit is how binary data is converted to string data in the first place. 
 
Mike

 
_______________________________________________
Ibm-netrexx mailing list
[hidden email]
Online Archive : http://ibm-netrexx.215625.n3.nabble.com/



_______________________________________________
Ibm-netrexx mailing list
[hidden email]
Online Archive : http://ibm-netrexx.215625.n3.nabble.com/