Nested List Support?

classic Classic list List threaded Threaded
50 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Re: JProgressBar

Kermit Kiser

Thanks for the pastebin advice. However:

  • pastebin.com does not list NetRexx as a supported language. Other pastebins no more likely so. No particular advantage there.
  • pastebin does not tell you how long pastes will last, from which I deduce a least-access garbage collection algorithm. For a small group like ours, stuff will probably vanish quickly.
  • Hence, for small code examples (I would arbitrarily limit to ~100loc) that new NetRexxers might be looking for in the next few years, the list is a better archive.
  • If a discussion centers around a small code segment or is illuminated by it, I would prefer the code to remain with the discussion myself.
  • You won't see me using the list as a code repository for changeable products like the jEdit plugin or the JSR223 script engine as some persons would like to do with attachments.
You are welcome,
-- KK

On 12/7/2012 11:34 AM, Fernando Cassia wrote:


On Wed, Dec 5, 2012 at 6:36 PM, Kermit Kiser <[hidden email]> wrote:
------------------------------------------------------------------------------------------------------------------------------------------
import javax.swing.
import java.text.

Pastebin is your friend. :)

And thanks for the code snippet, btw :)
FC

--
During times of Universal Deceit, telling the truth becomes a revolutionary act
Durante épocas de Engaño Universal, decir la verdad se convierte en un Acto Revolucionario
- George Orwell



_______________________________________________
Ibm-netrexx mailing list
[hidden email]
Online Archive : http://ibm-netrexx.215625.n3.nabble.com/



_______________________________________________
Ibm-netrexx mailing list
[hidden email]
Online Archive : http://ibm-netrexx.215625.n3.nabble.com/

Reply | Threaded
Open this post in threaded view
|

Re: JProgressBar

ThSITC
Hi Kermit,

*a STRANGER* is talking now (Massa Thomas):

I do *not* understand your answer, *nor* the questions imposed!

What are we talking now, here, all above ???

Frankly speaking, *I do not have any idea* !!!

Thomas.

PS: So, hence, *what is the issue, in Question, Please* ??????????????
======================================================================== 
Am 08.12.2012 01:06, schrieb Kermit Kiser:

Thanks for the pastebin advice. However:

  • pastebin.com does not list NetRexx as a supported language. Other pastebins no more likely so. No particular advantage there.
  • pastebin does not tell you how long pastes will last, from which I deduce a least-access garbage collection algorithm. For a small group like ours, stuff will probably vanish quickly.
  • Hence, for small code examples (I would arbitrarily limit to ~100loc) that new NetRexxers might be looking for in the next few years, the list is a better archive.
  • If a discussion centers around a small code segment or is illuminated by it, I would prefer the code to remain with the discussion myself.
  • You won't see me using the list as a code repository for changeable products like the jEdit plugin or the JSR223 script engine as some persons would like to do with attachments.
You are welcome,
-- KK

On 12/7/2012 11:34 AM, Fernando Cassia wrote:


On Wed, Dec 5, 2012 at 6:36 PM, Kermit Kiser <[hidden email]> wrote:
------------------------------------------------------------------------------------------------------------------------------------------
import javax.swing.
import java.text.

Pastebin is your friend. :)

And thanks for the code snippet, btw :)
FC

--
During times of Universal Deceit, telling the truth becomes a revolutionary act
Durante épocas de Engaño Universal, decir la verdad se convierte en un Acto Revolucionario
- George Orwell



_______________________________________________
Ibm-netrexx mailing list
[hidden email]
Online Archive : http://ibm-netrexx.215625.n3.nabble.com/




_______________________________________________
Ibm-netrexx mailing list
[hidden email]
Online Archive : http://ibm-netrexx.215625.n3.nabble.com/



--
Thomas Schneider, IT Consulting; http://www.thsitc.com; Vienna, Austria, Europe

_______________________________________________
Ibm-netrexx mailing list
[hidden email]
Online Archive : http://ibm-netrexx.215625.n3.nabble.com/

Thomas Schneider, Vienna, Austria (Europe) :-)

www.thsitc.com
www.db-123.com
Reply | Threaded
Open this post in threaded view
|

Re: JProgressBar

Kermit Kiser
Thomas --

Yes, we know you don't understand. ;-)

Fernando is suggesting that source code should not be posted on the list but rather on external web sites called "pastebins" which have been created for that purpose and a simple link to the external site should then be included in the list post. This has the benefit of reducing the memory needed to store the list messages and reducing the amount of text mildly interested readers need to scan.

I am responding that I don't see much advantage to that as pastebins are temporary storage and large code items are permanent but changeable items that should be stored in a genuine code tracking repository whereas small code segments are usually better to paste inline in the list post so that future interested parties can find them by searching the list archives.

-- Kermit

On 12/7/2012 2:49 PM, Thomas Schneider wrote:
Hi Kermit,

*a STRANGER* is talking now (Massa Thomas):

I do *not* understand your answer, *nor* the questions imposed!

What are we talking now, here, all above ???

Frankly speaking, *I do not have any idea* !!!

Thomas.

PS: So, hence, *what is the issue, in Question, Please* ??????????????
======================================================================== 
Am 08.12.2012 01:06, schrieb Kermit Kiser:

Thanks for the pastebin advice. However:

  • pastebin.com does not list NetRexx as a supported language. Other pastebins no more likely so. No particular advantage there.
  • pastebin does not tell you how long pastes will last, from which I deduce a least-access garbage collection algorithm. For a small group like ours, stuff will probably vanish quickly.
  • Hence, for small code examples (I would arbitrarily limit to ~100loc) that new NetRexxers might be looking for in the next few years, the list is a better archive.
  • If a discussion centers around a small code segment or is illuminated by it, I would prefer the code to remain with the discussion myself.
  • You won't see me using the list as a code repository for changeable products like the jEdit plugin or the JSR223 script engine as some persons would like to do with attachments.
You are welcome,
-- KK

On 12/7/2012 11:34 AM, Fernando Cassia wrote:


On Wed, Dec 5, 2012 at 6:36 PM, Kermit Kiser <[hidden email]> wrote:
------------------------------------------------------------------------------------------------------------------------------------------
import javax.swing.
import java.text.

Pastebin is your friend. :)

And thanks for the code snippet, btw :)
FC

--
During times of Universal Deceit, telling the truth becomes a revolutionary act
Durante épocas de Engaño Universal, decir la verdad se convierte en un Acto Revolucionario
- George Orwell



_______________________________________________
Ibm-netrexx mailing list
[hidden email]
Online Archive : http://ibm-netrexx.215625.n3.nabble.com/




_______________________________________________
Ibm-netrexx mailing list
[hidden email]
Online Archive : http://ibm-netrexx.215625.n3.nabble.com/



--
Thomas Schneider, IT Consulting; http://www.thsitc.com; Vienna, Austria, Europe


_______________________________________________
Ibm-netrexx mailing list
[hidden email]
Online Archive : http://ibm-netrexx.215625.n3.nabble.com/

Reply | Threaded
Open this post in threaded view
|

Re: JProgressBar

ThSITC
Kermit,
 
   *I* do understand, as well!

I am, however, preparing just now, to deliver *any and all* source code I've ever written to:

www.kenai.com

Just in case I'm dying ...

OK?

Thomas.

PS:: Only *question* I do have, at the minute:

Shall I do so for making sources OPEN SOURCE there at my *ancient Projects*

PP, DB-123, etc, etc, ...

*or*

Simply at:

Project NetRexx    (in Contributors path, of course)
??????????????????????????????????????????????????????????????
Thomas.
===================================================================  

=========================================================================

Am 08.12.2012 02:22, schrieb Kermit Kiser:
Thomas --

Yes, we know you don't understand. ;-)

Fernando is suggesting that source code should not be posted on the list but rather on external web sites called "pastebins" which have been created for that purpose and a simple link to the external site should then be included in the list post. This has the benefit of reducing the memory needed to store the list messages and reducing the amount of text mildly interested readers need to scan.

I am responding that I don't see much advantage to that as pastebins are temporary storage and large code items are permanent but changeable items that should be stored in a genuine code tracking repository whereas small code segments are usually better to paste inline in the list post so that future interested parties can find them by searching the list archives.

-- Kermit

On 12/7/2012 2:49 PM, Thomas Schneider wrote:
Hi Kermit,

*a STRANGER* is talking now (Massa Thomas):

I do *not* understand your answer, *nor* the questions imposed!

What are we talking now, here, all above ???

Frankly speaking, *I do not have any idea* !!!

Thomas.

PS: So, hence, *what is the issue, in Question, Please* ??????????????
======================================================================== 
Am 08.12.2012 01:06, schrieb Kermit Kiser:

Thanks for the pastebin advice. However:

  • pastebin.com does not list NetRexx as a supported language. Other pastebins no more likely so. No particular advantage there.
  • pastebin does not tell you how long pastes will last, from which I deduce a least-access garbage collection algorithm. For a small group like ours, stuff will probably vanish quickly.
  • Hence, for small code examples (I would arbitrarily limit to ~100loc) that new NetRexxers might be looking for in the next few years, the list is a better archive.
  • If a discussion centers around a small code segment or is illuminated by it, I would prefer the code to remain with the discussion myself.
  • You won't see me using the list as a code repository for changeable products like the jEdit plugin or the JSR223 script engine as some persons would like to do with attachments.
You are welcome,
-- KK

On 12/7/2012 11:34 AM, Fernando Cassia wrote:


On Wed, Dec 5, 2012 at 6:36 PM, Kermit Kiser <[hidden email]> wrote:
------------------------------------------------------------------------------------------------------------------------------------------
import javax.swing.
import java.text.

Pastebin is your friend. :)

And thanks for the code snippet, btw :)
FC

--
During times of Universal Deceit, telling the truth becomes a revolutionary act
Durante épocas de Engaño Universal, decir la verdad se convierte en un Acto Revolucionario
- George Orwell



_______________________________________________
Ibm-netrexx mailing list
[hidden email]
Online Archive : http://ibm-netrexx.215625.n3.nabble.com/




_______________________________________________
Ibm-netrexx mailing list
[hidden email]
Online Archive : http://ibm-netrexx.215625.n3.nabble.com/



--
Thomas Schneider, IT Consulting; http://www.thsitc.com; Vienna, Austria, Europe



--
Thomas Schneider, IT Consulting; http://www.thsitc.com; Vienna, Austria, Europe

_______________________________________________
Ibm-netrexx mailing list
[hidden email]
Online Archive : http://ibm-netrexx.215625.n3.nabble.com/

Thomas Schneider, Vienna, Austria (Europe) :-)

www.thsitc.com
www.db-123.com
Reply | Threaded
Open this post in threaded view
|

Re: JProgressBar

Fernando Cassia-2
In reply to this post by Kermit Kiser


On Fri, Dec 7, 2012 at 9:06 PM, Kermit Kiser <[hidden email]> wrote:
pastebin does not tell you how long pastes will last, from which I deduce a least-access garbage collection algorithm. For a small group like ours, stuff will probably vanish quickly.

Hi Kermit.

I was just speaking from a readability point of view, as mailing lists tend to break long lines and make reading code awful.

With regards to ´expiry´ of pastebins, that question is not covered in their FAQ (I´ve just emailed them asking to clarify) but pasting any new ´pastebin´ -pardon the redundancy-  anonymously (as guest) gives the indication that the pastebin does not expire, ever. -Unless someone reports it as unlawful content ie links / URLs pointing towards pirate software, stolen data etc-.

For example, an anonymous pastebin shows:
By: a guest on Dec 8th, 2012  |  syntax: None  |  size: 0.26 KB  |  hits: 0  |  expires: Never
^^^^^^

As usual, just a friendly suggestion and a useful web tool to keep in mind among your toolbox.

Best regards
FC

--
During times of Universal Deceit, telling the truth becomes a revolutionary act
- George Orwell


_______________________________________________
Ibm-netrexx mailing list
[hidden email]
Online Archive : http://ibm-netrexx.215625.n3.nabble.com/

Reply | Threaded
Open this post in threaded view
|

Re: JProgressBar

Fernando Cassia-2
In reply to this post by Kermit Kiser


On Fri, Dec 7, 2012 at 10:22 PM, Kermit Kiser <[hidden email]> wrote:
Thomas --

Yes, we know you don't understand. ;-)

You crack me up, Kermit.
:))

FC

_______________________________________________
Ibm-netrexx mailing list
[hidden email]
Online Archive : http://ibm-netrexx.215625.n3.nabble.com/

Reply | Threaded
Open this post in threaded view
|

Re: Nested List Support?

Kermit Kiser
In reply to this post by billfen
Bill --

Your request has not been thrown on the garbage heap! I have been working on it quite a bit actually.

However, I think we must agree to disagree about some things. For example you ask me how I got my impression that you feel that the NetRexx data type could not possibly maintain the structure of a list, and then you repeat that exact view (these are your own words from below): "it can not inherently hold a list containing  strings and other lists of strings." I must disagree with statements like that as I feel that I have indeed proven that the Rexx object can do the job. I will even go as far as to say that I suspect that the NetRexx data structure is the only general purpose data type (as vs a custom class) in existence with the power to handle this problem/opportunity and it can do so with ease! Are you sure you fully understand how the Rexx object works and what it can do?

Likewise, I think I understand your point about the "difference between "parsing a string" and dividing up a list which has been encoded as a string", but my point was that you cannot have one without the other - you have to first be able to parse a string before you can understand and work with it's list structure. (BTW: I did indeed encounter Lisp in my college days and and vowed to stay as far away from it as possible! But you will find that I do understand car and cdr.) I just do not agree with your contention that "decomposition of the lists comes before the parsing of strings". It cannot be done that way in practice.

I am not going to address your questions about the best approach to adding the support you requested to NetRexx except to agree that messing with PARSE is a BAD idea! And while your BIF idea was a brilliant bypass to the objections to adding a constructor or factory to NetRexx for the List==>Map support, I am not convinced yet that it is the correct solution for this need.

That said, I am going to implement a solution to this problem. It may well require changes to the Rexx class for the same reasons that the List==>Map support did: You cannot extend the Rexx class and have full NetRexx functionality with it, so extended Rexx object capabilities may need to be added internally.

One of the things that irritates me the most about NetRexx is that it cannot parse an arbitrary length word list and convert it to an indexed data item without my carefully writing some kind of loop code each time. Often I can get by using the .words/.word BIFs, but what if I need to pass the word strings to a Java interface as an array? Then I wind up having to pass the word list to the Java String.split method to get what I want and feeling second class because NetRexx has no built in support for such a common and simple need. The feature you have requested not only meets my need and completes our NetRexx collections support, it gives NetRexx a capability which Java lacks as far as I know and puts it out front in language features again in my opinion.

I am not totally sure that you want to do something about this issue rather than just talking about it but I want to at least try it on for size and see how useful it might be. At one point, you asked me about an API although I don't think you have suggested anything beyond the need to provide three delimiters. In theory an API requires a common agreement by a committee and thorough documentation before it can be implemented. But I am going to say something here that I hope René does not have me assassinated for later! We have an ARB (architecture) committee and a complex and detailed formal procedure for discussing, approving, and documenting such things. It is a good plan in theory. But I think that the procedure is about the only thing that has ever made it through the committee. I seriously doubt that any changes to NetRexx will ever happen that way. We just don't have the manpower. As far as I can see, there has never been any significant progress made from whatever discussions have occurred about APIs towards implementing support for the major new features of Java like enums, generics, JSR223, JSR199, collections, annotations, closures etc. The only language change to be implemented so far was extending Loop Over to cover collections iterators and that was a major fight even though it did not require any language syntax changes. Is it because the committee members lack the time to pursue this project? Is it because they lack interest? Is it because the true goal of the structure is to make sure that nothing changes? I don't know. Reasonable caution is wise. Too much and NetRexx dies. A few like René and I are making small changes and improvements but not much makes it into the official NetRexx. As you may have figured out by now, my own approach is more akin to the Agile software development movement - I code something, then try it out to see how it works and improve it iteratively or throw it out and start over. Coding things in NetRexx is so easy that it is a much faster way to gain understanding of a problem than any formal processes or discussions can possibly provide. It does not matter if you have to throw the code out and start over, because you have learned something and the effort was a helpful part of the whole process.

So I have been researching this problem (is that the right term?) extensively with hands on, and I think I have made some good progress. (If you have guessed from the above that I already have a draft implementation, you are correct.) Here is what I have learned in the process:

The problem domain is something I call AttributeStringLists - that is, lists with elements that can be strings or AttributeStringLists or "named" AttributeStringLists (lists which have a name string associated). The closest Java data structure is something called an AttributeList but it cannot really handle the structure retention requirements of this concept. (An Attribute is just a data item with a name string and a value.)

The API actually requires four delimiters - start, end, separator, and escape.  The escape is needed because list elements are strings and therefore may contain the delimiter characters. Although the default delimiters still need to be decided for the API, I recommend that the default escape be the normal "\" NetRexx escape character.

In addition to the delimiter characters, the API requires at least four binary flags to handle variations in list syntax. The minimum four flags are these:

flag 1:
0 = separators follow sublists (data following a sublist but preceding a delimiter is discarded)
1 = separators do not follow sublists (sublists act as separator characters - any data following indicates a new element)
flag 2:
0 = adjacent separators produce empty elements
1 = adjacent separators reduce to one (used for word lists which may have multiple spaces separating elements)
flag 3:
0 = escape sequences are not translated (escape characters remain in element strings)
1 = escape sequences are translated (a second pass of list parse may fail)
flag 4:
0 = whitespace is translated to blank (TAB,FF,LF,CR)
1 = whitespace is treated as text

In addition to the parse list method itself, the API needs "car" and "cdr" methods at minimum: car returns the first or only element in a list. cdr returns: a single remaining element, or a single remaining list, or a sublist of remaining elements. Also needed is a method to signal if an element is itself a list.

Let me know if you think of anything I missed. The above information is good enough for a prototype implementation. The prototype demonstration program can be viewed at the link below although it is largely spaghetti code at this point and lacks safety and sanity checks (The code/API needs some refining and testing before it can be implemented as part of NetRexx itself.):

http://www.kermitkiser.com/NetRexx/attributestringlistparse.html

(I didn't paste the code inline this time as some seem not to like that. ;-)

But I will paste a snip from the program output showing some of the prototype API calls and what they produce.
Note that car and cdr do not do any parsing - it is all done in the parselist call.
Also "parselist" can accept delimiter and flag parameters.
-----------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------------------------
 parsing this list:
in=fun( arg1(arg1a, arg1b), arg2(((nested)), z) )

 parseout=parselist(in)    --    parse list structure

say car(parseout)        --    first element of a list
fun( arg1(arg1a, arg1b), arg2(((nested)), z) )

say car(car(parseout))    --    first element of first element
 arg1(arg1a, arg1b)

say cdr(car(parseout))     --    remainder of first element
 arg2(((nested)), z)

say car(cdr(car(parseout)))     --    first element of remainder of first element
((nested))

say cdr(cdr(car(parseout)))     --    remainder of remainder of first element
 z

say car(car(cdr(car(parseout))))     --    first element of first element of remainder of first element
(nested)

say car(car(car(cdr(car(parseout)))))     --    first element of first element of first element of remainder of first element
nested
-----------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------------------------
There's still time to get those API requests in folks!

-- Kermit


On 12/5/2012 5:16 PM, Bill Fenlason wrote:
Kermit,

On 12/5/2012 7:02 PM, Kermit Kiser wrote:
Bill --

Sometimes I wonder if we live on the same planet. ;-)

You are not alone :)

I don't understand how you think you can parse a string but not parse it in order. Nor do I understand why you would want a halfway solution that does not fully parse a string.

There is a difference between "parsing a string" and dividing up a list which has been encoded as a string.  With a list, it is natural to request the items in the list one at a time, and the items in a list may be either a string or another list.  Possibly you are not familiar with list processing (ala Lisp) and find this confusing, but I assume that is not the case. 

Historical note: Original Lisp (for the IBM 704 in assembler) used a tree structure and used the contents of car (address register) and cdr (decrement register) to extract the first (next) item in a list and the remainder of the list.  In Rene's example,he shows how to strip the first or next list item using the parse instruction but used the ancient names which are still in common use after all these years. 

The essential point here is that one of the ways that would make sense in NetRexx would be to encode a list as a string.  In other words, a "super" string which is a "list of strings" or a list of: "strings" and "lists of strings".

In no way did I mean to imply that the parsing was "half way", but just that the decomposition of the lists comes before the parsing of strings.  First a list is processed, and then sublists are processed as necessary.

I am sorry if you did not understand my code example, but I am not sure it can be simplified further. As I said in my post,  "Handling this type of syntax is way beyond what a parse instruction can do and I think this example shows that the general case is not trivial."

I didn't say that I didn't understand your code - it was well written and clear.

Possibly you skipped that last phrase?

The phase I meant was "while still retaining the list structure."  Parsing a single string is not the same as breaking up a list (which happens to be encoded as a single string).

I try things out with code because I don't think in abstract logic like you and Mike seem to do. So I provide working code examples to show my thoughts here. But then you say that my code has to scan strings one character at a time which is no more true or false than saying the PARSE instruction has to scan strings one character at a time. (Or do you really think that PARSE does not look at all of the characters?)

My point there was that I was asking if an approach that could be used would be to extend the parse command.  I am looking for the best general purpose approach to handle the nested list problem.  I agree that my question is more abstract than specific.

You also seem to feel that the lowly NetRexx data type could not possibly maintain the structure of a list but I think that the Rexx object is the most powerful data structure ever invented. It can not only hold strings and numbers, it can hold lists and maps and do amazing things with them and each one is a complete associative database! (And even more features are in the advanced after3.01 NetRexx version!)

I don't know how you came to that conclusion - what did I say that gave you that idea?  All I was asking was how to make the Rexx string object hold a list of strings and other (nested) lists of strings.  Certainly the Rexx object can hold a simple list of strings.  But it can not inherently hold a list containing  strings and other lists of strings.  External conventions for list delimiters must be provided.  Possibly as an extension they could be added as fields in the Rexx object.

Since I think that way, I will try again to explain what I mean with a code example. I modified my original sample program and added a method to reconstruct a parsed list, showing at each stage of reconstruction what list structure data can be extracted from the parsed string object. I even showed how you can transform one list syntax to another with the example parsed list Rexx object. (Your new example is basically the same structure with different delimiters, so the same code handles both examples fine.) Just ignore it if you still don't believe it can be done.

I certainly understand that it can be done, Kermit, and your code obviously demonstrates it. 

But the code itself does not provide an answer to the original question I asked, which was "If NetRexx or Rexx were to be extended to allow convenient parsing of nested lists, how should it be approached?"

In retrospect, perhaps I should have replaced the word "parsing" with "deconstruction". 

I provided 6 possibilities, and perhaps your code could be the basis of possibility 3 (built in functions), although there doesn't seem to be a clear API.  It certainly demonstrates an example, but obviously I'm trying to avoid that level of user coding for the general case. 

I was asking "what is the best approach?", not "can it be done?" or "is there a code snippet that  can be used?".

I thought my original post asked a single question and was reasonably clear, but apparently I was wrong about that.

BTW: PARSE is intended for very simple parsing problems. That is why RexxLA started the RegRexx project to provide a more sophisticated pattern matching and parsing facility with a simpler syntax and more flexibility than regex has. (It remains to be seen if that can be done.) I think that is also why Mike included the verify and translate, etc, mechanisms to handle more complex parsing needs.

Yes, that is what Mike said as well, and I agree in general.  I suggested the possibility of extending the parse statement by adding a functional notation in the template, but Mike said he considered and rejected it some time ago.

-- Kermit

Bill



_______________________________________________
Ibm-netrexx mailing list
[hidden email]
Online Archive : http://ibm-netrexx.215625.n3.nabble.com/

Reply | Threaded
Open this post in threaded view
|

Re: Nested List Support?

billfen
Kermit,

> It is clear that you have spent considerable time on this.  I'm hopeful that with some careful explanations we may get to the point that we clearly understand each other, although there may still be some disagreements.

> I'll imbed my comments, but I will color and mark them so that those with limited email clients can distinguish my words from yours.

On 12/9/2012 5:15 PM, Kermit Kiser wrote:
Bill --

Your request has not been thrown on the garbage heap! I have been working on it quite a bit actually.

However, I think we must agree to disagree about some things. For example you ask me how I got my impression that you feel that the NetRexx data type could not possibly maintain the structure of a list, and then you repeat that exact view (these are your own words from below): "it can not inherently hold a list containing  strings and other lists of strings." I must disagree with statements like that as I feel that I have indeed proven that the Rexx object can do the job. I will even go as far as to say that I suspect that the NetRexx data structure is the only general purpose data type (as vs a custom class) in existence with the power to handle this problem/opportunity and it can do so with ease! Are you sure you fully understand how the Rexx object works and what it can do?
> This needs a careful clarification on my part.  I believe that I've stated that NetRexx CAN easily maintain the structure of a SIMPLE list.  In other words, it handles something like a list of blank or other delimited tokens or token sequences.  Thus "This is a word list" or "comma separated , multi word , list elements , etc. " are simple lists (in the form of strings) which can easily be handled by the parse statement. 

> In my statement that you quoted (
"it can not inherently hold a list containing  strings and other lists of strings."), the key word is INHERENTLY.

> Here are a few definitions for the word "inherent" via google:
>    "existing in something as a permanent, essential, or characteristic attribute"

>    "existing as an inseparable part; intrinsic"
>    "
involved in the constitution or <a href="http://www.merriam-webster.com/dictionary/essential[1]" class="d_link" onclick="_gaq.push(['_trackEvent', 'Enhancement - link', 'Clicked', 'essential[1]']);">essential character of something : belonging by nature or habit"
>    "
<span style="cursor: default;" id="hotword" name="hotword" onmouseover="this.style.color='#0055bb';this.style.cursor='pointer'" onmouseout="this.style.color='#333333';this.style.cursor='default'" onclick="return hotwordOneClick(this);">existing <span id="hotword" name="hotword" onmouseover="this.style.color='#0055bb';this.style.cursor='pointer'" onmouseout="this.style.color='#333333';this.style.cursor='default'" onclick="return hotwordOneClick(this);">in <span id="hotword" name="hotword" onmouseover="this.style.color='#0055bb';this.style.cursor='pointer'" onmouseout="this.style.color='#333333';this.style.cursor='default'" onclick="return hotwordOneClick(this);">someone <span id="hotword" name="hotword" onmouseover="this.style.color='#0055bb';this.style.cursor='pointer'" onmouseout="this.style.color='#333333';this.style.cursor='default'" onclick="return hotwordOneClick(this);">or <span style="cursor: default;" id="hotword" name="hotword" onmouseover="this.style.color='#0055bb';this.style.cursor='pointer'" onmouseout="this.style.color='#333333';this.style.cursor='default'" onclick="return hotwordOneClick(this);">something <span style="cursor: default;" id="hotword" name="hotword" onmouseover="this.style.color='#0055bb';this.style.cursor='pointer'" onmouseout="this.style.color='#333333';this.style.cursor='default'" onclick="return hotwordOneClick(this);">as <span style="cursor: default;" id="hotword" name="hotword" onmouseover="this.style.color='#0055bb';this.style.cursor='pointer'" onmouseout="this.style.color='#333333';this.style.cursor='default'" onclick="return hotwordOneClick(this);">a <span style="cursor: default;" id="hotword" name="hotword" onmouseover="this.style.color='#0055bb';this.style.cursor='pointer'" onmouseout="this.style.color='#333333';this.style.cursor='default'" onclick="return hotwordOneClick(this);">permanent <span id="hotword" name="hotword" onmouseover="this.style.color='#0055bb';this.style.cursor='pointer'" onmouseout="this.style.color='#333333';this.style.cursor='default'" onclick="return hotwordOneClick(this);">and <span style="cursor: default;" id="hotword" name="hotword" onmouseover="this.style.color='#0055bb';this.style.cursor='pointer'" onmouseout="this.style.color='#333333';this.style.cursor='default'" onclick="return hotwordOneClick(this);">inseparable <span style="cursor: default;" id="hotword" name="hotword" onmouseover="this.style.color='#0055bb';this.style.cursor='pointer'" onmouseout="this.style.color='#333333';this.style.cursor='default'" onclick="return hotwordOneClick(this);">element, <span style="cursor: default;" id="hotword" name="hotword" onmouseover="this.style.color='#0055bb';this.style.cursor='pointer'" onmouseout="this.style.color='#333333';this.style.cursor='default'" onclick="return hotwordOneClick(this);">quality, <span style="cursor: default;" id="hotword" name="hotword" onmouseover="this.style.color='#0055bb';this.style.cursor='pointer'" onmouseout="this.style.color='#333333';this.style.cursor='default'" onclick="return hotwordOneClick(this);">or <span style="cursor: pointer;" id="hotword" name="hotword" onmouseover="this.style.color='#0055bb';this.style.cursor='pointer'" onmouseout="this.style.color='#333333';this.style.cursor='default'" onclick="return hotwordOneClick(this);">attribute"
(apologies to those who think this is pedantic)

> In other words, I was NOT saying that the Rexx object in NetRexx is incapable of containing an encoded "list containing strings and other lists of strings".  It is pretty obvious that by adopting encoding conventions such as the use of '{' and '}' (or other character or character sequences) to surround an embedded list of strings that it can be done.  Certainly your code snippets demonstrated that.

> What I WAS saying is that the Rexx object and NetRexx do not have that capability INHERENTLY.  There is currently nothing "built in" to the Rexx object and NetRexx which defines or processes any method of encoding a list of strings so that it can be contained in another string.  The parse statement can not handle it, and there is no other statement which INHERENTLY handles it. 

> In order to do it in your snippets, you had to define an encoded list as being surrounded by specific characters.  That does not rule out other methods of encoding lists into strings.  I think the use of "begin", "end" and "separator" characters may be the easiest way to do it, but there may be other ways that it can be done.  NetRexx does not
INHERENTLY provide a mechanism for encoding and decoding them.

> It would be presumptuous of me to quantify my understanding of the Rexx object, but I have probably spent more time examining its internal structure and the very numerous methods (in the NetRexxR Rexx.nrx module) than most.

Likewise, I think I understand your point about the "difference between "parsing a string" and dividing up a list which has been encoded as a string", but my point was that you cannot have one without the other - you have to first be able to parse a string before you can understand and work with it's list structure. (BTW: I did indeed encounter Lisp in my college days and and vowed to stay as far away from it as possible! But you will find that I do understand car and cdr.) I just do not agree with your contention that "decomposition of the lists comes before the parsing of strings". It cannot be done that way in practice.

> Kermit, what you are saying is simply not true.  For example, consider a method which is passed a string which contains an encoded list.  The purpose of the method is to print out the elements in the list, while ignoring any nested lists of elements.  Obviously the method first decomposes the list into elements and nested lists, and prints only the elements.  The nested lists are not decomposed.  The parse statement is not used at all.  The decoding process (as described in your snippets, for example) do not use the parse statement. 

> Perhaps you are assuming that the nested lists contain the same kinds of elements as the main list.  That is not necessarily true.  For example, suppose the primary list contains the names of states, and each state name is followed by a nested list of the major highway names and numbers running through the state.  It would certainly be reasonable to want to print only the state names without decoding the lists of highway identifiers.

> I'm not a Lisp programmer, and unfortunately there were no computer courses when I went to college.  But I did run across Lisp when I was playing with emacs many years ago, and I feel that many programmers will benefit from reading http://deptinfo.unice.fr/~roy/sicp.pdf .  Not understanding the fundamental ideas of Lisp leaves a hole in a programmers education, in my opinion.


I am not going to address your questions about the best approach to adding the support you requested to NetRexx except to agree that messing with PARSE is a BAD idea! And while your BIF idea was a brilliant bypass to the objections to adding a constructor or factory to NetRexx for the List==>Map support, I am not convinced yet that it is the correct solution for this need.

> Yes, as Mike recommended, extending the parse statement is not the way to go.

That said, I am going to implement a solution to this problem. It may well require changes to the Rexx class for the same reasons that the List==>Map support did: You cannot extend the Rexx class and have full NetRexx functionality with it, so extended Rexx object capabilities may need to be added internally.

> As you know, I started this whole thing by asking the general question about what general approach would be best.  I'm not sure there is any agreement on that yet.

One of the things that irritates me the most about NetRexx is that it cannot parse an arbitrary length word list and convert it to an indexed data item without my carefully writing some kind of loop code each time. Often I can get by using the .words/.word BIFs, but what if I need to pass the word strings to a Java interface as an array? Then I wind up having to pass the word list to the Java String.split method to get what I want and feeling second class because NetRexx has no built in support for such a common and simple need. The feature you have requested not only meets my need and completes our NetRexx collections support, it gives NetRexx a capability which Java lacks as far as I know and puts it out front in language features again in my opinion.

> Mike addressed this in his comment.

> Can't you write a general purpose subroutine (method) that is passed a Rexx string and a target Rexx (to be indexed) string or String array which does this?  Perhaps a Rexx object method?

I am not totally sure that you want to do something about this issue rather than just talking about it but I want to at least try it on for size and see how useful it might be. At one point, you asked me about an API although I don't think you have suggested anything beyond the need to provide three delimiters. In theory an API requires a common agreement by a committee and thorough documentation before it can be implemented. But I am going to say something here that I hope René does not have me assassinated for later! We have an ARB (architecture) committee and a complex and detailed formal procedure for discussing, approving, and documenting such things. It is a good plan in theory. But I think that the procedure is about the only thing that has ever made it through the committee. I seriously doubt that any changes to NetRexx will ever happen that way. We just don't have the manpower. As far as I can see, there has never been any significant progress made from whatever discussions have occurred about APIs towards implementing support for the major new features of Java like enums, generics, JSR223, JSR199, collections, annotations, closures etc. The only language change to be implemented so far was extending Loop Over to cover collections iterators and that was a major fight even though it did not require any language syntax changes. Is it because the committee members lack the time to pursue this project? Is it because they lack interest? Is it because the true goal of the structure is to make sure that nothing changes? I don't know. Reasonable caution is wise. Too much and NetRexx dies. A few like René and I are making small changes and improvements but not much makes it into the official NetRexx. As you may have figured out by now, my own approach is more akin to the Agile software development movement - I code something, then try it out to see how it works and improve it iteratively or throw it out and start over. Coding things in NetRexx is so easy that it is a much faster way to gain understanding of a problem than any formal processes or discussions can possibly provide. It does not matter if you have to throw the code out and start over, because you have learned something and the effort was a helpful part of the whole process.

> I think it is wise to be cautious with regard to language changes, and the bureaucracy is kind of like the US Congress - designed to slow down things.  But there is nothing to stop the implementation of experimental things in your branch.  I've thrown away too much code because of thoughtless design to agree with your approach in general, but I also think that over design ends up with unnecessary coding.

So I have been researching this problem (is that the right term?) extensively with hands on, and I think I have made some good progress. (If you have guessed from the above that I already have a draft implementation, you are correct.) Here is what I have learned in the process:

The problem domain is something I call AttributeStringLists - that is, lists with elements that can be strings or AttributeStringLists or "named" AttributeStringLists (lists which have a name string associated). The closest Java data structure is something called an AttributeList but it cannot really handle the structure retention requirements of this concept. (An Attribute is just a data item with a name string and a value.)

> I'm not sure that it is necessary or desirable to attach names to StringLists.  Getting to sound like XML :)

The API actually requires four delimiters - start, end, separator, and escape.  The escape is needed because list elements are strings and therefore may contain the delimiter characters. Although the default delimiters still need to be decided for the API, I recommend that the default escape be the normal "\" NetRexx escape character.

In addition to the delimiter characters, the API requires at least four binary flags to handle variations in list syntax. The minimum four flags are these:

flag 1:
0 = separators follow sublists (data following a sublist but preceding a delimiter is discarded)
1 = separators do not follow sublists (sublists act as separator characters - any data following indicates a new element)
flag 2:
0 = adjacent separators produce empty elements
1 = adjacent separators reduce to one (used for word lists which may have multiple spaces separating elements)
flag 3:
0 = escape sequences are not translated (escape characters remain in element strings)
1 = escape sequences are translated (a second pass of list parse may fail)
flag 4:
0 = whitespace is translated to blank (TAB,FF,LF,CR)
1 = whitespace is treated as text

> I'm not sure all of these are needed.  Some or all could be defaulted.  I think the general approach used by the parse statement could be considered.  Simplicity should not be underestimated.

In addition to the parse list method itself, the API needs "car" and "cdr" methods at minimum: car returns the first or only element in a list. cdr returns: a single remaining element, or a single remaining list, or a sublist of remaining elements. Also needed is a method to signal if an element is itself a list.

> car should return the first item in a list, which is either an element or a nested list.  cdr should return the rest of the list (i.e the second (next) item, or the equivalent of null).  I don't think the names car and cdr should be used.


Let me know if you think of anything I missed. The above information is good enough for a prototype implementation. The prototype demonstration program can be viewed at the link below although it is largely spaghetti code at this point and lacks safety and sanity checks (The code/API needs some refining and testing before it can be implemented as part of NetRexx itself.):

http://www.kermitkiser.com/NetRexx/attributestringlistparse.html

> I will take a look at this, but I'm more the "let's define things before jumping into code" type - I guess that's why you think we are from different planets, and I wouldn't argue it :)

Bill

(I didn't paste the code inline this time as some seem not to like that. ;-)

But I will paste a snip from the program output showing some of the prototype API calls and what they produce.
Note that car and cdr do not do any parsing - it is all done in the parselist call.
Also "parselist" can accept delimiter and flag parameters.
-----------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------------------------
 parsing this list:
in=fun( arg1(arg1a, arg1b), arg2(((nested)), z) )

 parseout=parselist(in)    --    parse list structure

say car(parseout)        --    first element of a list
fun( arg1(arg1a, arg1b), arg2(((nested)), z) )

say car(car(parseout))    --    first element of first element
 arg1(arg1a, arg1b)

say cdr(car(parseout))     --    remainder of first element
 arg2(((nested)), z)

say car(cdr(car(parseout)))     --    first element of remainder of first element
((nested))

say cdr(cdr(car(parseout)))     --    remainder of remainder of first element
 z

say car(car(cdr(car(parseout))))     --    first element of first element of remainder of first element
(nested)

say car(car(car(cdr(car(parseout)))))     --    first element of first element of first element of remainder of first element
nested
-----------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------------------------
There's still time to get those API requests in folks!

-- Kermit


On 12/5/2012 5:16 PM, Bill Fenlason wrote:
Kermit,

On 12/5/2012 7:02 PM, Kermit Kiser wrote:
Bill --

Sometimes I wonder if we live on the same planet. ;-)

You are not alone :)

I don't understand how you think you can parse a string but not parse it in order. Nor do I understand why you would want a halfway solution that does not fully parse a string.

There is a difference between "parsing a string" and dividing up a list which has been encoded as a string.  With a list, it is natural to request the items in the list one at a time, and the items in a list may be either a string or another list.  Possibly you are not familiar with list processing (ala Lisp) and find this confusing, but I assume that is not the case. 

Historical note: Original Lisp (for the IBM 704 in assembler) used a tree structure and used the contents of car (address register) and cdr (decrement register) to extract the first (next) item in a list and the remainder of the list.  In Rene's example,he shows how to strip the first or next list item using the parse instruction but used the ancient names which are still in common use after all these years. 

The essential point here is that one of the ways that would make sense in NetRexx would be to encode a list as a string.  In other words, a "super" string which is a "list of strings" or a list of: "strings" and "lists of strings".

In no way did I mean to imply that the parsing was "half way", but just that the decomposition of the lists comes before the parsing of strings.  First a list is processed, and then sublists are processed as necessary.

I am sorry if you did not understand my code example, but I am not sure it can be simplified further. As I said in my post,  "Handling this type of syntax is way beyond what a parse instruction can do and I think this example shows that the general case is not trivial."

I didn't say that I didn't understand your code - it was well written and clear.

Possibly you skipped that last phrase?

The phase I meant was "while still retaining the list structure."  Parsing a single string is not the same as breaking up a list (which happens to be encoded as a single string).

I try things out with code because I don't think in abstract logic like you and Mike seem to do. So I provide working code examples to show my thoughts here. But then you say that my code has to scan strings one character at a time which is no more true or false than saying the PARSE instruction has to scan strings one character at a time. (Or do you really think that PARSE does not look at all of the characters?)

My point there was that I was asking if an approach that could be used would be to extend the parse command.  I am looking for the best general purpose approach to handle the nested list problem.  I agree that my question is more abstract than specific.

You also seem to feel that the lowly NetRexx data type could not possibly maintain the structure of a list but I think that the Rexx object is the most powerful data structure ever invented. It can not only hold strings and numbers, it can hold lists and maps and do amazing things with them and each one is a complete associative database! (And even more features are in the advanced after3.01 NetRexx version!)

I don't know how you came to that conclusion - what did I say that gave you that idea?  All I was asking was how to make the Rexx string object hold a list of strings and other (nested) lists of strings.  Certainly the Rexx object can hold a simple list of strings.  But it can not inherently hold a list containing  strings and other lists of strings.  External conventions for list delimiters must be provided.  Possibly as an extension they could be added as fields in the Rexx object.

Since I think that way, I will try again to explain what I mean with a code example. I modified my original sample program and added a method to reconstruct a parsed list, showing at each stage of reconstruction what list structure data can be extracted from the parsed string object. I even showed how you can transform one list syntax to another with the example parsed list Rexx object. (Your new example is basically the same structure with different delimiters, so the same code handles both examples fine.) Just ignore it if you still don't believe it can be done.

I certainly understand that it can be done, Kermit, and your code obviously demonstrates it. 

But the code itself does not provide an answer to the original question I asked, which was "If NetRexx or Rexx were to be extended to allow convenient parsing of nested lists, how should it be approached?"

In retrospect, perhaps I should have replaced the word "parsing" with "deconstruction". 

I provided 6 possibilities, and perhaps your code could be the basis of possibility 3 (built in functions), although there doesn't seem to be a clear API.  It certainly demonstrates an example, but obviously I'm trying to avoid that level of user coding for the general case. 

I was asking "what is the best approach?", not "can it be done?" or "is there a code snippet that  can be used?".

I thought my original post asked a single question and was reasonably clear, but apparently I was wrong about that.

BTW: PARSE is intended for very simple parsing problems. That is why RexxLA started the RegRexx project to provide a more sophisticated pattern matching and parsing facility with a simpler syntax and more flexibility than regex has. (It remains to be seen if that can be done.) I think that is also why Mike included the verify and translate, etc, mechanisms to handle more complex parsing needs.

Yes, that is what Mike said as well, and I agree in general.  I suggested the possibility of extending the parse statement by adding a functional notation in the template, but Mike said he considered and rejected it some time ago.

-- Kermit

Bill




_______________________________________________
Ibm-netrexx mailing list
[hidden email]
Online Archive : http://ibm-netrexx.215625.n3.nabble.com/

Reply | Threaded
Open this post in threaded view
|

Re: Nested List Support?

Jeff Hennick
In reply to this post by Kermit Kiser
Kermit and all,

In the spirit of MFC's dictum of getting the definition right before any consideration of implementation, I offer this:

I understand your candidate BIF for Nested List Support would be:

parselist([sep][,start,end[,esc[,flags]]])
returns an indexed string of the elements of string broken at, and not including, sep. The default character for sep is comma.  Sub lists, may be considered as a single element beginning with start and ending with end.  If start and end are not specified or are null, no sublisting is performed. If any of sep, start, or end are in and element it is escaped by preceding it with esc.  If no esc is specified or it is null, no escaping is performed.  sep, start, end, and esc are strings.  flags are ???????

[A good description of the output and examples goes here!]
[I have no idea how the concept of "flags" fits into a NetRexx BIF.  It would seem to be more optional individual parameters are "standard", but that is not pretty: NetRexx is to be readable.]
[I have made esc default to null rather than "\" as a way to "turn off" escaping, which seems difficult otherwise.]

I also have internal comments below.

On 12/9/2012 5:15 PM, Kermit Kiser wrote:

The API actually requires four delimiters - start, end, separator, and escape.  The escape is needed because list elements are strings and therefore may contain the delimiter characters. Although the default delimiters still need to be decided for the API, I recommend that the default escape be the normal "\" NetRexx escape character.

In addition to the delimiter characters, the API requires at least four binary flags to handle variations in list syntax. The minimum four flags are these:

flag 1:
0 = separators follow sublists (data following a sublist but preceding a delimiter is discarded)
Could you give a real life example of when this would be used?  My gut feeling is this is just a miss-formed string.
1 = separators do not follow sublists (sublists act as separator characters - any data following indicates a new element)
flag 2:
0 = adjacent separators produce empty elements
1 = adjacent separators reduce to one (used for word lists which may have multiple spaces separating elements)
I think flag 3 is covered above, along with start and end.
flag 3:
0 = escape sequences are not translated (escape characters remain in element strings)
1 = escape sequences are translated (a second pass of list parse may fail)
With translate and trim BIFs, is flag 4 necessary?
flag 4:
0 = whitespace is translated to blank (TAB,FF,LF,CR)
1 = whitespace is treated as text

In addition to the parse list method itself, the API needs "car" and "cdr" methods at minimum: car returns the first or only element in a list. cdr returns: a single remaining element, or a single remaining list, or a sublist of remaining elements. Also needed is a method to signal if an element is itself a list.

Let me know if you think of anything I missed.
The original "presenting problem" was in terms of Python tuples and lists.  I see this as a super-set / sub-set of parsing CSV strings.  As such I'd like the candidate BIF to accept CSV strings, at least as defined in RFC4180 ( http://tools.ietf.org/html/rfc4180 ).  Which adds the concept of quoted strings to be taken as a single element.  This would seem to require another flag for striping/leaving outer single/double quote marks. (Or simply define them as being striped.)  CSV as generated by Excel spreadsheets would be a plus (or may already be covered.)  Or, this may be too much overload for one BIF.
The above information is good enough for a prototype implementation. The prototype demonstration program can be viewed at the link below although it is largely spaghetti code at this point and lacks safety and sanity checks (The code/API needs some refining and testing before it can be implemented as part of NetRexx itself.):

http://www.kermitkiser.com/NetRexx/attributestringlistparse.html

(I didn't paste the code inline this time as some seem not to like that. ;-)
There's still time to get those API requests in folks!

-- Kermit
Jeff


_______________________________________________
Ibm-netrexx mailing list
[hidden email]
Online Archive : http://ibm-netrexx.215625.n3.nabble.com/

Reply | Threaded
Open this post in threaded view
|

Re: Nested List Support?

Kermit Kiser
Hi Jeff --

Thanks for the suggestions on CSV and the API . I will look into those.

Some comments added below.

-- Kermit

On 12/9/2012 7:24 PM, Jeff Hennick wrote:
Kermit and all,

In the spirit of MFC's dictum of getting the definition right before any consideration of implementation, I offer this:

I understand your candidate BIF for Nested List Support would be:

parselist([sep][,start,end[,esc[,flags]]])
returns an indexed string of the elements of string broken at, and not including, sep. The default character for sep is comma.  Sub lists, may be considered as a single element beginning with start and ending with end.  If start and end are not specified or are null, no sublisting is performed. If any of sep, start, or end are in and element it is escaped by preceding it with esc.  If no esc is specified or it is null, no escaping is performed.  sep, start, end, and esc are strings.  flags are ???????

[A good description of the output and examples goes here!]
[I have no idea how the concept of "flags" fits into a NetRexx BIF.  It would seem to be more optional individual parameters are "standard", but that is not pretty: NetRexx is to be readable.]
[I have made esc default to null rather than "\" as a way to "turn off" escaping, which seems difficult otherwise.]

You make some good suggestions. What I termed "flags" are generally called options in the NetRexx BIF documentation (with the exception of the .format BIF where the Scientific or Engineering option is called "exform" presumably to clarify what it affects). For example the .strip BIF has an option that can be L, T, B, or null. I chose binary flags in my prototype in the hope of keeping the usage decisions simple. I probably should have used the term "options" rather than flags. In the current prototype code those options can be defaulted by omission (from the right) or passing a blank but that API can be easily changed.
I also have internal comments below.

On 12/9/2012 5:15 PM, Kermit Kiser wrote:

The API actually requires four delimiters - start, end, separator, and escape.  The escape is needed because list elements are strings and therefore may contain the delimiter characters. Although the default delimiters still need to be decided for the API, I recommend that the default escape be the normal "\" NetRexx escape character.

In addition to the delimiter characters, the API requires at least four binary flags to handle variations in list syntax. The minimum four flags are these:

flag 1:
0 = separators follow sublists (data following a sublist but preceding a delimiter is discarded)
Could you give a real life example of when this would be used?  My gut feeling is this is just a miss-formed string.
LOL. You are both right and wrong there! If you look at the examples provided by Bill, you can see that he sometimes inserts spaces after end list delimiters. This is a miss-formed list as there is no separator, but it seems to be done deliberately to improve readability for the human eye. Those extra characters are probably not part of any element (but they could be, hence the option):

fun( arg1(arg1a, arg1b), arg2(((nested)), z) )
 { 1 , 2 3 , { 9 8 , { 7 , 6 } } , , 4 5  }

1 = separators do not follow sublists (sublists act as separator characters - any data following indicates a new element)
flag 2:
0 = adjacent separators produce empty elements
1 = adjacent separators reduce to one (used for word lists which may have multiple spaces separating elements)
I think flag 3 is covered above, along with start and end.
flag 3:
0 = escape sequences are not translated (escape characters remain in element strings)
1 = escape sequences are translated (a second pass of list parse may fail)
With translate and trim BIFs, is flag 4 necessary?
It is in the case of using blanks for separators because they may look the same to the eye when displayed. You probably don't want them creating elements! (But again, you might.)
flag 4:
0 = whitespace is translated to blank (TAB,FF,LF,CR)
1 = whitespace is treated as text

In addition to the parse list method itself, the API needs "car" and "cdr" methods at minimum: car returns the first or only element in a list. cdr returns: a single remaining element, or a single remaining list, or a sublist of remaining elements. Also needed is a method to signal if an element is itself a list.

Let me know if you think of anything I missed.
The original "presenting problem" was in terms of Python tuples and lists.  I see this as a super-set / sub-set of parsing CSV strings.  As such I'd like the candidate BIF to accept CSV strings, at least as defined in RFC4180 ( http://tools.ietf.org/html/rfc4180 ).  Which adds the concept of quoted strings to be taken as a single element.  This would seem to require another flag for striping/leaving outer single/double quote marks. (Or simply define them as being striped.)  CSV as generated by Excel spreadsheets would be a plus (or may already be covered.)  Or, this may be too much overload for one BIF.
This is a great idea. I actually considered the issue of quoted strings but have not had time to add logic and API for them. I have not used Python, but a quick check shows me that the Python dictionary is close to the NetRexx data type though not as flexible. A dictionary is a map (keys to values) which every Rexx object implicitly contains along with it's atomic (string) value. A list or tuple is simply a special case of a map where the keys are integers in natural order. Those Python data structures should translate to Rexx objects quite easily.
The above information is good enough for a prototype implementation. The prototype demonstration program can be viewed at the link below although it is largely spaghetti code at this point and lacks safety and sanity checks (The code/API needs some refining and testing before it can be implemented as part of NetRexx itself.):

http://www.kermitkiser.com/NetRexx/attributestringlistparse.html

(I didn't paste the code inline this time as some seem not to like that. ;-)
There's still time to get those API requests in folks!

-- Kermit
Jeff



_______________________________________________
Ibm-netrexx mailing list
[hidden email]
Online Archive : http://ibm-netrexx.215625.n3.nabble.com/



_______________________________________________
Ibm-netrexx mailing list
[hidden email]
Online Archive : http://ibm-netrexx.215625.n3.nabble.com/

Reply | Threaded
Open this post in threaded view
|

Re: Nested List Support?

Kermit Kiser
In reply to this post by billfen
Bill --

My apology for taking so long to respond again. I have too many projects in progress. And this topic is not a trivial one. My research approach is a combination of much Googleing along with trying various things in code as I mentioned before.

Also I have no problem agreeing to disagree as it appears that is likely at least in some areas. I will add some responses to some of your comments below hopefully with color that passes through.


On 12/9/2012 6:43 PM, Bill Fenlason wrote:
Kermit,

> It is clear that you have spent considerable time on this.  I'm hopeful that with some careful explanations we may get to the point that we clearly understand each other, although there may still be some disagreements.

> I'll imbed my comments, but I will color and mark them so that those with limited email clients can distinguish my words from yours.

On 12/9/2012 5:15 PM, Kermit Kiser wrote:
Bill --

Your request has not been thrown on the garbage heap! I have been working on it quite a bit actually.

However, I think we must agree to disagree about some things. For example you ask me how I got my impression that you feel that the NetRexx data type could not possibly maintain the structure of a list, and then you repeat that exact view (these are your own words from below): "it can not inherently hold a list containing  strings and other lists of strings." I must disagree with statements like that as I feel that I have indeed proven that the Rexx object can do the job. I will even go as far as to say that I suspect that the NetRexx data structure is the only general purpose data type (as vs a custom class) in existence with the power to handle this problem/opportunity and it can do so with ease! Are you sure you fully understand how the Rexx object works and what it can do?
> This needs a careful clarification on my part.  I believe that I've stated that NetRexx CAN easily maintain the structure of a SIMPLE list.  In other words, it handles something like a list of blank or other delimited tokens or token sequences.  Thus "This is a word list" or "comma separated , multi word , list elements , etc. " are simple lists (in the form of strings) which can easily be handled by the parse statement. 

> In my statement that you quoted (
"it can not inherently hold a list containing  strings and other lists of strings."), the key word is INHERENTLY.

> Here are a few definitions for the word "inherent" via google:
>    "existing in something as a permanent, essential, or characteristic attribute"

>    "existing as an inseparable part; intrinsic"
>    "
involved in the constitution or <a moz-do-not-send="true" href="http://www.merriam-webster.com/dictionary/essential[1]" class="d_link" onclick="_gaq.push(['_trackEvent', 'Enhancement - link', 'Clicked', 'essential[1]']);">essential character of something : belonging by nature or habit"
>    "
<span style="cursor: default;" id="hotword" name="hotword" onmouseover="this.style.color='#0055bb';this.style.cursor='pointer'" onmouseout="this.style.color='#333333';this.style.cursor='default'" onclick="return hotwordOneClick(this);">existing <span id="hotword" name="hotword" onmouseover="this.style.color='#0055bb';this.style.cursor='pointer'" onmouseout="this.style.color='#333333';this.style.cursor='default'" onclick="return hotwordOneClick(this);">in <span id="hotword" name="hotword" onmouseover="this.style.color='#0055bb';this.style.cursor='pointer'" onmouseout="this.style.color='#333333';this.style.cursor='default'" onclick="return hotwordOneClick(this);">someone <span id="hotword" name="hotword" onmouseover="this.style.color='#0055bb';this.style.cursor='pointer'" onmouseout="this.style.color='#333333';this.style.cursor='default'" onclick="return hotwordOneClick(this);">or <span style="cursor: default;" id="hotword" name="hotword" onmouseover="this.style.color='#0055bb';this.style.cursor='pointer'" onmouseout="this.style.color='#333333';this.style.cursor='default'" onclick="return hotwordOneClick(this);">something <span style="cursor: default;" id="hotword" name="hotword" onmouseover="this.style.color='#0055bb';this.style.cursor='pointer'" onmouseout="this.style.color='#333333';this.style.cursor='default'" onclick="return hotwordOneClick(this);">as <span style="cursor: default;" id="hotword" name="hotword" onmouseover="this.style.color='#0055bb';this.style.cursor='pointer'" onmouseout="this.style.color='#333333';this.style.cursor='default'" onclick="return hotwordOneClick(this);">a <span style="cursor: default;" id="hotword" name="hotword" onmouseover="this.style.color='#0055bb';this.style.cursor='pointer'" onmouseout="this.style.color='#333333';this.style.cursor='default'" onclick="return hotwordOneClick(this);">permanent <span id="hotword" name="hotword" onmouseover="this.style.color='#0055bb';this.style.cursor='pointer'" onmouseout="this.style.color='#333333';this.style.cursor='default'" onclick="return hotwordOneClick(this);">and <span style="cursor: default;" id="hotword" name="hotword" onmouseover="this.style.color='#0055bb';this.style.cursor='pointer'" onmouseout="this.style.color='#333333';this.style.cursor='default'" onclick="return hotwordOneClick(this);">inseparable <span style="cursor: default;" id="hotword" name="hotword" onmouseover="this.style.color='#0055bb';this.style.cursor='pointer'" onmouseout="this.style.color='#333333';this.style.cursor='default'" onclick="return hotwordOneClick(this);">element, <span style="cursor: default;" id="hotword" name="hotword" onmouseover="this.style.color='#0055bb';this.style.cursor='pointer'" onmouseout="this.style.color='#333333';this.style.cursor='default'" onclick="return hotwordOneClick(this);">quality, <span style="cursor: default;" id="hotword" name="hotword" onmouseover="this.style.color='#0055bb';this.style.cursor='pointer'" onmouseout="this.style.color='#333333';this.style.cursor='default'" onclick="return hotwordOneClick(this);">or <span style="cursor: pointer;" id="hotword" name="hotword" onmouseover="this.style.color='#0055bb';this.style.cursor='pointer'" onmouseout="this.style.color='#333333';this.style.cursor='default'" onclick="return hotwordOneClick(this);">attribute"
(apologies to those who think this is pedantic)

> In other words, I was NOT saying that the Rexx object in NetRexx is incapable of containing an encoded "list containing strings and other lists of strings".  It is pretty obvious that by adopting encoding conventions such as the use of '{' and '}' (or other character or character sequences) to surround an embedded list of strings that it can be done.  Certainly your code snippets demonstrated that.

> What I WAS saying is that the Rexx object and NetRexx do not have that capability INHERENTLY.  There is currently nothing "built in" to the Rexx object and NetRexx which defines or processes any method of encoding a list of strings so that it can be contained in another string.  The parse statement can not handle it, and there is no other statement which INHERENTLY handles it. 

> In order to do it in your snippets, you had to define an encoded list as being surrounded by specific characters.  That does not rule out other methods of encoding lists into strings.  I think the use of "begin", "end" and "separator" characters may be the easiest way to do it, but there may be other ways that it can be done.  NetRexx does not
INHERENTLY provide a mechanism for encoding and decoding them.

> It would be presumptuous of me to quantify my understanding of the Rexx object, but I have probably spent more time examining its internal structure and the very numerous methods (in the NetRexxR Rexx.nrx module) than most.

Now you have thrown me for a loop. Disregarding for a moment your understanding of INHERENTLY, you have switched from discussing whether a Rexx object can "hold" (which I interpreted to mean contain) a decomposed list to now talking about "handling" which you further clarify as "defines or processes any method of encoding a list...". It seems to me there is a bit of difference between containing a data structure and containing the logic to process a data structure. While it is true that objects contain both data and processing logic , I hope it is clear from the fact that I have been testing code to process lists that I never suggested that the processing logic to handle complex lists is "inherent" in Rexx objects. My contention has been that the Rexx object as a general purpose native data type was sophisticated enough to contain list data whether decomposed or serialized (string encoded) and that adding appropriate logic whether internal (inherent?) or external would create an object superior to any existing "List" objects. If you meant all along that the Rexx object has no built-in methodology to process sophisticated lists, then you are correct, of course. If however, you meant that the Rexx object is not sophisticated enough to contain lists with full structural data associated, then we must still disagree. The Rexx object by definition contains both an "atomic" string value and a map of keys to values - a list is simply a map of ordered natural number keys to values. There is no difficulty in containing both a marshaled (serialized) list and a decomposed list in a single Rexx object. And since Rexx map values are themselves Rexx objects, they can easily contain sublists as well. The full blown map capability of a Rexx object means that it can contain far more information than any specialized List object, even to containing decomposed attribute lists with addressable associations. Your continued use of the term "an encoded list" still implies to me that you did not understand what my program does nor what a Rexx object contains. While my test program only includes code to decompose lists encoded in strings according to certain rules, that was all it needed because I added the logic to handle other types of lists to NetRexx a few weeks ago with your help!
Likewise, I think I understand your point about the "difference between "parsing a string" and dividing up a list which has been encoded as a string", but my point was that you cannot have one without the other - you have to first be able to parse a string before you can understand and work with it's list structure. (BTW: I did indeed encounter Lisp in my college days and and vowed to stay as far away from it as possible! But you will find that I do understand car and cdr.) I just do not agree with your contention that "decomposition of the lists comes before the parsing of strings". It cannot be done that way in practice.

> Kermit, what you are saying is simply not true.  For example, consider a method which is passed a string which contains an encoded list.  The purpose of the method is to print out the elements in the list, while ignoring any nested lists of elements.  Obviously the method first decomposes the list into elements and nested lists, and prints only the elements.  The nested lists are not decomposed.  The parse statement is not used at all.  The decoding process (as described in your snippets, for example) do not use the parse statement. 

> Perhaps you are assuming that the nested lists contain the same kinds of elements as the main list.  That is not necessarily true.  For example, suppose the primary list contains the names of states, and each state name is followed by a nested list of the major highway names and numbers running through the state.  It would certainly be reasonable to want to print only the state names without decoding the lists of highway identifiers.

> I'm not a Lisp programmer, and unfortunately there were no computer courses when I went to college.  But I did run across Lisp when I was playing with emacs many years ago, and I feel that many programmers will benefit from reading http://deptinfo.unice.fr/~roy/sicp.pdf .  Not understanding the fundamental ideas of Lisp leaves a hole in a programmers education, in my opinion.


Here I definitely must disagree. I have coded parsing algorithms hundreds if not thousands of times and what you are saying is simply not true. The only way to partially decompose a list is to sequentially extract one element at a time (which is exactly what is done during list parsing). That leaves you with a decomposed segment and an un-decomposed segment meaning that you have not yet looked at the part of the string past the extracted elements so you don't yet know what is there or how many total elements are available. It is true that an element might also be an encoded list with a completely different ruleset containing no overlapping delimiters but the original list is still completely decomposed according to the initial structural ruleset provided once all the characters have been examined. And in the more general case, you could have a list in a string that looks like the following: "< data (something> and yet more)". If your initial ruleset has start/end delimiters "<>", you will obtain a completely different decomposed list than if your start/end delimiters are "()". More on the use of multiple start/end delimiters later.
I am not going to address your questions about the best approach to adding the support you requested to NetRexx except to agree that messing with PARSE is a BAD idea! And while your BIF idea was a brilliant bypass to the objections to adding a constructor or factory to NetRexx for the List==>Map support, I am not convinced yet that it is the correct solution for this need.

> Yes, as Mike recommended, extending the parse statement is not the way to go.

That said, I am going to implement a solution to this problem. It may well require changes to the Rexx class for the same reasons that the List==>Map support did: You cannot extend the Rexx class and have full NetRexx functionality with it, so extended Rexx object capabilities may need to be added internally.

> As you know, I started this whole thing by asking the general question about what general approach would be best.  I'm not sure there is any agreement on that yet.

While I don't have a complete answer to your question yet, I do have a lot more information related to the needed API now, thanks to my research and testing. The required ruleset information has now grown to include at least 5 delimiter types and 7 flags/options and more are possible. For example, my prototype code can now decompose and recompose XML data such as the following string which you might recognize as a line from an Ant build.xml file:

'<taskdef name="nrc" classname="org.apache.tools.ant.taskdefs.optional.NetRexx" classpath="${build.classpath}"/>'

But notice that this format includes what I have called for lack of better terminology a "meta" delimiter. The character "/" is not needed to terminate the list. It is therefore considered as part of the final element of the list but in fact it has nothing to do with that element - it is a terminator for a higher level logical entity, the "taskdef" and is often placed in a separate terminator string. It really needs to be recorded as a separate list element even though it has no separator under XML placement rules. Likewise consider the following string from a JSON example in the Wikipedia article:

{
    "id": 1,
    "name": "Foo",
    "price": 123,
    "tags": [ "Bar", "Eek" ],
    "stock": {
        "warehouse": 300,
        "retail": 20
    }
}
Notice that there are two start/end delimiter pairs used to signify different data lists. Yet it is still basically a nested list format.

These things have brought me realization that an API which directly passes rule items like delimiters and options is too unwieldy for general use and does not match the format simplicity of normal NetRexx BIFs at all closely. Hence I have a new proposal for passing list rulesets which will greatly simplify things. List rulesets can be contained in a Rexx object in three separate ways:

(1) A string such as "CSV", which is a well known list format name, can select a built-in ruleset.
(2) A string such as 'delimiters(startend(start("<") end(">") ) separator(",") meta("/") escape("\\") ) options(separators-must-follow-sublists("yes") adjacent-separators-reduce-to-one("false") ) ' could provide a human readable custom set of list rules that is itself a decomposable list according to a default ruleset.
(3) A ruleset string that is a decomposable list string according to a non-default list ruleset can simply be pre-parsed (decomposed) before passing to the API something like in this example:

liststring.decomposewithruleset(rulestring.decomposewithruleset("CSV"))

In this way you eventually arrive at a ruleset string that can be handled with a built in ruleset to then create a new ruleset for decomposing the target list.

Notice that this approach allows the options and built-in rulesets to be expanded without affecting existing programs!

I also found that in addition to "first" and "rest" methods, basic list processing should include a "join" method to merge lists or add an element to a list.
One of the things that irritates me the most about NetRexx is that it cannot parse an arbitrary length word list and convert it to an indexed data item without my carefully writing some kind of loop code each time. Often I can get by using the .words/.word BIFs, but what if I need to pass the word strings to a Java interface as an array? Then I wind up having to pass the word list to the Java String.split method to get what I want and feeling second class because NetRexx has no built in support for such a common and simple need. The feature you have requested not only meets my need and completes our NetRexx collections support, it gives NetRexx a capability which Java lacks as far as I know and puts it out front in language features again in my opinion.

> Mike addressed this in his comment.

> Can't you write a general purpose subroutine (method) that is passed a Rexx string and a target Rexx (to be indexed) string or String array which does this?  Perhaps a Rexx object method?

I am not totally sure that you want to do something about this issue rather than just talking about it but I want to at least try it on for size and see how useful it might be. At one point, you asked me about an API although I don't think you have suggested anything beyond the need to provide three delimiters. In theory an API requires a common agreement by a committee and thorough documentation before it can be implemented. But I am going to say something here that I hope René does not have me assassinated for later! We have an ARB (architecture) committee and a complex and detailed formal procedure for discussing, approving, and documenting such things. It is a good plan in theory. But I think that the procedure is about the only thing that has ever made it through the committee. I seriously doubt that any changes to NetRexx will ever happen that way. We just don't have the manpower. As far as I can see, there has never been any significant progress made from whatever discussions have occurred about APIs towards implementing support for the major new features of Java like enums, generics, JSR223, JSR199, collections, annotations, closures etc. The only language change to be implemented so far was extending Loop Over to cover collections iterators and that was a major fight even though it did not require any language syntax changes. Is it because the committee members lack the time to pursue this project? Is it because they lack interest? Is it because the true goal of the structure is to make sure that nothing changes? I don't know. Reasonable caution is wise. Too much and NetRexx dies. A few like René and I are making small changes and improvements but not much makes it into the official NetRexx. As you may have figured out by now, my own approach is more akin to the Agile software development movement - I code something, then try it out to see how it works and improve it iteratively or throw it out and start over. Coding things in NetRexx is so easy that it is a much faster way to gain understanding of a problem than any formal processes or discussions can possibly provide. It does not matter if you have to throw the code out and start over, because you have learned something and the effort was a helpful part of the whole process.

> I think it is wise to be cautious with regard to language changes, and the bureaucracy is kind of like the US Congress - designed to slow down things.  But there is nothing to stop the implementation of experimental things in your branch.  I've thrown away too much code because of thoughtless design to agree with your approach in general, but I also think that over design ends up with unnecessary coding.

I still don't think any progress will occur without some sort of compromise. Even Mike has agreed with the wisdom of trial implementations of proposals for testing different approaches. At this point it is becoming clear that my prototype will have to be discarded except maybe as a reference for starting a full implementation but it definitely helped me with understanding the API needs and what could actually be accomplished with suitable algorithms. For those who want to look at or run the current spaghetti prototype, the code is here:

http://www.kermitkiser.com/NetRexx/attributestringlistparse.html

This project may take a while!
So I have been researching this problem (is that the right term?) extensively with hands on, and I think I have made some good progress. (If you have guessed from the above that I already have a draft implementation, you are correct.) Here is what I have learned in the process:

The problem domain is something I call AttributeStringLists - that is, lists with elements that can be strings or AttributeStringLists or "named" AttributeStringLists (lists which have a name string associated). The closest Java data structure is something called an AttributeList but it cannot really handle the structure retention requirements of this concept. (An Attribute is just a data item with a name string and a value.)

> I'm not sure that it is necessary or desirable to attach names to StringLists.  Getting to sound like XML :)

Interesting since your first example included implicit name tagging of sublists:

fun( arg1(arg1a, arg1b), arg2(((nested)), z) )

If we omit attribute (property) lists, we have excluded half of the list formats currently in use. We have also vastly underused the power of the native NetRexx datatype and skipped a feature set that would put NetRexx far ahead of other languages in my opinion. I had not looked at Python before this issue came up, but it is obvious to me now why it is so popular - it is way ahead of other languages (including NetRexx) in easy list handling and perhaps even map handling features and has a much better syntax than Java type languages!
The API actually requires four delimiters - start, end, separator, and escape.  The escape is needed because list elements are strings and therefore may contain the delimiter characters. Although the default delimiters still need to be decided for the API, I recommend that the default escape be the normal "\" NetRexx escape character.

In addition to the delimiter characters, the API requires at least four binary flags to handle variations in list syntax. The minimum four flags are these:

flag 1:
0 = separators follow sublists (data following a sublist but preceding a delimiter is discarded)
1 = separators do not follow sublists (sublists act as separator characters - any data following indicates a new element)
flag 2:
0 = adjacent separators produce empty elements
1 = adjacent separators reduce to one (used for word lists which may have multiple spaces separating elements)
flag 3:
0 = escape sequences are not translated (escape characters remain in element strings)
1 = escape sequences are translated (a second pass of list parse may fail)
flag 4:
0 = whitespace is translated to blank (TAB,FF,LF,CR)
1 = whitespace is treated as text

> I'm not sure all of these are needed.  Some or all could be defaulted.  I think the general approach used by the parse statement could be considered.  Simplicity should not be underestimated.

In addition to the parse list method itself, the API needs "car" and "cdr" methods at minimum: car returns the first or only element in a list. cdr returns: a single remaining element, or a single remaining list, or a sublist of remaining elements. Also needed is a method to signal if an element is itself a list.

> car should return the first item in a list, which is either an element or a nested list.  cdr should return the rest of the list (i.e the second (next) item, or the equivalent of null).  I don't think the names car and cdr should be used.


Let me know if you think of anything I missed. The above information is good enough for a prototype implementation. The prototype demonstration program can be viewed at the link below although it is largely spaghetti code at this point and lacks safety and sanity checks (The code/API needs some refining and testing before it can be implemented as part of NetRexx itself.):

http://www.kermitkiser.com/NetRexx/attributestringlistparse.html

> I will take a look at this, but I'm more the "let's define things before jumping into code" type - I guess that's why you think we are from different planets, and I wouldn't argue it :)

Bill

(I didn't paste the code inline this time as some seem not to like that. ;-)

But I will paste a snip from the program output showing some of the prototype API calls and what they produce.
Note that car and cdr do not do any parsing - it is all done in the parselist call.
Also "parselist" can accept delimiter and flag parameters.
-----------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------------------------
 parsing this list:
in=fun( arg1(arg1a, arg1b), arg2(((nested)), z) )

 parseout=parselist(in)    --    parse list structure

say car(parseout)        --    first element of a list
fun( arg1(arg1a, arg1b), arg2(((nested)), z) )

say car(car(parseout))    --    first element of first element
 arg1(arg1a, arg1b)

say cdr(car(parseout))     --    remainder of first element
 arg2(((nested)), z)

say car(cdr(car(parseout)))     --    first element of remainder of first element
((nested))

say cdr(cdr(car(parseout)))     --    remainder of remainder of first element
 z

say car(car(cdr(car(parseout))))     --    first element of first element of remainder of first element
(nested)

say car(car(car(cdr(car(parseout)))))     --    first element of first element of first element of remainder of first element
nested
-----------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------------------------
There's still time to get those API requests in folks!

-- Kermit


On 12/5/2012 5:16 PM, Bill Fenlason wrote:
Kermit,

On 12/5/2012 7:02 PM, Kermit Kiser wrote:
Bill --

Sometimes I wonder if we live on the same planet. ;-)

You are not alone :)

I don't understand how you think you can parse a string but not parse it in order. Nor do I understand why you would want a halfway solution that does not fully parse a string.

There is a difference between "parsing a string" and dividing up a list which has been encoded as a string.  With a list, it is natural to request the items in the list one at a time, and the items in a list may be either a string or another list.  Possibly you are not familiar with list processing (ala Lisp) and find this confusing, but I assume that is not the case. 

Historical note: Original Lisp (for the IBM 704 in assembler) used a tree structure and used the contents of car (address register) and cdr (decrement register) to extract the first (next) item in a list and the remainder of the list.  In Rene's example,he shows how to strip the first or next list item using the parse instruction but used the ancient names which are still in common use after all these years. 

The essential point here is that one of the ways that would make sense in NetRexx would be to encode a list as a string.  In other words, a "super" string which is a "list of strings" or a list of: "strings" and "lists of strings".

In no way did I mean to imply that the parsing was "half way", but just that the decomposition of the lists comes before the parsing of strings.  First a list is processed, and then sublists are processed as necessary.

I am sorry if you did not understand my code example, but I am not sure it can be simplified further. As I said in my post,  "Handling this type of syntax is way beyond what a parse instruction can do and I think this example shows that the general case is not trivial."

I didn't say that I didn't understand your code - it was well written and clear.

Possibly you skipped that last phrase?

The phase I meant was "while still retaining the list structure."  Parsing a single string is not the same as breaking up a list (which happens to be encoded as a single string).

I try things out with code because I don't think in abstract logic like you and Mike seem to do. So I provide working code examples to show my thoughts here. But then you say that my code has to scan strings one character at a time which is no more true or false than saying the PARSE instruction has to scan strings one character at a time. (Or do you really think that PARSE does not look at all of the characters?)

My point there was that I was asking if an approach that could be used would be to extend the parse command.  I am looking for the best general purpose approach to handle the nested list problem.  I agree that my question is more abstract than specific.

You also seem to feel that the lowly NetRexx data type could not possibly maintain the structure of a list but I think that the Rexx object is the most powerful data structure ever invented. It can not only hold strings and numbers, it can hold lists and maps and do amazing things with them and each one is a complete associative database! (And even more features are in the advanced after3.01 NetRexx version!)

I don't know how you came to that conclusion - what did I say that gave you that idea?  All I was asking was how to make the Rexx string object hold a list of strings and other (nested) lists of strings.  Certainly the Rexx object can hold a simple list of strings.  But it can not inherently hold a list containing  strings and other lists of strings.  External conventions for list delimiters must be provided.  Possibly as an extension they could be added as fields in the Rexx object.

Since I think that way, I will try again to explain what I mean with a code example. I modified my original sample program and added a method to reconstruct a parsed list, showing at each stage of reconstruction what list structure data can be extracted from the parsed string object. I even showed how you can transform one list syntax to another with the example parsed list Rexx object. (Your new example is basically the same structure with different delimiters, so the same code handles both examples fine.) Just ignore it if you still don't believe it can be done.

I certainly understand that it can be done, Kermit, and your code obviously demonstrates it. 

But the code itself does not provide an answer to the original question I asked, which was "If NetRexx or Rexx were to be extended to allow convenient parsing of nested lists, how should it be approached?"

In retrospect, perhaps I should have replaced the word "parsing" with "deconstruction". 

I provided 6 possibilities, and perhaps your code could be the basis of possibility 3 (built in functions), although there doesn't seem to be a clear API.  It certainly demonstrates an example, but obviously I'm trying to avoid that level of user coding for the general case. 

I was asking "what is the best approach?", not "can it be done?" or "is there a code snippet that  can be used?".

I thought my original post asked a single question and was reasonably clear, but apparently I was wrong about that.

BTW: PARSE is intended for very simple parsing problems. That is why RexxLA started the RegRexx project to provide a more sophisticated pattern matching and parsing facility with a simpler syntax and more flexibility than regex has. (It remains to be seen if that can be done.) I think that is also why Mike included the verify and translate, etc, mechanisms to handle more complex parsing needs.

Yes, that is what Mike said as well, and I agree in general.  I suggested the possibility of extending the parse statement by adding a functional notation in the template, but Mike said he considered and rejected it some time ago.

-- Kermit

Bill





_______________________________________________
Ibm-netrexx mailing list
[hidden email]
Online Archive : http://ibm-netrexx.215625.n3.nabble.com/



_______________________________________________
Ibm-netrexx mailing list
[hidden email]
Online Archive : http://ibm-netrexx.215625.n3.nabble.com/

Reply | Threaded
Open this post in threaded view
|

Re: Nested List Support?

billfen
Kermit,

I see no point in more picky discussion  - lets just say that we have different perspectives and leave it at that.

Bill


On 12/13/2012 5:53 PM, Kermit Kiser wrote:
Bill --

My apology for taking so long to respond again. I have too many projects in progress. And this topic is not a trivial one. My research approach is a combination of much Googleing along with trying various things in code as I mentioned before.

Also I have no problem agreeing to disagree as it appears that is likely at least in some areas.



_______________________________________________
Ibm-netrexx mailing list
[hidden email]
Online Archive : http://ibm-netrexx.215625.n3.nabble.com/

Reply | Threaded
Open this post in threaded view
|

Fwd: Re: Nested List Support?

Jeff Hennick
In reply to this post by Jeff Hennick
Kermit, Bill, and all,

After Kermit's input on options, re-reading NRL3, and some further thinking, let's open discussion on this Proposed BIF:

parselist([sep][,start,end[,esc[,options]]])
returns a one-based indexed string of the elements of string broken at, and not including, sep. The default character for sep is comma.  Sub lists, may be considered as a single element beginning with start and ending with end.  If start and end are not specified or are null, no sublisting is performed. If any of sep, start, or end are in an element it is escaped by preceding it with esc.  If no esc is specified or it is null, no escaping is performed.  sep, start, end, esc, and options are strings. 

option
s is a string of zero or more space delimited words.  Only the first character of each word is significant and it may be in uppercase or in lowercase.  They are processed left to right, so if there is any conflict the rightmost is used.  (For example, the order of C and E changes the handling of escaped quotation marks.) Any of these words may be prefixed with NO, in any case, to reverse its action, so here the first 3 characters are important.  The default for any option is the NO-value.

The following option characters are recognized:

D (Discard); data following a sublist but preceding a delimiter is discarded.
NOD (No Discard); sublists act as separator characters - any data following indicates a new element.


R (Reduce); adjacent separators reduce to one.
NOR (No Reduce); adjacent separators produce empty elements.


E (Escape strings are kept); escape characters remain in element strings.
NOE (No Escape strings); escape characters are removed from element strings.


W (White space); whitespace is treated as text.
NOW (No White space); whitespace is translated to blank (TAB, FF, LF, CR).  Adjacent white space, including multiple blanks, is collapsed to a single blank.


S
(Sublists) The output includes sublists broken into elements
NOS
(No Sublists) The output includes sublists as single elements.


C (Comma Separated Values); elements may be surrounded by quotation marks, either single (') or double ("), in pairs.  Contained quotation marks (of the same type) are escaped by doubling the mark.  The outer quotation marks are removed, and inner ones unescaped in the output.  This processed as in RFC4180 ( http://tools.ietf.org/html/rfc4180 ).

If sep is null, it defaults to comma (,).  Other values for sep may be used, such as tab (d2c(9)), or semicolon (;).
NOC (Not Comma Separated Values); quotation marks are treated as text.
[Are there any better mnemonic words for these options?  Should any have the default inverted?  If so, what word would be appropriate so the NO-version can be the new default?]

[I have added the Sublists and NOSublists options as I understand each is valuable in different situations.]

-------- Original Message --------
Subject: Re: [Ibm-netrexx] Nested List Support?
Date: Mon, 10 Dec 2012 00:24:17 -0500
From: Jeff Hennick [hidden email]
To: IBM Netrexx [hidden email]


Kermit and all,

In the spirit of MFC's dictum of getting the definition right before any consideration of implementation, I offer this:

I understand your candidate BIF for Nested List Support would be:

parselist([sep][,start,end[,esc[,flags]]])
returns an indexed string of the elements of string broken at, and not including, sep. The default character for sep is comma.  Sub lists, may be considered as a single element beginning with start and ending with end.  If start and end are not specified or are null, no sublisting is performed. If any of sep, start, or end are in and element it is escaped by preceding it with esc.  If no esc is specified or it is null, no escaping is performed.  sep, start, end, and esc are strings.  flags are ???????

[A good description of the output and examples goes here!]
[I have no idea how the concept of "flags" fits into a NetRexx BIF.  It would seem to be more optional individual parameters are "standard", but that is not pretty: NetRexx is to be readable.]
[I have made esc default to null rather than "\" as a way to "turn off" escaping, which seems difficult otherwise.]

I also have internal comments below.

On 12/9/2012 5:15 PM, Kermit Kiser wrote:

The API actually requires four delimiters - start, end, separator, and escape.  The escape is needed because list elements are strings and therefore may contain the delimiter characters. Although the default delimiters still need to be decided for the API, I recommend that the default escape be the normal "\" NetRexx escape character.

In addition to the delimiter characters, the API requires at least four binary flags to handle variations in list syntax. The minimum four flags are these:

flag 1:
0 = separators follow sublists (data following a sublist but preceding a delimiter is discarded)
Could you give a real life example of when this would be used?  My gut feeling is this is just a miss-formed string.
1 = separators do not follow sublists (sublists act as separator characters - any data following indicates a new element)
flag 2:
0 = adjacent separators produce empty elements
1 = adjacent separators reduce to one (used for word lists which may have multiple spaces separating elements)
I think flag 3 is covered above, along with start and end.
flag 3:
0 = escape sequences are not translated (escape characters remain in element strings)
1 = escape sequences are translated (a second pass of list parse may fail)
With translate and trim BIFs, is flag 4 necessary?
flag 4:
0 = whitespace is translated to blank (TAB,FF,LF,CR)
1 = whitespace is treated as text

In addition to the parse list method itself, the API needs "car" and "cdr" methods at minimum: car returns the first or only element in a list. cdr returns: a single remaining element, or a single remaining list, or a sublist of remaining elements. Also needed is a method to signal if an element is itself a list.

Let me know if you think of anything I missed.
The original "presenting problem" was in terms of Python tuples and lists.  I see this as a super-set / sub-set of parsing CSV strings.  As such I'd like the candidate BIF to accept CSV strings, at least as defined in RFC4180 ( http://tools.ietf.org/html/rfc4180 ).  Which adds the concept of quoted strings to be taken as a single element.  This would seem to require another flag for striping/leaving outer single/double quote marks. (Or simply define them as being striped.)  CSV as generated by Excel spreadsheets would be a plus (or may already be covered.)  Or, this may be too much overload for one BIF.
The above information is good enough for a prototype implementation. The prototype demonstration program can be viewed at the link below although it is largely spaghetti code at this point and lacks safety and sanity checks (The code/API needs some refining and testing before it can be implemented as part of NetRexx itself.):

http://www.kermitkiser.com/NetRexx/attributestringlistparse.html

(I didn't paste the code inline this time as some seem not to like that. ;-)
There's still time to get those API requests in folks!

-- Kermit
Jeff




_______________________________________________
Ibm-netrexx mailing list
[hidden email]
Online Archive : http://ibm-netrexx.215625.n3.nabble.com/

Reply | Threaded
Open this post in threaded view
|

Re: Fwd: Re: Nested List Support?

Kermit Kiser
Hi Jeff --

Just to let you know that I am also working on this issue. I am still planning to make something available for public testing but it may take a while as this does not seem to be a trivial problem to resolve. Besides you and I, no one else seems interested in providing (or perhaps they lack time to provide) feedback or suggestions on a feature set or API or implementation approach for this type of support.

Your API suggestion is good and I considered using something like that after my early testing with the delimiter string block and options flag block API. But it seemed a bit complex and lacked the flexibility that I think we will need in this feature if we ever release it. I discarded my first test prototype and created a new program with what seems like a more flexible API:

A list ruleset specifies a set of delimiters and options and can be provided in four separate ways:

(1) A null ruleset signifies a default ruleset. For example, start/end delimiters "()", a separator " " (a blank), an escape character " (double quote) and a nameseperator ":" and option "escape is quoted string mode"

(2) A string such as "CSV", which is a well known list format name, selects a built-in ruleset. For example, a list in CSV format could be decoded like this: inputlist=inputstring.getlist("CSV")

(3) A string which provides a human readable custom set of list rules that is itself a decomposable list according to the default ruleset. For example: 'delimiters(startend(start("<") end(">") ) separator(",") meta("/") escape("\\") namesep("=") ) options(separators-must-follow-sublists adjacent-separators-reduce-to-one)'

(4) A ruleset string that is an encoded list according to a known ruleset can simply be preparsed before use like in this example: inputlist=inputstring.getlist(rulesetstring.getlist("CSV"))

Although it is not completely defined or tested and debugged, you can view this second prototype or download the code to play with it from here:

http://kermitkiser.com/NetRexx/AttributeStringListTest.html

Current limitations: only a single set of delimiters is accepted, only single character delimiters can be used, meta delimiters are not processed yet, the option for delimiter recording is not yet implemented, abbreviations are not yet supported, delimiters cannot be omitted yet.

Here is my current planned option set for list rules: (0 indicates a default setting)
option 1:
0 = separators follow sublists
1 = separators do not follow sublists
option 2:
0 = adjacent separators reduce to one
1 = adjacent separators produce empty elements
option 3:
0 = escape sequences are translated
1 = escape sequences are not translated
option 4:
0 = whitespace is translated to blank (TAB,FF,LF,CR,VT)
1 = whitespace is treated as text
option 5:
0 = escape is quoted string mode (ie "text, delimiters or double escape like this: "" more text")
1 = escape is single character (ie \x)
option 6:
0 = attribute names are implicit (ie fun(x,y) )
1 = attribute names are explicit (ie with delimiter as in fun=(x,y) or fun:(x,y) )
option 7:
0 = delimiters are implicit (do not record structural delimiters)
1 = delimiters are tokens (save delimiters as separate elements)

Once the second prototype was bootstrapped (meaning the default rule could decode additional rulesets which could then decode any list strings matching them), I left that program and moved to a third prototype which uses a Rexx subclass to hold the lists and new methods. Even using a custom version of NetRexx which allows the Rexx class to be extended and treats extensions as Rexx equivalents for automatic conversions, I quickly ran into serious problems with that approach which convinced me to discard it. My fourth try will involve modifying the Rexx class itself to include the list handling code as I am now convinced that it may be the only reasonable approach to implement this feature set.

-- Kermit

PS: Your Sublists/NoSublists option is not really necessary as my current implementation automatically includes both formats. To clarify - after calling parselist or getlist (or probably buildlist in the final version) to store the decoded list in variable "inputlist", if the third item in the list is a sublist, then inputlist[3] would return that entire sublist, while inputlist[3,1] would return the first element of the sublist. Likewise CSV is a builtin ruleset so no special option is needed for it.


On 12/16/2012 5:06 PM, Jeff Hennick wrote:
Kermit, Bill, and all,

After Kermit's input on options, re-reading NRL3, and some further thinking, let's open discussion on this Proposed BIF:

parselist([sep][,start,end[,esc[,options]]])
returns a one-based indexed string of the elements of string broken at, and not including, sep. The default character for sep is comma.  Sub lists, may be considered as a single element beginning with start and ending with end.  If start and end are not specified or are null, no sublisting is performed. If any of sep, start, or end are in an element it is escaped by preceding it with esc.  If no esc is specified or it is null, no escaping is performed.  sep, start, end, esc, and options are strings. 

option
s is a string of zero or more space delimited words.  Only the first character of each word is significant and it may be in uppercase or in lowercase.  They are processed left to right, so if there is any conflict the rightmost is used.  (For example, the order of C and E changes the handling of escaped quotation marks.) Any of these words may be prefixed with NO, in any case, to reverse its action, so here the first 3 characters are important.  The default for any option is the NO-value.

The following option characters are recognized:

D (Discard); data following a sublist but preceding a delimiter is discarded.
NOD (No Discard); sublists act as separator characters - any data following indicates a new element.


R (Reduce); adjacent separators reduce to one.
NOR (No Reduce); adjacent separators produce empty elements.


E (Escape strings are kept); escape characters remain in element strings.
NOE (No Escape strings); escape characters are removed from element strings.


W (White space); whitespace is treated as text.
NOW (No White space); whitespace is translated to blank (TAB, FF, LF, CR).  Adjacent white space, including multiple blanks, is collapsed to a single blank.


S
(Sublists) The output includes sublists broken into elements
NOS
(No Sublists) The output includes sublists as single elements.


C (Comma Separated Values); elements may be surrounded by quotation marks, either single (') or double ("), in pairs.  Contained quotation marks (of the same type) are escaped by doubling the mark.  The outer quotation marks are removed, and inner ones unescaped in the output.  This processed as in RFC4180 ( http://tools.ietf.org/html/rfc4180 ).

If sep is null, it defaults to comma (,).  Other values for sep may be used, such as tab (d2c(9)), or semicolon (;).
NOC (Not Comma Separated Values); quotation marks are treated as text.
[Are there any better mnemonic words for these options?  Should any have the default inverted?  If so, what word would be appropriate so the NO-version can be the new default?]

[I have added the Sublists and NOSublists options as I understand each is valuable in different situations.]

-------- Original Message --------
Subject: Re: [Ibm-netrexx] Nested List Support?
Date: Mon, 10 Dec 2012 00:24:17 -0500
From: Jeff Hennick [hidden email]
To: IBM Netrexx [hidden email]


Kermit and all,

In the spirit of MFC's dictum of getting the definition right before any consideration of implementation, I offer this:

I understand your candidate BIF for Nested List Support would be:

parselist([sep][,start,end[,esc[,flags]]])
returns an indexed string of the elements of string broken at, and not including, sep. The default character for sep is comma.  Sub lists, may be considered as a single element beginning with start and ending with end.  If start and end are not specified or are null, no sublisting is performed. If any of sep, start, or end are in and element it is escaped by preceding it with esc.  If no esc is specified or it is null, no escaping is performed.  sep, start, end, and esc are strings.  flags are ???????

[A good description of the output and examples goes here!]
[I have no idea how the concept of "flags" fits into a NetRexx BIF.  It would seem to be more optional individual parameters are "standard", but that is not pretty: NetRexx is to be readable.]
[I have made esc default to null rather than "\" as a way to "turn off" escaping, which seems difficult otherwise.]

I also have internal comments below.

On 12/9/2012 5:15 PM, Kermit Kiser wrote:

The API actually requires four delimiters - start, end, separator, and escape.  The escape is needed because list elements are strings and therefore may contain the delimiter characters. Although the default delimiters still need to be decided for the API, I recommend that the default escape be the normal "\" NetRexx escape character.

In addition to the delimiter characters, the API requires at least four binary flags to handle variations in list syntax. The minimum four flags are these:

flag 1:
0 = separators follow sublists (data following a sublist but preceding a delimiter is discarded)
Could you give a real life example of when this would be used?  My gut feeling is this is just a miss-formed string.
1 = separators do not follow sublists (sublists act as separator characters - any data following indicates a new element)
flag 2:
0 = adjacent separators produce empty elements
1 = adjacent separators reduce to one (used for word lists which may have multiple spaces separating elements)
I think flag 3 is covered above, along with start and end.
flag 3:
0 = escape sequences are not translated (escape characters remain in element strings)
1 = escape sequences are translated (a second pass of list parse may fail)
With translate and trim BIFs, is flag 4 necessary?
flag 4:
0 = whitespace is translated to blank (TAB,FF,LF,CR)
1 = whitespace is treated as text

In addition to the parse list method itself, the API needs "car" and "cdr" methods at minimum: car returns the first or only element in a list. cdr returns: a single remaining element, or a single remaining list, or a sublist of remaining elements. Also needed is a method to signal if an element is itself a list.

Let me know if you think of anything I missed.
The original "presenting problem" was in terms of Python tuples and lists.  I see this as a super-set / sub-set of parsing CSV strings.  As such I'd like the candidate BIF to accept CSV strings, at least as defined in RFC4180 ( http://tools.ietf.org/html/rfc4180 ).  Which adds the concept of quoted strings to be taken as a single element.  This would seem to require another flag for striping/leaving outer single/double quote marks. (Or simply define them as being striped.)  CSV as generated by Excel spreadsheets would be a plus (or may already be covered.)  Or, this may be too much overload for one BIF.
The above information is good enough for a prototype implementation. The prototype demonstration program can be viewed at the link below although it is largely spaghetti code at this point and lacks safety and sanity checks (The code/API needs some refining and testing before it can be implemented as part of NetRexx itself.):

http://www.kermitkiser.com/NetRexx/attributestringlistparse.html

(I didn't paste the code inline this time as some seem not to like that. ;-)
There's still time to get those API requests in folks!

-- Kermit
Jeff





_______________________________________________
Ibm-netrexx mailing list
[hidden email]
Online Archive : http://ibm-netrexx.215625.n3.nabble.com/



_______________________________________________
Ibm-netrexx mailing list
[hidden email]
Online Archive : http://ibm-netrexx.215625.n3.nabble.com/

Reply | Threaded
Open this post in threaded view
|

Re: Fwd: Re: Nested List Support?

Dave Woodman
In reply to this post by Jeff Hennick

Hi Jeff/Kermit

 

Reading this I am struck with a vague disquiet at the name of the BIF:- looking back through the archive I see when it originated, but for some reason I did not receive that (and several other) posts.

 

My low-level disquiet comes from the use of “parse” in the name when the purpose, syntax and behaviour do not resemble the current “parse.” Perhaps “splitlist” may reflect the function’s purpose better?

 

I also noted the “NO” flags – are there any instances where setting a flag implies a set of another flag that may, or may not, need to be overridden? I have no objections, per se, to syntactic sugar where appropriate, but if all flags default to “NO” and do not have any interactions then this seems unnecessary.

 

Just my tuppence ha’penneth

 

                Dave.

 

From: [hidden email] [mailto:[hidden email]] On Behalf Of Jeff Hennick
Sent: 17 December 2012 03:07
To: IBM Netrexx
Subject: [Ibm-netrexx] Fwd: Re: Nested List Support?

 

Kermit, Bill, and all,

After Kermit's input on options, re-reading NRL3, and some further thinking, let's open discussion on this Proposed BIF:

parselist([sep][,start,end[,esc[,options]]])

returns a one-based indexed string of the elements of string broken at, and not including, sep. The default character for sep is comma.  Sub lists, may be considered as a single element beginning with start and ending with end.  If start and end are not specified or are null, no sublisting is performed. If any of sep, start, or end are in an element it is escaped by preceding it with esc.  If no esc is specified or it is null, no escaping is performed.  sep, start, end, esc, and options are strings. 

options
is a string of zero or more space delimited words.  Only the first character of each word is significant and it may be in uppercase or in lowercase.  They are processed left to right, so if there is any conflict the rightmost is used.  (For example, the order of C and E changes the handling of escaped quotation marks.) Any of these words may be prefixed with NO, in any case, to reverse its action, so here the first 3 characters are important.  The default for any option is the NO-value.

The following option characters are recognized:

D

(Discard); data following a sublist but preceding a delimiter is discarded.

NOD

(No Discard); sublists act as separator characters - any data following indicates a new element.

R

(Reduce); adjacent separators reduce to one.

NOR

(No Reduce); adjacent separators produce empty elements.

E

(Escape strings are kept); escape characters remain in element strings.

NOE

(No Escape strings); escape characters are removed from element strings.

W

(White space); whitespace is treated as text.

NOW

(No White space); whitespace is translated to blank (TAB, FF, LF, CR).  Adjacent white space, including multiple blanks, is collapsed to a single blank.

S

(Sublists) The output includes sublists broken into elements

NOS

(No Sublists) The output includes sublists as single elements.

C

(Comma Separated Values); elements may be surrounded by quotation marks, either single (') or double ("), in pairs.  Contained quotation marks (of the same type) are escaped by doubling the mark.  The outer quotation marks are removed, and inner ones unescaped in the output.  This processed as in RFC4180 ( http://tools.ietf.org/html/rfc4180 ).

If sep is null, it defaults to comma (,).  Other values for sep may be used, such as tab (d2c(9)), or semicolon (;).

NOC

(Not Comma Separated Values); quotation marks are treated as text.

[Are there any better mnemonic words for these options?  Should any have the default inverted?  If so, what word would be appropriate so the NO-version can be the new default?]

[I have added the Sublists and NOSublists options as I understand each is valuable in different situations.]

-------- Original Message --------

Subject:

Re: [Ibm-netrexx] Nested List Support?

Date:

Mon, 10 Dec 2012 00:24:17 -0500

From:

Jeff Hennick [hidden email]

To:

IBM Netrexx [hidden email]

 

Kermit and all,

In the spirit of MFC's dictum of getting the definition right before any consideration of implementation, I offer this:

I understand your candidate BIF for Nested List Support would be:

parselist([sep][,start,end[,esc[,flags]]])

returns an indexed string of the elements of string broken at, and not including, sep. The default character for sep is comma.  Sub lists, may be considered as a single element beginning with start and ending with end.  If start and end are not specified or are null, no sublisting is performed. If any of sep, start, or end are in and element it is escaped by preceding it with esc.  If no esc is specified or it is null, no escaping is performed.  sep, start, end, and esc are strings.  flags are ???????

[A good description of the output and examples goes here!]

[I have no idea how the concept of "flags" fits into a NetRexx BIF.  It would seem to be more optional individual parameters are "standard", but that is not pretty: NetRexx is to be readable.]
[I have made esc default to null rather than "\" as a way to "turn off" escaping, which seems difficult otherwise.]

I also have internal comments below.

On 12/9/2012 5:15 PM, Kermit Kiser wrote:


The API actually requires four delimiters - start, end, separator, and escape.  The escape is needed because list elements are strings and therefore may contain the delimiter characters. Although the default delimiters still need to be decided for the API, I recommend that the default escape be the normal "\" NetRexx escape character.

In addition to the delimiter characters, the API requires at least four binary flags to handle variations in list syntax. The minimum four flags are these:

flag 1:
0 = separators follow sublists (data following a sublist but preceding a delimiter is discarded)

Could you give a real life example of when this would be used?  My gut feeling is this is just a miss-formed string.

1 = separators do not follow sublists (sublists act as separator characters - any data following indicates a new element)
flag 2:
0 = adjacent separators produce empty elements
1 = adjacent separators reduce to one (used for word lists which may have multiple spaces separating elements)

I think flag 3 is covered above, along with start and end.

flag 3:
0 = escape sequences are not translated (escape characters remain in element strings)
1 = escape sequences are translated (a second pass of list parse may fail)

With translate and trim BIFs, is flag 4 necessary?

flag 4:
0 = whitespace is translated to blank (TAB,FF,LF,CR)
1 = whitespace is treated as text

In addition to the parse list method itself, the API needs "car" and "cdr" methods at minimum: car returns the first or only element in a list. cdr returns: a single remaining element, or a single remaining list, or a sublist of remaining elements. Also needed is a method to signal if an element is itself a list.

Let me know if you think of anything I missed.

The original "presenting problem" was in terms of Python tuples and lists.  I see this as a super-set / sub-set of parsing CSV strings.  As such I'd like the candidate BIF to accept CSV strings, at least as defined in RFC4180 ( http://tools.ietf.org/html/rfc4180 ).  Which adds the concept of quoted strings to be taken as a single element.  This would seem to require another flag for striping/leaving outer single/double quote marks. (Or simply define them as being striped.)  CSV as generated by Excel spreadsheets would be a plus (or may already be covered.)  Or, this may be too much overload for one BIF.

The above information is good enough for a prototype implementation. The prototype demonstration program can be viewed at the link below although it is largely spaghetti code at this point and lacks safety and sanity checks (The code/API needs some refining and testing before it can be implemented as part of NetRexx itself.):

http://www.kermitkiser.com/NetRexx/attributestringlistparse.html

(I didn't paste the code inline this time as some seem not to like that. ;-)

There's still time to get those API requests in folks!

-- Kermit

Jeff

 


_______________________________________________
Ibm-netrexx mailing list
[hidden email]
Online Archive : http://ibm-netrexx.215625.n3.nabble.com/

Reply | Threaded
Open this post in threaded view
|

Re: Fwd: Re: Nested List Support?

George Hovey-2
Dave and Kermit,
Just 2 cents.  These sound like issues that might have received considerable attention from the computer science establishment.  Is it possible a wheel is being reinvented?

On Mon, Dec 17, 2012 at 7:51 AM, Dave Woodman <[hidden email]> wrote:

Hi Jeff/Kermit

 

Reading this I am struck with a vague disquiet at the name of the BIF:- looking back through the archive I see when it originated, but for some reason I did not receive that (and several other) posts.

 

My low-level disquiet comes from the use of “parse” in the name when the purpose, syntax and behaviour do not resemble the current “parse.” Perhaps “splitlist” may reflect the function’s purpose better?

 

I also noted the “NO” flags – are there any instances where setting a flag implies a set of another flag that may, or may not, need to be overridden? I have no objections, per se, to syntactic sugar where appropriate, but if all flags default to “NO” and do not have any interactions then this seems unnecessary.

 

Just my tuppence ha’penneth

 

                Dave.

 

From: [hidden email] [mailto:[hidden email]] On Behalf Of Jeff Hennick
Sent: 17 December 2012 03:07
To: IBM Netrexx
Subject: [Ibm-netrexx] Fwd: Re: Nested List Support?

 

Kermit, Bill, and all,

After Kermit's input on options, re-reading NRL3, and some further thinking, let's open discussion on this Proposed BIF:

parselist([sep][,start,end[,esc[,options]]])

returns a one-based indexed string of the elements of string broken at, and not including, sep. The default character for sep is comma.  Sub lists, may be considered as a single element beginning with start and ending with end.  If start and end are not specified or are null, no sublisting is performed. If any of sep, start, or end are in an element it is escaped by preceding it with esc.  If no esc is specified or it is null, no escaping is performed.  sep, start, end, esc, and options are strings. 

options
is a string of zero or more space delimited words.  Only the first character of each word is significant and it may be in uppercase or in lowercase.  They are processed left to right, so if there is any conflict the rightmost is used.  (For example, the order of C and E changes the handling of escaped quotation marks.) Any of these words may be prefixed with NO, in any case, to reverse its action, so here the first 3 characters are important.  The default for any option is the NO-value.

The following option characters are recognized:

D

(Discard); data following a sublist but preceding a delimiter is discarded.

NOD

(No Discard); sublists act as separator characters - any data following indicates a new element.

R

(Reduce); adjacent separators reduce to one.

NOR

(No Reduce); adjacent separators produce empty elements.

E

(Escape strings are kept); escape characters remain in element strings.

NOE

(No Escape strings); escape characters are removed from element strings.

W

(White space); whitespace is treated as text.

NOW

(No White space); whitespace is translated to blank (TAB, FF, LF, CR).  Adjacent white space, including multiple blanks, is collapsed to a single blank.

S

(Sublists) The output includes sublists broken into elements

NOS

(No Sublists) The output includes sublists as single elements.

C

(Comma Separated Values); elements may be surrounded by quotation marks, either single (') or double ("), in pairs.  Contained quotation marks (of the same type) are escaped by doubling the mark.  The outer quotation marks are removed, and inner ones unescaped in the output.  This processed as in RFC4180 ( http://tools.ietf.org/html/rfc4180 ).

If sep is null, it defaults to comma (,).  Other values for sep may be used, such as tab (d2c(9)), or semicolon (;).

NOC

(Not Comma Separated Values); quotation marks are treated as text.

[Are there any better mnemonic words for these options?  Should any have the default inverted?  If so, what word would be appropriate so the NO-version can be the new default?]

[I have added the Sublists and NOSublists options as I understand each is valuable in different situations.]

-------- Original Message --------

Subject:

Re: [Ibm-netrexx] Nested List Support?

Date:

Mon, 10 Dec 2012 00:24:17 -0500

From:

Jeff Hennick [hidden email]

To:

IBM Netrexx [hidden email]

 

Kermit and all,

In the spirit of MFC's dictum of getting the definition right before any consideration of implementation, I offer this:

I understand your candidate BIF for Nested List Support would be:

parselist([sep][,start,end[,esc[,flags]]])

returns an indexed string of the elements of string broken at, and not including, sep. The default character for sep is comma.  Sub lists, may be considered as a single element beginning with start and ending with end.  If start and end are not specified or are null, no sublisting is performed. If any of sep, start, or end are in and element it is escaped by preceding it with esc.  If no esc is specified or it is null, no escaping is performed.  sep, start, end, and esc are strings.  flags are ???????

[A good description of the output and examples goes here!]

[I have no idea how the concept of "flags" fits into a NetRexx BIF.  It would seem to be more optional individual parameters are "standard", but that is not pretty: NetRexx is to be readable.]
[I have made esc default to null rather than "\" as a way to "turn off" escaping, which seems difficult otherwise.]

I also have internal comments below.

On 12/9/2012 5:15 PM, Kermit Kiser wrote:


The API actually requires four delimiters - start, end, separator, and escape.  The escape is needed because list elements are strings and therefore may contain the delimiter characters. Although the default delimiters still need to be decided for the API, I recommend that the default escape be the normal "\" NetRexx escape character.

In addition to the delimiter characters, the API requires at least four binary flags to handle variations in list syntax. The minimum four flags are these:

flag 1:
0 = separators follow sublists (data following a sublist but preceding a delimiter is discarded)

Could you give a real life example of when this would be used?  My gut feeling is this is just a miss-formed string.

1 = separators do not follow sublists (sublists act as separator characters - any data following indicates a new element)
flag 2:
0 = adjacent separators produce empty elements
1 = adjacent separators reduce to one (used for word lists which may have multiple spaces separating elements)

I think flag 3 is covered above, along with start and end.

flag 3:
0 = escape sequences are not translated (escape characters remain in element strings)
1 = escape sequences are translated (a second pass of list parse may fail)

With translate and trim BIFs, is flag 4 necessary?

flag 4:
0 = whitespace is translated to blank (TAB,FF,LF,CR)
1 = whitespace is treated as text

In addition to the parse list method itself, the API needs "car" and "cdr" methods at minimum: car returns the first or only element in a list. cdr returns: a single remaining element, or a single remaining list, or a sublist of remaining elements. Also needed is a method to signal if an element is itself a list.

Let me know if you think of anything I missed.

The original "presenting problem" was in terms of Python tuples and lists.  I see this as a super-set / sub-set of parsing CSV strings.  As such I'd like the candidate BIF to accept CSV strings, at least as defined in RFC4180 ( http://tools.ietf.org/html/rfc4180 ).  Which adds the concept of quoted strings to be taken as a single element.  This would seem to require another flag for striping/leaving outer single/double quote marks. (Or simply define them as being striped.)  CSV as generated by Excel spreadsheets would be a plus (or may already be covered.)  Or, this may be too much overload for one BIF.

The above information is good enough for a prototype implementation. The prototype demonstration program can be viewed at the link below although it is largely spaghetti code at this point and lacks safety and sanity checks (The code/API needs some refining and testing before it can be implemented as part of NetRexx itself.):

http://www.kermitkiser.com/NetRexx/attributestringlistparse.html

(I didn't paste the code inline this time as some seem not to like that. ;-)

There's still time to get those API requests in folks!

-- Kermit

Jeff

 


_______________________________________________
Ibm-netrexx mailing list
[hidden email]
Online Archive : http://ibm-netrexx.215625.n3.nabble.com/





--
"One can live magnificently in this world if one knows how to work and how to love."  --  Leo Tolstoy
_______________________________________________
Ibm-netrexx mailing list
[hidden email]
Online Archive : http://ibm-netrexx.215625.n3.nabble.com/

Reply | Threaded
Open this post in threaded view
|

Re: Fwd: Re: Nested List Support?

Mike Cowlishaw
Dave's comments are more aimed at the issues of naming, syntax, and usability rather than semantics.  Those are areas that the 'computer science establishment' prefer to ignore.   So I think he raises good questions .. refinements, perhaps, but a refined design often lasts longer (and proves more extensible in the long term).
 
Mike


From: [hidden email] [mailto:[hidden email]] On Behalf Of George Hovey
Sent: 17 December 2012 13:25
To: IBM Netrexx
Subject: Re: [Ibm-netrexx] Fwd: Re: Nested List Support?

Dave and Kermit,
Just 2 cents.  These sound like issues that might have received considerable attention from the computer science establishment.  Is it possible a wheel is being reinvented?

On Mon, Dec 17, 2012 at 7:51 AM, Dave Woodman <[hidden email]> wrote:

Hi Jeff/Kermit

 

Reading this I am struck with a vague disquiet at the name of the BIF:- looking back through the archive I see when it originated, but for some reason I did not receive that (and several other) posts.

 

My low-level disquiet comes from the use of “parse” in the name when the purpose, syntax and behaviour do not resemble the current “parse.” Perhaps “splitlist” may reflect the function’s purpose better?

 

I also noted the “NO” flags – are there any instances where setting a flag implies a set of another flag that may, or may not, need to be overridden? I have no objections, per se, to syntactic sugar where appropriate, but if all flags default to “NO” and do not have any interactions then this seems unnecessary.

 

Just my tuppence ha’penneth

 

                Dave.

 

From: [hidden email] [mailto:[hidden email]] On Behalf Of Jeff Hennick
Sent: 17 December 2012 03:07
To: IBM Netrexx
Subject: [Ibm-netrexx] Fwd: Re: Nested List Support?

 

Kermit, Bill, and all,

After Kermit's input on options, re-reading NRL3, and some further thinking, let's open discussion on this Proposed BIF:

parselist([sep][,start,end[,esc[,options]]])

returns a one-based indexed string of the elements of string broken at, and not including, sep. The default character for sep is comma.  Sub lists, may be considered as a single element beginning with start and ending with end.  If start and end are not specified or are null, no sublisting is performed. If any of sep, start, or end are in an element it is escaped by preceding it with esc.  If no esc is specified or it is null, no escaping is performed.  sep, start, end, esc, and options are strings. 

options
is a string of zero or more space delimited words.  Only the first character of each word is significant and it may be in uppercase or in lowercase.  They are processed left to right, so if there is any conflict the rightmost is used.  (For example, the order of C and E changes the handling of escaped quotation marks.) Any of these words may be prefixed with NO, in any case, to reverse its action, so here the first 3 characters are important.  The default for any option is the NO-value.

The following option characters are recognized:

D

(Discard); data following a sublist but preceding a delimiter is discarded.

NOD

(No Discard); sublists act as separator characters - any data following indicates a new element.

R

(Reduce); adjacent separators reduce to one.

NOR

(No Reduce); adjacent separators produce empty elements.

E

(Escape strings are kept); escape characters remain in element strings.

NOE

(No Escape strings); escape characters are removed from element strings.

W

(White space); whitespace is treated as text.

NOW

(No White space); whitespace is translated to blank (TAB, FF, LF, CR).  Adjacent white space, including multiple blanks, is collapsed to a single blank.

S

(Sublists) The output includes sublists broken into elements

NOS

(No Sublists) The output includes sublists as single elements.

C

(Comma Separated Values); elements may be surrounded by quotation marks, either single (') or double ("), in pairs.  Contained quotation marks (of the same type) are escaped by doubling the mark.  The outer quotation marks are removed, and inner ones unescaped in the output.  This processed as in RFC4180 ( http://tools.ietf.org/html/rfc4180 ).

If sep is null, it defaults to comma (,).  Other values for sep may be used, such as tab (d2c(9)), or semicolon (;).

NOC

(Not Comma Separated Values); quotation marks are treated as text.

[Are there any better mnemonic words for these options?  Should any have the default inverted?  If so, what word would be appropriate so the NO-version can be the new default?]

[I have added the Sublists and NOSublists options as I understand each is valuable in different situations.]

-------- Original Message --------

Subject:

Re: [Ibm-netrexx] Nested List Support?

Date:

Mon, 10 Dec 2012 00:24:17 -0500

From:

Jeff Hennick [hidden email]

To:

IBM Netrexx [hidden email]

 

Kermit and all,

In the spirit of MFC's dictum of getting the definition right before any consideration of implementation, I offer this:

I understand your candidate BIF for Nested List Support would be:

parselist([sep][,start,end[,esc[,flags]]])

returns an indexed string of the elements of string broken at, and not including, sep. The default character for sep is comma.  Sub lists, may be considered as a single element beginning with start and ending with end.  If start and end are not specified or are null, no sublisting is performed. If any of sep, start, or end are in and element it is escaped by preceding it with esc.  If no esc is specified or it is null, no escaping is performed.  sep, start, end, and esc are strings.  flags are ???????

[A good description of the output and examples goes here!]

[I have no idea how the concept of "flags" fits into a NetRexx BIF.  It would seem to be more optional individual parameters are "standard", but that is not pretty: NetRexx is to be readable.]
[I have made esc default to null rather than "\" as a way to "turn off" escaping, which seems difficult otherwise.]

I also have internal comments below.

On 12/9/2012 5:15 PM, Kermit Kiser wrote:


The API actually requires four delimiters - start, end, separator, and escape.  The escape is needed because list elements are strings and therefore may contain the delimiter characters. Although the default delimiters still need to be decided for the API, I recommend that the default escape be the normal "\" NetRexx escape character.

In addition to the delimiter characters, the API requires at least four binary flags to handle variations in list syntax. The minimum four flags are these:

flag 1:
0 = separators follow sublists (data following a sublist but preceding a delimiter is discarded)

Could you give a real life example of when this would be used?  My gut feeling is this is just a miss-formed string.

1 = separators do not follow sublists (sublists act as separator characters - any data following indicates a new element)
flag 2:
0 = adjacent separators produce empty elements
1 = adjacent separators reduce to one (used for word lists which may have multiple spaces separating elements)

I think flag 3 is covered above, along with start and end.

flag 3:
0 = escape sequences are not translated (escape characters remain in element strings)
1 = escape sequences are translated (a second pass of list parse may fail)

With translate and trim BIFs, is flag 4 necessary?

flag 4:
0 = whitespace is translated to blank (TAB,FF,LF,CR)
1 = whitespace is treated as text

In addition to the parse list method itself, the API needs "car" and "cdr" methods at minimum: car returns the first or only element in a list. cdr returns: a single remaining element, or a single remaining list, or a sublist of remaining elements. Also needed is a method to signal if an element is itself a list.

Let me know if you think of anything I missed.

The original "presenting problem" was in terms of Python tuples and lists.  I see this as a super-set / sub-set of parsing CSV strings.  As such I'd like the candidate BIF to accept CSV strings, at least as defined in RFC4180 ( http://tools.ietf.org/html/rfc4180 ).  Which adds the concept of quoted strings to be taken as a single element.  This would seem to require another flag for striping/leaving outer single/double quote marks. (Or simply define them as being striped.)  CSV as generated by Excel spreadsheets would be a plus (or may already be covered.)  Or, this may be too much overload for one BIF.

The above information is good enough for a prototype implementation. The prototype demonstration program can be viewed at the link below although it is largely spaghetti code at this point and lacks safety and sanity checks (The code/API needs some refining and testing before it can be implemented as part of NetRexx itself.):

http://www.kermitkiser.com/NetRexx/attributestringlistparse.html

(I didn't paste the code inline this time as some seem not to like that. ;-)

There's still time to get those API requests in folks!

-- Kermit

Jeff

 


_______________________________________________
Ibm-netrexx mailing list
[hidden email]
Online Archive : http://ibm-netrexx.215625.n3.nabble.com/





--
"One can live magnificently in this world if one knows how to work and how to love."  --  Leo Tolstoy

_______________________________________________
Ibm-netrexx mailing list
[hidden email]
Online Archive : http://ibm-netrexx.215625.n3.nabble.com/

Reply | Threaded
Open this post in threaded view
|

Re: Fwd: Re: Nested List Support?

Jeff Hennick
In reply to this post by Dave Woodman
Dave,

Thanks for your comments.  All ha'pennys and 2-cents are welcomed here.  My penny's worth in this is just as a "customer advocate."  I've had CSV problems in the past, and have also used Python's lists and tuples.  My guiding principles are taken from MFC and his philosophy in the design of Rexx and NetRexx, including (from NRL3):
The Rexx programming language was designed with just one objective: to make programming easier
than it was before. The design achieved this by emphasizing readability and usability, with a
minimum of special notations and restrictions. It was consciously designed to make life easier for its
users, rather than for its implementers.
and
It is difficult to define exactly how to meet user expectations, but it helps to ask the question “Could
there be a high astonishment factor associated with this feature?”. If a feature, accidentally misused,
gives apparently unpredictable results, then it has a high astonishment factor and is therefore
undesirable.
Another important attribute of a reliable software tool is consistency. A consistent language is by
definition predictable and is often elegant. The danger here is to assume that because a rule is
consistent and easily described, it is therefore simple to understand. Unfortunately, some of the most
elegant rules can lead to effects that are completely alien to the intuition and expectations of a user
who, after all, is human. [Consistancy is difficult to maintain in an open source environment.  As new features are added by different people, much discussion should be indulged in to ensure consistancy and not astonishment.  For an example, other languges have some functions with arguments in haystack,needle order and other functions with needle,haystack order: high astonishment factor!  Yes, I'm thinking of you, PHP.]

When the discussion of a possible extension began to get into implementation first, I felt a discussion of the interface/documentation was in order.
 -------

I'll pass on the name issue, but tend to agree with you.

On the "NO" flags: I allow them primarily for "readability documentation," even though they are usually redundant and optional, so they don't trigger a BadArgumentException.  The one place I foresee an overlap in the proposed options is with CSV's use of "" to escape an embedded ", and the more general escape character/string.  Although I don't really see them being combined in a real-world situation -- but I'm half blind.

I'll take this place to emphasize I believe the separator, start, end, and escape should all be specified as strings, not limited to single characters.  In practice, I would expect them to usually be single characters.  By permitting full strings, it makes this function available for some "unanticipated" applications, for example getting the contents of an HTML table by using <td> for start and </td> for end.

Jeff

On 12/17/2012 7:51 AM, Dave Woodman wrote:

Hi Jeff/Kermit

 

Reading this I am struck with a vague disquiet at the name of the BIF:- looking back through the archive I see when it originated, but for some reason I did not receive that (and several other) posts.

 

My low-level disquiet comes from the use of “parse” in the name when the purpose, syntax and behaviour do not resemble the current “parse.” Perhaps “splitlist” may reflect the function’s purpose better?

 

I also noted the “NO” flags – are there any instances where setting a flag implies a set of another flag that may, or may not, need to be overridden? I have no objections, per se, to syntactic sugar where appropriate, but if all flags default to “NO” and do not have any interactions then this seems unnecessary.

 

Just my tuppence ha’penneth

 

                Dave.




_______________________________________________
Ibm-netrexx mailing list
[hidden email]
Online Archive : http://ibm-netrexx.215625.n3.nabble.com/

Reply | Threaded
Open this post in threaded view
|

Re: Fwd: Re: Nested List Support?

Dave Woodman

Thanks Jeff,

 

I have no problem with the “NO” flags themselves, but feel that they could add complexity if care is not taken. For instance, how would instances of both a flag and a no-flag be handled? Flag takes precedence, no takes precedence, first takes, last takes? Elsewhere in NetRexx we have last-takes, so maybe that would be best…

 

As for CSV – who hasn’t had problems? J

 

                Dave.

 

From: [hidden email] [mailto:[hidden email]] On Behalf Of Jeff Hennick
Sent: 17 December 2012 16:06
To: IBM Netrexx
Subject: Re: [Ibm-netrexx] Fwd: Re: Nested List Support?

 

Dave,

Thanks for your comments.  All ha'pennys and 2-cents are welcomed here.  My penny's worth in this is just as a "customer advocate."  I've had CSV problems in the past, and have also used Python's lists and tuples.  My guiding principles are taken from MFC and his philosophy in the design of Rexx and NetRexx, including (from NRL3):

The Rexx programming language was designed with just one objective: to make programming easier
than it was before. The design achieved this by emphasizing readability and usability, with a
minimum of special notations and restrictions. It was consciously designed to make life easier for its
users, rather than for its implementers.

and

It is difficult to define exactly how to meet user expectations, but it helps to ask the question “Could
there be a high astonishment factor associated with this feature?”. If a feature, accidentally misused,
gives apparently unpredictable results, then it has a high astonishment factor and is therefore
undesirable.
Another important attribute of a reliable software tool is consistency. A consistent language is by
definition predictable and is often elegant. The danger here is to assume that because a rule is
consistent and easily described, it is therefore simple to understand. Unfortunately, some of the most
elegant rules can lead to effects that are completely alien to the intuition and expectations of a user
who, after all, is human. [Consistancy is difficult to maintain in an open source environment.  As new features are added by different people, much discussion should be indulged in to ensure consistancy and not astonishment.  For an example, other languges have some functions with arguments in haystack,needle order and other functions with needle,haystack order: high astonishment factor!  Yes, I'm thinking of you, PHP.]


When the discussion of a possible extension began to get into implementation first, I felt a discussion of the interface/documentation was in order.
 -------

I'll pass on the name issue, but tend to agree with you.

On the "NO" flags: I allow them primarily for "readability documentation," even though they are usually redundant and optional, so they don't trigger a BadArgumentException.  The one place I foresee an overlap in the proposed options is with CSV's use of "" to escape an embedded ", and the more general escape character/string.  Although I don't really see them being combined in a real-world situation -- but I'm half blind.

I'll take this place to emphasize I believe the separator, start, end, and escape should all be specified as strings, not limited to single characters.  In practice, I would expect them to usually be single characters.  By permitting full strings, it makes this function available for some "unanticipated" applications, for example getting the contents of an HTML table by using <td> for start and </td> for end.

Jeff

On 12/17/2012 7:51 AM, Dave Woodman wrote:

Hi Jeff/Kermit

 

Reading this I am struck with a vague disquiet at the name of the BIF:- looking back through the archive I see when it originated, but for some reason I did not receive that (and several other) posts.

 

My low-level disquiet comes from the use of “parse” in the name when the purpose, syntax and behaviour do not resemble the current “parse.” Perhaps “splitlist” may reflect the function’s purpose better?

 

I also noted the “NO” flags – are there any instances where setting a flag implies a set of another flag that may, or may not, need to be overridden? I have no objections, per se, to syntactic sugar where appropriate, but if all flags default to “NO” and do not have any interactions then this seems unnecessary.

 

Just my tuppence ha’penneth

 

                Dave.

 

 


_______________________________________________
Ibm-netrexx mailing list
[hidden email]
Online Archive : http://ibm-netrexx.215625.n3.nabble.com/

Reply | Threaded
Open this post in threaded view
|

Re: Fwd: Re: Nested List Support?

ThSITC
In reply to this post by Dave Woodman
Dave, *and all* (when I'm still allowed to throw my 0.000000001 Euro's or $'s here in):

Most of you, when not all, do *for sure* know Chomski's grammars:

As I see it,

a) Expressions of any Language might be parsed by so called *Precedence Grammars*, defining the
various Operators, their GLYPHs (also called Tokens) ... ok ?
b) Programs do have not only expressions, of course, but so called Instructions.
c) A *set of instructions*, more accurate: a *sequence of instructions*, where a *sequence* is defined
as an *ordered set*, might form, what I do call a *block* of instructions.
d) *most critical issue* is now, both in Computer and Natural Languages, the so-called HOMONYM-Problem,
e.g. a GLYPH used with different *semantics* in different *contexts* :-(

There is *lot of litarature* availabable for so called LR-Grammars, and all of their followers..

There is *less litarature* available of other type of grammars.

Of Course, Natural Language does have *no SYNTAX*, at all.

It does have *semantics*, however!

Hope you do see this contribution as a contribution, and *not* as any argument,
that I personally do not like Netrexx, etc...

I do *love* NetRexx, as all of my Software is now in NetRexx! :-)

Thus, you all:

A Merry X-Mas, and a Happy new year 2013 :-)

Thomas Schneider.
===================================================================================

Am 17.12.2012 13:51, schrieb Dave Woodman:

Hi Jeff/Kermit

 

Reading this I am struck with a vague disquiet at the name of the BIF:- looking back through the archive I see when it originated, but for some reason I did not receive that (and several other) posts.

 

My low-level disquiet comes from the use of “parse” in the name when the purpose, syntax and behaviour do not resemble the current “parse.” Perhaps “splitlist” may reflect the function’s purpose better?

 

I also noted the “NO” flags – are there any instances where setting a flag implies a set of another flag that may, or may not, need to be overridden? I have no objections, per se, to syntactic sugar where appropriate, but if all flags default to “NO” and do not have any interactions then this seems unnecessary.

 

Just my tuppence ha’penneth

 

                Dave.

 

From: [hidden email] [[hidden email]] On Behalf Of Jeff Hennick
Sent: 17 December 2012 03:07
To: IBM Netrexx
Subject: [Ibm-netrexx] Fwd: Re: Nested List Support?

 

Kermit, Bill, and all,

After Kermit's input on options, re-reading NRL3, and some further thinking, let's open discussion on this Proposed BIF:

parselist([sep][,start,end[,esc[,options]]])

returns a one-based indexed string of the elements of string broken at, and not including, sep. The default character for sep is comma.  Sub lists, may be considered as a single element beginning with start and ending with end.  If start and end are not specified or are null, no sublisting is performed. If any of sep, start, or end are in an element it is escaped by preceding it with esc.  If no esc is specified or it is null, no escaping is performed.  sep, start, end, esc, and options are strings. 

options
is a string of zero or more space delimited words.  Only the first character of each word is significant and it may be in uppercase or in lowercase.  They are processed left to right, so if there is any conflict the rightmost is used.  (For example, the order of C and E changes the handling of escaped quotation marks.) Any of these words may be prefixed with NO, in any case, to reverse its action, so here the first 3 characters are important.  The default for any option is the NO-value.

The following option characters are recognized:

D

(Discard); data following a sublist but preceding a delimiter is discarded.

NOD

(No Discard); sublists act as separator characters - any data following indicates a new element.



R

(Reduce); adjacent separators reduce to one.

NOR

(No Reduce); adjacent separators produce empty elements.



E

(Escape strings are kept); escape characters remain in element strings.

NOE

(No Escape strings); escape characters are removed from element strings.



W

(White space); whitespace is treated as text.

NOW

(No White space); whitespace is translated to blank (TAB, FF, LF, CR).  Adjacent white space, including multiple blanks, is collapsed to a single blank.



S

(Sublists) The output includes sublists broken into elements

NOS

(No Sublists) The output includes sublists as single elements.



C

(Comma Separated Values); elements may be surrounded by quotation marks, either single (') or double ("), in pairs.  Contained quotation marks (of the same type) are escaped by doubling the mark.  The outer quotation marks are removed, and inner ones unescaped in the output.  This processed as in RFC4180 ( http://tools.ietf.org/html/rfc4180 ).

If sep is null, it defaults to comma (,).  Other values for sep may be used, such as tab (d2c(9)), or semicolon (;).

NOC

(Not Comma Separated Values); quotation marks are treated as text.

[Are there any better mnemonic words for these options?  Should any have the default inverted?  If so, what word would be appropriate so the NO-version can be the new default?]

[I have added the Sublists and NOSublists options as I understand each is valuable in different situations.]

-------- Original Message --------

Subject:

Re: [Ibm-netrexx] Nested List Support?

Date:

Mon, 10 Dec 2012 00:24:17 -0500

From:

Jeff Hennick [hidden email]

To:

IBM Netrexx [hidden email]

 

Kermit and all,

In the spirit of MFC's dictum of getting the definition right before any consideration of implementation, I offer this:

I understand your candidate BIF for Nested List Support would be:

parselist([sep][,start,end[,esc[,flags]]])

returns an indexed string of the elements of string broken at, and not including, sep. The default character for sep is comma.  Sub lists, may be considered as a single element beginning with start and ending with end.  If start and end are not specified or are null, no sublisting is performed. If any of sep, start, or end are in and element it is escaped by preceding it with esc.  If no esc is specified or it is null, no escaping is performed.  sep, start, end, and esc are strings.  flags are ???????

[A good description of the output and examples goes here!]

[I have no idea how the concept of "flags" fits into a NetRexx BIF.  It would seem to be more optional individual parameters are "standard", but that is not pretty: NetRexx is to be readable.]
[I have made esc default to null rather than "\" as a way to "turn off" escaping, which seems difficult otherwise.]

I also have internal comments below.

On 12/9/2012 5:15 PM, Kermit Kiser wrote:


The API actually requires four delimiters - start, end, separator, and escape.  The escape is needed because list elements are strings and therefore may contain the delimiter characters. Although the default delimiters still need to be decided for the API, I recommend that the default escape be the normal "\" NetRexx escape character.

In addition to the delimiter characters, the API requires at least four binary flags to handle variations in list syntax. The minimum four flags are these:

flag 1:
0 = separators follow sublists (data following a sublist but preceding a delimiter is discarded)

Could you give a real life example of when this would be used?  My gut feeling is this is just a miss-formed string.

1 = separators do not follow sublists (sublists act as separator characters - any data following indicates a new element)
flag 2:
0 = adjacent separators produce empty elements
1 = adjacent separators reduce to one (used for word lists which may have multiple spaces separating elements)

I think flag 3 is covered above, along with start and end.

flag 3:
0 = escape sequences are not translated (escape characters remain in element strings)
1 = escape sequences are translated (a second pass of list parse may fail)

With translate and trim BIFs, is flag 4 necessary?

flag 4:
0 = whitespace is translated to blank (TAB,FF,LF,CR)
1 = whitespace is treated as text

In addition to the parse list method itself, the API needs "car" and "cdr" methods at minimum: car returns the first or only element in a list. cdr returns: a single remaining element, or a single remaining list, or a sublist of remaining elements. Also needed is a method to signal if an element is itself a list.

Let me know if you think of anything I missed.

The original "presenting problem" was in terms of Python tuples and lists.  I see this as a super-set / sub-set of parsing CSV strings.  As such I'd like the candidate BIF to accept CSV strings, at least as defined in RFC4180 ( http://tools.ietf.org/html/rfc4180 ).  Which adds the concept of quoted strings to be taken as a single element.  This would seem to require another flag for striping/leaving outer single/double quote marks. (Or simply define them as being striped.)  CSV as generated by Excel spreadsheets would be a plus (or may already be covered.)  Or, this may be too much overload for one BIF.

The above information is good enough for a prototype implementation. The prototype demonstration program can be viewed at the link below although it is largely spaghetti code at this point and lacks safety and sanity checks (The code/API needs some refining and testing before it can be implemented as part of NetRexx itself.):

http://www.kermitkiser.com/NetRexx/attributestringlistparse.html

(I didn't paste the code inline this time as some seem not to like that. ;-)

There's still time to get those API requests in folks!

-- Kermit

Jeff

 



_______________________________________________
Ibm-netrexx mailing list
[hidden email]
Online Archive : http://ibm-netrexx.215625.n3.nabble.com/



--
Thomas Schneider, IT Consulting; http://www.thsitc.com; Vienna, Austria, Europe

_______________________________________________
Ibm-netrexx mailing list
[hidden email]
Online Archive : http://ibm-netrexx.215625.n3.nabble.com/

Thomas Schneider, Vienna, Austria (Europe) :-)

www.thsitc.com
www.db-123.com
123