Nested List Support?

classic Classic list List threaded Threaded
50 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: Re: Nested List Support?

billfen
This seems overly complicated to me, and I think some additional design work is needed. 

The essence of the problem is the ability to find matching end delimiters, and for that the list delimiter is not needed. 

Perhaps a function which identifies the length of the matching delimiter set could be used, even in a parse statement as a variable reference.

Of course, the problems involved with quoted strings containing start or end delimiters is not easily solved, and I suspect the best approach would simply be to ban them.

The parse statement has the advantage that it provides for multiple assignments, as in the parsing of a simple (CSV) string.


On 12/17/2012 11:06 AM, Jeff Hennick wrote:
Dave,

Thanks for your comments.  All ha'pennys and 2-cents are welcomed here.  My penny's worth in this is just as a "customer advocate."  I've had CSV problems in the past, and have also used Python's lists and tuples.  My guiding principles are taken from MFC and his philosophy in the design of Rexx and NetRexx, including (from NRL3):
The Rexx programming language was designed with just one objective: to make programming easier
than it was before. The design achieved this by emphasizing readability and usability, with a
minimum of special notations and restrictions. It was consciously designed to make life easier for its
users, rather than for its implementers.
and
It is difficult to define exactly how to meet user expectations, but it helps to ask the question “Could
there be a high astonishment factor associated with this feature?”. If a feature, accidentally misused,
gives apparently unpredictable results, then it has a high astonishment factor and is therefore
undesirable.
Another important attribute of a reliable software tool is consistency. A consistent language is by
definition predictable and is often elegant. The danger here is to assume that because a rule is
consistent and easily described, it is therefore simple to understand. Unfortunately, some of the most
elegant rules can lead to effects that are completely alien to the intuition and expectations of a user
who, after all, is human. [Consistancy is difficult to maintain in an open source environment.  As new features are added by different people, much discussion should be indulged in to ensure consistancy and not astonishment.  For an example, other languges have some functions with arguments in haystack,needle order and other functions with needle,haystack order: high astonishment factor!  Yes, I'm thinking of you, PHP.]

When the discussion of a possible extension began to get into implementation first, I felt a discussion of the interface/documentation was in order.
 -------

I'll pass on the name issue, but tend to agree with you.

On the "NO" flags: I allow them primarily for "readability documentation," even though they are usually redundant and optional, so they don't trigger a BadArgumentException.  The one place I foresee an overlap in the proposed options is with CSV's use of "" to escape an embedded ", and the more general escape character/string.  Although I don't really see them being combined in a real-world situation -- but I'm half blind.

I'll take this place to emphasize I believe the separator, start, end, and escape should all be specified as strings, not limited to single characters.  In practice, I would expect them to usually be single characters.  By permitting full strings, it makes this function available for some "unanticipated" applications, for example getting the contents of an HTML table by using <td> for start and </td> for end.

Jeff

On 12/17/2012 7:51 AM, Dave Woodman wrote:

Hi Jeff/Kermit

 

Reading this I am struck with a vague disquiet at the name of the BIF:- looking back through the archive I see when it originated, but for some reason I did not receive that (and several other) posts.

 

My low-level disquiet comes from the use of “parse” in the name when the purpose, syntax and behaviour do not resemble the current “parse.” Perhaps “splitlist” may reflect the function’s purpose better?

 

I also noted the “NO” flags – are there any instances where setting a flag implies a set of another flag that may, or may not, need to be overridden? I have no objections, per se, to syntactic sugar where appropriate, but if all flags default to “NO” and do not have any interactions then this seems unnecessary.

 

Just my tuppence ha’penneth

 

                Dave.





_______________________________________________
Ibm-netrexx mailing list
[hidden email]
Online Archive : http://ibm-netrexx.215625.n3.nabble.com/



No virus found in this message.
Checked by AVG - www.avg.com
Version: 2013.0.2805 / Virus Database: 2634/5952 - Release Date: 12/11/12



_______________________________________________
Ibm-netrexx mailing list
[hidden email]
Online Archive : http://ibm-netrexx.215625.n3.nabble.com/

Reply | Threaded
Open this post in threaded view
|

Re: Fwd: Re: Nested List Support?

Jeff Hennick
In reply to this post by Dave Woodman
On 12/17/2012 11:23 AM, Dave Woodman wrote:

Thanks Jeff,

 

I have no problem with the “NO” flags themselves, but feel that they could add complexity if care is not taken. For instance, how would instances of both a flag and a no-flag be handled? Flag takes precedence, no takes precedence, first takes, last takes? Elsewhere in NetRexx we have last-takes, so maybe that would be best…

Last is my choice.  As you say it is elsewhere, so no astonishment.

 

As for CSV – who hasn’t had problems? J

 

                Dave.

 




_______________________________________________
Ibm-netrexx mailing list
[hidden email]
Online Archive : http://ibm-netrexx.215625.n3.nabble.com/

Reply | Threaded
Open this post in threaded view
|

Re: Fwd: Re: Nested List Support?

Jeff Hennick
In reply to this post by billfen
On 12/17/2012 11:34 AM, Bill Fenlason wrote:
This seems overly complicated to me, and I think some additional design work is needed. 
Agree and what I am trying to prompt.  Possibly we have two separate problems & solutions.  Maybe my introducing CSV was in error and over complicates the sub-list problem.

The essence of the problem is the ability to find matching end delimiters, and for that the list delimiter is not needed. 
I'm sorry, I don't follow your reasoning here.  I understood that recognizing sub-lists, from Python, was important to you.

Perhaps a function which identifies the length of the matching delimiter set could be used, even in a parse statement as a variable reference.
Are you introducing multiple pairs of sub-list delimiters? And how would knowing the size of this set (when other than 1) be usefull?  And a function within PARSE would be a new concept.  Or -- as likely -- am I completely misreading your comment here?

Of course, the problems involved with quoted strings containing start or end delimiters is not easily solved, and I suspect the best approach would simply be to ban them.
In general, I don't have control over the input contents.  Are you talking of banning quoted strings, or delimiters within them?  It may come back to CSV and Python lists need separate functions, no matter how much they look alike at first consideration.  For the simplest cases with no quotes and no sub-lists, the results would be the same.

The parse statement has the advantage that it provides for multiple assignments, as in the parsing of a simple (CSV) string.
PARSE is wonderfully powerful.  But for CSV it works only in the very simplest of situations, with no quotes and no embedded separators.  And if it is not to be in a loop (which was what you originally were trying to hide) if the number of fields is known.  PARSE also works great for the simple case Python list problem when the format of the list is known ahead of time.
-------
In my work I could see using both functions.  Either as new BIFs (eventually) or from a NetRexx library of "best practice" functions.

Jeff

_______________________________________________
Ibm-netrexx mailing list
[hidden email]
Online Archive : http://ibm-netrexx.215625.n3.nabble.com/

Reply | Threaded
Open this post in threaded view
|

Re: Fwd: Re: Nested List Support?

billfen
Jeff,

On 12/17/2012 12:39 PM, Jeff Hennick wrote:
On 12/17/2012 11:34 AM, Bill Fenlason wrote:
This seems overly complicated to me, and I think some additional design work is needed. 
Agree and what I am trying to prompt.  Possibly we have two separate problems & solutions.  Maybe my introducing CSV was in error and over complicates the sub-list problem.

> Yes, I think there are multiple issues to deal with, and I think the Comma Separated Value list comparison is useful. 

> Consider a situation in which a CSV list contains additional but distinct CSV lists.  An example might be a CSV list of family names, each of which is optionally followed by a nested CSV list of children names.  And in some cases a child name is followed by a CSV list of pet names, etc.

The essence of the problem is the ability to find matching end delimiters, and for that the list delimiter is not needed. 
I'm sorry, I don't follow your reasoning here.  I understood that recognizing sub-lists, from Python, was important to you.

> The idea is that nested sub-lists have to be identified some how, and it would seem that the simplest way would be to surround them with begin and end delimiters. 

> Remember, the original question was how to provide a list of elements within a string such that an element can either be a string or a nested list of elements, while retaining the list structure.  I asked what the best design approach would be. 

Perhaps a function which identifies the length of the matching delimiter set could be used, even in a parse statement as a variable reference.
Are you introducing multiple pairs of sub-list delimiters? And how would knowing the size of this set (when other than 1) be usefull?  And a function within PARSE would be a new concept.  Or -- as likely -- am I completely misreading your comment here?

> Perhaps so.  What I'm suggesting is that the length (in characters) of the substring which begins and ends with the matching delimiter pair could be used to decompose the input string, either with the substr method or by using that value in a variable pattern in a parse statement (see page 140 in the latest NLR). 

> For example, assume that the '{' and '}' delimiter characters do not occur in any element, and that a substring in a list consists of "{a,b,{c,{d,e}}},x,y".  The length of the first element in that list (i.e. the outermost nested list) is 15, which can be used to strip the nested list from the primary string. 

> If the length is zero, the first element is an actual element and not a nested list.  The element separator (i.e. ',') is not needed to determine the length. 

> The processing loop initially provides three elements: a complex list, x and y. 
> If necessary, the complex list is decomposed into 3 elements: a, b, and another nested complex list. 
> The processing continues in a similar fashion as necessary. 
> Alternatively, a complex list may be decomposed and processed when it is recognized, but that is not a requirement.

Of course, the problems involved with quoted strings containing start or end delimiters is not easily solved, and I suspect the best approach would simply be to ban them.
In general, I don't have control over the input contents.  Are you talking of banning quoted strings, or delimiters within them?  It may come back to CSV and Python lists need separate functions, no matter how much they look alike at first consideration.  For the simplest cases with no quotes and no sub-lists, the results would be the same.

> If nested string lists are identified by being surrounded by beginning and ending delimiters, it is essential that no element contain either of those delimiters as data.  If necessary, non-printable characters could be used, or as Kermit suggested some kind of C or Java like escape sequence convention.

The parse statement has the advantage that it provides for multiple assignments, as in the parsing of a simple (CSV) string.
PARSE is wonderfully powerful.  But for CSV it works only in the very simplest of situations, with no quotes and no embedded separators.  And if it is not to be in a loop (which was what you originally were trying to hide) if the number of fields is known.  PARSE also works great for the simple case Python list problem when the format of the list is known ahead of time.

> I was not trying to hide any loop - I'm not sure what you mean or what I might have said to give you that idea?

> Another approach is to provide list encode (and decode) functions which insert (delete) the begin and end delimiters and converts (reverts) the separator character to another non-used character.  That way the complex list could be parsed like a CSV.  (e.g.  "parse  cdr  car  ','  cdr  ;" )  But as I say, that is a somewhat different approach with its own strengths and weaknesses, but I think it has promise.

-------
In my work I could see using both functions.  Either as new BIFs (eventually) or from a NetRexx library of "best practice" functions.

Jeff

> Bill

_______________________________________________
Ibm-netrexx mailing list
[hidden email]
Online Archive : http://ibm-netrexx.215625.n3.nabble.com/

Reply | Threaded
Open this post in threaded view
|

Re: Fwd: Re: Nested List Support?

Aviatrexx
In reply to this post by Kermit Kiser
I doubt that it is lack of interest that has suppressed feedback,
Kermit.  More likely it was the lack of time to analyze the problem to
the degree that you, Jeff, and Mike have.  It will take some of us a
little bit to catch up to you. :-)

One way that Mike made programming easier was by the elegant
decomposition of programming problems into atomic tools with simple
interfaces.  For example, the PARSE instruction does not attempt to
provide a mechanism to extract all of a variable number of blank
delimited words in a string in one fell swoop; we loop on the idiom:

   PARSE wordlist word wordlist

I would rather have the flexibility of a simple language tool than one
that tries to do everything for me.  If I want to convert it to a
array, I'll do it myself.  I don't want a tool with multiple options
for zero- or one-based indices, upper- or lower-casing, etc.

To me, decomposing nested lists seem to be a parsing problem and I
would not be unhappy to see it implemented as a Parse enhancement.

The current Parse operation is resolutely left-to-right so it is
poorly suited for extracting strings bounded by balanced delimiters.

I would be happy with the enhancement (in Classic format):

   PARSE LIST list dlim1 sublist dlim2 list

that would return the sublist delimited by a balanced set of 'dlim1'
and 'dlim2' strings.  Further extraction of lists or values from
'sublist' and 'list' (possibly with different delimiters) would be at
the programmer's discretion.

Needless to say, that syntax won't work for NetRexx, but perhaps a
signifier on the delimiter string would suffice:

   PARSE list [<UL] sublist [</UL] list

or possibly (if less preferably):

   PARSE list ('<UL') sublist ('</UL') list

would suffice.

To me, this approach (if feasible) is more Rexx-ish.

-Chip-

On 12/17/2012 05:20 Kermit Kiser said:
> Just to let you know that I am also working on this issue. I am still
> planning to make something available for public testing but it may
> take a while as this does not seem to be a trivial problem to resolve.
> Besides you and I, no one else seems interested in providing (or
> perhaps they lack time to provide) feedback or suggestions on a
> feature set or API or implementation approach for this type of support.

_______________________________________________
Ibm-netrexx mailing list
[hidden email]
Online Archive : http://ibm-netrexx.215625.n3.nabble.com/

Reply | Threaded
Open this post in threaded view
|

Re: Fwd: Re: Nested List Support?

rvjansen
Chip,

true; I myself have been stricken down by the most violent flu from hell (or at least purgatory); afraid to call it dengue because then it will give me even more pain in the joints than I already have. This, in short, always happens when you try to do something useful.

I am, at the moment, not sure how one can solve this problem in a generic way. Also, the approach of isolating the list and then have another parse at it has always worked for me. For all XML related parsing, I can call xalan or saxon. For other parsing tasks there is Antlr or  javacc.  Parse is OK for al line oriented parsing and more, depending how much energy one wants to put into it.


As for the proposal, I think some examples would go a long way in showing its merit. It can start life in a library; we also need some voting mechanism for including new constructs in the language proper.

best regards,

René.


O n 18 dec. 2012, at 21:25, Chip Davis <[hidden email]> wrote:

> I doubt that it is lack of interest that has suppressed feedback, Kermit.  More likely it was the lack of time to analyze the problem to the degree that you, Jeff, and Mike have.  It will take some of us a little bit to catch up to you. :-)
>
> One way that Mike made programming easier was by the elegant decomposition of programming problems into atomic tools with simple interfaces.  For example, the PARSE instruction does not attempt to provide a mechanism to extract all of a variable number of blank delimited words in a string in one fell swoop; we loop on the idiom:
>
>  PARSE wordlist word wordlist
>
> I would rather have the flexibility of a simple language tool than one that tries to do everything for me.  If I want to convert it to a array, I'll do it myself.  I don't want a tool with multiple options for zero- or one-based indices, upper- or lower-casing, etc.
>
> To me, decomposing nested lists seem to be a parsing problem and I would not be unhappy to see it implemented as a Parse enhancement.
>
> The current Parse operation is resolutely left-to-right so it is poorly suited for extracting strings bounded by balanced delimiters.
>
> I would be happy with the enhancement (in Classic format):
>
>  PARSE LIST list dlim1 sublist dlim2 list
>
> that would return the sublist delimited by a balanced set of 'dlim1' and 'dlim2' strings.  Further extraction of lists or values from 'sublist' and 'list' (possibly with different delimiters) would be at the programmer's discretion.
>
> Needless to say, that syntax won't work for NetRexx, but perhaps a signifier on the delimiter string would suffice:
>
>  PARSE list [<UL] sublist [</UL] list
>
> or possibly (if less preferably):
>
>  PARSE list ('<UL') sublist ('</UL') list
>
> would suffice.
>
> To me, this approach (if feasible) is more Rexx-ish.
>
> -Chip-
>
> On 12/17/2012 05:20 Kermit Kiser said:
>> Just to let you know that I am also working on this issue. I am still
>> planning to make something available for public testing but it may
>> take a while as this does not seem to be a trivial problem to resolve.
>> Besides you and I, no one else seems interested in providing (or
>> perhaps they lack time to provide) feedback or suggestions on a
>> feature set or API or implementation approach for this type of support.
>
> _______________________________________________
> Ibm-netrexx mailing list
> [hidden email]
> Online Archive : http://ibm-netrexx.215625.n3.nabble.com/
>

_______________________________________________
Ibm-netrexx mailing list
[hidden email]
Online Archive : http://ibm-netrexx.215625.n3.nabble.com/

Reply | Threaded
Open this post in threaded view
|

Re: Fwd: Re: Nested List Support?

ThSITC
In reply to this post by Aviatrexx
Hi CHIP, and all:

What about using a PARSING *Template*, similar to the notation the
Backus-Naur Form does have ?

When anybody *is* interested, I can publish a *specification* of that
quite instantly, as that has been
the way I am following ...

I shall, however, only do so when not the majority of the group does say:

Whoops, shutup, you dreamer!

What I did, for instance, has been to implement an EXEC CICS Parser for
parsing imbedded CICS
commands in COBOL and/or PL/I programs.

When interested, then I shall publish this technology.
Thomas.

PS: It does need, of course, some *conventions* how the results of a
*complicated PARSE* are stored,
internally ;-)

======================================================================

Am 18.12.2012 21:25, schrieb Chip Davis:

> I doubt that it is lack of interest that has suppressed feedback,
> Kermit.  More likely it was the lack of time to analyze the problem to
> the degree that you, Jeff, and Mike have.  It will take some of us a
> little bit to catch up to you. :-)
>
> One way that Mike made programming easier was by the elegant
> decomposition of programming problems into atomic tools with simple
> interfaces.  For example, the PARSE instruction does not attempt to
> provide a mechanism to extract all of a variable number of blank
> delimited words in a string in one fell swoop; we loop on the idiom:
>
>   PARSE wordlist word wordlist
>
> I would rather have the flexibility of a simple language tool than one
> that tries to do everything for me.  If I want to convert it to a
> array, I'll do it myself.  I don't want a tool with multiple options
> for zero- or one-based indices, upper- or lower-casing, etc.
>
> To me, decomposing nested lists seem to be a parsing problem and I
> would not be unhappy to see it implemented as a Parse enhancement.
>
> The current Parse operation is resolutely left-to-right so it is
> poorly suited for extracting strings bounded by balanced delimiters.
>
> I would be happy with the enhancement (in Classic format):
>
>   PARSE LIST list dlim1 sublist dlim2 list
>
> that would return the sublist delimited by a balanced set of 'dlim1'
> and 'dlim2' strings.  Further extraction of lists or values from
> 'sublist' and 'list' (possibly with different delimiters) would be at
> the programmer's discretion.
>
> Needless to say, that syntax won't work for NetRexx, but perhaps a
> signifier on the delimiter string would suffice:
>
>   PARSE list [<UL] sublist [</UL] list
>
> or possibly (if less preferably):
>
>   PARSE list ('<UL') sublist ('</UL') list
>
> would suffice.
>
> To me, this approach (if feasible) is more Rexx-ish.
>
> -Chip-
>
> On 12/17/2012 05:20 Kermit Kiser said:
>> Just to let you know that I am also working on this issue. I am still
>> planning to make something available for public testing but it may
>> take a while as this does not seem to be a trivial problem to resolve.
>> Besides you and I, no one else seems interested in providing (or
>> perhaps they lack time to provide) feedback or suggestions on a
>> feature set or API or implementation approach for this type of support.
>
> _______________________________________________
> Ibm-netrexx mailing list
> [hidden email]
> Online Archive : http://ibm-netrexx.215625.n3.nabble.com/
>
>


--
Thomas Schneider, IT Consulting; http://www.thsitc.com; Vienna, Austria,
Europe

_______________________________________________
Ibm-netrexx mailing list
[hidden email]
Online Archive : http://ibm-netrexx.215625.n3.nabble.com/

Thomas Schneider, Vienna, Austria (Europe) :-)

www.thsitc.com
www.db-123.com
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: Re: Nested List Support?

billfen
In reply to this post by Aviatrexx
Chip,

I largely agree with your comments, particularly the one about options.

In my reply to Jeff (I assume you may have missed it, it is included below), at the end I mentioned that by using specific encode and decode methods the list processing can be done using the PARSE statement.

For example, assume the element separator character is the comma (i.e. a CSV list) and the nested list beginning and ending delimiters are '{' and '}'.  Also assume that no element actually contains any of those three characters as well as an additional character, say '~', used for comma substitution.

A list of elements which may include either string elements or nested lists could be processed something like this:

baseList = aBaseList || " " 
-- i.e. a CSV list containing either elements and/or nested lists
-- The forced blank at the end converts any ending comma to a null element

LOOP UNTIL baseList == ""

    PARSE baseList elementOrList "," baseList

    IF isNestedList(elementOrList) THEN DO
        listToProcess = decodeList(elementOrList)
        --  Process the list possibly with further PARSE statements
        END

    ELSE  DO
        elementToProcess = elementOrList.strip()
        -- Process the element
    END
END

The isNestedList method checks the parsed and stripped string for leading and trailing "{" and "}".

The decodeList method strips the leading and trailing "{" and "}", and then converts any "~" character not contained within a "{" and "}" pair back to a ","

A downside of this is that an encodeList method likely would be use to encode a list containing elements or nested lists.  It adds the leading and trailing "{" and "}" characters, and converts all commas to the comma substitute character "~".

So for my example (below) of "{a,b,{c,{d,e}}},x,y" , something like the following might be used:

nested1 = encodeList('d'  ','  'e')
nested2 = encodeList('c'  ','  nested1)
nested3 = encodeList('a'  ','  'b'  ','  nested2
aBaseList = nested3  ','  'x'  ','  y

Another downside is that looking a the raw string containing nested lists might be confusing because of the substitute comma characters.  That could be avoided by a translate.  In the above example,
aBaseList would be "{a~b~{c~{d~e}}},x,y" .

I agree with Mike that making the PARSE statement any more complex would probably not be a good idea.

Bill


On 12/18/2012 3:25 PM, Chip Davis wrote:
I doubt that it is lack of interest that has suppressed feedback, Kermit.  More likely it was the lack of time to analyze the problem to the degree that you, Jeff, and Mike have.  It will take some of us a little bit to catch up to you. :-)

One way that Mike made programming easier was by the elegant decomposition of programming problems into atomic tools with simple interfaces.  For example, the PARSE instruction does not attempt to provide a mechanism to extract all of a variable number of blank delimited words in a string in one fell swoop; we loop on the idiom:

  PARSE wordlist word wordlist

I would rather have the flexibility of a simple language tool than one that tries to do everything for me.  If I want to convert it to a array, I'll do it myself.  I don't want a tool with multiple options for zero- or one-based indices, upper- or lower-casing, etc.

To me, decomposing nested lists seem to be a parsing problem and I would not be unhappy to see it implemented as a Parse enhancement.

The current Parse operation is resolutely left-to-right so it is poorly suited for extracting strings bounded by balanced delimiters.

I would be happy with the enhancement (in Classic format):

  PARSE LIST list dlim1 sublist dlim2 list

that would return the sublist delimited by a balanced set of 'dlim1' and 'dlim2' strings.  Further extraction of lists or values from 'sublist' and 'list' (possibly with different delimiters) would be at the programmer's discretion.

Needless to say, that syntax won't work for NetRexx, but perhaps a signifier on the delimiter string would suffice:

  PARSE list [<UL] sublist [</UL] list

or possibly (if less preferably):

  PARSE list ('<UL') sublist ('</UL') list

would suffice.

To me, this approach (if feasible) is more Rexx-ish.

-Chip-

On 12/17/2012 05:20 Kermit Kiser said:
Just to let you know that I am also working on this issue. I am still
planning to make something available for public testing but it may
take a while as this does not seem to be a trivial problem to resolve.
Besides you and I, no one else seems interested in providing (or
perhaps they lack time to provide) feedback or suggestions on a
feature set or API or implementation approach for this type of support.

_______________________________________________
Ibm-netrexx mailing list
[hidden email]
Online Archive : http://ibm-netrexx.215625.n3.nabble.com/
---------------------------------------------------------------------------------------------------------

On 12/17/2012 4:33 PM, Bill Fenlason wrote:
Jeff,

On 12/17/2012 12:39 PM, Jeff Hennick wrote:
On 12/17/2012 11:34 AM, Bill Fenlason wrote:
This seems overly complicated to me, and I think some additional design work is needed. 
Agree and what I am trying to prompt.  Possibly we have two separate problems & solutions.  Maybe my introducing CSV was in error and over complicates the sub-list problem.

> Yes, I think there are multiple issues to deal with, and I think the Comma Separated Value list comparison is useful. 

> Consider a situation in which a CSV list contains additional but distinct CSV lists.  An example might be a CSV list of family names, each of which is optionally followed by a nested CSV list of children names.  And in some cases a child name is followed by a CSV list of pet names, etc.

The essence of the problem is the ability to find matching end delimiters, and for that the list delimiter is not needed. 
I'm sorry, I don't follow your reasoning here.  I understood that recognizing sub-lists, from Python, was important to you.

> The idea is that nested sub-lists have to be identified some how, and it would seem that the simplest way would be to surround them with begin and end delimiters. 

> Remember, the original question was how to provide a list of elements within a string such that an element can either be a string or a nested list of elements, while retaining the list structure.  I asked what the best design approach would be. 

Perhaps a function which identifies the length of the matching delimiter set could be used, even in a parse statement as a variable reference.
Are you introducing multiple pairs of sub-list delimiters? And how would knowing the size of this set (when other than 1) be usefull?  And a function within PARSE would be a new concept.  Or -- as likely -- am I completely misreading your comment here?

> Perhaps so.  What I'm suggesting is that the length (in characters) of the substring which begins and ends with the matching delimiter pair could be used to decompose the input string, either with the substr method or by using that value in a variable pattern in a parse statement (see page 140 in the latest NLR). 

> For example, assume that the '{' and '}' delimiter characters do not occur in any element, and that a substring in a list consists of "{a,b,{c,{d,e}}},x,y".  The length of the first element in that list (i.e. the outermost nested list) is 15, which can be used to strip the nested list from the primary string. 

> If the length is zero, the first element is an actual element and not a nested list.  The element separator (i.e. ',') is not needed to determine the length. 

> The processing loop initially provides three elements: a complex list, x and y. 
> If necessary, the complex list is decomposed into 3 elements: a, b, and another nested complex list. 
> The processing continues in a similar fashion as necessary. 
> Alternatively, a complex list may be decomposed and processed when it is recognized, but that is not a requirement.

Of course, the problems involved with quoted strings containing start or end delimiters is not easily solved, and I suspect the best approach would simply be to ban them.
In general, I don't have control over the input contents.  Are you talking of banning quoted strings, or delimiters within them?  It may come back to CSV and Python lists need separate functions, no matter how much they look alike at first consideration.  For the simplest cases with no quotes and no sub-lists, the results would be the same.

> If nested string lists are identified by being surrounded by beginning and ending delimiters, it is essential that no element contain either of those delimiters as data.  If necessary, non-printable characters could be used, or as Kermit suggested some kind of C or Java like escape sequence convention.

The parse statement has the advantage that it provides for multiple assignments, as in the parsing of a simple (CSV) string.
PARSE is wonderfully powerful.  But for CSV it works only in the very simplest of situations, with no quotes and no embedded separators.  And if it is not to be in a loop (which was what you originally were trying to hide) if the number of fields is known.  PARSE also works great for the simple case Python list problem when the format of the list is known ahead of time.

> I was not trying to hide any loop - I'm not sure what you mean or what I might have said to give you that idea?

> Another approach is to provide list encode (and decode) functions which insert (delete) the begin and end delimiters and converts (reverts) the separator character to another non-used character.  That way the complex list could be parsed like a CSV.  (e.g.  "parse  cdr  car  ','  cdr  ;" )  But as I say, that is a somewhat different approach with its own strengths and weaknesses, but I think it has promise.

-------
In my work I could see using both functions.  Either as new BIFs (eventually) or from a NetRexx library of "best practice" functions.

Jeff

> Bill

_______________________________________________
Ibm-netrexx mailing list
[hidden email]
Online Archive : http://ibm-netrexx.215625.n3.nabble.com/

Reply | Threaded
Open this post in threaded view
|

Re: Fwd: Re: Nested List Support?

George Hovey-2
The ideas expressed in this thread contain a number of novelties and complexities, which always admits the possibility of unanticipated behavior.  Would it make sense to have a 'development class library' where new ideas could be floated?  After being shaken down some of these might make their way into the language.  For a given area, like lists of lists, there might be multiple attempts at a solution.  Once in the library, a class would have to remain forever, though it might be labeled 'deprecated'.

On Tue, Dec 18, 2012 at 7:12 PM, Bill Fenlason <[hidden email]> wrote:
Chip,

I largely agree with your comments, particularly the one about options.

In my reply to Jeff (I assume you may have missed it, it is included below), at the end I mentioned that by using specific encode and decode methods the list processing can be done using the PARSE statement.

For example, assume the element separator character is the comma (i.e. a CSV list) and the nested list beginning and ending delimiters are '{' and '}'.  Also assume that no element actually contains any of those three characters as well as an additional character, say '~', used for comma substitution.

A list of elements which may include either string elements or nested lists could be processed something like this:

baseList = aBaseList || " " 
-- i.e. a CSV list containing either elements and/or nested lists
-- The forced blank at the end converts any ending comma to a null element

LOOP UNTIL baseList == ""

    PARSE baseList elementOrList "," baseList

    IF isNestedList(elementOrList) THEN DO
        listToProcess = decodeList(elementOrList)
        --  Process the list possibly with further PARSE statements
        END

    ELSE  DO
        elementToProcess = elementOrList.strip()
        -- Process the element
    END
END

The isNestedList method checks the parsed and stripped string for leading and trailing "{" and "}".

The decodeList method strips the leading and trailing "{" and "}", and then converts any "~" character not contained within a "{" and "}" pair back to a ","

A downside of this is that an encodeList method likely would be use to encode a list containing elements or nested lists.  It adds the leading and trailing "{" and "}" characters, and converts all commas to the comma substitute character "~".

So for my example (below) of "{a,b,{c,{d,e}}},x,y" , something like the following might be used:

nested1 = encodeList('d'  ','  'e')
nested2 = encodeList('c'  ','  nested1)
nested3 = encodeList('a'  ','  'b'  ','  nested2
aBaseList = nested3  ','  'x'  ','  y

Another downside is that looking a the raw string containing nested lists might be confusing because of the substitute comma characters.  That could be avoided by a translate.  In the above example,
aBaseList would be "{a~b~{c~{d~e}}},x,y" .

I agree with Mike that making the PARSE statement any more complex would probably not be a good idea.

Bill


On 12/18/2012 3:25 PM, Chip Davis wrote:
I doubt that it is lack of interest that has suppressed feedback, Kermit.  More likely it was the lack of time to analyze the problem to the degree that you, Jeff, and Mike have.  It will take some of us a little bit to catch up to you. :-)

One way that Mike made programming easier was by the elegant decomposition of programming problems into atomic tools with simple interfaces.  For example, the PARSE instruction does not attempt to provide a mechanism to extract all of a variable number of blank delimited words in a string in one fell swoop; we loop on the idiom:

  PARSE wordlist word wordlist

I would rather have the flexibility of a simple language tool than one that tries to do everything for me.  If I want to convert it to a array, I'll do it myself.  I don't want a tool with multiple options for zero- or one-based indices, upper- or lower-casing, etc.

To me, decomposing nested lists seem to be a parsing problem and I would not be unhappy to see it implemented as a Parse enhancement.

The current Parse operation is resolutely left-to-right so it is poorly suited for extracting strings bounded by balanced delimiters.

I would be happy with the enhancement (in Classic format):

  PARSE LIST list dlim1 sublist dlim2 list

that would return the sublist delimited by a balanced set of 'dlim1' and 'dlim2' strings.  Further extraction of lists or values from 'sublist' and 'list' (possibly with different delimiters) would be at the programmer's discretion.

Needless to say, that syntax won't work for NetRexx, but perhaps a signifier on the delimiter string would suffice:

  PARSE list [<UL] sublist [</UL] list

or possibly (if less preferably):

  PARSE list ('<UL') sublist ('</UL') list

would suffice.

To me, this approach (if feasible) is more Rexx-ish.

-Chip-

On 12/17/2012 05:20 Kermit Kiser said:
Just to let you know that I am also working on this issue. I am still
planning to make something available for public testing but it may
take a while as this does not seem to be a trivial problem to resolve.
Besides you and I, no one else seems interested in providing (or
perhaps they lack time to provide) feedback or suggestions on a
feature set or API or implementation approach for this type of support.

_______________________________________________
Ibm-netrexx mailing list
[hidden email]
Online Archive : http://ibm-netrexx.215625.n3.nabble.com/
---------------------------------------------------------------------------------------------------------


On 12/17/2012 4:33 PM, Bill Fenlason wrote:
Jeff,

On 12/17/2012 12:39 PM, Jeff Hennick wrote:
On 12/17/2012 11:34 AM, Bill Fenlason wrote:
This seems overly complicated to me, and I think some additional design work is needed. 
Agree and what I am trying to prompt.  Possibly we have two separate problems & solutions.  Maybe my introducing CSV was in error and over complicates the sub-list problem.

> Yes, I think there are multiple issues to deal with, and I think the Comma Separated Value list comparison is useful. 

> Consider a situation in which a CSV list contains additional but distinct CSV lists.  An example might be a CSV list of family names, each of which is optionally followed by a nested CSV list of children names.  And in some cases a child name is followed by a CSV list of pet names, etc.

The essence of the problem is the ability to find matching end delimiters, and for that the list delimiter is not needed. 
I'm sorry, I don't follow your reasoning here.  I understood that recognizing sub-lists, from Python, was important to you.

> The idea is that nested sub-lists have to be identified some how, and it would seem that the simplest way would be to surround them with begin and end delimiters. 

> Remember, the original question was how to provide a list of elements within a string such that an element can either be a string or a nested list of elements, while retaining the list structure.  I asked what the best design approach would be. 

Perhaps a function which identifies the length of the matching delimiter set could be used, even in a parse statement as a variable reference.
Are you introducing multiple pairs of sub-list delimiters? And how would knowing the size of this set (when other than 1) be usefull?  And a function within PARSE would be a new concept.  Or -- as likely -- am I completely misreading your comment here?

> Perhaps so.  What I'm suggesting is that the length (in characters) of the substring which begins and ends with the matching delimiter pair could be used to decompose the input string, either with the substr method or by using that value in a variable pattern in a parse statement (see page 140 in the latest NLR). 

> For example, assume that the '{' and '}' delimiter characters do not occur in any element, and that a substring in a list consists of "{a,b,{c,{d,e}}},x,y".  The length of the first element in that list (i.e. the outermost nested list) is 15, which can be used to strip the nested list from the primary string. 

> If the length is zero, the first element is an actual element and not a nested list.  The element separator (i.e. ',') is not needed to determine the length. 

> The processing loop initially provides three elements: a complex list, x and y. 
> If necessary, the complex list is decomposed into 3 elements: a, b, and another nested complex list. 
> The processing continues in a similar fashion as necessary. 
> Alternatively, a complex list may be decomposed and processed when it is recognized, but that is not a requirement.

Of course, the problems involved with quoted strings containing start or end delimiters is not easily solved, and I suspect the best approach would simply be to ban them.
In general, I don't have control over the input contents.  Are you talking of banning quoted strings, or delimiters within them?  It may come back to CSV and Python lists need separate functions, no matter how much they look alike at first consideration.  For the simplest cases with no quotes and no sub-lists, the results would be the same.

> If nested string lists are identified by being surrounded by beginning and ending delimiters, it is essential that no element contain either of those delimiters as data.  If necessary, non-printable characters could be used, or as Kermit suggested some kind of C or Java like escape sequence convention.

The parse statement has the advantage that it provides for multiple assignments, as in the parsing of a simple (CSV) string.
PARSE is wonderfully powerful.  But for CSV it works only in the very simplest of situations, with no quotes and no embedded separators.  And if it is not to be in a loop (which was what you originally were trying to hide) if the number of fields is known.  PARSE also works great for the simple case Python list problem when the format of the list is known ahead of time.

> I was not trying to hide any loop - I'm not sure what you mean or what I might have said to give you that idea?

> Another approach is to provide list encode (and decode) functions which insert (delete) the begin and end delimiters and converts (reverts) the separator character to another non-used character.  That way the complex list could be parsed like a CSV.  (e.g.  "parse  cdr  car  ','  cdr  ;" )  But as I say, that is a somewhat different approach with its own strengths and weaknesses, but I think it has promise.

-------
In my work I could see using both functions.  Either as new BIFs (eventually) or from a NetRexx library of "best practice" functions.

Jeff

> Bill

_______________________________________________
Ibm-netrexx mailing list
[hidden email]
Online Archive : http://ibm-netrexx.215625.n3.nabble.com/





--
"One can live magnificently in this world if one knows how to work and how to love."  --  Leo Tolstoy
_______________________________________________
Ibm-netrexx mailing list
[hidden email]
Online Archive : http://ibm-netrexx.215625.n3.nabble.com/

Reply | Threaded
Open this post in threaded view
|

Re: Fwd: Re: Nested List Support?

Kermit Kiser
In reply to this post by Aviatrexx
Looking at the recent posts on this topic, it seems that you are quite
correct about the responses or lack thereof Chip.

I agree with the idea that Mike's approach to programming is elegant,
even inspired, simplicity that allows flexible solutions to most
problems. In that context, I am firmly opposed to changing the PARSE
instruction in any way as it is already complex enough to be a
significant learning hump. In fact, I never understood why the failed
RegRexx project seemed to revolve around changing PARSE. Perhaps that is
one reason it never went anywhere. It was a very complex project on it's
own even without trying to merge it with PARSE! Ditto for nested list
processing.

But it also seems counter productive to me to implement any of the
partial solutions that have been proposed. That is because if you do so,
you have introduced a new function that has to be studied and understood
and then you wind up still having to code the complex loop logic to
handle the nested sublists. And in the process, you have lost the
flexibility that you have if you just code the entire logic yourself!

-- Kermit

On 12/18/2012 10:25 AM, Chip Davis wrote:

> I doubt that it is lack of interest that has suppressed feedback,
> Kermit.  More likely it was the lack of time to analyze the problem to
> the degree that you, Jeff, and Mike have.  It will take some of us a
> little bit to catch up to you. :-)
>
> One way that Mike made programming easier was by the elegant
> decomposition of programming problems into atomic tools with simple
> interfaces.  For example, the PARSE instruction does not attempt to
> provide a mechanism to extract all of a variable number of blank
> delimited words in a string in one fell swoop; we loop on the idiom:
>
>   PARSE wordlist word wordlist
>
> I would rather have the flexibility of a simple language tool than one
> that tries to do everything for me.  If I want to convert it to a
> array, I'll do it myself.  I don't want a tool with multiple options
> for zero- or one-based indices, upper- or lower-casing, etc.
>
> To me, decomposing nested lists seem to be a parsing problem and I
> would not be unhappy to see it implemented as a Parse enhancement.
>
> The current Parse operation is resolutely left-to-right so it is
> poorly suited for extracting strings bounded by balanced delimiters.
>
> I would be happy with the enhancement (in Classic format):
>
>   PARSE LIST list dlim1 sublist dlim2 list
>
> that would return the sublist delimited by a balanced set of 'dlim1'
> and 'dlim2' strings.  Further extraction of lists or values from
> 'sublist' and 'list' (possibly with different delimiters) would be at
> the programmer's discretion.
>
> Needless to say, that syntax won't work for NetRexx, but perhaps a
> signifier on the delimiter string would suffice:
>
>   PARSE list [<UL] sublist [</UL] list
>
> or possibly (if less preferably):
>
>   PARSE list ('<UL') sublist ('</UL') list
>
> would suffice.
>
> To me, this approach (if feasible) is more Rexx-ish.
>
> -Chip-
>
> On 12/17/2012 05:20 Kermit Kiser said:
>> Just to let you know that I am also working on this issue. I am still
>> planning to make something available for public testing but it may
>> take a while as this does not seem to be a trivial problem to resolve.
>> Besides you and I, no one else seems interested in providing (or
>> perhaps they lack time to provide) feedback or suggestions on a
>> feature set or API or implementation approach for this type of support.
>
> _______________________________________________
> Ibm-netrexx mailing list
> [hidden email]
> Online Archive : http://ibm-netrexx.215625.n3.nabble.com/
>
>
>

_______________________________________________
Ibm-netrexx mailing list
[hidden email]
Online Archive : http://ibm-netrexx.215625.n3.nabble.com/

123