Design dilemma in HTML::Seamstress (solved!)

Discussion:

Terrence Brannon

2004-12-31 19:24:49 UTC

Recently I began to doubt that the compiler hints were a worthwhile
thing to use when creating DOM-like accessor classes to HTML documents.

This post shows the current usage of Seamstress with compiler
hints. And then gives a plethora of reasons why such a design was, on
the whole, a waste of time and should be dropped immediately :).

Overview

HTML::Seamstress is a Perl module which supports HTML templating via
tree-manipulations. It is based on HTML::Tree. The latest version of
this module was inspired by the amazingly concise code that one can
write with Class::DBI after setting up an object-oriented hierarchy.

Here are two samples of tree-based templating as the module is developed
now:

Text substitution
In our first example, we want to perform simple text substitution on the
HTML template document. The HTML file html/hello_world.htm has klass
attributes which serve as compiler (kompiler?) hints to Seamstress:

<html>
<head>
<title>Hello World</title>
</head>
<body>
<h1>Hello World</h1>
Hello, my name is <klass=content span id="name">dummy_name.
Today's date is <klass=content span id="date">dummy_date.
</body>
</html>

Seamstress compiles HTML to "html::hello_world"
shell> seamc html/hello_world.htm
Seamstress v2.91 generating html::hello_world from html/hello_world.htm
Now you simply use the compiled version of HTML with object-oriented
accessors.

use html::hello_world;

my $tree = html::hello_world->new;
$tree->name('terrence brannon')->date('5/11/1969')->as_HTML;

If-then-else with the highlander kompiler hint
The "highlander" kompiler hint is used to mark a subtree of HTML in
which only one child should survive:



Hello, does your mother know you're
using her AOL account?


Sorry, you're not old enough to enter
(and too dumb to lie about your age)


Welcome



Compile and use the module:
use html::highlander;

my $tree = html::highlander->new;

$tree->age_dialog
(
[
under10 => sub { $_[0] < 10} ,
under18 => sub { $_[0] < 18} ,
welcome => sub { 1 }
],
$age
)->as_HTML;

# will only output one of the 3 dialogues based on which closure
# fires first

The dilemmas
Use of the klass tag as a kompiler hint
The biggest dilemma I have is whether to alter the HTML by use of the
"klass" attribute as a kompiler hint. The original reason for writing
Seamstress was to provide HTML templating via pure Perl and pure HTML.
The original connection between the two was the "id" tag, a standard
HTML attribute which must be unique for every element within an HTML4.01
document. The Java framework which inspired the development of
Seamstress, XMLC, uses only standard "class" and "id" tags to generate
Java accessors to HTML documents.

If I wanted to eliminate the "klass" attribute, then I would have to
provide command line arguments to the Seamstress compiler to generate
certain types of methods:

$> ./seamc -klass="name content" -klass="date content" hello_world.html

But that would get tedious when dealing with a ton of files.

So, even though the HTML would be slightly modified with the use of the
"klass" tag, I think I would prefer that over having to supply kompiler
hints at the shell.

Use of any magic whatsoever
Currently, Seamstress supplies accessors which are "intelligent" in two
ways. One, they acts as getters or setters based on whether or not they
are supplied arguments:

$tree->name ; # returns $tree->look_down(id => 'name');
$tree->name(12); # will call a setter method based on compiler hint

Two, they act as specialized setters based on compiler hints. Since
there are numerous ways to "set" a node in a tree (you can set its
contents, you can set its children's contents, you can set an attribute,
you can delete all but one of the children, etc.), the tree operation
that is called is based on the compiler hint.

However, it is not clear that all the extra work to make the inline code
succint is worthwhile. The "magical" version of the hello_world.pm
program is:

$tree->name('terrence brannon')->date('5/11/1969')->as_HTML;

The "plain Jane" version is:

$tree->get_name->replace_content('terrence brannon')
->get_date->replace_conent('5/11/69);

I like the explictness of the plain version.

The plain version would simply have the Seamstress compiler create
get_$id accessors for any HTML element in the document with an id tag. This
plain version is very attractive for a number of reasons:

* the HTML file has zero shock value to the HTML designer
* no kompiler hints need to be written
* no worry about needing to expand the kompiler hints into a
mini-langauge
Over time, mini-languages tend to need more and more. If all
shortcuts and idioms are handled by library methods, then the full
power of Perl can be brought to bear on any situation.

* Similar to the way XMLC works for Java
XMLC does not use any extra tags when creating DOM accessor classes
to HTML files. The widespread success and usage of XMLC implies that
none are necessary.

If I wanted 100% compatibility with XMLC I would be using Terrence
Mather's XML::DOM. However, HTML::ElementTable is an excellent
module for imperative tree-building in Perl and I want to be
able to integrate its results into my HTML templates.

Conclusion
It is very depressing to have to rip out my compiler's guts. I spent a
good amount of time building the compiler and code-generator and
creating tests for it. Now, the compiler is going to be much simpler and
all of the idiomatic processing will exist in standalone tree-processing
libraries such as HTML::Element::Library (to be uploaded) or
HTML::ElementTable.

However, I think this is a change for the better. In fact, it is nice to
know that all tree processing actions will be handled like this:

$tree->$id_name->$library_method(@method_args);

instead of

$tree->$id_name(@args_to_magic_method)

--
Carter's Compass: I know I'm on the right track when,
by deleting something, I'm adding functionality.

Rob Kinyon

2005-01-03 15:12:07 UTC

Permalink

Terrence -

First off, excellent article on the dangers of wanting to "make
it cool". I mean, what's cooler than writing your own compiler?? :-)

Question about HTML::Seamstress: What happens if I have two in my document? What does get_date() do?

Also, wouldn't it be possible to provide a convenience function of
foo() that was, essentially, {
(+shift)->get_foo()->replace_content(@_) } ?

Rob

On Fri, 31 Dec 2004 19:24:49 +0000, Terrence Brannon

Post by Terrence Brannon
Recently I began to doubt that the compiler hints were a worthwhile
thing to use when creating DOM-like accessor classes to HTML documents.
This post shows the current usage of Seamstress with compiler
hints. And then gives a plethora of reasons why such a design was, on
the whole, a waste of time and should be dropped immediately :).
Overview
HTML::Seamstress is a Perl module which supports HTML templating via
tree-manipulations. It is based on HTML::Tree. The latest version of
this module was inspired by the amazingly concise code that one can
write with Class::DBI after setting up an object-oriented hierarchy.
Here are two samples of tree-based templating as the module is developed
Text substitution
In our first example, we want to perform simple text substitution on the
HTML template document. The HTML file html/hello_world.htm has klass
<html>
<head>
<title>Hello World</title>
</head>
<body>
<h1>Hello World</h1>
Hello, my name is <klass=content span id="name">dummy_name.
Today's date is <klass=content span id="date">dummy_date.
</body>
</html>
Seamstress compiles HTML to "html::hello_world"
shell> seamc html/hello_world.htm
Seamstress v2.91 generating html::hello_world from html/hello_world.htm
Now you simply use the compiled version of HTML with object-oriented
accessors.
use html::hello_world;
my $tree = html::hello_world->new;
$tree->name('terrence brannon')->date('5/11/1969')->as_HTML;
If-then-else with the highlander kompiler hint
The "highlander" kompiler hint is used to mark a subtree of HTML in


Hello, does your mother know you're
using her AOL account?


Sorry, you're not old enough to enter
(and too dumb to lie about your age)


Welcome


use html::highlander;
my $tree = html::highlander->new;
$tree->age_dialog
(
[
under10 => sub { $_[0] < 10} ,
under18 => sub { $_[0] < 18} ,
welcome => sub { 1 }
],
$age
)->as_HTML;
# will only output one of the 3 dialogues based on which closure
# fires first
The dilemmas
Use of the klass tag as a kompiler hint
The biggest dilemma I have is whether to alter the HTML by use of the
"klass" attribute as a kompiler hint. The original reason for writing
Seamstress was to provide HTML templating via pure Perl and pure HTML.
The original connection between the two was the "id" tag, a standard
HTML attribute which must be unique for every element within an HTML4.01
document. The Java framework which inspired the development of
Seamstress, XMLC, uses only standard "class" and "id" tags to generate
Java accessors to HTML documents.
If I wanted to eliminate the "klass" attribute, then I would have to
provide command line arguments to the Seamstress compiler to generate
$> ./seamc -klass="name content" -klass="date content" hello_world.html
But that would get tedious when dealing with a ton of files.
So, even though the HTML would be slightly modified with the use of the
"klass" tag, I think I would prefer that over having to supply kompiler
hints at the shell.
Use of any magic whatsoever
Currently, Seamstress supplies accessors which are "intelligent" in two
ways. One, they acts as getters or setters based on whether or not they
$tree->name ; # returns $tree->look_down(id => 'name');
$tree->name(12); # will call a setter method based on compiler hint
Two, they act as specialized setters based on compiler hints. Since
there are numerous ways to "set" a node in a tree (you can set its
contents, you can set its children's contents, you can set an attribute,
you can delete all but one of the children, etc.), the tree operation
that is called is based on the compiler hint.
However, it is not clear that all the extra work to make the inline code
succint is worthwhile. The "magical" version of the hello_world.pm
$tree->name('terrence brannon')->date('5/11/1969')->as_HTML;
$tree->get_name->replace_content('terrence brannon')
->get_date->replace_conent('5/11/69);
I like the explictness of the plain version.
The plain version would simply have the Seamstress compiler create
get_$id accessors for any HTML element in the document with an id tag. This
* the HTML file has zero shock value to the HTML designer
* no kompiler hints need to be written
* no worry about needing to expand the kompiler hints into a
mini-langauge
Over time, mini-languages tend to need more and more. If all
shortcuts and idioms are handled by library methods, then the full
power of Perl can be brought to bear on any situation.
* Similar to the way XMLC works for Java
XMLC does not use any extra tags when creating DOM accessor classes
to HTML files. The widespread success and usage of XMLC implies that
none are necessary.
If I wanted 100% compatibility with XMLC I would be using Terrence
Mather's XML::DOM. However, HTML::ElementTable is an excellent
module for imperative tree-building in Perl and I want to be
able to integrate its results into my HTML templates.
Conclusion
It is very depressing to have to rip out my compiler's guts. I spent a
good amount of time building the compiler and code-generator and
creating tests for it. Now, the compiler is going to be much simpler and
all of the idiomatic processing will exist in standalone tree-processing
libraries such as HTML::Element::Library (to be uploaded) or
HTML::ElementTable.
However, I think this is a change for the better. In fact, it is nice to
instead of
--
Carter's Compass: I know I'm on the right track when,
by deleting something, I'm adding functionality.
_______________________________________________
sw-design mailing list
http://metaperl.com/cgi-bin/mailman/listinfo/sw-design

Terrence Brannon

2005-01-03 16:51:50 UTC

Permalink

Post by Rob Kinyon
Terrence -
First off, excellent article on the dangers of wanting to "make
it cool". I mean, what's cooler than writing your own compiler?? :-)

It is certainly addictive. I can't wait to sit down for another 2-3
hours of solid hacking.

Post by Rob Kinyon
Question about HTML::Seamstress: What happens if I have two in my document? What does get_date() do?

Answer 1: get_date will perform this method

$tree->look_down(id => 'date');

Per the HTML::Element docs, such a method returns the first element
found in the scalar context. So it would return the first subtree whose root
node had "date" as an id attribute

Answer 2: Per the HTML specifications, it is illegal to have two
different elements with the same value for the id attribute. So, in a
validating situation, such HTML would never make the cut.

However, you have got me thinking. I have gotten pretty used to
writing test cases for all of the various methods. I use
HTML::PrettyPrinter to dump a file of expected output for given
input. I may create a hello_world.t test directory with a pre-made test suite
for each template file. Though I doubt most people would take
advantage of such a suite.

Post by Rob Kinyon
Also, wouldn't it be possible to provide a convenience function of
foo() that was, essentially, {

Yes, the most common templating act is substitution. So,

$tree->date->($value)

will be a shortcut for

$tree->get_date->replace_content($value)

All other templating acts will be called "longhand", e.g.

$tree->get_$id->$library_method(@args)

where $library_method is a method such as highlander(), iter(),
dual_iter(), and any other method that exists in HTML::Element or
HTML::Element::Library (to be uploaded).

--
Carter's Compass: I know I'm on the right track when,
by deleting something, I'm adding functionality.

Matthew Simon Cavalletto

2005-01-03 18:19:09 UTC

Permalink

Post by Terrence Brannon
However, it is not clear that all the extra work to make the inline code
succint is worthwhile. The "magical" version of the hello_world.pm
$tree->name('terrence brannon')->date('5/11/1969')->as_HTML;
$tree->get_name->replace_content('terrence brannon')
->get_date->replace_conent('5/11/69);

It's not clear to me that either of these named-accessor,
mutator-chaining interfaces is the optimum idiom; wouldn't the
following be clearer?

$tree->replace_content_elements(
'name' => 'terrence brannon',
'date' => '5/11/69
);
print $tree->as_HTML;

Post by Terrence Brannon
In fact, it is nice to know that all tree processing

It seems more natural to me to treat the ID as data, along these lines:

$tree->$library_method($id_name => @method_args);

Why select an interface that forces you to generate methods for every
ID'd element? (And what happens when my HTML designer gives me a
template with IDs named 'new' or 'delete'?)

More generally, I would side with Aristotle's position against mutator
chaining, set forth at <http://perlmonks.org/index.pl/417872>,
particularly in cases which seem to mix mutators with sub-object
accessors -- are you really calling get_date() on the result of
replace_content(), as shown in the "plain" example above? Yikes!

-Simon

Ovid

2005-01-03 21:08:31 UTC

Permalink

Post by Matthew Simon Cavalletto
More generally, I would side with Aristotle's position against
mutator
chaining, set forth at <http://perlmonks.org/index.pl/417872>,
particularly in cases which seem to mix mutators with sub-object
accessors -- are you really calling get_date() on the result of
replace_content(), as shown in the "plain" example above? Yikes!

Chained mutators spanning classes is truly horrible style that leads
to code that is difficult to maintain. However, that's just a
violation of the Law of Demeter (http://c2.com/cgi/wiki?LawOfDemeter)
and has nothing specific to do with chained mutators. Attacking
chained mutators because some people choose to violate this Law seems
silly. If they stop violating a well-known software design issue, one
of the biggest (and false) objections to chained mutators goes away.

Aristotle's argument, to my mind, has a serious flaw: if you don't
like chained mutators, don't use them. If my methods return $self,
there is nothing stopping you from *not* chaining.

Further, providing a bulk 'setter' has a similar flaw to the Law of
Demeter argument:

$o->set(
foo => 'bar',
baz => 'quux',
);

In short, whether or not this is a good interface has nothing to do
with whether or not chained mutators are a good idea. Of course, we
can argue that chained mutators are suboptimal because there's a better
option (the bulk setter), but the bulk setter has issues of its own
(http://www.perlmonks.org/?node_id=418166) so until I hear something
more persuasive (and I'm all ears) I merely view this as a matter of
style.

The primary problem I see with chained mutators being the (occasional)
need to return to Null object and overload its boolean value. This is
slow in Perl, but if the Null object is not a common case then this is
probably not that much of an issue.

Cheers,
Ovid

=====
Silence is Evil http://users.easystreet.com/ovid/philosophy/decency.html
Ovid http://www.perlmonks.org/index.pl?node_id=17000
Web Programming with Perl http://users.easystreet.com/ovid/cgi_course/

Terrence Brannon

2005-01-03 23:48:31 UTC

Permalink

Post by Matthew Simon Cavalletto

Post by Terrence Brannon
succint is worthwhile. The "magical" version of the hello_world.pm
$tree->name('terrence brannon')->date('5/11/1969')->as_HTML;
$tree->get_name->replace_content('terrence brannon')
->get_date->replace_conent('5/11/69);

It's not clear to me that either of these named-accessor,
mutator-chaining interfaces is the optimum idiom; wouldn't the
following be clearer?
$tree->replace_content_elements(
'name' => 'terrence brannon',
'date' => '5/11/69
);
print $tree->as_HTML;

Post by Terrence Brannon
In fact, it is nice to know that all tree processing

Why select an interface that forces you to generate methods for every
ID'd element?

You have a better design but I am subclassing from HTML::Tree. When you work
with HTML::Tree, each method typically receives the tree as its first
argument. So the first thing my methodmaker did was go find the right
tree so it could be passed to other methods. In addition,
HTML::ElementTable (by Matthew Sisk), and HTML::Element::Library (soon
to be uploaded) all revolve around the notion of give me a tree and I
act as advertised.

IDs as look_down criteria are just one thing that HTML::Tree can do
look_down()s on. It can look down on tags, text, closures. For the
purposes of my limited domain, all I need are id tags. But I would
need a front-end to every HTML::Tree method which did

$tree->look_down(id => $id_name) # locate the subtree
->$library_method(@method_args); # operate on the tree

before calling the method. For example HTML::ElementExtended contains
the shortcut method replace_content($tree, $new_content). So if I
wrote a manual boilerplate in Seamstress which converted

$tree->replace_content($id_name => $new_content)
to
$tree->look_down(id => $id_name)->replace_content($new_content)

then I could have the interface most suited for Seamstress convert to
the interface most common outside of Seamstress.

But to me that is less natural than

$tree->get_id_name->replace_content($new_content)

Post by Matthew Simon Cavalletto
(And what happens when my HTML designer gives me a template with IDs
named 'new' or 'delete'?)

Razor-sharp assessment, Simon. Thanks a lot for the input. I never
thought of that pitfall. But maybe that's why there should be no
methods which are not prefaced with get_$id. You have settled that
issue for me once and for all: to hell with method names not prefixed
with get_ or set_

Post by Matthew Simon Cavalletto
More generally, I would side with Aristotle's position against mutator
chaining, set forth at <http://perlmonks.org/index.pl/417872>,
particularly in cases which seem to mix mutators with sub-object
accessors -- are you really calling get_date() on the result of
replace_content(), as shown in the "plain" example above? Yikes!

It does work as long as it is passed the root tree. But I see your
point. If replace_content() is called with a subtree and returns a
subtree, then get_date() will fail.

--
Carter's Compass: I know I'm on the right track when,
by deleting something, I'm adding functionality.

Matthew Simon Cavalletto

2005-01-04 15:48:23 UTC

Permalink

Post by Terrence Brannon

Post by Matthew Simon Cavalletto

Post by Terrence Brannon
In fact, it is nice to know that all tree processing

You have a better design but I am subclassing from HTML::Tree.

Ah, that makes sense. (I wonder how things would have turned out if
you'd chosen to use a delegating wrapper instead of a subclass?)

Post by Terrence Brannon

Post by Matthew Simon Cavalletto
(And what happens when my HTML designer gives me a template with IDs
named 'new' or 'delete'?)

Razor-sharp assessment, Simon. Thanks a lot for the input. I never
thought of that pitfall. But maybe that's why there should be no
methods which are not prefaced with get_$id.

I guess it's still not clear to me why you're generating get_foo()
methods instead of calling a pre-written method with an argument, like
get('foo').

-Simon

b***@metaperl.com

2005-01-05 21:38:08 UTC

Permalink

Post by Matthew Simon Cavalletto

Post by Terrence Brannon

Post by Terrence Brannon
In fact, it is nice to know that all tree processing

You have a better design but I am subclassing from HTML::Tree.

Ah, that makes sense. (I wonder how things would have turned out if
you'd chosen to use a delegating wrapper instead of a subclass?)

Good question. I wish I had more experience with such technology.

Post by Matthew Simon Cavalletto
I guess it's still not clear to me why you're generating get_foo()
methods instead of calling a pre-written method with an argument, like
get('foo').

It _was_ a matter of efficiency. Initially, when creating a .pm for an
HTML template, I would store the serialized parse tree in the module
along with package-scoped scalars with pre-computed look-downs into
the tree. i.e.,

package html::hello_world;

my $tree = tree()
my $name = $tree->look_down(id => 'name');
my $date = $tree->look_down(id => 'date);

sub get_date { $name # no runtime lookup }
sub set_date { $name->replace_content(shift) }

sub tree {

bless {
# huge nest of HTML::TreeBuilder elements
# courtesy of Data::Dumper with Purity == 1
}, HTML::TreeBuilder

}

But then I realized that under mod_perl and destructive tree
operations, the serialized parse tree could only be used one
time. I.e,

# file xyz.pl
$object = hello_world->new;
$object->get_name->detach; # no more name subtree

# file abc.pl
$object2 = hello_world->new;
$object2->look_down(id => 'name'); # hey where'd it go?!

Two HTTP requests under CGI would be fine with the implementation
above. But under mod_perl, assuming you did a LoadModule hello_world at
server startup, abc.pl run after xyz.pl would fail.

So, each new use of a tree for templating has to reparse the HTML template so
that the tree can be manipulated destructively.

I had wanted to create a templating solution and not think about
CGI/mod_perl until later. In fact, it wasnt CGI/mod_perl that led to
the above realization. I wrote all the tests for the highlander idiom
in the same .t file. So, after the first test worked. I created
another object for the 2nd test. But, the first test had detached all
the subtrees other than the one that met the condition, so the 2nd
test failed because tree module was returning the same tree for the
new object.

All of the below is more succinct now, but just notice the detach
calls to see what I mean about needing a new tree for each test:



Hello, does your mother know you're
using her AOL account?


Sorry, you're not old enough to enter
(and too dumb to lie about your age)


Welcome



sub age_handler {
my ($tree, $age) = @_;
my $SPAN = $tree->look_down('id', 'age_handler');
if ($age < 10) {
$SPAN->look_down('id', $_)->detach for qw(under18 welcome);
} elsif ($age < 18) {
$SPAN->look_down('id', $_)->detach for qw(under10 welcome);
} else {
$SPAN->look_down('id', $_)->detach for qw(under10 under18);
}

}

my $o1 = tree_test->new;
$o1->age_handler(10);
# then test tree

my $o2 = tree_test->new;
$o2->age_handler(100000);
# then test tree... oh no!

--
Carter's Compass: I know I'm on the right track when,
by deleting something, I'm adding functionality.