Discussion:
Code and block size as complexity metrics
Rob Kinyon
2004-12-08 13:27:36 UTC
Permalink
http://www.perlmonks.org/?node_id=413087 has a discussion of
Cylcomatic Complexity, which seems to be exactly what we're trying to
do here ...
I've got the basic tokens/block thing going in an exploration script.
One thing I'm noticing is that find() flattens out your parse-tree.
Having the ability to access the parent-node may end up being nice,
but I'm still feeling my way through the problem-space, so don't code
anything based on what I say just yet.
It doesn't flatten it out as such, it just returned a list of pointers
to the various elements you asked for in the tree.
http://search.cpan.org/~adamk/PPI-0.831/lib/PPI/Element.pm
The tree itself still exists and is not modified, and you can still call
any of the "going up the tree" methods.
$Element->parent # Immediate parent
$Element->top # Top of the PDOM tree (should be a ::Document)
$Element->statement # Parent statement
Some documentation on what "significant" means would be nice. Also, if
significant() and prune() could play together, that would be cool. I'm
my $doc = PPI::Document->read( $filename );
$doc->prune( "PPI::Token::$_" ) for qw( Comment Whitespace );
See the find documentation at
http://search.cpan.org/~adamk/PPI-0.831/lib/PPI/Node.pm#find_$class_|_\&condition
The entire set of search methods (find, find_any and prune) accept
anonymous sub references as well, the class name thing is just a shortcut.
The sub is passed the top element of the search, and the "current"
element, so you can do something like the following, which will strip
out everything but the significant elements.
my $Doc = PPI::Document->read( $filename );
$Doc->prune( sub { ! $_[1]->significant } );
The "significant" property is defined at its method
http://search.cpan.org/~adamk/PPI-0.831/lib/PPI/Element.pm#significant
In short, perl contains a whole bunch of things you normally don't want
to care about. If I want to find the "previous" or "next" element, I
_probably_ don't care what whitespace or comments or pod is in the way.
And so the "s" serious of methods (snext_sibling, sprevious_sibling,
schildren etc) will do what you mean, ignoring any miscellaneous things
that don't matter that happen to be in the way.
This is important because when writing the perl document back to a file,
it is VERY important that the document ends up exactly as it was when we
read it in, in the same way something like Dreamweaver will read and
write a HTML file while leaving it exactly as is (layout, broken code
and all).
And whitespace does actually matter in some instances during the
parsing. Yes, perl is REALLY crufty :)
It isn't line-based, but it is line-based for some things (Heredocs).
Whitespace isn't important, but it does need to be there or things break.
I suppose I could be a good opensource user and provide a patch+tests,
huh... :-)
I'm happy to receive patches but no guarantees that it will be merged.
I'm still shuffling things around to make sure I'm 100% comfortable with
the API myself.
Adam
Stevan Little
2004-12-08 19:49:19 UTC
Permalink
(Note this is cross posted on perlmonks as well)

So I was reading the Cylcomatic Complexity thread on perl monks, and it
occured to me that B::Concise actually gives up some of the necessary
information (at least I think it does).

Given this code:

print "Hello World";

I got this output:

stevan% perl -MO=Concise test.pl
6 <@> leave[1 ref] vKP/REFC ->(end)
1 <0> enter ->2
2 <;> nextstate(main 1 test.pl:3) v ->3
5 <@> print vK ->6
3 <0> pushmark s ->4
4 <$> const[PV "Hello World"] s ->5

Now I freely admit that both my understanding of Cylcomatic Complexity
and B::Concise's output are most likely radically flawed, but I am
going to ramble a bit here anyway.

Using this definition for CC (Cylcomatic Complexity)

Cyclomatic complexity may be considered a broad measure of soundness
and
confidence for a program. Introduced by Thomas McCabe in 1976, it
measures
the number of linearly-independent paths through a program module.

And with this (flimsy) interpretation of the B::Concise output:

(opcode-sequence-number) <0> (opcode_name) ->
(next-opcode-sequence-number)

It seems to me that our code could be viewed as the following graph:

1 -> 2
2 -> 3
3 -> 4
4 -> 5
5 -> 6
6 -> (end)

Simple right? Since there is only one "linearly-independent path" the
CC-metric for this might be 1.

Now let's take a look at a slightly more complex bit of code:

if (shift) {
print "Hello World";
}
else {
print "Goodbye World";
}

And here is the B::Concise output:

a <@> leave[1 ref] vKP/REFC ->(end)
1 <0> enter ->2
2 <;> nextstate(main 4 test.pl:3) v ->3
- <1> null vK/1 ->a
6 <|> cond_expr(other->7) vK/1 ->b
5 <1> shift sK/1 ->6
4 <1> rv2av[t2] sKRM/1 ->5
3 <#> gv[*ARGV] s ->4
- <@> scope vK ->-
- <0> ex-nextstate v ->7
9 <@> print vK ->a
7 <0> pushmark s ->8
8 <$> const[PV "Hello World"] s ->9
g <@> leave vKP ->a
b <0> enter v ->c
c <;> nextstate(main 2 test.pl:7) v ->d
f <@> print vK ->g
d <0> pushmark s ->e
e <$> const[PV "Goodbye World"] s ->f

Now here is the 2 graphs for this:

1 -> 2
2 -> 3
3 -> 4
4 -> 5
5 -> 6
6 -> 7 (<<< conditional here)
7 -> 8
8 -> 9
9 -> (end)

and:

1 -> 2
2 -> 3
3 -> 4
4 -> 5
5 -> 6
6 -> b (<<< conditional here)
b -> c
c -> d
d -> e
e -> f
f -> g
g -> (end)

We now have two "linearly-independent path" the CC-metric for this
might be 2.

It seems to get a little tricky when we have subroutines, here is some
code:

sub start {
print test();
return "Goodbye World";
}

sub test {
return "Hello world"
}

print start();

And here is the B::Concise output (notice we need to pass the sub-names
to get them to output):

perl -MO=Concise,-main,start,test test.pl
main::start:
b <1> leavesub[1 ref] K/REFC,1 ->(end)
- <@> lineseq KP ->b
1 <;> nextstate(main 1 test.pl:5) v ->2
6 <@> print vK ->7
2 <0> pushmark s ->3
5 <1> entersub[t2] lKS/TARG,1 ->6
- <1> ex-list lK ->5
3 <0> pushmark s ->4
- <1> ex-rv2cv sK/1 ->-
4 <#> gv[*test] s/EARLYCV ->5
7 <;> nextstate(main 1 test.pl:6) v ->8
a <@> return K ->b
8 <0> pushmark s ->9
9 <$> const[PV "Goodbye World"] s ->a
main::test:
g <1> leavesub[1 ref] K/REFC,1 ->(end)
- <@> lineseq KP ->g
c <;> nextstate(main 2 test.pl:10) v ->d
f <@> return K ->g
d <0> pushmark s ->e
e <$> const[PV "Hello world"] s ->f
main program:
o <@> leave[1 ref] vKP/REFC ->(end)
h <0> enter ->i
i <;> nextstate(main 3 test.pl:14) v ->j
n <@> print vK ->o
j <0> pushmark s ->k
m <1> entersub[t2] lKS/TARG,1 ->n
- <1> ex-list lK ->m
k <0> pushmark s ->l
- <1> ex-rv2cv sK/1 ->-
l <#> gv[*start] s ->m

If we look at the "main program:" label as a starting place:

h -> i
i -> j
j -> k
k -> l
l -> *start (
1 -> 2
2 -> 3
3 -> 4
4 -> *test (
c -> d
d -> e
e -> f
f -> g (return)
) -> 5
6 -> 7
7 -> 8
8 -> 9
9 -> a
a -> b (return)
) -> m
m -> n
n -> o
0 -> (end)

We now only have one "linearly-independent path" so the CC-metric for
this would be 1.

Now of course there are some issues with this. To start with
B::Concise's sequence numbers are in base 36 and values greater than 62
are not supported according to the docs. However, if you use the
B::Concise functional interface, it seems that is might be possible to
get around this restriction by using the perl opcode hex addresses
instead of the sequence numbers. However now I am getting into needing
to really write some code, and I am currently out of time and need to
get back to "real" work. But anyway I am interested in comments from
the group.

Steve
Post by Rob Kinyon
http://www.perlmonks.org/?node_id=413087 has a discussion of
Cylcomatic Complexity, which seems to be exactly what we're trying to
do here ...
Loading...