Code and block size as complexity metrics

Stevan Little

2004-12-08 19:49:19 UTC

(Note this is cross posted on perlmonks as well)

So I was reading the Cylcomatic Complexity thread on perl monks, and it
occured to me that B::Concise actually gives up some of the necessary
information (at least I think it does).

Given this code:

print "Hello World";

I got this output:

stevan% perl -MO=Concise test.pl
6 <@> leave[1 ref] vKP/REFC ->(end)
1 <0> enter ->2
2 <;> nextstate(main 1 test.pl:3) v ->3
5 <@> print vK ->6
3 <0> pushmark s ->4
4 <$> const[PV "Hello World"] s ->5

Now I freely admit that both my understanding of Cylcomatic Complexity
and B::Concise's output are most likely radically flawed, but I am
going to ramble a bit here anyway.

Using this definition for CC (Cylcomatic Complexity)

Cyclomatic complexity may be considered a broad measure of soundness
and
confidence for a program. Introduced by Thomas McCabe in 1976, it
measures
the number of linearly-independent paths through a program module.

And with this (flimsy) interpretation of the B::Concise output:

(opcode-sequence-number) <0> (opcode_name) ->
(next-opcode-sequence-number)

It seems to me that our code could be viewed as the following graph:

1 -> 2
2 -> 3
3 -> 4
4 -> 5
5 -> 6
6 -> (end)

Simple right? Since there is only one "linearly-independent path" the
CC-metric for this might be 1.

Now let's take a look at a slightly more complex bit of code:

if (shift) {
print "Hello World";
}
else {
print "Goodbye World";
}

And here is the B::Concise output:

a <@> leave[1 ref] vKP/REFC ->(end)
1 <0> enter ->2
2 <;> nextstate(main 4 test.pl:3) v ->3
- <1> null vK/1 ->a
6 <|> cond_expr(other->7) vK/1 ->b
5 <1> shift sK/1 ->6
4 <1> rv2av[t2] sKRM/1 ->5
3 <#> gv[*ARGV] s ->4
- <@> scope vK ->-
- <0> ex-nextstate v ->7
9 <@> print vK ->a
7 <0> pushmark s ->8
8 <$> const[PV "Hello World"] s ->9
g <@> leave vKP ->a
b <0> enter v ->c
c <;> nextstate(main 2 test.pl:7) v ->d
f <@> print vK ->g
d <0> pushmark s ->e
e <$> const[PV "Goodbye World"] s ->f

Now here is the 2 graphs for this:

1 -> 2
2 -> 3
3 -> 4
4 -> 5
5 -> 6
6 -> 7 (<<< conditional here)
7 -> 8
8 -> 9
9 -> (end)

and:

1 -> 2
2 -> 3
3 -> 4
4 -> 5
5 -> 6
6 -> b (<<< conditional here)
b -> c
c -> d
d -> e
e -> f
f -> g
g -> (end)

We now have two "linearly-independent path" the CC-metric for this
might be 2.

It seems to get a little tricky when we have subroutines, here is some
code:

sub start {
print test();
return "Goodbye World";
}

sub test {
return "Hello world"
}

print start();

And here is the B::Concise output (notice we need to pass the sub-names
to get them to output):

perl -MO=Concise,-main,start,test test.pl
main::start:
b <1> leavesub[1 ref] K/REFC,1 ->(end)
- <@> lineseq KP ->b
1 <;> nextstate(main 1 test.pl:5) v ->2
6 <@> print vK ->7
2 <0> pushmark s ->3
5 <1> entersub[t2] lKS/TARG,1 ->6
- <1> ex-list lK ->5
3 <0> pushmark s ->4
- <1> ex-rv2cv sK/1 ->-
4 <#> gv[*test] s/EARLYCV ->5
7 <;> nextstate(main 1 test.pl:6) v ->8
a <@> return K ->b
8 <0> pushmark s ->9
9 <$> const[PV "Goodbye World"] s ->a
main::test:
g <1> leavesub[1 ref] K/REFC,1 ->(end)
- <@> lineseq KP ->g
c <;> nextstate(main 2 test.pl:10) v ->d
f <@> return K ->g
d <0> pushmark s ->e
e <$> const[PV "Hello world"] s ->f
main program:
o <@> leave[1 ref] vKP/REFC ->(end)
h <0> enter ->i
i <;> nextstate(main 3 test.pl:14) v ->j
n <@> print vK ->o
j <0> pushmark s ->k
m <1> entersub[t2] lKS/TARG,1 ->n
- <1> ex-list lK ->m
k <0> pushmark s ->l
- <1> ex-rv2cv sK/1 ->-
l <#> gv[*start] s ->m

If we look at the "main program:" label as a starting place:

h -> i
i -> j
j -> k
k -> l
l -> *start (
1 -> 2
2 -> 3
3 -> 4
4 -> *test (
c -> d
d -> e
e -> f
f -> g (return)
) -> 5
6 -> 7
7 -> 8
8 -> 9
9 -> a
a -> b (return)
) -> m
m -> n
n -> o
0 -> (end)

We now only have one "linearly-independent path" so the CC-metric for
this would be 1.

Now of course there are some issues with this. To start with
B::Concise's sequence numbers are in base 36 and values greater than 62
are not supported according to the docs. However, if you use the
B::Concise functional interface, it seems that is might be possible to
get around this restriction by using the perl opcode hex addresses
instead of the sequence numbers. However now I am getting into needing
to really write some code, and I am currently out of time and need to
get back to "real" work. But anyway I am interested in comments from
the group.

Steve

Post by Rob Kinyon
http://www.perlmonks.org/?node_id=413087 has a discussion of
Cylcomatic Complexity, which seems to be exactly what we're trying to
do here ...