PHP Depend - Software Metrics for PHP

Cyclomatic Complexity

The Cyclomatic Complexity number [#cabeccn]_ or short CCN is one of the oldest complexity metrics. The first time this software metric was mentioned was 1976 by Thomas J. McCabe. This metric counts the available decision paths in a software fragment to determine its complexity. Each decision path starts with one of the conditional statements from the following list, so that it is fairly easy to detect them in existing source code.

A look at this list of statements may result in the questions: Is this list wrong, it doesn't list else and default? But it is correct. The assumption is that both statements will contain the default execution path of a program which
also exists when there are no special cases to capture.

Each decision path gets the value 1 and the sum of all these values represents the Cyclomatic Complexity of the analyzed software fragment. Note that each function and method also
counts with a value of 1 With this knowledge we can now
calculate the complexity of the following example code: :

<?php                                 
//                                                         | CCN
// -------------------------------------------------------------
class CyclomaticComplexityNumber                        // |  0
{                                                       // |  0
    public function example( $x, $y )                   // |  1
    {                                                   // |  0
        if ( $x > 23 || $y < 42 )                       // |  1
        {                                               // |  0
            for ( $i = $x; $i >= $x && $i <= $y; ++$i ) // |  1
            {                                           // |  0
            }                                           // |  0
        }                                               // |  0
        else                                            // |  0
        {                                               // |  0
            switch ( $x + $y )                          // |  0
            {                                           // |  0
                case 1:                                 // |  1
                    break;                              // |  0
                case 2:                                 // |  1
                    break;                              // |  0
                default:                                // |  0
                    break;                              // |  0
            }                                           // |  0
        }                                               // |  0
    }                                                   // |  0
    file_exists('/tmp/log') or touch('/tmp/log');       // |  0
}                                                       // |  0
// -------------------------------------------------------------
//                                                         |  5

Based on the previous definition the Cyclomatic Complexity
Number of the example code example is 5. But you may have noticed that this approach does not capture all decision paths that exist. We haven't catched those paths that came from the by the boolean expression || line 8 and && line 10, and
the logical or expression in line 27. A variation of the Cyclomatic Complexity Number that also captures those paths is the so called CCN2. The CCN2 is the most widely used variation of this software metrics. Tools like PHPUnit, PMD and Checkstyle report it as Cyclomatic Complexity of an analyzed software fragment.

Now we get a complexity value of 8 when we apply the CCN2
to the previous example, what is a growth of the software's
complexity of 60%.

Due to the fact that Cyclomatic Complexity Number was
originally invented for procedural programming languages,
this definition for the Cyclomatic Complexity Number still
misses one element to measure the complexity of an object
oriented software system. With the concept of exceptions a
software gets additional decision paths for each catch statement used in the source code. While try contains
the code for the regular execution code without special
cases, similar to else and default statements.

Now that we know what the Cyclomatic Complexity Number is, what can we do with the measured information? We can find
the complexity hotspots in a system, for example the top
ten artifacts with the highest complexity, but this is only important during an initial analyses phase to get the big
picture of an application. For a continuous inspection this information is not so important. A continuous analyses requires thresholds that help to categories calculated
values. During the time four values have emerged as good
thresholds for the Cyclomatic Complexity Number of a
software system.

You may ask, why should I care about the complexity of a software system, where is the value of benefit in this metric?

Mostly the complex parts of an application contain business critical logic. But this complexity has negative impacts on
the readability and understandability of source code. Those parts will normally become a maintenance and bug fixing nightmare, because no one knows all the constraints, side effects and what's exactly going on in that part of the
software. This situation results in the well known saying "Never touch a running system" which in turn mostly ends
in copy&paste programming. The situation can even become more critical when the original author leaves the
development team or the company.

Example: Finally a small example how to apply the new knowledge
about the Cyclomatic Complexity Number, thresholds and the negative impacts of complex software to an existing
development process. The following source listing shows
a complex method taken from PHP_Depend's source. This method has a Cyclomatic Complexity Number of 16 and I must admit that the original author needed some time to understand what
was going on in this method. :

<?php
// ...
private function _countCalls(PHP_Depend_Code_AbstractCallable $callable)
{
    $callT  = array(
        \PDepend\Source\Tokenizer\Tokens::T_STRING,
        \PDepend\Source\Tokenizer\Tokens::T_VARIABLE
    );
    $chainT = array(
        \PDepend\Source\Tokenizer\Tokens::T_DOUBLE_COLON,
        \PDepend\Source\Tokenizer\Tokens::T_OBJECT_OPERATOR,
    );

    $called = array();

    $tokens = $callable->getTokens();
    $count  = count($tokens);
    for ($i = 0; $i < $count; ++$i) {
        // break on function body open
        if ($tokens[$i]->type === \PDepend\Source\Tokenizer\Tokens::T_CURLY_BRACE_OPEN) {
            break;
        }
    }

    for (; $i < $count; ++$i) {
        // Skip non parenthesis tokens
        if ($tokens[$i]->type !== \PDepend\Source\Tokenizer\Tokens::T_PARENTHESIS_OPEN) {
            continue;
        }
        // Skip first token
        if (!isset($tokens[$i - 1]) || !in_array($tokens[$i - 1]->type, $callT)) {
            continue;
        }
        // Count if no other token exists
        if (!isset($tokens[$i - 2]) && !isset($called[$tokens[$i - 1]->image])) {
            $called[$tokens[$i - 1]->image] = true;
            ++$this->_calls;
            continue;
        } else if (in_array($tokens[$i - 2]->type, $chainT)) {
            $identifier = $tokens[$i - 2]->image . $tokens[$i - 1]->image;
            for ($j = $i - 3; $j >= 0; --$j) {
                if (!in_array($tokens[$j]->type, $callT)
                    && !in_array($tokens[$j]->type, $chainT)
                ) {
                    break;
                }
                $identifier = $tokens[$j]->image . $identifier;
            }

            if (!isset($called[$identifier])) {
                $called[$identifier] = true;
                ++$this->_calls;
            }
        } else if ($tokens[$i - 2]->type !== \PDepend\Source\Tokenizer\Tokens::T_NEW
            && !isset($called[$tokens[$i - 1]->image])
        ) {
            $called[$tokens[$i - 1]->image] = true;
            ++$this->_calls;
        }
    }
}

The first thing to do is to make sure that the test suite
is good enough to ensure that the required refactorings
will not change the public behavior of the component or class. When this is done and we are sure our that api breaks will be detected by the test suite we can start to extract logic into separate methods.

The following example shows the result of the refactoring: :

<?php
// ...
private function _countCalls(PHP_Depend_Code_AbstractCallable $callable)
{
    $called = array();

    $tokens = $callable->getTokens();
    $count  = count($tokens);
    for ($i = $this->_findOpenCurlyBrace($tokens); $i < $count; ++$i) {

        if ($this->_isCallableOpenParenthesis($tokens, $i) === false) {
            continue;
        }

        if ($this->_isMethodInvocation($tokens, $i) === true) {
            $image = $this->_getInvocationChainImage($tokens, $i);
        } else if ($this->_isFunctionInvocation($tokens, $i) === true) {
            $image = $tokens[$i - 1]->image;
        } else {
            $image = null;
        }

        if ($image !== null) {
            $called[$image] = $image;
        }
    }

    $this->_calls += count($called);
}

The subjective feeling of readability heavily depends on the
complexity of control structures, as we can see by a
comparison of the original and the refactored version of the
method example. The new version with its Cyclomatic Complexity Number of 5 is much easier to read and understand.

IEEE Transactions on Software Engineering; A Complexity Measure; Thomas J. McCabe; 1976