/usr/portage

The state of meta programming in PHP 11

Quoting Wikipedia

Metaprogramming is the writing of computer programs that write or manipulate other programs (or themselves) as their data, or that do part of the work at compile time that would otherwise be done at runtime

Metaprogramming is quite an interesting sub-discipline and knowing about certain techniques and tools allows you to cut corners quite dramatically for certain tasks. As always, don’t overdo but to find out when you are overdoing, first start doing, get excited, overdo, find out the right dose. Let’s have a look at what kind of tools you have available in PHP to solve typical meta programming problems.

What kind of questions can meta programmatic APIs answer?

I would group metaprogramming into three sub areas: type introspection, lower level syntax inspection and metadata management. Typical type introspection questions are:

On a lower level you typically interact with a certain kind of syntax tree to answer questions like:

A third category is adding metadata to the declared types: Java, C# and a few others have first-class Annotation support for this kind of things but PHP only has user space solutions so far. A few things you need metadata for:

The toolkit

Reflection APIs

PHP core delivers 2.5 key APIs for meta programming. The first one is ext/reflection. You can create reflection classes form a lot of things, functions, classes, extensions and use them to make programming assumptions about the APIs you are introspecting.

A simple example to find out the number of required parameters for each method in the class DirectoryIterator:

<?php
$class = new ReflectionClass('DirectoryIterator');
foreach ($class->getMethods() as $method) {
    $numberOfRequiredParameters = $method->getNumberOfRequiredParameters();
}

Refection is all nice and shiny, except when you don’t want to include everything you want to inspect. This is of interest if you inspect various source trees at once that declare duplicate symbols. To do so, there is PHP-Token-Reflection by Ondřej Nešpor. It’s a pretty nifty replacement for ext/reflection completely built in user land and on top of ext/tokenizer that even copes with invalid declarations. Additionally it fixes some oddities of the internal reflection API but tries to keep it as close as possible. I’ve played around with it a bit and I quite like it.

<?php
$broker = new TokenReflection\Broker(new TokenReflection\Backend\Memory());
$broker->processDirectory("path/to/src");
$class = $broker->getClass('MyClass');
foreach ($class->getMethods() as $method) {
   ...
}

Tokenizer

Another core API, this time much more low level, is ext/tokenizer. If enabled at compile time it allows you to parse PHP source code into a list of tokens. Because the API is so low level it is quite hard to use without a proper abstraction layer on top of it. Most of the successful projects built upon ext/tokenizer have built one. One of them is phpcs by Greg Sherwood that built an Token Stream abstraction on top of ext/tokenizer that allows much more convenient navigation in the token stream. Another one shipping its own token stream abstraction is pdepend by Manuel Pichler. Another noteworthy, standalone abstraction is php-manipulator.
For an example on how the raw API can be used, I once wrote this little script to apply a few transformations to source trees to ease converting source trees to PHP 5.4.

PHP Parser: a fully fledged AST parser for PHP

Between a high level API like Reflection and a low level API like ext/tokenizer there surely is a gap: what if I want to work on an AST data structure. There is this beautiful project PHP-Parser by Nikita Popov. This is quite interesting for more complex transformations like user space AOP, all kinds of static code analysis and so on. If ext/tokenizer feels way underpowered, have a look at this project.

Aspect oriented programming

While we are talking about AOP: a relative newcomer is PECL AOP that provides a quite simple API for aspect orientated programming in PHP. For Zend Framework 2 there is also an AOP module available. Let’s stick to AOP for a moment: for Symfony 2 there is JMSAopBundle by Johannes Schmitt. It provides basic AOP functionality for Symfony 2. JMSSecurityExtraBundle and JMSDiExtraBundle use it to provide annotation support for Symfony security bundle and the Symfony dependency injection component.

Metadata management

Traditionally, every docblock documentation parser rolled it’s own annotation system. This changed a little with the rise of Symfony and Doctrine 2. Doctrine 2 allows you to use annotations for persistence definition and Symfony allows you to use annotations for a lot of things (routes, security, etc.). While Doctrine still ships it’s own metadata handling component in doctrine-common, there is another library by Johannes Schmitt, Metadata that aims to consolidate metadata handling for PHP. The API of the Metadata library as well as the one of doctrine-common is quite simple: you have some sort of annotation reader that maps metadata information to classes. Think about this annotation:

<?php
use My\Annotation\Some;
/**
  * @Some(foo="bar")
  */
class MyClass
{}

This kind of annotation will map to an instance of My\Annotation\Some with the property $foo set to “bar”.

Radioactive, specialized or obscure

Ever dreamed of renaming functions, redeclaring classes and so on? Let us not discuss whether this is a good idea or not, but if you would like, look no further: there is runkit for that (I think this is the most current fork).

If you want to access the opcodes of a your code, Stefan Esser wrote bytekit for you (bytekit.org is no longer available, I only found Tyrael/bytekit and Mayflower/Bytekit). To make working with bytekit data a little more convenient, Sebastian Bergmann wrote bytekit-cli.

To register callbacks at every function call, there is funcall by Chen Ze and intercept by Gabriel Ricard.

One should not forget about xdebug by Derick Rethans that provides a quite specialized sub-sub-sub-discipline: code coverage analyis.

The future

PHP core itself could really use native support for annotations. This would fix little differences in how annotations are used nowadays by major projects. Another very interesting development is quite definitely PHP AOP. I would consider that a candidate for core inclusion at some point.

The userland libraries could see some consolidation and now that we have composer dependency management isn’t so much of a problem. Especially in the Symfony 2 world, reusing the same metadata framework would make totally sense. A first step is that Zend Framework 2 uses doctrine-common for annotations support.

Filed on 02-12-2012, 17:05 under , , , , , , , & eleven comments & no trackbacks

Proof of Concept: Binary packed UUIDs as primary keys with Doctrine2 and MySQL 8

The Problem

For a project I need non-guessable synthetic primary keys. I will use them to construct URIs and these URIs need to be non-guessable. If I would use the traditional way of doing so, going the down the route of integer primary keys with auto increments, or using a sequence table an attacker could easily increment or decrement the integer to find some similar items. Next idea was to use UUIDs or GUIDs. These identifiers are globally unique, so this would work for primary keys too. Reading some documentation on the topic brought up the interesting issue of space usage. Storing the UUIDs in a CHAR column would be a huge waste of space compared to an integer primary key. As primary keys are referenced in related table, this would be a huge issue. Finally I found a trick storing there binary representation in a BINARY column. Doing that in MySQL is fairly easy:

INSERT INTO items SET id = UNHEX(REPLACE(UUID(), '-', '');

Selecting a human readable reasult is easy too:

SELECT HEX(id) FROM items;

Achieving the same thing in PHP is pretty straightforward too. You need the PECL extension UUID (pecl install uuid) and pack()/unpack():

<?php
$uuid = uuid_create(UUID_TYPE_TIME);
$uuid = str_replace("-", "", $uuid);
var_dump(pack('H*', $uuid));
string(16) "?Irp??ߐ
                   )??m"

Converting them back into there hex representation is similar:

<?php
var_dump(array_shift(unpack('H*', $binaryUuid)));
string(32) "d2f268509db211df9010000c29abf06d"

Doctrine2 integration

Next step would be integration with Doctrine2. To do so, we need to create a custom mapping type. I’m not using Doctrine2 for database abstraction, but for it’s object relational mapping capabilities so I ignore portability and concentrate on MySQL.

<?php
namespace Lars\Doctrine2\Types\Mysql;
use Doctrine\DBAL\Types\Type;
use Doctrine\DBAL\Platforms\AbstractPlatform;
 
class BinaryType extends Type
{
    const BINARY = 'binary';
 
    public function getSqlDeclaration(array $fieldDeclaration, AbstractPlatform $platform)
    {
        return sprintf('BINARY(%d)', $fieldDeclaration['length']);
    }
 
    public function getName()
    {       
        return self::BINARY;
    }   
      
    public function convertToPhpValue($value, AbstractPlatform $platform)
    {
        if ($value !== null) {
            $value= unpack('H*', $value);
            return array_shift($value);
        }
    }
 
    public function convertToDatabaseValue($value, AbstractPlatform $platform)
    {
        if ($value !== null) {
            return pack('H*', $value);
        }
    }
}

Now we are introducing the new type to Doctrine2 somewhere in our setup logic:

<?php
use Doctrine\DBAL\Types\Type;
Type::addType('binary', 'Lars\Doctrine2\Types\Mysql\BinaryType');

One issue I stumbled upon was the default Doctrine2 does. With MySQL it maps binary types to intermediate blob types (in the Doctrine2 type system). This default behavior is not configurable, so we need to patch Doctrine\DBAL\Schema\MySqlSchemaManager. I’m sure there is a more elegant way and I would love to receive some remarks here:

            case 'tinyblob':
            case 'mediumblob':
            case 'longblob':
            case 'blob':
            /** 
             * Commented out to make our custom mapping work
             * case 'binary':
             */         
            case 'varbinary':
                $type = 'blob';
                $length = null;
                break;

Last part is our entity:

<?php
namespace Lars\User\Domain;
 
/**
 * @Entity
 * @Table(name="user",indexes={@Index(name="user_email_idx",columns={"user_email"})})
 * @HasLifecycleCallbacks
 */
class User
{
    /**
     * @Id
     * @Column(type="binary",length=16,name="user_id")
     * @GeneratedValue(strategy="NONE")
     */
    protected $_id;
 
    /**
     * @Column(type="string",length=32,name="user_email")
     */
    protected $_email;
 
    public function changeEmail($email)
    {
        $this->_email = $email;
        return $this;
    }
 
    public function getId()
    {
        return $this->_id;
    }
 
    /**
     * @PrePersist
     */
    public function generateUuid()
    {
        $this->_id = str_replace('-', '', uuid_create(UUID_TYPE_TIME));
    }
}

The important part here is the createUuid()-method to generate the UUID once before persisting the domain object. With GeneratedValue(strategy="NONE") we told Doctrine not to generate the ID by itself and with HasLifecycleCallbacks we configure Doctrine to scan for lifecycle callback methods, so that generateUuid() will be called before persisting the entity.

Fetching an object by ID is as easy as ever, but don’t forget to convert the ID:

$user = $em->find(
    'Lars\User\Domain\User',
    pack('H*', '16aec29e9db011df8013000c29abf06d')
);

Further ideas

The whole UUID should be refactored towards an UUID value object to encapsulate UUID creation and binary conversion.

Filed on 01-08-2010, 23:11 under , , , & eight comments & no trackbacks