Leaflet: PHP development in 2009 6

What do you think is the state of the art in 2009?

Filed on 02-05-2009, 19:07 under , , & six comments & no trackbacks

Speaking at IPC Spring ’09 2

I’m giving a talk at International PHP Conference Spring Edition in May in Berlin. I will talk about Beatiful (PHP) code: code your mother domain experts can read, code that talks, code that reveals intention and what does it mean on the architecture side. So, a somewhat impractical and detached topic. I’m looking forward to see you in Berlin.

Filed on 10-03-2009, 00:12 under , , & two comments & no trackbacks

Not having globals state doesn't mean you're doomed 11

Clear Dependencies

A pretty popular myth about avoiding global state (singletons, multitons, registries, global variable, static variables/methods) is that it results in creating widely used objects more often than needed. The most common example in this case is a database connection. We try to avoid global state to let objects express their dependencies clearly: the object constructor should be as readable as “give me this, give me that and I will work”. Let’s talk about a situation where we instanciate a relatively complex set of domain objects including a service layer. For the example, we assume that we read an existing customer. We use the CustomerServiceLayer to retrieve the Customer, which uses the CustomerRepository to create the Customer object which needs a DatabaseConnection connection and passes a strategy (NameFormattingStrategy) to format the name of the customer to the Customer object and a CustomerDataMapper to allow the Customer object to save itself. Here are the constructor signatures of the involved components:

class Customer ...
    public function __construct(
        NameFormattingStrategy $nameFormattingStrategy
class CustomerRepository ...
    public function __construct(
        DatabaseConnection $connection,
        CustomerDataMapper $dataMapper,
        NameFormattingStrategy $nameFormattingStrategy
class CustomerServiceLayer ...
    public function __construct(CustomerRepository $repository)
class CustomerDataMapper ...
    public function __construct(DatabaseConnection $connection)
class DatabaseConnection ...
   public function __construct(
       string $host,
       int $port,
       string $username,
       string $password,
       string $database

So, all we do in the page controller, may it be a (page) controller or a plain PHP file is instanciating the service layer and its dependencies:

$databaseConnection = new DatabaseConnection(...);
$customerDataMapper = new CustomerDataMapper($databaseConnection);
$nameFormattingStrategy = new NameFormattingStrategy();
$customerRepository = new CustomerRepository(
$serviceLayer = new CustomerServiceLayer($customerRepository);
$customer = $serviceLayer->getCustomerById((int)$_GET['customer_id]);
... pass it to the view, do nifty things ...

If other components, like the OrderRepository needs a database connection, just pass it to it. No need to let the order repository know how to get it. It is just there. In your unit test you can passed a mocked repository, a mocked database connection and a mocked data mapper depending on what particular part of the chain you are going to test. By the way: the heavy construction work could be easily passed to a number of factories just responsible for creating your objects. These factories are easily testable too as the only assertion made would be is the returned object correctly configured.

Filed on 15-02-2009, 03:03 under , , , & eleven comments & no trackbacks

A Tech Book a Day 3

When it comes to reading I’m coming from a different corner: I read a lot of philosophical books from philosophers like Adorno, Marcuse, Marx before I really started reading tech books. These books are hard to read, especially the works of the Frankfurt School are notorious for their specific language which is sometimes hard to decipher. Tech books are exactly the opposite: while there are entertaining technical writers with a good style a lot use a pretty common and dry vocabulary – which is a good thing. The thing is, you don’t really need to read tech books.

Novels, philosophical – and more general humanistic – works are much harder. They often transport semantics in metaphors you don’t get when just reading. You have to read a sentence more than once to get it. But when you read a book about Design Patterns, your favourite book on PHP or something similar non-algorithm related you can just scan the book for news, read and understand the code samples and go on, page per page. Scan through the page, take notes but just note what’s new to you. If it is a reference, mark the important parts with stickers. Ignore the rest, remember, don’t read, just scan.

Additionally technical books tend to have a foreword and a foreword for the second edition and a forward for the third edition and a lot of testimonials attesting how good this book is (hey, I already purchased it, don’t sell it to me again). So the real content starts at page 40. Excluding white pages the book that was 400 pages long might shrink to 300 pages. If you need 30 seconds per page that means you can read the book in two and a half hours. And 30 seconds per page a a pessimistic estimation. With this technique it is possible to read a technical book in a day without stress and totally relaxed in a week. That means you could read 52 tech books a year. I’m lame, I just read scanned around 20 last year.

Filed on 02-02-2009, 18:06 under , , , , , & three comments & no trackbacks

Using IMAP semantics to control web publishing 5

Every time a new comment or trackback arrives here I get a new mail. Of course I read mails over IMAP, as I utilize a number of different clients. It always felt a bit clumsy to click a link to decide what happens with the new comment: should it be approved or deleted? At the end I decide twice a time what happens with the comment. First I click the link for approval or deletion, second I decide what to do with the notification mail. Why not couple these options? Every unread comment mail represents a moderated comment, if I delete the mail, the comment is deleted, if I just mark the mail as read, it will be approved. The good thing is, we have IMAP so the blog comment moderation daemon would be just another IMAP client that watches a single mail directory. Wouldn’t that be cool? Maybe Garvin would like that for Serendipity.

Filed on 10-01-2009, 00:12 under , , , & five comments & no trackbacks

Seven Things 1

I’ve got tagged by Brian DeShong and Manuel Pichler to write down seven facts about me. Quick and easy:

So here are my nominated victims:

Filed on 03-01-2009, 13:01 under , , & one comment & two trackbacks

Antipattern: chaining stateless protocol requests 3

As we all know, HTTP is a stateless protocol. We do all sort of hacks to add state, like ext/session in PHP. While such hacks work great for a lot of use cases, we should remind ourselves that they are hacks. There is a phenomenon of state creep: coupling unrelated HTTP requests. Think of a page that references a thumbnail in an <img/>-tag and the picture is generated as needed: it would be possible to generate that image in the context of the request that embeds that image. So the template calls a helper to generate the thumbnail and the thumbnail is generated in the file system.

While this works well for a single host, your personal weblog about cooking and cats, it won’t work for something serious. When you start load balancing between two webserver nodes you are set on fire as you can’t guarantee that the image is present on the correct node (beside you are generating the image n times where n is the number of nodes). The solution is not that hard: pregenerate all the images with a queuing system and display “This image is currently not available”-placeholders as long as they are not ready or – in case of little image uploads – generate them when uploading the image. The other option is to generate them on the fly when they are requested. If you do the latter, do it in the context of the request that tries to receive the image, not in the embedding context (the page that embeds the image). Generating on the fly means that you deliver your files through PHP or something similar: this is fine as long as you have an HTTP accelerator in place.

One of the systems that does it in the way described above is Drupal. I’ve implement MogileFS for image storage and retrieval for Drupal and let me say, it was not a pleasure.

On a side note: HTTP 1.1 allows resources to be fetched in parallel, which makes generating images in the wrong context even worse from a user experience point of view, as the page will not show up until each thumbnail is generated.

Filed on 24-09-2008, 20:08 under , , , & three comments & no trackbacks

8 Hints out of Testing-Turmoil 6

  1. Have a continuous integration solution in place. Really. If you don’t, you just burn money by writing tests. I would go so far and say, if you don’t have continuous integration, you should stop writing unit tests and do click testing. Let your CI system generate API docs, high level docs, code coverage report, testdox and every statical analysis info you generate.
  2. The definition of “tests pass” is “tests pass on the continous integration system”. “Works for me” has neither a place in the bugtracker nor everywhere else.
  3. If you can’t test it, the architecture is most likely wrong (exceptions are sessions and caching related code which is generally hard to test). Testability should be your main concern when writing code. What’s the use of fast or wonderful looking code, if you can’t repeatable prove it is working?
  4. Prefer method calls over annotations. A typo in setExpectedException will trigger a transparent error, while a typo in expectedException will lead to Obscure Test, and most likely a Mistery Guest.
  5. Run the whole test harness twice. This will hellp to identify setup/teardown bugs. Create a random test suite to identify the hard to track mistakes.
  6. Run your testsuite really often. We run it with 15 seconds delay every minute and I’m pretty happy with it.
  7. Use good test names that describe the behavior of the unit. The behavior is not the unit you test itself, that’s what I see in the code, it is something like “calling register changes the status of the user to foobar” so the good test name would be “testRegisterChangesTheStatus …”.
  8. Aim for 100% code coverage. 95% is nothing to be proud about, I can guarantee, the missing 5% will be the hardest part.

Filed on 19-09-2008, 17:05 under , , , & six comments & no trackbacks

Testing PHP 5.3 alpha1 3

Finally Johannes Schlüter baked a first alpha-tarball for PHP 5.3. The new version contains a huge amount of new features, like closures, namespaces and late static binding. Such a huge amount requires thorough testing: if you are using a PHP application you would like to see fully working with our brand new version or you are developing a PHP application, this is your chance to make sure everything will go smoothly. If you are a web hosting provider, do your performance benchmarks now!
If you are, accidentally, using the Gentoo Linux distribution, I have something for you: in my personal PHP overlay you can find an ebuild for PHP 5.3.0_alpha1. A few warnings: currently, ext/fileinfo does not compile because of #45636 and of course I did not test all possible USE-flag combinations. If you experience problems with it, just leave a comment here.

Filed on 03-08-2008, 19:07 under , , & three comments & no trackbacks

Specific env vars for Gentoo packages 0

Since Gentoo Portage introduced the package specific configuration in /etc/portage there was one thing I always missed: specifying environment variables per package. Some environment variables you might want to specify per package are CFLAGS, CXXFLAGS and FEATURES. Especially when you do debugging, some packages should not be stripped, which is the perfect use case for the FEATURES environment variable. While specifying USE-flags and keywords per package, the rest is not that easy. Christian Hoffmann dropped me a link to this mail: the tip there works fine. I’ve played around with it and implemented it slightly differently: first, I would like to be informed which environment files are read and second I changed the resolution order so that the specific configuration inherits from the more generic. So this is what my /etc/portage/bashrc looks like now:
[geshi lang=bash] for conf in ${PN} ${PN}-${PV} ${PN}-${PV}-${PR}; do env=/etc/portage/env/${CATEGORY}/${conf}.env if [[ -f ${env} ]]; then einfo "Reading specific environment from ${env}" . ${env} fi done [/geshi]
For dev-lang/php-5.2.6-r3 I can use three different files to customize the build environment: /etc/portage/env/dev-lang/php.env would apply for all PHP ebuilds, /etc/portage/env/dev-lang/php-5.2.6.env for all revision of the ebuild for 5.2.6 and /etc/portage/env/dev-lang/php-5.2.6-r2.env for the exact ebuild. My /etc/portage/env/dev-lang/php.env file now looks like this to disable stripping the binaries after emerging them and keeping the working directory for better backtraces:
[geshi lang=bash] FEATURES="${FEATURES} nostrip keepwork" [/geshi]

Filed on 02-08-2008, 15:03 under , , , & no comments & no trackbacks

Antipattern: the verbose constructor 48

Constructors are often used to shortcut dependency injection and parameter passing on instantiation. This is a valid practice and often leads to shorter code. Consider the following example (a simple value object, often used to not mess around with floats and to keep currency and amount together):

class Money
    protected $_amount;
    protected $_currency;
    protected $_divisor;
    public function __construct(
        $amount = null, $currency = null, $divisor = null)
        if ($amount !== null)
        if ($current !== null)
        if ($divisor !== null)
    ... setter and getter ...

Now consider instantiating this object. Instead of creating a new instance of “Money” and calling three setter, everything can be done compactly in the constructor.

bc . $money = new Money(13200, ‘EUR’, 100);

So for the money object this works pretty well. The code is easy to read, but wait, the first argument can be grasped easily, the second too, but the third? It is not too obvious that it is a divisor is passed. An alternative would be changing the constructor to accept an array. This is a replacement for true named arguments, as e.g. Python supports. Solar uses that a lot, as well as the Zend Framework.

$money = new Money(
        'amount' => 13200,
        'currency' => 'EUR',
        'divisor' => 100

Much better readable but does your IDE code completion works? And what happens if you pass “amoµnt”, because your fingers are as clumsy as mine? Exactly, the parameter will be silently ignored.
But look at this:

$money = new Money();

It is at least equally short, readable, your IDE works and if you have problems with the dimensions of your keys on your keyboard (they are too small, it has nothing to do with your fingers) you will be warned. But we could even have an even shorter example while maintaining the readability. With fluent interfaces we would get the following:

$money = new Money();

Wonderful! If you want, you can add a newline between each object operator and you would have the same amount of lines but less dense code (sad that we don’t have fluent constructors, isn’t it?). Sometimes setters are so elegant.

So until know one thing should be clear: it is not just about easily writing the code, but about the next guy understanding it too. Because you never write code for yourself. Never. But let’s investigate some real live example. I work with a framework that allows me to define really nifty business logic by just sticking together a bunch of fields and every field having a bunch of validators and filters attached.

class User extends Model
    protected function _define(Definition $definition)
        $definition->addField(new StringField('username', true, null, true));
    protected function _getStorageClass()
        return 'UserStorage';

All the time I write such a definition, I need to look into the code to check the order of the parameters. I can remember the first parameter, but the rest is too similar. To explain it: the second parameter specifies whether the field is required, the third expects a default parameter and the fourth indicates whether the value can be changed after it has been set once. I’ve talked about filters and validators, right?

class User extends Model
    protected function _define(Definition $definition)
        $definition->addField(new StringField('username', true, null, true))
            ->addValidator(new UniqueUserValidator())
            ->addFilter(new LowercaseFilter())
            ->addValidator(new RegexValidator('/^[a-z]+$/'));

Definition::addField() returns the passed field object to allow adding validators and filters. What works for validators and filters, should work for the rest too, shouldn’t it?

class User extends Model
    protected function _define(Definition $definition)
        $definition->addField(new StringField('username'))

I admit, a bit more code to write, but a huge improvement in readability and therefore in maintainability. Other variants, where setter are not a good solution is to create an expressive factory. We e.g. have a Criteria object that creates and orders Criterion objects internally. Because we don’t have a fluent constructor, we have a static create-method for the Criteria object.

$criteria = Criteria::create('User')->field('id')->equal(1);

The alternative with just utilizing the constructor would be horribly to read and would have limitations regarding the parameter parsing capabilities (except if func_get_args() is used, which is totally the opposite of the paradigm of strict APIs). But back to the constructor only example:

$criteria = new Criteria('User', array('id' => 1));

And how would you express “id not equal 1” with it? So that’s where expressive factories are an alternative.

Constructors, as like any other method, should have as less parameters as possible but as much as needed. Obvious. The constructor should only allow setting vital information for the object (if the object has a name, there is a good chance, that the name is the parameter of the class’ constructor because it is considered vital). And the ease of use depends heavily whether the parameters passed can be intuitively distinguished by looking at there values. As well when the code is written first time as for maintaining it for the rest of your life.

(There are a bunch of other tricks to make parameters more readable, like using class constants as parameters, but this is out of scope of this article).

Filed on 31-07-2008, 01:01 under , , & 48 comments & no trackbacks

Over abbreviated 10

© Giant Ginkgo

Matthew Weier O’Phinney announced Zend’s naming scheme for the Zend Framework from the point where PHP 5.3 namespaces are used. The issue is, that the PHP parser does not allow class Abstract, neither interface Interface as both “abstract” and “interface” are reserved keywords. So Zend suggests prefixing interfaces with “I” and abstract classes with “A”. Hungarian notation for classes and interfaces.

One of the bullet points in the list of “what makes a name a good name?” is and will be forever “as short as possible, as verbose as needed”, other points are “you must understand the name without studying specific rules before”. The last is why hungarian notation sucks so tremendously. The IFoo/ABar violates two of those criteria: first it is not as verbose as it could be with just a few keystrokes more: AbstractBar would work fine and is much clearer. At second it introduces a special notation you have to grasp before. While AbstractBar would be as descripive as possible, ABar is cryptic for those who are not lucky enough to practice Python programming.

If we are at it, the scheme makes it impossible to have grammatically correct names: IFoo would be read as InterfaceFoo which really should be FooInterface. And no, the fix is not FooI.

Filed on 30-06-2008, 17:05 under , , & ten comments & one trackback

Join us 0

You were a bit bored lately: you wanted to have time and infrastructure for unit tests and continuous integration but it was “too expensive”, you wanted a more grown up, professional structure for development, a coding style, a build system, lots of books for training, augmentative thinking about architecture and object orientation or – more general – work you can both take pride in and sleep well with. This is how we want to develop software (and we are close to it and continuously improving). We are offering two positions: senior developer and another one more suited for career starters.
You will work on various projects, including a not yet released open source framework based on the Zend Framework (and yes, we are using PHP 5.3 for development). You should be fluent in PHP 5, at least know what unit tests are and you have a good understanding of object oriented programming. And no, you don’t need to know who’s invented the pepper mill or the handcar or PHP.
Additionally benefits include table football, a Wii, free water and coffee and silent workplaces.
So the ball’s in your court: if you are interested, drop me a message.

Filed on 04-06-2008, 16:04 under , , & no comments & no trackbacks

Security "to go"? 5

I’m a huge fan of PHP-IDS. Mario Heiderich and Christian Matthies did an incredible job polishing this tool, adding new features and trying to catch every esoteric attack signature. However I have the feeling there is some confusion (german) about what intrusion detection is for. On a server, intrusion detection is used to diagnose a break in. First of all you do everything not to let your server go down. You have a firewall, you try not to expose services to the outside, you do SSH with port knocking, you put a risky service into jail or chroots, you use the Suhosin patchset and so on. There are various strategies how to harden a server. The hardening is the barrier against break-in attempts.
If the hell freezes, the intrusion detection mechanism comes into play to make sure the attempt is not overseen and the machine does not become yet another zombie in a bot net. PHP-IDS is an intrusion detection tool on the application level. Application firewalls know about a certain protocol and its structure (e.g. HTTP) and inspect the protocol to detect attack patterns. Some of them are even capable of learning from usual request signatures and enforcing rules based on the learned data. There are various commercial products to achieve application firewalling. PHP-IDS does the same for free and sits directly on the webserver in the scope of the application. For personal usage or projects with a lower budgets who can’t effort expensive products, it might be a good supplement. Beside being a supplement, application firewalls are a valid use when security becomes an urgent problem: a lot of heavily flawed software is designed (often it is not even designed) and developed without a developer even heard about security: “Yes you can inject HTML, that’s a feature!”, “‘ OR true/* lists you every item, isn’t that cool?”. If such projects become popular, application firewalls might be an option to hotfix the disaster. But nevertheless the application needs to be fixed.
The very immanent issue with application firewalls is that there is no other place to know exactly what’s proper incoming data for the application – except in the application itself. That’s why application firewalls can never be perfect. IDS is needed for the 2% the developer forgot. So it is not like coffee to go. It is like having the coffee and adding milk or sugar. Having milk without coffee seems pointless to me anyway.

Filed on 20-05-2008, 21:09 under , , , & five comments & one trackback

PHP Unconference Hamburg - Day&nbsp;1 1

The first day at the PHP Unconference in Hamburg was quite nice. The day started with a slightly confused registration, followed by the notorious voting for sessions. Our planned talk was magically lost but I was too tired to object.
I attended two sessions, “Security Development Lifecycle”, a process model developed by Microsoft to strengthen the focus on security during development. While the entire process is pretty complex, there are a few ideas and basic rules that are worth adapting. Treating security problems as show-stoppers should be obvious, classifying attack surfaces, scenarios and privacy impacts is a thankless job, regular security training for the development team is a good idea, but do you really do it? The second session was “Ask the core developer” by Johannes Schlüter. It ended up pitying one another and wining a bit about missing innovation in core, an impression I don’t share.
The interesting parts were not the sessions but the corridor conversations. It’s always interesting to hear how others do PHP.

Filed on 27-04-2008, 02:02 under , , & one comment & one trackback

↖ Older Entries Newer Entries ↘