Get up to 80 % extra points for free! More info:

Test data generator for PHP - Design, entities and the core

Welcome to the first lesson about creating random database data generators. This tool or script, if you will, generates various (random) data combinations and adds them into the database. This course assumes that you're familiar with the PDO database driver and OOP. Imagine that you're creating an application which communicates with a database. In order to make the application work properly, you need to test it on some data. A specific example would be a user list once you reach the beta testing phase. At that moment, our script will come in handy and fill the ("user") database table automatically with random data.

Usage

Let's go a bit further into detail before we move on. Imagine the "user" data table's rows. These rows were automatically generated by the application based on its settings (you can only see a few users when there are actually 500 of them).

Generated users - Libraries for PHP

To generate test users, the following bit of code will suffice:

$generator->generate("user", 500, array(
    "user_id" => "Id",
    "first_name" => "FirstName",
    "last_name" => "LastName",
    "email" => "Email:first_name,last_name",
    "telephone" => "Telephone",
    "date_of_birth" => "Date:1970-01-01,2013-01-01",
    "credit_card_expiration" => "Date:2015-01-01, 2030-01-01")
);

The design

We'll take things seriously and create this tool using an object-oriented approach. We'll think about the design before we move on. Our script will be very robust and easily extendable. In case the terms "robust" and "extendable" terms don't ring any bells, take a look at the database table again. What will we generate? First names, last names, dates, numbers, email addresses... To make it a bit more challenging, we'll also be able to declare relations between tables. Now you know what we mean by "robust" and "extendable". Think, how come the programmer who uses the script can't implement his own data entities? What if our script can't generate ZIP codes and the programmer needs some?

In this article, I'm going to use the following terms:

  1. Entity - A unit for generating values (e.g. first name, last name, telephone number, ...)
  2. Core - Our script's core, which will work with the database, all the tables, and our entities.

Entity

Entities have to share information with the core. Perhaps you though that random values are all we needed, not so fast. You should know that Ids are entities as well. When you use primary keys in your database table, you know that you don't specify these columns when inserting data. Therefore, why would our code have to determine which SQL type to use if our entities can do that for us?

IRandomizableItem

We'll prepare an IRandomizableItem interface for our entities. The interface will require two parameterless methods:

  • randomValue() returns a random value
  • columnSqlType() - returns a string with the SQL type (e.g. "varchar(55)").

However, that's not all.

Let's say we'll bind two columns as an entity. Besides, why not? It works with columns and values as well. Therefore, we'll pass several parameters to all entities in their load() method, which they could use to work with the data. The first parameter will be the current item being inserted into the database. The second one will contain parameters which will be passed when defining the data type (we'll get to it later). The third parameter will be an array of all of the tables and their values. The fourth, the last one, will contain definitions of all columns of all of the tables.

Every entity has to have a load() method, however, since its parameters will differ (not every entity needs all of it), we can't add the method to our interface. Finally, every entity will have two public properties - $insertToDefinition and $insertToTable. $insertToDefinition specifies whether the column is a part of the table definition (to create a table), and the $insertToTable specifies whether the value should be inserted into the table when the time comes. These properties are the reason we pass the parameters to the load() method rather than the constructor. The interface will be as follows:

interface IRandomizableItem
{
    function randomValue();
    function columnSqlType();
}

FirstName

Now let's implement the first entity. We'll start with the first names. Create a FirstName class which implements the IRandomizableItem interface.

Note: I recommend using an autoloader.

Set the $insertToDefinition and $insertToTable class properties to true since the first name will be present everywhere. Our class will have an array of names to pick a random name from. Go ahead and add the static $dataNames array along with some values:

class FirstName implements IRandomizableItem
{
    static $dataNames = array("Michael", "David", "Andrew", "Matthew", "Lewis", "John", "Adam", "Thomas");
    public $insertToDefinition = true;
    public $insertToTable = true;

    function randomValue()
    {
    }

    function columnSqlType()
    {
    }
}

The randomValue() method will pick one random value and the columnSqlType() method will return the SQL type string:

function randomValue()
{
    return self::$dataNames[rand(0,count(self::$dataNames) - 1)];
}

function columnSqlType()
{
    return 'varchar(64) COLLATE utf8_general_ci';
}

Don't forget to implement the load() method (leave it empty):

function load() {}

The following entities will be similar, but let's focus on the core for now.

The core

Add a file to the project, name it SampleDataGene­rator.php, and create a class of the same name in it. Also, add a DatabaseManager class which will represent a simple database wrapper, and a DataRow class which is a container structure for creating new database rows. It'll be the object which is passed to entities as their first parameter.

SampleDataGenerator

Our SampleDataGenerator will have two public properties - $tables and $tableColumns. The $tables property will be an associative array which will map table names to an arrays of DataRow instances. The $tableColumns property will be an associative array which will map table names to an associative array of the [columns] => column data type. Simply put, $tableColumns will contain nested associative arrays.

class SampleDataGenerator
{
    public $tables = array();
    public $tableColumns = array();
}

The SampleDataGenerator will provide the generate() and save() methods. The save() method doesn't accept any parameters and only sends data to the database. The generate() method accepts 3 parameters: table name, the number of rows, and the columns. The table name and the number of rows are self-explanatory. The $columns parameter, as you know, will be in the following format [column name] => (string) column type with parameters. Also, notice the type with parameters (it will be declared as type:parameters). The parameters will always be declared as a string and will not have a separator defined. However, every entity can "define" and implement its own. If you're now lost due to all of these technical terms, don't worry, the project will be available for download later on. Take this code for instance:

$generator = new SampleDataGenerator();
$generator->generate("user", 500, array(
    "user_id" => "Id",
    "first_name" => "FirstName",
    "last_name" => "LastName",
    "email" => "Email:first_name,last_name",
    "telephone" => "Telephone",
    "date_of_birth" => "Date:1970-01-01,2013-01-01",
    "credit_card_expiration" => "Date:2015-01-01, 2030-01-01")
);

$generator->save();

On the first line, we create a generator. Then, we tell it to create a "user" table and fill it with 50 people. It should fill in the user_id, first_name, last_name, email, telephone, and dates in the past and future. Notice the email where we pass column names as parameters. The email will use them to generate its value later. Then, we specify a range to the dates. The resulting random date will be within these boundaries. Last of all, we send it to the database. First and foremost, let's create a class representing a row - DataRow.

DataRow

A DataRow will need table columns, information about where it belongs, and all of the tables. We'll pass these values to it through its constructor. Last of all, we'll need the $values property which will be an associative array of the [columns name] => [columns value] format.

public $tables;
public $columns;
public $values = array();

function __construct($columns, $tables)
{
    $this->columns = $columns;
    $this->tables = $tables;
}

The DataRow will also have a generate() method which won't accept any parameters and will generate row values. The method will iterate over the $columns using a foreach loop and will retrieve the name and the type from them.

function generate()
{
    foreach ($this->columns as $columnName => $columnType)
    {
    }
}

The column type, as we've already mentioned, contains the name and the parameters. The name is the name of our entity and the parameters are optional. These two parts are separated by a colon ":". We'll assign values from the type to the $generatorName and $paramsForGenerator properties. The default value of the $paramsForGenerator will be an empty string:

$generatorParts = explode(":", $columnType);
$generatorName = $generatorParts[0];

$paramsForGenerator = "";
if (isset($generatorParts[1])) {
    $paramsForGenerator = $generatorParts[1];
}

I'm sure you're wondering how we'll instantiate an entity from a string? PHP has a beautiful syntax for that. We simply write new $variableName(parameters) and it creates a new instance of the class whose name is stored in the $variableName. As you can see, you can pass parameters as we. However, we know that we pass parameters for entities through the load() method and not the constructor.

$generator = new $generatorName();
$generator->load($this, $columnName, $paramsForGenerator, $this->tables, $this->columns);

We should also verify whether the user isn't trying to perform a script injection and instantiate something that's not an entity. Doing so is very simple. The entity has to implement the IRandomizableItem interface, we can verify that it does using the instanceof operator which returns whether the left operand implements the right operand. If this condition doesn't apply (the instance is not an entity) we terminate the script and print an error message using the die() function.

if (!$generator instanceof IRandomizableItem)
    die("Error: PHP injection attempt");

Finally, if the columns want its value to be inserted into the database we go ahead and do so.

if ($generator->insertToTable)
    $this->values[$columnName] = $generator->randomValue();

Our row is now complete. Let's get back to the generator and implement generating rows. Create a generate() method in the SampleDataGenerator with the $table, $count, and $columns parameters. First, we'll store the columns for the generated table.

$this->tableColumns[$table] = $columns;

Then, we use a loop to create individual rows. In every iteration, we create a new row, pass the necessary parameters to it, and call the generate() method on it. Last of all, we add the row into the table.

for ($i = 0; $i < $count; $i++)
{
    $random = new DataRow($columns, $this->tables);
    $random->generate();
    $this->tables[$table][] = $random;
}

We are now able to generate data. That's all for today, however, there is still more work to be done. In the next lesson, , we'll implement a specialized database wrapper and generate a table full of names :)


 

Previous article
Finishing the Image library for PHP
All articles in this section
Libraries for PHP
Article has been written for you by David Capka Hartinger
Avatar
User rating:
No one has rated this quite yet, be the first one!
The author is a programmer, who likes web technologies and being the lead/chief article writer at ICT.social. He shares his knowledge with the community and is always looking to improve. He believes that anyone can do what they set their mind to.
Unicorn university David learned IT at the Unicorn University - a prestigious college providing education on IT and economics.
Activities