RSS

Sorting 2D-arrays in PHP – anectodes and reflections

One of the first things you might run into as a PHP developer is having to sort a two dimensional array (table) by an arbitrary field. Consider the following array for instance:

$customers = array(
    array("name" => "David", "age" => 32),
    array("name" => "Bernard", "age" => 45)
);

What if you want to sort it so that Bernard ends up on top instead? I ended up writing this function:

static function sort2d_asc(&$arr, $key){
    //we loop through the array and fetch a value that we store in tmp
    for($i = 0; $i < count($arr); $i++){
        $tmp = $arr[$i];
        $pos = false;
        //we check if the key we want to sort by is a string
        $str = is_numeric($tmp[$key]);
        if(!$str){
            //we loop the array again to compare against the temp value we have
      for($j = $i; $j < count($arr); $j++){
        if(StringManip::is_date($tmp[$key])){
          if(StringManip::compareDates($arr[$j][$key], $tmp[$key], $type = 'asc')){
            $tmp = $arr[$j];
              $pos = $j;
          }
          //we string compare, if the string is "smaller" it will be assigned to the temp value  
        }else if(strcasecmp($arr[$j][$key], $tmp[$key]) < 0){
            $tmp = $arr[$j];
            $pos = $j;
        }
      } 
    }else{
        for($j = $i; $j < count($arr); $j++){
        if($arr[$j][$key] < $tmp[$key]){
            $tmp = $arr[$j];
            $pos = $j;
        }
      }
    }
    if($pos !== false){
        $arr[$pos] = $arr[$i];
        $arr[$i] = $tmp;
    }
  }
}

You pass in the array you want to sort as &$arr and the key you want to sort by as $key and the array get sorted in an ascending fashion. This works OK as long as you have a two dimensional array like the one above. But it wont work if your array looks like this:

$customers = array(
    "cus1" => array("name" => "David", "age" => 32),
    "cus2" => array("name" => "Bernard", "age" => 45)
);

This array does not have numerical keys, the results using the above mega-function will not be reliable. Usually this is not a problem since 2D arrays retrieved from the database will be numbered numerically. The above function actually worked fine until I wanted to sort an array like the one just above. It wouldn’t work of course so I reviewed the manual and the array_multisort function. I realized that it can be used to sort arrays just like I wanted. This is the result:

static function multi2dSortAsc(&$arr, $key){
  $sort_col = array();
  foreach ($arr as $sub) $sort_col[] = $sub[$key];
  array_multisort($sort_col, $arr);
}

The crux is that we have to create the 1D array we want to sort by on the fly, once we have this array it can be used to sort the parent 2D array by passing it as a second argument. This function will work on both of the customer arrays.

 
Leave a comment

Posted by on June 8, 2011 in Mixed, PHP

 

How to Debug in PHP

This article breaks down the fundamentals of debugging in PHP, helps you understand PHP’s error messages and introduces you to some useful tools to help make the process a little less painful.

Doing your Ground Work

It is important that you configure PHP correctly and write your code in such a way that it produces meaningful errors at the right time. For example, it is generally good practice to turn on a verbose level of error reporting on your development platform. This probably isn’t such a great idea, however, on your production server(s). In a live environment you neither want to confuse a genuine user or give malicious users too much information about the inner-workings of your site.

So, with that in mind lets talk about the all too common “I’m getting no error message” issue. This is normally caused by a syntax error on a platform where the developer has not done their ground work properly. First, you should turn display_errors on. This can be done either in your php.ini file or at the head of your code like this:

<?php
ini_set('display_errors', 'On');

Next, you will need to set an error reporting level. As default PHP 4 and 5 do not show PHP notices which can be important in debugging your code (more on that shortly). Notices are generated by PHP whether they are displayed or not, so deploying code with twenty notices being generated has an impact upon the overhead of your site. So, to ensure notices are displayed, set your error reporting level either in your php.ini or amend your runtime code to look like this:

<?php
ini_set('display_errors', 'On');
error_reporting(E_ALL);

Tip: E_ALL is a constant so don’t make the mistake of enclosing it in quotation marks.

With PHP 5 it’s also a good idea to turn on the E_STRICT level of error reporting. E_STRICT is useful for ensuring you’re coding using the best possible standards. For example E_STRICT helps by warning you that you’re using a deprecated function. Here’s how to enable it at runtime:

<?php
ini_set('display_errors', 'On');
error_reporting(E_ALL | E_STRICT);

It is also worth mentioning that on your development platform it is often a good idea to make these changes in your php.ini file rather than at the runtime. This is because if you experience a syntax error with these options set in your code and not in the php.ini you may, depending on your set up, be presented with a blank page. Likewise, it is worth noting that if you’re setting these values in your code, a conditional statement might be a good idea to avoid these settings accidentally being deployed to a live environment.

What Type of Error am I Looking at?

As with most languages, PHP’s errors may appear somewhat esoteric, but there are in fact only four key types of error that you need to remember:

Syntax Errors

Syntactical errors or parse errors are generally caused by a typo in your code. For example a missing semicolon, quotation mark, brace or parentheses. When you encounter a syntax error you will receive an error similar to this:

Parse error: syntax error, unexpected T_ECHO in Document/Root/example.php on line 6

In this instance it is important that you check the line above the line quoted in the error (in this case line 5) because while PHP has encountered something unexpected on line 6, it is common that it is a typo on the line above causing the error. Here’s an example:

<?php
ini_set('display_errors', 'On');
error_reporting(E_ALL);
$sSiteName = "Think Vitamin"
echo $sSiteName;

In this example I have omitted the semi-colon from line 5, however, PHP has reported an error occurred on line 6. Looking one line above you can spot and rectify the problem.

Warnings

Warnings aren’t deal breakers like syntax errors. PHP can cope with a warning, however, it knows that you probably made a mistake somewhere and is notifying you about it. Warnings often appear for the following reasons:

1. Headers already sent. Try checking for white space at the head of your code or in files you’re including.
2. You’re passing an incorrect number of parameters to a function.
3. Incorrect path names when including files.

Notices

Notices aren’t going to halt the execution of your code either, but they can be very important in tracking down a pesky bug. Often you’ll find that code that’s working perfectly happily in a production environment starts throwing out notices when you set error_reporting to E_ALL.

A common notice you’ll encounter during development is:

>Notice: Undefined index: FullName in /Document/Root/views/userdetails.phtml on line 55

This information can be extremely useful in debugging your application. Say you’ve done a simple database query and pulled a row of user data from a table. For presentation in your view you’ve assigned the details to an array called $aUserDetails. However, when you echo $aUserDetails[‘FirstName’] on line 55 there’s no output and PHP throws the notice above. In this instance the notice you receive can really help.

PHP has helpfully told us that the FirstName key is undefined so we know that this isn’t a case of the database record being NULL. However, perhaps we should check our SQL statement to ensure we’ve actually retrieved the user’s first name from the database. In this case the notice has helped us rule out a potential issue which has in turn steered us towards the likely source of our problem. Without the notice our likely first stop would have been the database record, followed by tracing back through our logic to eventually find our omission in the SQL.

Fatal Errors

Fatal Errors sound the most painful of the four but are in fact often the easiest to resolve. What it means, in short, is that PHP understands what you’ve asked it to do but can’t carry out the request. Your syntax is correct, you’re speaking its language but PHP doesn’t have what it needs to comply. The most common fatal error is an undefined class or function and the error generated normally points straight to the root of the problem:

Fatal error: Call to undefined function create() in /Document/Root/example.php on line 23

Using var_dump() to Aid Your Debugging

var_dump() is a native PHP function which displays structured, humanly readable, information about one (or more) expressions. This is particularly useful when dealing with arrays and objects as var_dump() displays their structure recursively giving you the best possible picture of what’s going on. Here’s an example of how to use var_dump() in context:

Below I have created an array of scores achieved by users but one value in my array is subtly distinct from the others, var_dump() can help us discover that distinction.

<?php
ini_set('display_errors', 'On');
error_reporting(E_ALL);
$aUserScores = array('Ben' => 7,'Linda' => 4,'Tony' => 5,'Alice' => '9');
var_dump($aUserScores);

Tip: Wrap var_dump() in tags to aid readability.

The output from var_dump() will look like this:

array(4) {
  ["Ben"]=>
  int(7)
  ["Linda"]=>
  int(4)
  ["Tony"]=>
  int(5)
  ["Alice"]=>
  string(1) "9"
}

As you can see var_dump tells us that $aUserScores is an array with four key/value pairs. Ben, Linda, and Tony all have their values (or scores) stored as integers. However, Alice is showing up as a string of one character in length.

If we return to my code, we can see that I have mistakenly wrapped Alice’s score of 9 in quotation marks causing PHP to interpret it as a string. Now, this mistake won’t have a massively adverse effect, however, it does demonstrate the power of var_dump() in helping us get better visibility of our arrays and objects.

While this is a very basic example of how var_dump() functions it can similarly be used to inspect large multi-dimensional arrays or objects. It is particularly useful in discovering if you have the correct data returned from a database query or when exploring a JSON response from say, Twitter:

<?php
ini_set('display_errors', 'On');
error_reporting(E_ALL);
$sJsonUrl = 'http://search.twitter.com/trends.json';
$sJson = file_get_contents($sJsonUrl,0,NULL,NULL);
$oTrends = json_decode($sJson);
var_dump($oTrends);

Useful Tools to Consider when Debugging

Finally, I want to point out a couple of useful tools that I’ve used to help me in the debugging process. I won’t go into detail about installing and configuring these extensions and add-ons, but I wanted to mention them because they can really make our lives easier.

Xdebug

Xdebug is a PHP extension that aims to lend a helping hand in the process of debugging your applications. Xdebug offers features like:

  • Automatic stack trace upon error
  • Function call logging
  • Display features such as enhanced var_dump() output and code coverage information.

Xdebug is highly configurable, and adaptable to a variety of situations. For example, stack traces (which are extremely useful for monitoring what your application is doing and when) can be configured to four different levels of detail. This means that you can adjust the sensitivity of Xdebug’s output helping you to get granular information about your app’s activity.

Stack traces show you where errors occur, allow you to trace function calls and detail the originating line numbers of these events. All of which is fantastic information for debugging your code.

Tip: As default Xdebug limits var_dump() output to three levels of recursion. You may want to change this in your xdebug.ini file by setting the xdebug.var_display_max_depth to equal a number that makes sense for your needs.

Check out Xdebug’s installation guide to get started.

FirePHP

For all you FireBug fans out there, FirePHP is a really useful little PHP library and Firefox add-on that can really help with AJAX development.

Essentially FirePHP enables you to log debug information to the Firebug console using a simple method call like so:

<?php
$sSql = 'SELECT * FROM tbl';
FB::log('SQL query: ' . $sSql);

In an instance where I’m making an AJAX search request, for example, it might be useful to pass back the SQL string my code is constructing in order that I can ensure my code is behaving correctly. All data logged to the Firebug console is sent via response headers and therefore doesn’t effect the page being rendered by the browser.

Warning: As with all debug information, this kind of data shouldn’t be for public consumption. The downside of having to add the FirePHP method calls into your PHP is that before you go live you will either have to strip all these calls out or set up an environment based conditional statement which establishes whether or not to include the debug code.

You can install the Firefox add-on at FirePHP’s website and also grab the PHP libs there too. Oh, and don’t forget if you haven’t already installed FireBug, you’ll need that too.

 
Leave a comment

Posted by on June 8, 2011 in Mixed, PHP

 

Array Pagination in PHP

Pagination in PHP is a topic covered by a lot of tutorials and is therefore quite saturated. Although I’m not going to introduce any wild new concepts into this tutorial I will explain how you can use pagination for data held within an array.

Normally you’d only need to paginate your data if you’ve got quite a lot of it in which case you’re most likely using some sort of database. With database systems of course pagination can be achieved relatively easily by specifying the offset parameters in an SQL query. But what if your data didn’t come from a database table and instead came from a flat file.

Take a look at the following code which shows exactly how it’s done.

  1. // Data, normally from a flat file or some other source
  2. $data = “Item 1|Item 2|Item 3|Item 4|Item 5|Item 6|Item 7|Item 8|Item 9|Item 10”;
  3. // Put our data into an array
  4. $dataArray = explode(‘|’, $data);
  5. // Get the current page
  6. $currentPage = trim($_REQUEST[page]);
  7. // Pagination settings
  8. $perPage = 3;
  9. $numPages = ceil(count($dataArray) / $perPage);
  10. if(!$currentPage || $currentPage > $numPages)
  11.     $currentPage = 0;
  12. $start = $currentPage * $perPage;
  13. $end = ($currentPage * $perPage) + $perPage;
  14. // Extract ones we need
  15. foreach($dataArray AS $key => $val)
  16. {
  17.     if($key >= $start && $key < $end)
  18.         $pagedData[] = $dataArray[$key];
  19. }

Now for the explanation. To begin with I have created the $data array which contains a long line of items split up by the pipe ‘|’ character. Realistically this would be real data however just for this tutorial I’ll keep it simple. Then, using the explode() function I’ve cut up the $data variable into an array using ‘|’ as the delimeter.

Line 8 simply gets the current page number if one is provided.

Lines 11 to 17 are all to do with the simple math calculations which make this array pagination work. Firstly set how many items we’d like to display per page into the variable, $perPage. In the example above I’ve set this to 3.

On line 12 we’re working out how many pages there are going to be. This can be done by dividing the total number of items (by using the count() function) by the items per page value. Notice on this line that I’m also using the ceil() function. This basically rounds the number up (e.g. 5.134 becomes 6).

We then have a simply if statement on lines 13 and 14. It’s basically saying that if no page number has been provided or if the provided page number is more than the number of pages, set it to 0. This stops people from trying to access pages which have no items.

On lines 19 and 20 we’re setting the $start and $end variable which you might recognize if you’ve done pagination using SQL queries before. The $start variable holds the lowest ID of the item which can be displayed on this page. The $end variable is maximum cap which the items ID can be to be displayed (actually, it’s one above, but this depends on how you do your if statement on line 22).

Great, we’re nearly there. Now, on line 20 we start a foreach statement which loops through each of our data items. Inside this loop is a simple if statement to see if the id of the current data item is above (or equal to) the $start value and below the $end value. If it is then we place a copy of it into our $pagedData array.

Once the foreach loop has finished the $pagedData array now contains all of the data items which we should be displaying on the current page. All we have to do now is to loop through and display them. This has been shown in the following code snippet.

  1. foreach($pagedData AS $item)
  2.     echo $item . “<br>”;

As far as displaying the data goes that’s it. All we need to do now is to display the pagination links to let you navigate your way through the pages. Here’s the code for that.

  1. if($currentPage > 0 && $currentPage < $numPages)
  2.     echo ‘<a href=”?page=’ . ($currentPage – 1) . ‘”>« Previous page</a><br>’;
  3. if($numPages > $currentPage && ($currentPage + 1) < $numPages)
  4.     echo ‘<a href=”?page=’ . ($currentPage + 1) . ‘” class=”right”>Next page »</a><br>’;

The above snippet consists of two quite simple if statements. The first is for displaying the “previous page” link and the second is for displaying the “next page” link.

Starting with the first, the if statement checking to see if the current page is more than 0 (you wouldn’t have a previous link on the very first page) and if the current page is less than the total number of pages (to avoid displaying it on pages with no data).

The second if statement is checking to see if the total number of pages is more than the current page number (so you’re not vieiwng the last page) and if there are any more pages after the current page.

 
Leave a comment

Posted by on June 8, 2011 in Mixed, PHP

 

9 PHP Debugging Techniques You Should Be Using

Enable Notices

The wonderful developers of PHP (no, not PHP developers) bestowed a great gift when they created the language we all know and love, and that was the gift of notices.

What are notices? A notice is a type of error message which is less severe than parse, fatal and warning error messages. A notice may tell you something along the lines of “Hey, you have used that variable without defining it! What gives?”

Notices are very useful tool for avoiding bugs during development as they can catch problems caused by mistyped variable names or non-existent array indexes. This will save you a lot of headaches.

If you have been developing with notices turned off then please go and turn them on. Please! You can do this in one of three ways. Firstly, you can do this in your php.ini file:

error_reporting = E_ALL

Or you can insert the following line at the start of your PHP code:

ini_set(‘error_reporting’, E_ALL);

Lastly, you can set this on a per-virtual host in Apache HTTPD using the following:

php_value error_reporting 8191

If you want to test that notices are turned on you can simply create a file with the following:

<?php
echo "You should see a notice below this line\n";
echo $thisVariableDoesNotExist;
?>

At this point I should stress that you should always ensure that the display_errors ini setting is set to ‘Off’ for any live/production sites. It is a very bad idea to allow the public to view your error messages because a) it looks bad, and b) it can give away a lot of information to potential crackers.

Now that you have turned notices on you may find that your code creates a lot of them. I would start correcting these at the first chance you get as you will never see new notices amid such clutter. You may find empty() andis_set() useful.

See also:

Use a Logging System

A logging system can be very useful in tracking down bugs, especially when they happen in a production environment. Such a system can also be useful in debugging during development but I find it much easier to use an IDE to debug my development environment (more on this shortly).

I feel that it is important to log any significant actions performed (e.g. user created, group deleted, registration email sent) as well as any errors that occur. Some people advocate logging a message every time you enter or leave a function. I find that this method to be a bit too verbose, especially with more complex applications. I prefer to ditch the reams of log messages for a good IDE and debugger.

When you actually come to log your messages I would recommend having a fallback mechanism (you never know, the logger itself could have just broken!) For example, you could try logging to a database, and failing that you can append to a log file, and failing that you can send an email. You may find exceptions useful for this.

See also:

Log Errors

We have to accept, that despite out best efforts, errors can (and do) occur in production environments. When these hiccups do arise we have to ensure that they are dealt with quickly, otherwise users (or even, gasp, clients) get angry.

Some types of errors we cannot do anything about (think parse errors) and we just have to ensure that we have a close eye on our error logs so we notice when they occur. Fortunately, it is these types of bugs that are normally caught very quickly during development and testing.

As for all the other errors, we need to make sure that they are caught and dealt with properly. Make sure the user is shown a nice error page (with a suitably cute ‘oops-back-soon’ picture) and then log, log
everything in sight! I recommend storing the following:

  • The stack current trace (see debug_backtrace() and debug_print_backtrace()).
  • The output of get_defined_vars(). However, this is only useful if you call it at the point the error occurs, not at the point the error is logged. This includes global variables.
  • Any and all information about the remote user (IP address, user agent, session data)
  • All global variable data (which includes the contents of $_COOKIES, $_SERVER etc.)
  • Any other status data which is specific to your application

This information will be invaluable when you come to tracking down errors in production code.

How you choose to actually capture your errors is your own choice. You can use trigger_error and a custom error handler, but I prefer to use a custom exception class which takes care of logging for me.

See also:

Check Function Parameters

Checking function parameters can help you catch a lot of bugs before the erroneous data passes too far through your application. I like to test that the input parameters are of the expected type and are reasonably sane.

To avoid too much clutter I generally only do this on utility methods (as these are used often and in many places) and methods which permanently manipulate data such as files and databases (as errors here have the potential to destroy a lot of data).

See also:

Use an Integrated Development Environment and Debugger

I created my first ever website on a Geocities account. To do this I used was a textarea field in the Geocities admin area. It was OK for the simple HTML I was writing, but I would not like to create an entire PHP application in this way!

I have now moved on to using an Integrated Development Environment (IDE) for the majority of my development, and I highly recommend that you do the same. I use Zend Studio and you only have to look at thefeature list to see why I find it completely indispensible, especially for debugging. If you have not used an IDE before I recommend you have a look at one of the applications listed at the end of this section.

I also use a remote debugger (ZendDebugger), which ties into the IDE. The remote debugger is a PHP module that allows you to debug code on your server using the IDE on your local machine. You can set breakpoints, inspect variables, examine stack traces, profile code and all the other benefits you would expect from a debugger. And no, Zend does not sponsor me.

See also:

Unit Testing

Unit testing may not be everyone’s idea of fun, but I can be very effective for developing larger projects. It can give you confidence when you have to make significant changes to the code base, as well as point out problems before your code goes into production.

There are two catches with unit tests, the biggest of which is that you have to actually write the Unit tests themselves. Although this should save time in the long run (or at least lead to a more robust product), it is hard to avoid thinking that you could be spending the time developing functionality.

The second catch is that it often forces you to refactor your code into more test-friendly chunks. This is probably a good thing but it will take more time. The best approach would be to write unit tests from the very start of the project or, for an existing project, you can write a unit test for every bug that is fixed.

If you are using unit tests you should also be aware of the concept of code coverage. This is a metric which shows what percentage of your code is run during the testing process. The higher value for this indicates a more robust set of unit tests. You can calculate your code coverage using a debugger, as was discussed in the previous section.

See also:

No Magic! (Or, Avoid Side Effects)

A side effect can be described as a non-obvious effect that was caused by performing an action (see Wikipediafor a more technical description). For example:

So what is going on here? You can see that we start with a $radius and $area variable which we use to show the area of the circle. We then want to display the area of a square with the same dimensions, so we call getSquareArea. Although this function does what its name implies, it also alters the $radius variable (intentionally or not). This is defiantly non-obvious in the rest of our code and can cause severe headaches when it comes to debugging, especially in more complex applications. Of course, you should also avoid global variables for similar reasons.

This also applies to modifying function parameters (which were passed by reference). If you find yourself doing this then you should probably refactor your code. To do this you can either return the parameter rather than modifying it, or you can split the function into several smaller functions. Also, don’t forget that objects are always passed by reference in PHP 5.

Use Manual Redirects When Debugging

Many developers (including myself) will make use of redirects when developing web applications. To refresh your memory, here is how you do a redirect in PHP:

This technique can be very useful for sending the user to the correct page, but it can also be very problematic for debugging. For example, do you keep getting sent off to a bizarre area of your application? Do you know if it is just one redirect sending you there or many? What do you do when you get trapped in an infinite redirect loop?

My answer to these problems was to introduce the concept of manual redirects which would only be used for debugging. Rather than sending a header to the client, I would send a link to the target page as well as a stack trace. This would allow me to monitor the redirects that were happening in my application and clearly see what was happening if the application went wrong.

The code I use looks something like this:

<?php
function redirect($url, $debug = false) {
	//If manual page redirects have been enabled then print some html

	if ($debug) {
		echo "Redirect: You are being   redirected to:$url\n";
		echo "Backtrace:\n";
		debug_print_backtrace ();
		echo "";
} else {
header ( "Location: $url" );
}

exit ( 0 );

}
?>

You may find it useful to pull the $debug value from your configuration system of choice rather than having to pass it for each function
call, but it works in this example.

Keep Things Simple

I think this rule probably exists for every profession out there, so it should be no surprise that it applies to software development.

It is good practice to write software using a clear structure and using standard design patterns, but this is only a high level approach to keeping things simple. We also need to keep your individual algorithms as simple as possible as this will make your code easier to understand in six months when it needs fixing, and will also make it easier to fix.

Here are some ways you can achieve this:

  1. Keep an eye on functions that are growing. You may find that you can split the code into several smaller functions.
  2. Functions that are only called in one place may be too specific. You can either bring the code inline, or generalise using several smaller functions. You can always keep the specific function and just use that to call and aggregate the new, smaller, functions.
  3. Watch out for functions with very long names or lots of arguments. This can be a sign that the function could be split into several smaller functions, or it could even be replaced with a class.
  4. Use built in functions where possible. This will help avoid spurious amounts of PHP code and there is a good chance the internal function will be faster as it is written in C (and by the pros!) Some of the most underappreciated internal functions are the array functions.
  5. If you really must have long and complex sections of code, then make sure you add some documentation. You and your fellow developers will be thankful of this when it comes to debugging.

Most of these points are about splitting large functions into smaller ones. It is also important to ensure you do not end up with lots of tiny functions, but I feel this is a much more unusual problem.

See also:

 
Leave a comment

Posted by on June 7, 2011 in Mixed, PHP

 

The Truth About PHP Variables

I wanted to write this post to clear up what seems to be a common misunderstanding in PHP – that using references when passing around large variables is a good way save memory. To fully explain this I will need to explain how PHP handles variables internally. I hope that you will find this interesting and useful and that it helps dispel some myths around references and memory management in PHP.

Basic References in PHP

(Note: If you are already familiar with references in PHP then feel free to skip this section)

In PHP it is possible to assign variables by value or by reference. The former method is the most common, and should look very familiar to you:

<?php
//Example 1: Assigning variables by value (the 'standard' way)
$var1 = 'hello!';
$var2 = $var1;
$var2 = 'goodbye!';
echo $var1; // Produces: hello!
echo "
\n"; echo $var2; //Produces: goodbye! ?>

This should be no surprise to you, just simple assigning of variables in PHP. The next example is very similar, but we assign $var2 by reference rather than by value.

<?php
//Example 2: Assigning variables by reference
$var1 = 'hello!';
$var2 =& $var1; // Notice the ampersand. This means $var2
                // is a reference to $var1
$var2 = 'goodbye!' // because $var2 is a reference to $var1,
                   // both variables now have the value 'goodbye!';
echo $var1; // Produces: goodbye!
echo "
\n"; echo $var2; //Produces: goodbye! ?>

This may be more surprising to some of you, so I will explain what is happening. The first step is no different, we simply initialise $var1 with a value of ‘hello!’. However, in the next step we assign $var1 to $var2 using the ‘=&’ operator, which causes a reference to $var1 to be passed, rather than the actual contents of $var1. This means that both variables point to the same data in memory, so any changes to either variable will affect the other.

How PHP Handles Variables Internally (using zvals!)

While the above explanation of references is sufficient for a general understanding, it is often useful to understand how PHP handles variable assignment internally. This is where we introduce the concept of the zval.

zvals are an internal PHP structure which are used for storing variables. Each zval contains various pieces of information, and the ones we will be focusing on here are as follows:

The actual data stored within the zval – In our example this would be either ‘hello!’ or ‘goodbye!’. The zval also knows the type of data it contains, but this is not especially relevant here so it has been omitted from the above list.

The first item in our list, the actual data, does not require much explanation. The second item on this list (is_ref) indicates if variables should address this zval by value or by reference, the implications
of which are addressed shortly. The third item (ref_count) stores the number of variables that currently address this zval. If ref_count ever reaches zero (for example, if you call unset()) then PHP assumes that it can remove the zval and free up the memory it was using.

Now this bit is important: You may think that the ref_count value is only used when dealing with a reference (i.e. when is_ref=true), but this is not the case. The ref_count variable is used regardless of the value of is_ref. So what does this mean?

Being A Little Bit Clever

This is where, as the headline suggests, PHP is a little bit clever. When you assign a variable by value (such as in example 1) it does not create a new zval, it simply points both variables at the same zval and increases that zval’s ref_count by one. “Wait!” I hear you cry, “Isn’t that passing by reference?” Well, although it sounds the same, all PHP is doing is postponing any copying until it really has to – and it knows this because is_ref is still false. “Hum, so how does it work?” Ok, here is an example:

<?php
//Example 3a: Assigning variables by value (but with more detail)

//Here our zval is created for $var1.
$var1 = 'hello!';
//Our zval now has ref_count=1, is_ref=false

//We now assign $var1 to $var2
$var2 = $var1;
//Our zval now has ref_count=2, is_ref=false

debug_zval_dump($var2); //Produces: string(6) "hello!" refcount(3)
//(Why refcount(3)? See "An important note on debug_zval_dump()")

//We now assign a new value to $var2. So what happens to our zval?
$var2 = 'goodbye!';
//Read on to find out...

?>

An important note on debug_zval_dump(): php.net says this function “dumps a string representation of an internal Zend value to output.” This is true, but calling this function inherently causes another reference to the variable to be created, so you can (in these examples) subtract one from the ref_count value given in the output.

In the above example we see how both $var1 and $var2 refer to the same zval (as can be seen by the call to debug_zval_dump()). So what happens on the last line when we assign a new value to $var2? Does $var1 change too? Of course the answer is no, but why?

When we assign ‘goodbye!’ to $var2 in the example above, PHP examines the is_ref value of the underlying zval. If is_ref is false (as it is in this example) PHP knows that it can only change the value of the zval if the ref_count is 1 (as the change will not affect any other variables). However, in our example the ref_count is 2, therefore PHP realises that it is not allowed to change the zval’s value and so creates another zval to which $val2 is the associated. The is illustrated by the finished example below:

<?php
//Example 3b: Assigning variables by value (the complete example)

//Here our zval is created for $var1.
$var1 = 'hello!';
//Our zval now has value='hello!', ref_count=1, is_ref=false

//We now assign $var1 to $var2
$var2 = $var1;
//Our zval now has value='hello!', ref_count=2, is_ref=false

debug_zval_dump($var2); //Produces: string(6) "hello!" refcount(3)
//(Why refcount(3)? See "An important note on debug_zval_dump()")

//We now assign a new value to $var
$var2 = 'goodbye!';
//We now have two zvals:
//   The first: value='hello!', ref_count=1, is_ref=false
//   The second: value='goodbye!', ref_count=1, is_ref=false

?>

So we can see that, in the case of passing-by-value, PHP only copies data if a value is changed.

For the sake of completeness, here is an example where we pass-by-reference;

<?php
//Example 4: Assigning variables by value (the complete example)

//Here our zval is created for $var1.
$var1 = 'hello!';
//Our zval now has value='hello!', ref_count=1, is_ref=false

//We now assign $var1 to $var2
$var2 =& $var1;
//Our zval now has value='hello!', ref_count=2, is_ref=true

debug_zval_dump(&$var2); //Produces: &string(6) "hello!" refcount(3)
//(Why refcount(3)? See "An important note on debug_zval_dump()")

//We now assign a new value to $var
$var2 = 'goodbye!';
//We still have one zval, but with a
//new value: value='goodbye!', ref_count=2, is_ref=true

debug_zval_dump(&$var1); //Produces: &string(8) "goodbye!" refcount(3)
debug_zval_dump(&$var2); //Produces: &string(8) "goodbye!" refcount(3)
//(Why refcount(3)? See "An important note on debug_zval_dump()")

?>

As expected, we can see that the zval for both $var1 and $var2 has changed to a value of ‘goodbye!’ and has a ref_count of 2.

A Little More Complex

So now we know how PHP handles values and references, and isn’t it is all wonderfully exciting? “Oh yes! Please tell me more!” I hear you say? Ok then…

There is one last thing to mention in this area, which I think is especially relevant to those of you who love to (ahem) save memory by passing around references – what happens when values and references meet.

You may have noticed that the zval’s is_ref flag does not permit a zval to be both a reference and a value at the same time (as it is either true or false). On the face of it this is probably for the best as I suspect it could lead to all kinds of strangeness from an internal perspective. However, a result of this is that if you are using a variable by value in several places (i.e. the variables underlying zval has a ref_count greater than 1) and then pass it by reference (for example, to a function), PHP will have to copy the value into a entirely new zval in order to set the is_ref flag to true. The following example illustrates how this can result in substantially increased memory usage:

<?php
//Example 5: Showing how mixing references and values can lead
//           to increased memory consumption

memory_show_usage(); //Zero bytes

$v1 = str_repeat('0', 100000);//Generate 100kb of dummy data
memory_show_usage(); //100kb

$v2 = $v1;
//We now have two variables pointing to a zval in the form:
//   is_ref=false, ref_count=2
memory_show_usage(); //100kb

$r1 =& $v2; //We now assign our value by reference
memory_show_usage(); //200kb
//PHP has now had to create a second zval in the form:
//   is_ref=true, ref_count=1

$v3 = $r1; //We now assign second zval by value
memory_show_usage(); //300kb
//PHP has now had to create a third zval in the form:
//   is_ref=false, ref_count=1

$v4 = $v3; //Now assign by value
memory_show_usage(); //300kb (no increase)
//Our third zval now has a ref_count of 2

//Both $v3 and $v4 now have the same zval, which may only be
//passed by value as it has a ref_count greater than one

$r2 =& $v3; //So now we assign $v3 by reference
memory_show_usage(); //400kb
//Here PHP has been forced to create a fourth zval with yet
//another copy of the data. The new zval is in the form:
//    is_ref=true, ref_count = 1

//Simple function to show memory use from a baseline
function memory_show_usage(){
    static $baseline = null;
    if(is_null($baseline)){
        //Initialise to get an accurate memory use value
        $baseline = 1;
        $baseline = memory_get_usage();
    }

    echo (memory_get_usage() - $baseline) . " bytes\n";
}

?>

Although this example only assigns variables directly, the same principles apply when performing function calls where parameters are passed by reference. You can see that, unless the developer is completely consistent, passing variables by reference can easily lead to increased memory usage.

Conclusion

If you concern is to conserve memory then it is best to simply pass data by value as the PHP language is smart enough to conserve memory automatically. If you really must pass a value by reference then make sure that it is done consistently as this will avoid consuming many times more memory (and CPU cycles) than is necessary. Alternatively you could wrap your data in an object as PHP5 (but not PHP4) will pass this by reference as the default behaviour.

As a side note I would like to point out that side affecting function parameters (which may be your intention if you are passing by reference) is generally discouraged as it can make some bugs very hard to track down (a similar argument to that against global variables).

 
Leave a comment

Posted by on June 7, 2011 in Mixed, PHP

 

Complex Regular Expression Examples

User name check with regular expression

First start with a user name check. In case of a registration form you may want to control available user names a bit. Let’s suppose you don’t want to allow any special character in the name except “_.-” and of course letters and numbers. Besides this you may want to control the length of the user name to be between 4 and 20.

First we need to define the available characters. This can be realised with the following code:

[a-zA-Z0-9_.-]

After that we need to limit the number of characters with the following code:

{4,20}

At least we need to put it together:

^[a-zA-Z-0-9_.-]{4,20}$

In case of Perl compatible regular expression surround it with ‘/’. At the end the PHP code looks like this:

$pattern = '/^[a-zA-Z0-9_.-]{4,20}$/';
$username = "this.is.a-demo_-";

if (preg_match($pattern,$username)) echo "Match";
else echo "Not match";

Check hexadecimal color codes with regular expression

A hexadecimal color code looks like this: #5A332C or you can use a short form like #C5F. In both case it starts with a # and follows with exactly 3 or 6 numbers or letters from a-f.

So the first it starts as:

^#

the following character range is:

[a-fA-F0-9]

and the length can be 3 or 6. The complete pattern is the following:

^#(([a-fA-F0-9]{3}$)|([a-fA-F0-9]{6}$))

Here we use an or statement first check the #123 form and then the #123456 form. At the end the PHP code looks like this:

$pattern = '/^#(([a-fA-F0-9]{3}$)|([a-fA-F0-9]{6}$))/';
$color = "#1AA";

if (preg_match($pattern,$color)) echo "Match";
else echo "Not match";

Email check with regular expression

At least let’s see how we can check an email address with regular expressions. First take a careful look at the following example emails:

john.demo@demo.com
john@demo.us
john_123.demo_.name@demo.info

What we can see is that the @ is a mandatory element in an email. Besides this there must be some character before and some after it. More precisely there must be a valid domain name after the @.

So the first part must be a string with letters a numbers or some special characters like _-. In pattern we can write it as follows:

^[a-zA-Z0-9_.-]+

The domain name always have a let’s say name and tld. The tld is the .com, .us. .info and the name can be any string with valid characters. It means that the domain pattern looks like this:

[a-zA-Z0-9-]+\.[a-zA-Z.]{2,4}$

Now we only need to put together the 2 parts with the @ and get the complete pattern:

^[a-zA-Z0-9._-]+@[a-zA-Z0-9-]+\.[a-zA-Z.]{2,5}$

The PHP code looks like this:

$pattern = '/^[a-zA-Z0-9._-]+@[a-zA-Z0-9-]+\.[a-zA-Z.]{2,5}$/';
$email = "john123.demo_.name@demo.info";

if (preg_match($pattern,$email)) echo "Match";
else echo "Not match";
 
Leave a comment

Posted by on June 7, 2011 in PHP, Regular Expressions

 

Tags:

Regular expressions in PHP (Basics)

PHP regular expressions seems to be a quite complicated area especially if you are not an experienced Unix user. Historically regular expressions were originally designed to help working with strings under Unix systems.

Using regular expressions you can easy find a pattern in a string and/or replace it if you want. This is a very powerful tool in your hand, but be careful as it is slower than the standard string manipulation functions.

Regular expression types

There are 2 types of  regular expressions:

  • POSIX Extended
  • Perl Compatible

The ereg, eregi, … are the POSIX versions and preg_match, preg_replace, … are the Perl version. It is important that using Perl compatible regular expressions the expression should be enclosed in the delimiters, a forward slash (/), for example. However this version is more powerful and faster as well than the POSIX one.

The regular expressions basic syntax

To use regular expressions first you need to learn the syntax of the patterns. We can group the characters inside a pattern like this:

  • Normal characters which match themselves like hello
  • Start and end indicators as ^ and $
  • Count indicators like +,*,?
  • Logical operator like |
  • Grouping with {},(),[]

An example pattern to check valid emails looks like this:

Code:
^[a-zA-Z0-9._-]+@[a-zA-Z0-9-]+\.[a-zA-Z.]{2,5}$

The code to check the email using Perl compatible regular expression looks like this:

$pattern = "/^[a-zA-Z0-9._-]+@[a-zA-Z0-9-]+\.[a-zA-Z.]{2,5}$/";
$email = "jim@demo.com";

if (preg_match($pattern,$email)) echo "Match";
else echo "Not match";

And very similar in case of POSIX extended regular expressions:

$pattern = "^[a-zA-Z0-9._-]+@[a-zA-Z0-9-]+\.[a-zA-Z.]{2,5}$";
$email = "jim@demo.com";

if (eregi($pattern,$email)) echo "Match";
else echo "Not match";

Now let’s see a detailed pattern syntax reference:

Regular expression (pattern) Match (subject) Not match (subject) Comment
world Hello world Hello Jim Match if the pattern is present anywhere in the subject
^world world class Hello world Match if the pattern is present at the beginning of the subject
world$ Hello world world class Match if the pattern is present at the end of the subject
world/i This WoRLd Hello Jim Makes a search in case insensitive mode
^world$ world Hello world The string contains only the “world”
world* worl, world, worlddd wor There is 0 or more “d” after “worl”
world+ world, worlddd worl There is at least 1 “d” after “worl”
world? worl, world, worly wor, wory There is 0 or 1 “d” after “worl”
world{1} world worly There is 1 “d” after “worl”
world{1,} world, worlddd worly There is 1 ore more “d” after “worl”
world{2,3} worldd, worlddd world There are 2 or 3 “d” after “worl”
wo(rld)* wo, world, worldold wa There is 0 or more “rld” after “wo”
earth|world earth, world sun The string contains the “earth” or the “world”
w.rld world, wwrld wrld Any character in place of the dot.
^.{5}$ world, earth sun A string with exactly 5 characters
[abc] abc, bbaccc sun There is an “a” or “b” or “c” in the string
[a-z] world WORLD There are any lowercase letter in the string
[a-zA-Z] world, WORLD, Worl12 123 There are any lower- or uppercase letter in the string
[^wW] earth w, W The actual character can not be a “w” or “W”
 
Leave a comment

Posted by on June 7, 2011 in PHP, Regular Expressions