Writing efficient algorithms

Normally when writing websites with PHP, you don't really have worry much about writing efficient algorithms. PHP isn't even really meant for writing highly efficient code, as it is interpreted language. But sometimes, you still come across situations, where you need to implement some sort of highly recursive algorithms or ones that iterate over same piece of code for hundreds of times.

I've been lately rewriting my sudoku solver to have a better code structure and also with new kinds of logic to solve the sudoku puzzle. I wont argue with you, if you claim it's not the best idea to program it using PHP. However, it's still quite fun, and requires use of different kind of practices compared to what I would normally use with PHP.

When writing algorithms in PHP, or in any language, you often want to squeeze every possible millisecond from the execution time. in PHP it's often even more important when using them on webpages, because you don't want pages to have long load times. Users don't have too much patience, after all. To get the most out of your code, there are several key practices that you should follow to make your code faster. Even though these will be more useful when writing algorithms, these practices can also be applied to normal coding as well.

1. Take advantage of language constructs

The fastest things in PHP are the language constructs. They are the language itself and are highly optimized in the interpreter, since they don't require calling external libraries, and because they are used in all parts of the code, so their optimization is a key to creating a faster interpreted language.

Because these language constructs are so finely tuned, it's also best to take the most out of them. Whenever you can do something with a simple language construct, don't waste a funtion call for it. For example, using a casting operator like (int) $foo is much more efficient than using the function intval($foo). Calling functions always generate considerably amount of overhead, but when using a language construct, you avoid all that.

An interesting thing to note is that there are several function like language constructs as well. For example, isset() and unset() are both language constructs, even though they mostly act like functions. The key difference, however, is that calling them does not generate the function overhead.

while it may not be immediately apparent how you may effectively take advantage of the language constructs, you can see later how you can achieve great things with them, particularly when dealing with arrays.

2. Let PHP do the hard work

Second very important thing to remember is that PHP is slow because it's interpreted language. It's not particularly slow because it had a slow engine, but mostly because it's a higher level language. This same does not apply to the built in functions in PHP, however. They are mostly written in lower level languages and are much faster than what you could ever do with PHP.

The idea is to use the functions inside PHP as much as possible to do the hard work. This becomes particularly useful when you are dealing with big arrays. It's better to try design your code to to effectively use the functions inside PHP than trying to come up with your own way to do it. However, you will need a good knowledge of the available functions and what PHP can do for you.

For example, it may not often occur to you, but array_keys() is a very handy function for searching values in an array. The function takes a second parameter, which all the values are compared against, and only the keys with matching values are returned. With that single function, you can let the PHP internal function to handle a very expensive operation on a big array.

3. Design your data structures with the 2 rules in mind

Now that you know two basic ways to considerably improve the performance of your code, it's time to actually put them into use by designing your data structures to take advantage of the two previously mentioned practices.

It's hard to tell exactly what you need to do, because it depends so much on what you are coding, but I can give very real examples from my Sudoku solver. A very good way to use the language constructs, is with arrays that have unique string or integer values. Instead of storing the unique values in the array, store them as keys to the array. The values wont really matter, and personally I just mostly use true. When unique values are stored like this, you can easily check if the array contains a unique value using a code like isset($array[$value]);. Removing values from the array is also easy, since all you have to do is unset($array[$value]);. If you had stored those values in the array as values, you would have had to use functions instead of languages constructs.

Another neat advantage of that I have in my sudoku solver, is finding the number of unique values in few different arrays. For example, in my sudoku solver, for each row (and column and region), I store the ID number for each cell that has specific candidate in it. Because they are stored as keys in the array, I can simply use $cellsfor[$id1] + $cellsfor[$id2] to find all the cells that have either candidate in it. Summing up two arrays means a union, where keys of the resulted array are the keys of both of these arrays. I've used a simple language construct instead of using array_unique(array_merge($cellsfor[$id1], $cellsfor[$id2]));, which I would have had to use, if the ID numbers had been stored as values.

To use the functions available in PHP, I store data about the cells in the sudoku puzzle in single dimensional array instead of two dimensional array, which would seem like logical choice. The advantage of this is that I can apply the PHP array functions on the entire sudoku board, instead of having to iterate through the cells myself. For example, I store the number of candidates available in for each cell in array. To find all cells which have only one candidate, I can simply call array_keys($candidate_count, 1);. That way the entire array will be iterated by the internal PHP language, instead of my own code.

Arrays in particular are a very powerful thing in PHP. As you can see, the biggest optimizations in my Sudoku Solver doesn't come from particularly brilliant logic, as lot of it is quite brutish, it comes from effective use of PHP language.

4. Memory vs. CPU

Last but not least, the same idea that applies to all programming also applies to coding PHP algorithms, of course. You can often reduce CPU load, and thus make the script faster by wasting memory. Calculating things that are used often before the actual algorithm is usually very benefical compared to only calculating when they are needed.

In my sudoku solver, all I would really need is to keep a list of solved cells with their solution and list of available candidates for each cell. For solving a normal 9x9 puzzle, I wouldn't really need anything else. But, because I maintain lots of different kind of information about the sudoku puzzle in arrays, I can make all the solving logic much faster, because it only needs to compare the values, instead of determining them.

Unlike the useless optimizations I highlighted a few posts back, these are actual good practices, that will gain you real improvement in execution time.

2 comments:

Anonymous said...

Thanks for the comments on array_keys. Made a huge difference to my processing time.

Anonymous said...

Ditto above - a very useful tip. Thank you.