foreach
supports iteration over three different kinds of values:
In the following, I will try to explain precisely how iteration works in different cases. By far the simplest case is Traversable
objects, as for these foreach
is essentially only syntax sugar for code along these lines:
foreach ($it as $k => $v) { /* ... */ }
/* translates to: */
if ($it instanceof IteratorAggregate) {
$it = $it->getIterator();
}
for ($it->rewind(); $it->valid(); $it->next()) {
$v = $it->current();
$k = $it->key();
/* ... */
}
For internal classes, actual method calls are avoided by using an internal API that essentially just mirrors the Iterator
interface on the C level.
Iteration of arrays and plain objects is significantly more complicated. First of all, it should be noted that in PHP "arrays" are really ordered dictionaries and they will be traversed according to this order (which matches the insertion order as long as you didn't use something like sort
). This is opposed to iterating by the natural order of the keys (how lists in other languages often work) or having no defined order at all (how dictionaries in other languages often work).
The same also applies to objects, as the object properties can be seen as another (ordered) dictionary mapping property names to their values, plus some visibility handling. In the majority of cases, the object properties are not actually stored in this rather inefficient way. However, if you start iterating over an object, the packed representation that is normally used will be converted to a real dictionary. At that point, iteration of plain objects becomes very similar to iteration of arrays (which is why I'm not discussing plain-object iteration much in here).
So far, so good. Iterating over a dictionary can't be too hard, right? The problems begin when you realize that an array/object can change during iteration. There are multiple ways this can happen:
- If you iterate by reference using
foreach ($arr as &$v)
then $arr
is turned into a reference and you can change it during iteration.
- In PHP 5 the same applies even if you iterate by value, but the array was a reference beforehand:
$ref =& $arr; foreach ($ref as $v)
- Objects have by-handle passing semantics, which for most practical purposes means that they behave like references. So objects can always be changed during iteration.
The problem with allowing modifications during iteration is the case where the element you are currently on is removed. Say you use a pointer to keep track of which array element you are currently at. If this element is now freed, you are left with a dangling pointer (usually resulting in a segfault).
There are different ways of solving this issue. PHP 5 and PHP 7 differ significantly in this regard and I'll describe both behaviors in the following. The summary is that PHP 5's approach was rather dumb and lead to all kinds of weird edge-case issues, while PHP 7's more involved approach results in more predictable and consistent behavior.
As a last preliminary, it should be noted that PHP uses reference counting and copy-on-write to manage memory. This means that if you "copy" a value, you actually just reuse the old value and increment its reference count (refcount). Only once you perform some kind of modification a real copy (called a "duplication") will be done. See You're being lied to for a more extensive introduction on this topic.
PHP 5
Internal array pointer and HashPointer
Arrays in PHP 5 have one dedicated "internal array pointer" (IAP), which properly supports modifications: Whenever an element is removed, there will be a check whether the IAP points to this element. If it does, it is advanced to the next element instead.
While foreach
does make use of the IAP, there is an additional complication: There is only one IAP, but one array can be part of multiple foreach
loops:
// Using by-ref iteration here to make sure that it's really
// the same array in both loops and not a copy
foreach ($arr as &$v1) {
foreach ($arr as &$v) {
// ...
}
}
To support two simultaneous loops with only one internal array pointer, foreach
performs the following shenanigans: Before the loop body is executed, foreach
will back up a pointer to the current element and its hash into a per-foreach HashPointer
. After the loop body runs, the IAP will be set back to this element if it still exists. If however the element has been removed, we'll just use wherever the IAP is currently at. This scheme mostly-kinda-sort of works, but there's a lot of weird behavior you can get out of it, some of which I'll demonstrate below.
Array duplication
The IAP is a visible feature of an array (exposed through the current
family of functions), as such changes to the IAP count as modifications under copy-on-write semantics. This, unfortunately, means that foreach
is in many cases forced to duplicate the array it is iterating over. The precise conditions are:
- The array is not a reference (is_ref=0). If it's a reference, then changes to it are supposed to propagate, so it should not be duplicated.
- The array has refcount>1. If
refcount
is 1, then the array is not shared and we're free to modify it directly.
If the array is not duplicated (is_ref=0, refcount=1), then only its refcount
will be incremented (*). Additionally, if foreach
by reference is used, then the (potentially duplicated) array will be turned into a reference.
Consider this code as an example where duplication occurs:
function iterate($arr) {
foreach ($arr as $v) {}
}
$outerArr = [0, 1, 2, 3, 4];
iterate($outerArr);
Here, $arr
will be duplicated to prevent IAP changes on $arr
from leaking to $outerArr
. In terms of the conditions above, the array is not a reference (is_ref=0) and is used in two places (refcount=2). This requirement is unfortunate and an artifact of the suboptimal implementation (there is no concern of modification during iteration here, so we don't really need to use the IAP in the first place).
(*) Incrementing the refcount
here sounds innocuous, but violates copy-on-write (COW) semantics: This means that we are going to modify the IAP of a refcount=2 array, while COW dictates that modifications can only be performed on refcount=1 values. This violation results in user-visible behavior change (while a COW is normally transparent) because the IAP change on the iterated array will be observable -- but only until the first non-IAP modification on the array. Instead, the three "valid" options would have been a) to always duplicate, b) do not increment the refcount
and thus allowing the iterated array to be arbitrarily modified in the loop or c) don't use the IAP at all (the PHP 7 solution).
Position advancement order
There is one last implementation detail that you have to be aware of to properly understand the code samples below. The "normal" way of looping through some data structure would look something like this in pseudocode:
reset(arr);
while (get_current_data(arr, &data) == SUCCESS) {
code();
move_forward(arr);
}
However foreach
, being a rather special snowflake, chooses to do things slightly differently:
reset(arr);
while (get_current_data(arr, &data) == SUCCESS) {
move_forward(arr);
code();
}
Namely, the array pointer is already moved forward before the loop body runs. This means that while the loop body is working on element $i
, the IAP is already at element $i+1
. This is the reason why code samples showing modification during iteration will always unset
the next element, rather than the current one.
Examples: Your test cases
The three aspects described above should provide you with a mostly complete impression of the idiosyncrasies of the foreach
implementation and we can move on to discuss some examples.
The behavior of your test cases is simple to explain at this point:
In test cases 1 and 2 $array
starts off with refcount=1, so it will not be duplicated by foreach
: Only the refcount
is incremented. When the loop body subsequently modifies the array (which has refcount=2 at that point), the duplication will occur at that point. Foreach will continue working on an unmodified copy of $array
.
In test case 3, once again the array is not duplicated, thus foreach
will be modifying the IAP of the $array
variable. At the end of the iteration, the IAP is NULL (meaning iteration has done), which each
indicates by returning false
.
In test cases 4 and 5 both each
and reset
are by-reference functions. The $array
has a refcount=2
when it is passed to them, so it has to be duplicated. As such foreach
will be working on a separate array again.
Examples: Effects of current
in foreach
A good way to show the various duplication behaviors is to observe the behavior of the current()
function inside a foreach
loop. Consider this example:
foreach ($array as $val) {
var_dump(current($array));
}
/* Output: 2 2 2 2 2 */
Here you should know that current()
is a by-ref function (actually: prefer-ref), even though it does not modify the array. It has to be in order to play nice with all the other functions like next
which are all by-ref. By-reference passing implies that the array has to be separated and thus $array
and the foreach-array
will be different. The reason you get 2
instead of 1
is also mentioned above: foreach
advances the array pointer before running the user code, not after. So even though the code is at the first element, foreach
already advanced the pointer to the second.
Now lets try a small modification:
$ref = &$array;
foreach ($array as $val) {
var_dump(current($array));
}
/* Output: 2 3 4 5 false */
Here we have the is_ref=1 case, so the array is not copied (just like above). But now that it is a reference, the array no longer has to be duplicated when passing to the by-ref current()
function. Thus current()
and foreach
work on the same array. You still see the off-by-one behavior though, due to the way foreach
advances the pointer.
You get the same behavior when doing by-ref iteration:
foreach ($array as &$val) {
var_dump(current($array));
}
/* Output: 2 3 4 5 false */
Here the important part is that foreach will make $array
an is_ref=1 when it is iterate