The Long and Winding Way of the Dynamic Array

The Long and Winding Way of the Dynamic Array

1. Introduction

One of the cool features of Delphi-4 is a new array type – dynamic. It is not the first attempt of the Pascal-Delphi team to further develop the concept of the static array as it appeared in Wirth’s standard Pascal. To clarify the terminology, we have to note that the terms “static” and “dynamic” by themselves are applied nowadays at least in four different meanings or situations.

First – for arrays whose boundaries may vary (dynamic) vs. arrays with constant boundaries (static).

Second – for the methods of assigning memory to the variables: either their relative addresses are known at compile time (static), or the addresses are assigned by the system at run time (dynamic). Correspondingly, there are two methods of memory allocation: on the stack (static) or on the heap (dynamic).

Third, there are two methods referring to variables: either directly by their names (static), with one-to-one correspondence between a variable and its instance in memory, or indirectly via pointers – dynamic (a variable and its instance are not the same).

Forth, methods in the class declaration may be either static, or virtual/dynamic.

Our goal here is to follow the evolution of the concept dynamic with regard to arrays only. We will analyze its particularities for all array types appearing in Borland-Inprise Pascal-Delphi. In addition to the standard arrays (static), there are three more types in the family of arrays now: open arrays, variant arrays and dynamic arrays. Why so many? (We should add to this list also all different types of strings in Delphi-4, which are a family of special character arrays whose diversity is even larger – a topic for another discussion).

In fact, the idea of dynamic arrays has a long history, beginning as early as in ALGOL-60.

2. The origin

In ALGOL-60, the syntax of array type looks similar to that of Pascal. For example, a declaration real array A[M1:N1, M2:N2, M3:N3] defines a 3-dimensional array of real numbers. Just as in Pascal, in the outermost block only constants (integer numbers) are allowed as the array boundaries, and in that case the array is called static. But in the inner blocks of ALOGOL-60, unlike Pascal, the boundaries may be variables:

begin

real array A[1:100, 0..10]; {static array in ALGOL-60}

integer M1, N1, M2, N2; {variables}

{ M1, N1, M2, N2 have to be defined}

. . . .

begin

real array B[M1:N1, M2:N2]; {dynamic array in ALGOL-60}

real array C[1:3, 1:4]; {“static” array in ALGOL-60}

. . . .

end;

. . . .

end

In the inner blocks, the array boundaries could be any arithmetic expression with the only requirement that numeric values were assigned to all variables in the boundaries before entering the block. Therefore, beginning with ALGOL-60 the concept of a dynamic array means that its boundaries may vary during run time, while for static arrays only explicit numbers are allowed in the boundary expressions.

For arrays that are local variables (like arrays B, C in the example above) the memory was allocated when entering the block and released when leaving it. Because declarations of the variables local in a block could appear only in the beginning of that block, no complications with redefining dynamic arrays occurred, and obviously such arrays could not be kept till the next entrance in this block. Nevertheless, in ALGOL-60, one could use the specifier own for any local variable including dynamic arrays, which meant that although the visibility of the concerned variables obeyed the rule of scope, their values were kept available after reentering the block. According to the Revised Report on ALGOL-60, this persistency in case of dynamic arrays ought to be followed also for the subset of indexes that are valid for the current and previous versions of the own array, although specific compilers could limit or simplify that behavior.

Note: Regarding compile-time vs. run-time assignment of memory to the variables, it was typically run-time assignment in ALGOL-60 vs. compile time in Pascal, although both use the stack.

3. The long and winding road

In Wirth’s Pascal, programmers could derive an unlimited number of new types from the basic types, but for some reasons the basic array type strictly required only constant boundaries and therefore allowed to deal only with static arrays. Wirth’s motivation probably was to have very fast and efficient one-pass compilers, for which all but indirect variables (i.e. except instances of pointer types) are compile-time variables providing the most efficient access. As a drawback, there was no way to overcome the static nature of the arrays, for example, to develop procedures which deal with vectors, matrixes and other structures of arbitrary sizes: at least once we had to hard-code all array sizes as the maximal possible numeric values in the const section.

True, at least for one-dimensional arrays with a constant low boundary Low1 and variable boundary High1 we could resort to a trick involving indirect variables (pointers)

type TMaxArray = array[Low1..MaxInteger] of AnyType;

PArray = ^TMaxArray;

and later allocate memory to the instance of PArray according to the actual value of High1:

GetMem(PArray, (High1 - Low1 + 1)*SizOf(AnyType));

Then, we can use index expressions like PArray^[k] (even omitting ^ because of the undocumented syntax feature of Delphi). Unfortunately, for multi-dimensional arrays this approach with indirect variables does not work so simply with just one pointer. Also, a drawback is that we have to deal with pointers instead of direct variables (being ourselves responsible for allocating and de-allocating memory and other possible confusions connected with indirect access).

3.1. Open Arrays

The open arrays introduced in Delphi-1 were the first extension of Pascal’s concept of static arrays, but it was not really a type like the others that could be used to declare variables. Instead, it was an intrinsic type applicable only to formal parameters in procedures and functions. If a formal parameter looks like

FormalArr : array of TSomeType;

the actual parameter may be either a one-dimensional array of type TSomeType, or just one variable, or the so-called open array constructor [Expr1, Expr2, …, ExprN] – all of type TSomeType. The latter is a nice feature not available in ALGOL-60, otherwise the behavior of the open array as a formal parameter suffers two serious drawbacks. First, only one-dimensional arrays as actual parameters are allowed. Second, whatever the low and high boundaries of the actual array are, it is always mapped to the zero-based formal open array. The same is true for the open array constructor. The expressions are numbered starting with 0, which looks a little confusing. This zero-base indexing of the open arrays is not consistent with the more convenient Pascal’s array boundaries of low..high type, but for some reason Borland-Inprise still adheres to this principle (see the Dynamic arrays below).

The open arrays enabled us to overcome a very serious limitation of static arrays and allowed to deal comfortably with one-dimensional arrays. Thus, procedures for simple vectors of any size like scalar product, average, maximum, minimum et cetera were no more a problem.

The variant open array formal parameter looking like

FormalVarArr : array of const;

is intended only to transfer the open array constructors containing expressions of different types as an actual parameter – an extension of the similar feature for open arrays and the predecessor of the more general idea of the type variant.

3.2. Variant Arrays

The type Variant introduced by Delphi-2, was a very powerful extension of Pascal, intended for different purposes. We are going to discuss it here only with regard to the concept of dynamic arrays. A variable declared as Variant may represent a multi-dimensional dynamic array, but you need a special non-Pascal statement to specify the dimensions and the type of elements:

var vArr : variant;

. . . .

begin

. . . .

vArr := VarArrayCreate([Low1, High1, . . . ,LowN, HighN], ElementType);

where the boundaries may be variables and ElementType belongs to the fixed list of basic Pascal types denoted by the identifiers of the format varXXXX, for example varInteger, varDouble (see Table of the Variant type constants). After that we can consider vArr as an N-dimensional index variable vArr[i1,i2,…,iN]. This implementation looks the closest to the notion of dynamic array as it appeared in ALGOL-60: it really made possible the multi-dimensional rectangular arrays with the variable boundaries of low..high type. Unfortunately, access to elements of variant arrays is at least 10 times slower than to static arrays: a simple benchmark that transposes a big matrix (A[i,j]:=A[j,i], i=1,…,N; j = 1,…,i-1) demonstrates it pretty well. Also, in terms of memory consumption, any variant variable requires 16 byte overhead. Although it does not seem too much if one variable represents a big array, but in case of many non-array variants, it is something to be kept in mind.

The re-dimensioning of the variant arrays is possible within the same block, but only for the last (rightmost) dimension: the special function VarArrayRedim does the job.

As to the efficiency of access to the elements, it may be improved to the level as fast as that of static arrays via the special procedure varArrayLock. It returns a pointer that is assignment-compatible with pointers to any static array, but is meaningful only if that static type corresponds exactly to the dimensions specified in the varArrayCreate. For example, for variant array vArr created above, the corresponding static array type must be

TvArrStat = array[LoN..HiN, . . . , Lo1..Hi1] of ElementType;

with the dimensions specified in the order inverse to that in the varArrayCreate (why?!) and all LoN, HiN, . . . , Lo1, Hi1 being constants equal to the current values of the corresponding variable boundaries! Then, providing the declaration

var vArrLock : ^ TvArrStat;

the statement vArrLock := varArrayLock(vArr) allows to use the index variable vArrLock^[iN,…,i1] (or simply vArrLock[iN,…,i1]) with the access speed as fast as for static arrays. We gained in speed, but in order to deal with variable boundaries, we have to explicitly declare as many different static array types as we are going to have in run time! For example, we may need to prepare in advance several type declarations like

type T200x200 = array[1..200; 1..200] of real;

T150x150 = array[1..150; 1..150] of real;

. . . .

and then to specify the variable dimensions in the VarArrayCreate according to one of these types – not a very convenient technique.

The interesting feature of variant arrays is that the ElementType may be variant too, for example vArr := VarArrayCreate([Low1, High1, varVariant). In particular, it means that individual elements vArr[k] may be defined as a variant array again with any number of dimensions of any size, for example vArr[k] := VarArrayCreate([kLow, kHigh, varDouble). This creates an illusion as though we can treat vArr as a two-dimensional non-rectangular array. (The examples of non-rectangular arrays of two dimensions are triangle matrixes, or matrixes with just a few stripes. For 3 dimensions, it may be an integer grid of points inside a pyramid – more of that below in the section Dynamic Arrays). Unfortunately, the variant array of variant arrays does not work like the similar construction of the standard Pascal arrays. Providing the declaration of vArr given above, for the index variable like vArr[i,j] the code compiles, but stops with a run time error (because vArr is created as one-dimensional). Surprisingly, vArr[i][j] – that should be the synonym in Pascal – shows different behavior: it produces even a syntax error if it appears in the left side of the assignment statement; a := vArr[i][j] compiles and runs correctly, while vArr[i][j] := b does not resulting in a syntax error.

So we see that although the variant array type allows some functionality of the ALGOL’s-60 dynamic arrays, the variant arrays are far more complex, slow and not consistent with the Pascal’s concept of arrays both in syntax and semantics.

3.3. Dynamic arrays of Delphi-4

Finally, here is the latest attempt to implement the dynamic arrays (covered only on 2 pages of the Object Pascal Language Guide!). The declaration of one-dimensional dynamic array looks like

type TDynArray1 = array of baseType;

(boundaries […] must be omitted), where the baseType may be also a static array type or a dynamic array type again. This allows to declare the multi-dimensional “mixture” as well as “purely” dynamic arrays. For example, for 3 dimensions the declaration takes the form

type TDynArray3 = array of array of array of baseType;

This syntax allows to declare a certain number of dimensions, but not their sizes (which require special consideration). Thus, if the baseType above is of the non-array type, a variable var A, B : TDynArray3 may be used with up to 3 indexes, for example A[i,j,k]. Otherwise, if say the baseType = array[1..100;1..200] of double, this variable may appear with up to 5 indexes A[k1,k2,k3,k4,k5].

After a dynamic array variable is declared, it still cannot be used unless the special statement SetLength (discussed later) specifies the sizes of all dimensions – and allocates the required memory. This shows the important difference between the static and dynamic arrays. The latter are (but only partially behave like) hidden pointers, i.e. a dynamic array variable is not strictly associated with its memory image, the instance, but rather separates from it. Thus, the above-mentioned variable A (without indexes), or A[i], or A[i,j] (with number of indexes less than the declared number) are all hidden pointers. As such, at certain moments they may point nowhere, or more than one hidden pointer may point to the same instance. For example, after the assignment A := B, both A and B point at the same instance of B, so that any change to the elements of A therefore affects also B – in contradiction to the usual meaning of the assignment statement. While the instance of A (if it exists) seems to be lost because it is pointed at by nothing, it does not cause a memory leak, which is prevented by the so-called reference count technique implemented for the dynamic arrays. For that reason, two consecutive statements SetLength(A, …); SetLength(A, …) do not cause the loss of the piece of memory allocated in the first statement (leak) – unlike the similar situation, say, for classes: the sequence X := TAnyClass.Create; X := TAnyClass.Create is a mistake. Also, the assignment A := nil actually signals to the system that the memory (instance) must be freed – which is never the case for classes or pointers!

One more non-obvious fact: even if A[i1, i2, i3] = B[i1, i2, i3] for all indexes, it never means that the conditions A = B or A[i1] = B[i1] or A[i1,i2] = B[i1,i2] are true, because these not fully indexed variables as pointers point to different locations.

In terms of persistency, while leaving and reentering a block, the dynamic array variables behave like all other local (static) variables: leaving the block, the variables and their instances are freed automatically (for local variables of the types class and pointer it is wrong to leave the block without freeing all the instances of all such variables – the reason why such variables must be declared, for example, global). The user should nil a dynamic array only if it is important to free the memory before leaving the block. Hence, dynamic arrays are much safer than classes and pointers.

Thus, both the syntax and semantics of dynamic arrays differ from those of static arrays. Two system procedures, SetLength and Copy, previously intended to deal with strings, are applied now also for dynamic arrays. To define the sizes of dimensions – and the allowed index space – we have to use the system procedure SetLength(A, Length1,…) with a non-fixed number of the integer parameters Length1,…, LengthN. At least one of them is always required to specify the size of the leftmost dimension. If the number of dimensions is more than 1, the remaining sizes may be specified either in the same SetLength statement, or later in other such statements individually for each sub-array element. The former method defines rectangular arrays, similar to those known in ALGOL-60 or standard Pascal (but with mandatory zero low indexes), while the latter enables the so-called non-rectangular arrays.

For example, providing var A : TDynArray3, the single statement SetLength(A, N1,N2,N3) defines the rectangular array with the index field [0..N1-1; 0..N2-1; 0..N3-1], while the statement SetLength(A, N1) defines the size for the first dimension as N1 and correspondingly the index field for the first index as [0..N1-1]. This postpones the definition of the 2 other dimensions for each A[k] individually. The following sample defines two types of triangle matrixes:

var A, B : array of array of double;

N, i : integer;

begin

{defining N}

SetLength(A, N);

SetLength(B, N);

for i := 0 to N-1 do

begin

SetLength(A[i], i+1); {lower-left triangle matrix; index field 0£ i£ N-1, 0£ j£ i }

SetLength(B[i], N-i); {upper-left triangle matrix; index field 0£ i£ N-1, 0£ j£ N-i-1 }

end;

. . . .

end;

Unfortunately, due to the limitation imposed by zero based indexing, dynamic arrays do not allow to define the lower- and upper-right triangle matrixes, matrixes with several diagonal stripes et cetera this way.

Finally, here are some examples of 3-dimensional dynamic arrays of a 3- and 4-lateral pyramid-type:

var C, D : array of array of array of double;

N, i, j : integer;

begin

{defining N}

SetLength(C, N);

SetLength(D, N);

for i := 0 to N-1 do

begin

SetLength(C[i], i+1, i+1); {4-lataral pyramid; index field 0£ i£ N-1, 0£ j,k£ i}

SetLength(D[i], i+1) ; {3-lataral pyramid}

for j := 0 to i do SetLength(D[i,j], j+1) {index field 0£ i£ N-1, 0£ j£ i, 0£ k £ j }

end;

. . . .

end;

As to the speed of access to elements of dynamic arrays, it is almost as high as for static arrays, at least one- and two-dimensional ones, as the benchmark with matrix transposing proves. For static arrays, the memory location of each element in a multi-dimensional array is known as soon as the index expression computed. Thus, for a static element say A[k1,k2,k3] the relative location may look like N2N3*k1 + N3*k2 + k3. Instead, for dynamic arrays to access an element of K-dimensional array, the code has to sequentially de-reference K pointers to the respective 1-dimensional sub-arrays. Both approaches seem compatible.

4. Conclusions

The evolution of dynamic arrays in Borland-Inprise Pascal-Delphi was not simple and straightforward. With regard to the functionality, the multi-dimensional rectangular variant arrays are the closest to the concept of dynamic arrays as they first appeared in ALGOL-60, but variant arrays are 10 times slower than static ones, and they differ in syntax. In addition, for the concept of variant array of variant arrays, both syntax and semantics remain not quite clear.

The dynamic arrays in Delphi-4 exceed the arrays of ALGOL-60 at least in that they can be non-rectangular and still be as fast, as static arrays in Pascal. Unfortunately, this notion reveals several language inconsistencies.

1. Why the mandatory zero based indexing, while the static and variant arrays do not require it? The low..high indexing in standard Pascal is an important feature, that can be very helpful in many applications. Also, the most natural and safe method of numbering in structures like vectors and matrixes is to number the elements in a vector corresponding to the index field 1..N, not 0..N-1 .

2. Why the special syntax in the declaration, which omits the boundaries [ ]? As a result, the separate statement SetLength is later always required. True, SetLength allows to re-define the dimensions several times within a block, if it is really needed, but for the more typical case when the dimensions and the sizes are declared once at the beginning of the block, the standard form array[low..high] is better, because it is consistent with the syntax. The compiler knows by itself if the boundary expression is constant or variable, therefore it could implement the static or dynamic models according to the situation.

3. The assignment statements with incomplete indexes for dynamic array variables such as A := B have a quite unusual meaning: any change in an element A[k] affects also B[k] because A and B refer to the same instance. This behavior contradicts the standard meaning of the assignment statement and is dangerous, so that incomplete indexes in assignment statements for dynamic arrays should not be allowed.

4. The behavior of dynamic arrays as hidden pointers is better and safer than the behavior of classes (also hidden pointers) or pointers. Why not to improve the behavior of classes and pointers to the level of dynamic arrays so that all indirect reference mechanisms in the language follow uniform rules?

5. Logically, the Delphi type class must be just an indirect reference version of the type object introduced in Turbo-Pascal 5.5. The only difference should be in the method of memory allocation: for the class on the heap, while for the object – on the stack, which is safer and does not involve the dangerous separation of variables from their instances. Delphi-4 and all previous versions do support the type object for backward compatibility, but there are still certain differences between syntax and semantics of both types (discussed earlier in my article “Fewer pointers – less double thinking”, DI ….)

The fact that now there are as many as four different array types with quite different and inconsistent syntax and semantics in Borlan-Inprise Pascal-Delphi seems to be bad by itself. Too many, not always good, new features have been added to standard Pascal, which makes it cluttered, hectic and less safe. It looks like the language now evolves not according to a well-developed fundamental plan, but is instead trying to cope somehow with all different and inconsistent features of the very complex operating environment. Delphi is still an unparalleled software development tool on the market, but it is getting more and more complex, while its documentation and help system lag behind. Even the Object Pascal Language Guide, the fundamental document of the language, is neither complete nor clear, strict and formal enough, as one could expect from a document of this type. This bears no resemblance to that high standard of documentation that Borland was proud of in the era of Borland Turbo Pascal. So: back to the future?

Acknowledgments: I am very thankful to Dr. Manfred Mackeben for the patience in reading and improving this text and for many valuable notes.

Alexander Gofen