Arrays in C++

deceptikon 4 Tallied Votes 755 Views Share

Why Use Arrays?

Let's begin by considering why arrays might be needed. What is the problem that this feature solves? How does it make your life as a programmer easier? When studying new features, it's important to recognize what that feature does for you and where the incentive is in using it. Otherwise you might learn a feature just because it's there, but never actually use it, or never learn it at all.

Fortunately, arrays are very easy to justify. Consider a small program that collects 5 names and displays them to the user:

#include <iostream>
#include <string>

using namespace std;

int main()
{
    string name1, name2, name3, name4, name5;
    int n = 0;

    cout << "Enter 5 names (EOF to stop): ";

    if (getline(cin, name1)) {
        ++n;
    }

    if (getline(cin, name2)) {
        ++n;
    }

    if (getline(cin, name3)) {
        ++n;
    }

    if (getline(cin, name4)) {
        ++n;
    }

    if (getline(cin, name5)) {
        ++n;
    }

    if (n > 0) {
        cout << "Name 1: " << name1 << '\n';
    }

    if (n > 1) {
        cout << "Name 2: " << name2 << '\n';
    }

    if (n > 2) {
        cout << "Name 3: " << name3 << '\n';
    }

    if (n > 3) {
        cout << "Name 4: " << name4 << '\n';
    }

    if (n > 4) {
        cout << "Name 5: " << name5 << '\n';
    }
}

We can immediately see that this is tedious and repetitive. It encourages copy/paste and is easy to make mistakes in such a program. Further, while 5 names are not that bad, what if you might have 100? Or 1000? This approach does not scale well at all, and it quickly becomes unmanageable.

Note that the variables for our names end with 1, 2, 3, etc... This means these variables are related. What we would like to do is roll them up into a collection so that this relation is more clear and also easier to work with. Right now all of the variables are completely independent. The relationship is in name only, as it were. ;)

What Are Arrays?

So what are arrays? Arrays are a fundamental feature of C++ that represent a collection of related items. This collection can be treated as a single entity or queried to retrieve individual items stored therein. When I say "related", I mean both syntactically and logically. Arrays are a homogeneous collection, meaning that a single array can only store items of the same data type. You can have an array of int or an array of string, but not an array of both int and string.

It's really as simple as that, though confusion can arise because an array itself is a data type. As such, you can have an array of arrays, or an array of arrays of arrays. The idea is that arrays can be nested within each other to greatly improve flexibility.

An array of scalar type (such as int or string) is colloquially referred to as a 1 dimensional array, or 1D array. An array of arrays of scalar type is a 2 dimensional array, or 2D array. 2D arrays further can be called tables or matrixes because a common visual representation is a table of rows and columns:

  • 1D Array of int:

    {1, 2, 3, 4, 5}

  • 2D Array of int:

    {01, 02, 03, 04, 05}
    {06, 07, 08, 09, 10}
    {11, 12, 13, 14, 15}

This concept can be scaled up into 3D arrays, 4D arrays, 5D arrays, and so on, but it's rare to see arrays with more dimensions than 2, and almost unheard of for arrays with dimensions greater than 3. This article will focus on 1D and 2D arrays because they are the most common, and the rules don't change as more dimensions are added.

How Are Arrays Used?

So how do you create and use an array? Let's begin by using an array to simplify the names program above:

#include <iostream>
#include <string>

using namespace std;

int main()
{
    string names[5];
    int n = 0;

    cout << "Enter 5 names (EOF to stop): ";

    for (n = 0; n < 5; n++) {
        if (!getline(cin, names[n])) {
            break;
        }
    }

    for (int i = 0; i < n; i++) {
        cout << "Name " << i + 1 << ": " << names[i] << '\n';
    }
}

This code is much shorter. Further, you can have more names without making the code any longer; only the size of the array needs to change wherever it's used. Change instances of 5 to 100 and the program can suddenly accept 100 names instead of 5. That's a huge win in code maintenance. The program is also easier to reason about and verify; each iteration of the loops works with a single item in the array, and there's no repetition of code to potentially make mistakes on.

The syntax of an array declaration can be quirky, but at its simplest the form is

T identifier[number of items];

for a 1D array and

T identifier[number of 1D arrays][number of items];

for a 2D array. For a 3D array this concept is merely extended to another dimension:

T identifier[number of 2D arrays][number of 1D arrays][number of items];

T represents a data type such as int or string, and this tells you what data type the array will ultimately be capable of storing. Here are a few examples to hammer down how it looks:

int    a[10];        // An array that can store 10 integers
int*   b[100];       // An array that can store 100 pointers to int
double c[2][5];      // An array that can store 10 doubles in 2 rows of 5
int    (*pa)[10];    // A pointer to an array that can store 10 integers
void   (*d[5])(int); // An array of 5 pointers to functions that take an integer parameter and return nothing

The last two examples (pointer to an array and array of pointers to function) are where the quirkiness of C++'s declaration syntax really starts to show. It's a good idea to learn the rules of the declaration syntax, but that is beyond the scope of this article. For now, I'll show you a trick to make it easer. The typedef feature allows you to build a complex type such that it can then be used with the same syntax as a scalar type:

typedef int* intp_t;
typedef int array10_t[10];
typedef void (*func_t)(int);

intp_t     b[100]; // An array of 100 pointers to int
array10_t* pa;     // A pointer to an array of 10 int
func_t     d[5];   // An array of 5 pointers to function

Once you have an array, you can use it as a single entity in one of two ways:

  • Object Context: The array is treated as its own unique entity. In object context you can acquire information about the array sans its items.
  • Value Context: The array is converted to a pointer to the first element stored in the array. For a 1D array this is the 0th item. For a 2D array it is the 0th item in the first row.

There are two object contexts. When an array is an operand of the sizeof operator, and when an array is an operand of the & (address-of) operator. These two operators will give you the size of the array in bytes, and the address of the array, respectively. The size of an array is the collective size of all of its items:

sizeof(a) == sizeof(int) * 10
sizeof(b) == sizeof(int*) * 100
sizeof(c) == sizeof(double) * 10

Note that for sizeof(c), only the stored doubles contribute to the size of the array even though c is a 2D array. This is due to how arrays are stored in memory. Even though the visual representation of a 2D array is a table of rows and columns, arrays are stored contiguously in memory. So our first example of a 2D array:

{01, 02, 03, 04, 05}
{06, 07, 08, 09, 10}
{11, 12, 13, 14, 15}

would in practice be stored in memory as a 1D array:

{01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12, 13, 14, 15}

It follows that for a 2D array, logically you can use a pointer to traverse the entire array's stored items with a single loop:

int a[2][5] = {
    {1, 2, 3, 4, 5},
    {6, 7, 8, 9, 0}
};

for (int *p = &a[0][0]; p != &a[2][5]; p++) {
    cout << *p << '\n';
}

However, note that due to one little niggling rule in the C++ standard, you cannot safely assume that any multidimensional array can be treated as a 1D array. The rule is that you cannot walk off the end of an array. So while in practice it's often safe, technically according to the C++ standard, the above code is unsafe because p walks off the end of the first row of 1D arrays. Use this trick at your own risk.

To be strictly correct and portable, nested loops are required when traversing a 2D array:

int a[2][5] = {
    {1, 2, 3, 4, 5},
    {6, 7, 8, 9, 0}
};

for (int i = 0; i < 2; i++) {
    for (int j = 0; j < 5; j++) {
        cout << a[i][j] << '\n';
    }
}

Obtaining the address of an array is equivalent to obtaining the address of the 0th item:

&a == &a[0]
&b == &b[0]
&c == &c[0][0]

Once again, due to the storage of arrays in memory and the fact that there is no significant metadata for an array object, the 0th item has the same address as the array object itself. The only difference is how this address is interpreted (as an array object or as an item stored by the array). You'll notice that I took advantage of this fact in the traversal loop with a pointer above.

Anything that is not an object context is a value context. What this means in practice is that most of the time you use an array, it will be converted to a pointer to the first item. This does not mean that arrays and pointers are the same, that's a common misconception. But it does mean that pointers and arrays are closely related, as you'll see in a moment.

Arrays would be far less useful if you couldn't get quick access to any item, but you can. Arrays are a random access collection wherein you can use the [] (subscript) operator to reach any valid index in a dimension. I've already used the subscript operator many times in the examples, so hopefully it is clear how it is used. Each dimension is indexed in the same order that the array was declared. Given an array definition int arr[2][5] = {0};

arr[0][2] = 123; // Assign 123 to the 1st row's 3rd item

would result in the following array representation:

{0, 0, 123, 0, 0}
{0, 0,   0, 0, 0}

If we then did this:

arr[1][0] = 456; // Assign 456 to the 2nd row's 1st item

The resulting array would be:

{  0, 0, 123, 0, 0}
{456, 0,   0, 0, 0}

Of immediate and critical note is that arrays use 0-based indexes. That means the count starts at 0 instead of 1. It also means that the last valid index of a dimension will be one less than in the declaration. For int arr[2][5], the last valid index of the first dimension is 1 and the last valid index of the second dimension is 4.

In terms of verifying code correctness, this 0-based indexing is actually a good thing, but the reason it works this way is due to the relationship of arrays and pointers. Recall that in value context an array is converted to a pointer to the first item in the array. The subscript operator works in value context, which means you're not actually subscripting an array but a pointer.

Under the hood, the subscript operator is nothing more than syntactic sugar for using an offset from a pointer and then dereferencing it:

a[0]    == *(a + 0)
c[1][2] == *(*(c + 1) + 2)

Because the address of an array and address of the first item are one in the same, no offset is needed, thus the index is 0:

a[0] == *(a + 0) == *a

As a final note on indexing, take great care in indexing an array, because C++ will not stop you from using an index that is out of bounds for the array. This can be used to great effect in non-portable code, but it is also a significant source of bugs.

One common error with arrays is attempting to use sizeof after an array was passed to a function as a parameter:

void foo(int a[])
{
    cout << sizeof a << '\n';
}

int main()
{
    int a[] = {1, 2, 3, 4, 5};

    foo(a);
}

This will not work as intended because the array is passed in value context. In other words, foo's array parameter is not really an array, it's a pointer. This can be intuited by the fact that the first dimension size of an array parameter is not required (though subsequent dimension sizes are required), and if present will be ignored. These three function signatures are functionally equivalent:

void foo(int a[123]);
void foo(int a[]);
void foo(int *a);

This article only scratches the surface of how you might use arrays in your programs, but all of the basics are here and can be built upon. You're strongly encouraged to play around with arrays and gain experience with them in practice.

When Should Arrays Be Used?

The last and probably most important topic is when should you use arrays? The answer in C++ is surprisingly, "almost never". The standard C++ library provides several alternatives to arrays that are safer and contain fewer gotchas, such as std::vector and std:array. Further, C++11's intialization list syntax means that objects of these classes can be initialized in the same manner as arrays.

However, knowing how to use arrays to their fullest will help you to also use these alternatives to their fullest and also understand the exposed features, since std::vector and std::array are designed to be array-like in look and feel. Coverage of those classes is beyond the scope of this article, but I encourage you to look them up.

Assembly Guy commented: Good work, have some more rep :P +5
colins.maruf2 0 Newbie Poster

array is aseries of element of the same type palced in a continous memory location that can be individually reference by adding an index to unique identifer. And also they make work easy and fast, how by, saving time for the programmer

Assembly Guy 72 Posting Whiz

Nice tutorial deceptikon, I very much like the style. I used to have teachers who would try and teach the class something new without first setting a scene of what happens without the new tool. They'd be talking to class full of 'Hello, World!'-ers, saying about what arrays are, but not creating that contrast, so many people would come away from the lesson thinking "Why not just use multiple variables instead?". This tutorial is the opposite, which is awesome.

TL;DR: Nice work on both content and style :)

Learner010 99 Posting Whiz in Training

thanks for such a nice tutorial.
really very interesting .

looking forward to your pointers tutorial.

thanks.

kal_crazy 13 Junior Poster

Thanks Deceptikon for this amazing tutorial.

There were some things about arrays that I still did not know about and your post opened my half-opened eyes :)

Will you be posting more tutorials on other topics? Very keen on learning more.

Learner010 99 Posting Whiz in Training

int (*pa)[10]; // A pointer to an array that can store 10 integers

does that equivalent to int *pa=new int[10]; ?

deceptikon 1,790 Code Sniper Team Colleague Featured Poster

does that equivalent to int *pa=new int[10];?

No. int* and int(*)[10] are two completely different and incompatible types.

Learner010 99 Posting Whiz in Training

what i learn about pointer to array is that

A pointer to array is a pointer pointing to an array

example:

int arr[10];
int* ptr;

ptr=arr; // pointer to array

and your statement is

int (*ptr)[10]; //pointer to array

would you please clear doubts about pointer to array(with little easy code) ?

mike_2000_17 2,669 21st Century Viking Team Colleague Featured Poster

Let's start with a couple of definitions:
"Array": A sequence of elements packed next to each other in memory.
"Pointer": An address to a location in memory.

So, when you have this:

int arr[10];

you have a variable called arr whose type is an "array of 10 integers".

When you have this:

int* ptr;

you have a variable called ptr whose type is a "pointer to an integer", which means that it contains an address to a location in memory at which it is expected that there will be an integer value.

When you have this:

int (*aptr)[10];

you have a variable called aptr whose type is a "pointer to an array of 10 integers", which means that it contains an address to a location in memory at which it is expected that there will be 10 integers value next to each other in memory (i.e., an array of integers).

When you do this:

ptr = arr; // pointer to array

the type of ptr is still a "pointer to an integer", however, after this assignment it just so happens that ptr will be pointing to the start of the array arr (which is implicitly converted to a pointer there). And so, we say that ptr points to an array, but it's type is still int* (or "pointer to an integer"). The thing is this, the int* type only requires (by way of the type rules) that there is an integer at the location that ptr points to, which allows ptr to be pointing to the start of an array of integers (there is at least one integer there).

In that sense, the aptr is more restrictive, because it can only point to an array of 10 integers, while the ptr can point to any integer, including integers at the start, middle or end of an array. That's the different.

Learner010 99 Posting Whiz in Training

i was wondering if somebody could explain this concept with suitable example(i.e. code snippt).

Ancient Dragon 5,243 Achieved Level 70 Team Colleague Featured Poster

I could have sworn that I've seen this posted in DaniWeb C++ forum

void foo(int n)
{
    int ay[n];
    ...
    ...
}

I'm reading "The C++ Porgramming Language, 4th edition", and on paragraph 4.3 he specifically states that "The number of elements in the array, the array bounds, must be a constant expression." Why the difference?

mike_2000_17 2,669 21st Century Viking Team Colleague Featured Poster

That example uses a C99 feature called Variable-Length Arrays. This was introduced in C99, but then made optional. Generally, the compiler-vendors that support this feature on their C compiler generally support it also as an extension for their C++ compiler. For example, it works on GCC. The microsoft compiler has stopped C support to C90, and therefore, they don't support much of anything in C that is younger than 24 years old.

Anyways, strictly speaking, this is not standard C++, just a very common extension (GCC, Clang, ICC, etc.). In standard C++, the array bound must be a constant expression, as your quote says.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.