Sunday, July 6, 2008

C++: What's the difference between class and struct?

What's the difference between class and struct?

A beginner would say:

"There are a lot of differences, class is a C++ element and struct is a C one. We can have functions, virtual functions, inheritance, different access modifiers private, protected, public in a class but probably not in a struct. A struct generally holds member variables only."

(Just to clarify, the above statement is completely wrong.)

Intermediate and most advanced C++ programmers would say:

"class and struct do not have any difference, except the fact that class members are by default private and struct members are by default public. The keywords class and struct are interchangeable."

The above statement is correct to some extent and would satisfy most questioners, however, there are some special cases where class and struct are not interchangeable.

Consider the following code which compiles well in a C++ complier:

template <class T>
void fn()
{
}

Now replace the keyword 'class' with 'struct' and compile it..

template <struct T>
void fn()
{
}

It gives a compiler error! According to MSDN's definition of the template keyword:

"The template-parameter-list is a comma-separated list of template parameters, which may be types (in the form class identifier, typename identifier, or template < template-parameter-list > class identifier) or non-type parameters to be used in the template body."

So the template parameter list doesn't take the keyword 'struct'!

The template mechanism itself was introduced in later phase of the C++ evolution, some years later than the first version of C++. The 'struct' keyword was probably ignored when defining template which was considered to be an advance feature of C++.

"Another reasonable use of the C struct in C++, then, is when you want to pass all or part of a complex class object to a C function. This struct declaration serves to encapsulate that data and guarantees a compatible C storage layout. This guarantee, however, is maintained only under composition." said Stanley B. Lippman, author of several C++ books.

C++: Accessing the virtual table directly

This post is not intended for beginners. To understand the content of this topic, you need to have basic understanding of what virtual functions are.

We know that the run time binding or virtual function mechanism is implemented by a virtual table. If a class has at least one virtual function a virtual table will be created for that class. To be specific, 'only one' virtual table will be created for all of the instances/objects of that class. Each of the instances and objects will have a pointer to the virtual table.

The same thing is true for a class hierarchy. Meaning, if class Z derives class Y and class Y derives class X, only one virtual table will be created for all instances/objects of class X, Y and Z. Each of the instances and objects of X, Y and Z will have a pointer to the virtual table.

===============
Added on July 14, 2008:
The virtual tables for each of class X, Y and Z share common information but they are not necessarily the same table for each of these classes. The scenario is complex for multiple and virtual inheritance. I would like to discuss them in future posts.
===============

A pointer is 32 bit/4 bytes in a 32-bit architecture and 64-bit/8 bytes in a 64-bit architecture. So all instances/objects of a class or class hierarchy, where we have a virtual table, will have additional 4 bytes in them and 8 bytes in case of a 64-bit architecture.

This pointer is called virtual table pointer, sometimes 'vptr'. In VC++ compiler, the objects will have a pointer named '__vfptr' in them and in some other compiler it's '__vptr_X', where X is the class name.

Now __vfptr is not directly accessible from your code. For example, if you write the following code you'll get a compiler error as the __vfptr is not available for your use.

  1 X a;
2
cout << a.__vfptr;

However, if you debug the code in VC++, you can see the 'a.__vfptr' in the variable watch windows. Interesting ha?

Okay, now we'd like to see how we can access the virtual table even if the compiler doesn't want us to. Let's have class X with a virtual function fn() which simply prints a member variable and we want to access the virtual table of class X to call the function fn() using it. The following code does that.

  1 #include <iostream>
2
3
using namespace std;
4
5
//a simple class
6
class X
7
{
8
public:
9
//fn is a simple virtual function
10
virtual void fn()
11
{
12
cout << "n = " << n << endl;
13
}
14
15
//a member variable
16
int n;
17
};
18
19
int main()
20
{
21
//create an object (obj) of class X
22
X *obj = new X();
23
obj->n = 10;
24
25
//get the virtual table pointer of object obj
26
int* vptr = *(int**)obj;
27
28
// we shall call the function fn, but first the following assembly code
29
// is required to make obj as 'this' pointer as we shall call
30
// function fn() directly from the virtual table
31
__asm
32
{
33
mov ecx, obj
34
}
35
36
//function fn is the first entry of the virtual table, so it's vptr[0]
37
( (void (*)()) vptr[0] )();
38
39
//the above is the same as the following
40
//obj->fn();
41
42
return 0;
43
}
44
Please note, this code is compiler dependent and may only work on VC++ compilers and it'll work correctly when you'll run it in 'Release' mode. Here goes some explanation of the code.

In line 26, we have:
 26  int* vptr =  *(int**)obj;
The virtual table pointer __vfptr is available in the first 4 bytes of the object. In this line, we get the value of the pointer __vfptr or the address of the virtual table as an integer pointer (say as a pointer to an integer array).

The first entry of the virtual table is the function pointer of the virtual function 'fn'. We can access the first entry using vptr[0] (as this is just an array). So, in line 37, we just call the function using the function pointer. But wait, you might be asking why the following assembly line is there before that function call.
 33   mov ecx, obj
If you take another look into the implementation of function fn(), you can see that it prints out the member variable 'n', which is only avaliable to object 'obj'. Inside the function fn(), 'obj' needs to be set as 'this' pointer, to give the function fn() access to all it's members.

When we call the function fn() in this way: obj->fn(), the compiler does the job for us and sets 'obj' as 'this' before calling the function. But in line 37, we couldn't specify anything to the function fn() saying it is called for the object 'obj', so the function won't find out where to get the value of 'n' from. This is why we expicitly need to set the 'obj' as 'this' before we call the function fn() in line 37. We did that in line 33, in the assembly code. This line is again VC++ specific. In VC++, 'this' pointer is set in the register 'ECX'. Some other compiler may handle that differently.

If we had more virtual function, we could have access them using next indexes of vptr: vptr[1], vptr[2], etc.

We have learned some interesting facts about the virtual functions and the virtual table. We may not have any use of this kind of code where we need to directly access the virtual table in our general applications but this helps when you want to know more about C++ internals.

Enjoy!

July 12, 2008:
We assumed here that the vptr is placed in the beginning of the class object. here's a note on that:

Traditionally, the vptr has been placed after all the explicitly declared members of the class. More recently, it has been placed at the beginning of the class object. The C++ Standard allows the compiler the freedom to insert these internally generated members anywhere, even between those explicitly declared by the programmer.