Sunday, July 6, 2008

C++: Accessing the virtual table directly

This post is not intended for beginners. To understand the content of this topic, you need to have basic understanding of what virtual functions are.

We know that the run time binding or virtual function mechanism is implemented by a virtual table. If a class has at least one virtual function a virtual table will be created for that class. To be specific, 'only one' virtual table will be created for all of the instances/objects of that class. Each of the instances and objects will have a pointer to the virtual table.

The same thing is true for a class hierarchy. Meaning, if class Z derives class Y and class Y derives class X, only one virtual table will be created for all instances/objects of class X, Y and Z. Each of the instances and objects of X, Y and Z will have a pointer to the virtual table.

===============
Added on July 14, 2008:
The virtual tables for each of class X, Y and Z share common information but they are not necessarily the same table for each of these classes. The scenario is complex for multiple and virtual inheritance. I would like to discuss them in future posts.
===============

A pointer is 32 bit/4 bytes in a 32-bit architecture and 64-bit/8 bytes in a 64-bit architecture. So all instances/objects of a class or class hierarchy, where we have a virtual table, will have additional 4 bytes in them and 8 bytes in case of a 64-bit architecture.

This pointer is called virtual table pointer, sometimes 'vptr'. In VC++ compiler, the objects will have a pointer named '__vfptr' in them and in some other compiler it's '__vptr_X', where X is the class name.

Now __vfptr is not directly accessible from your code. For example, if you write the following code you'll get a compiler error as the __vfptr is not available for your use.

  1 X a;
2
cout << a.__vfptr;

However, if you debug the code in VC++, you can see the 'a.__vfptr' in the variable watch windows. Interesting ha?

Okay, now we'd like to see how we can access the virtual table even if the compiler doesn't want us to. Let's have class X with a virtual function fn() which simply prints a member variable and we want to access the virtual table of class X to call the function fn() using it. The following code does that.

  1 #include <iostream>
2
3
using namespace std;
4
5
//a simple class
6
class X
7
{
8
public:
9
//fn is a simple virtual function
10
virtual void fn()
11
{
12
cout << "n = " << n << endl;
13
}
14
15
//a member variable
16
int n;
17
};
18
19
int main()
20
{
21
//create an object (obj) of class X
22
X *obj = new X();
23
obj->n = 10;
24
25
//get the virtual table pointer of object obj
26
int* vptr = *(int**)obj;
27
28
// we shall call the function fn, but first the following assembly code
29
// is required to make obj as 'this' pointer as we shall call
30
// function fn() directly from the virtual table
31
__asm
32
{
33
mov ecx, obj
34
}
35
36
//function fn is the first entry of the virtual table, so it's vptr[0]
37
( (void (*)()) vptr[0] )();
38
39
//the above is the same as the following
40
//obj->fn();
41
42
return 0;
43
}
44
Please note, this code is compiler dependent and may only work on VC++ compilers and it'll work correctly when you'll run it in 'Release' mode. Here goes some explanation of the code.

In line 26, we have:
 26  int* vptr =  *(int**)obj;
The virtual table pointer __vfptr is available in the first 4 bytes of the object. In this line, we get the value of the pointer __vfptr or the address of the virtual table as an integer pointer (say as a pointer to an integer array).

The first entry of the virtual table is the function pointer of the virtual function 'fn'. We can access the first entry using vptr[0] (as this is just an array). So, in line 37, we just call the function using the function pointer. But wait, you might be asking why the following assembly line is there before that function call.
 33   mov ecx, obj
If you take another look into the implementation of function fn(), you can see that it prints out the member variable 'n', which is only avaliable to object 'obj'. Inside the function fn(), 'obj' needs to be set as 'this' pointer, to give the function fn() access to all it's members.

When we call the function fn() in this way: obj->fn(), the compiler does the job for us and sets 'obj' as 'this' before calling the function. But in line 37, we couldn't specify anything to the function fn() saying it is called for the object 'obj', so the function won't find out where to get the value of 'n' from. This is why we expicitly need to set the 'obj' as 'this' before we call the function fn() in line 37. We did that in line 33, in the assembly code. This line is again VC++ specific. In VC++, 'this' pointer is set in the register 'ECX'. Some other compiler may handle that differently.

If we had more virtual function, we could have access them using next indexes of vptr: vptr[1], vptr[2], etc.

We have learned some interesting facts about the virtual functions and the virtual table. We may not have any use of this kind of code where we need to directly access the virtual table in our general applications but this helps when you want to know more about C++ internals.

Enjoy!

July 12, 2008:
We assumed here that the vptr is placed in the beginning of the class object. here's a note on that:

Traditionally, the vptr has been placed after all the explicitly declared members of the class. More recently, it has been placed at the beginning of the class object. The C++ Standard allows the compiler the freedom to insert these internally generated members anywhere, even between those explicitly declared by the programmer.

31 comments:

rezahok said...

Very interesting post. I would say it added to my knowledge base !

/Reza

Z said...

Very interesting indeed!

Z said...

How unimportant may this be to the world but in place of "int *vprt" I would use "size_t *vptr" :P. The reason is, in 64-bit systems each address is 8 byte in length and when we are indexing vptr[0] and etc, we are necessarily indexing for address, i.e. the value of vprt[0] is again an address (to the function). Now, for 64-bit systems, "int *vptr" will only read 4 bytes and will miss the function pointer. The safest way to deal with this kind of situation is using size_t defintion. I know I'm overreacting to this way too much as this code is system specific anyway. I just wanted to point out that in 64-bit systems, size of int, long maybe not be consistent to the size of address (64-bit). But size_t will always be as big as the address for a given system. Using size_t is very important when we want to write codes for both 32-bit and 64-bits and also for embedded systems. [just a head-fake]

luc said...

Your article was beneficial to me in a great way since now I am certain about how dangerous static casting might prove to be if a class contains virtual functions.

I had no knowledge that the virtual table occupies first four bytes. This would mean, typecasting a class object to a simple int or an object of some other type can be fatal..

Thanx King..
well done (y)

M. Kaisar Ul Haque said...

@luc:

C style type casting is dangerous practice. C++'s static cast won't allow you to convert a class to int/long, you need to use reinterpret_cast or C style cast.

I would never use C style casting for converting a class to int/long. I would rather use 'conversion operator', it's safe and very beautiful. Following is an example:

class X
{
public:
X(long _n): n(_n){ }
operator long() const { return n; }

long n;
};

//usage:
X a(10);

long b = a;

mahdi said...

Hi Kaisar,
I found your article very interesting and I think you are the kind of man who go beyond the borders. This article have reminded me a problem I have encountered. The following is a C++ question, if you can answer it let me know about the solution (is a solution there is).
What do you think about a class A, friend of classes B and C each having its own attributes. Class A want to know all attributes each class encapsulates by a direct access to this information (without B and C providing a method to do this). Is this possible (with some assembling code) or not ??
thank you

Mahdi

mahdi said...

Hi Kaisar,
I found your article very interesting and I think you are the kind of man who go beyond the borders. This article have reminded me a problem I have encountered. The following is a C++ question, if you can answer it let me know about the solution (is a solution there is).
What do you think about a class A, friend of classes B and C each having its own attributes. Class A want to know all attributes each class encapsulates by a direct access to this information (without B and C providing a method to do this). Is this possible (with some assembling code) or not ??
thank you

Mahdi

Vivek said...

Hi Kaisar,

That was an amazing article to go through & a great insight. Thanks alot for sharing this.
Can you share me more details :

1) Following is puzzling me:
( (void (*)()) vptr[0] )();

I am sure we can split this in multiple instrcutions to make it more user readable?
Can you help by splitting so that it is understandable to me?

2) I checked this program in "Microsoft Visual C++ 2005", but i could not get the value of "n", even if the __asm instructions are present. It seems "Microsoft Visual C++ 2005" does not use ECX register for this pointer. Is it so?

Regards,
Vivek

Mohammad Kaisar Ul Haque said...

Hi Mahdi,

Sorry for this long late reply (I was not following this thread).

'Friend' is actually for the programmer and not the machine itself. You can access private/protected members of an instance of any class if you know the structure. For example:

//top secret class X
class X
{
private:
int n;
}

//main:
X a;
void *p = (void *)&a;
*((int *)p) = 10;

//a.n is now 10.. which is an un-friend-ly access.

What I understand from your question is that you want to get list of member variables of an object of a class X where you do not know about class X. This is also known as object introspection.

Unfortunately in C/C++, after the code gets compiled, there no structs/classes or objects. All members are converted into memory offsets. The compiler doesn't keep special information about members that you can access later at run time. However, you can create this behavior by keeping meta information about the members is some way. An example can be found here:

http://www.codeproject.com/KB/cpp/exposeattributes.aspx

You can google for more examples. Hope this helps.

Thanks.

Mohammad Kaisar Ul Haque said...

Hi Vivek,

Sorry for this late reply.

Answer to 1:

( (void (*)()) vptr[0] )();

In the above, we are casting the vptr[0] as a function pointer and calling the function at the same time. Here is a break down of it:

typedef void (*func_ptr)( void );

((func_ptr) vptr[0] )();

You can also break it down in other ways. If it is still confusing, you can refer to some articles/tutorials on C function pointers.

Answer to 2:

I actually used Visual C++ 2005 when I wrote the article and this is actually a VC++ specific code (which will not work in some other compiler). Please make sure you have used 'release mode' build to test the code.

If you are still facing difficulties, I can mail you the project files.

Thanks.

Vivek said...

Hi Kaisar,

Thank you for your reply.

I understand the answer to Point 1 now.

About Point 2, can you please mail the project files?

Thanks.

Regards,
Vivek

Mohammad Kaisar Ul Haque said...

Hi Vivek,

Please download the VS 2005 project file from this link: http://www.mediafire.com/?sharekey=f9d992424a44e7a467cd7f7bd65f7eefe04e75f6e8ebb871

Please build in 'Release' mode and run with Ctrl + F5 or from 'Debug' Menu > 'Start without Debug'. You can also run the exe file from command prompt.

Thanks.

Anonymous said...

@Z: Pointer size is 4 Bytes on 32bit systems and 8 Bytes on 64 bit systems. int* is the same as long* or double*. They are address, thus they are 4 Bytes on 32 bits and 8 Bytes on 64 bits :)

James said...

Hi! Much Thanks for Your article!
I have just tested some code from here on g++ compiler. vptr can be simply accessed on g++ as
X *x = new Y();
x->_vptr = _some_value;
also operations like:
int * vptr = *(int**)pca;
works perfectly too.

Nguyễn Dương Tuấn said...

When reading in Insied C++ Object Model, author said that there's a type_info object at the first entry in virtual table. But I also can't find it when debug. Could you please to explain about this for me?

Thanks,

Anonymous said...

I'm currently using VS2010 and am having trouble getting an output for the class variable.

I put a 'watch' on the ECX register and can see that it changes to point to the object correctly, but upon entering the member function via the function pointer I notice that the value inside the ECX register changes. When this happens it obviously doesn't contain the 'this' pointer any more, and the resultant 'cout' outputs nonsense for the member variable output.

Where this change occurs to the ECX register upon executing the function pointer statement, I decided to manually change ECX back to that of the 'this' pointer of the object in question. The output displays what it should now.

Might you have any idea on why the ECX register changes value again after the embedded assembler operation?

Thanks.

Anonymous said...

Just to further add to my query above, I've managed to get a correct output appearing, although I'm unsure as to how your version works fine.

I had VS produce an assembly file to (attempt) to figure out why the ECX register was changing after the the __asm embedded assembler.

((void(*)())vptr[0])();

mov eax, DWORD PTR _vptr$[ebp]
mov esi, esp
mov ecx, DWORD PTR [eax]
call ecx
cmp esi, esp
call __RTC_CheckEsp

As you can see, the function pointer changes the contents of ecx after the embedded assembly.

It works, however, ff I declare this:-

typedef void (*Func)(void);

... and do the following instead:-

Func pFunc = (Func)(*(int**)Obj)[0];
pFunc();

This should be the same as doing:-

Func pFunc = (Func)vptr[0];
pFunc();

... which in turn should be exactly the same as doing:-

(void(*)())(vptr[0])();

I don't understand why the two implementations provide different results. Considering this is the first time i've dabbled with any __asm, I'm going to guess it's relating to the way the compiler is working rather than anything else. Would this be an accurate assumption?

Thanks again.

Alan said...

With GCC (and I assume MSVC) you can avoid the need to manually set the `this` pointer using inline asm by using "pointer to member function" syntax instead - http://www.parashift.com/c++-faq-lite/pointers-to-members.html#faq-33.1

Anonymous said...

In at least one compiler you can avoid the inline asm by including the this pointer as the first argument in the function call.

xcyanx said...

Hello,
Your article is very intresting and i would like to ask a question. If we would like to expand the code to work with multiple inherintance how could we do that?
Thanks in advance.

Neh said...

Hey Kaisar,
Thanks a lot for the wonderful info. I tried this and it is very compiler dependent. In case of Ppc, reg r9 is used for the vtable function pointer and regs r3 onwards are used for Parameter passing. Hence, your approach of loading this in r3 before the func call did not work, because the compiler (following the Ppc EABI), would overwrite r3 with another parameter.
What worked for me was this:
~~~~~~~~~~~~~~~~~~~~~~~~~
#include "Bar.h"
#include

int main() {
Bar *b = new Bar();
int* vptr = *(int**)b;
printf("Address of ptr is 0x%x\n",(int)vptr);
bool val;
val = ( (bool (*)(Bar*,int)) vptr[1] )(b,5);
delete b;
return 0;
}

//Bar.h
include

class Bar
{
public:
Bar();
~Bar();
virtual void Func2();
virtual bool Func1(int);
private:
int i;

};

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Viet said...

Thank you for the post. Could you please explain/reason why virtual pointer is per instance and not static? Since all instances of the class point to the same static virtual pointer table, why wouldn't each class have static virtual pointer instead? Thanks

RoboSmith said...

@Viet: That's how polymorphism works.

A base class pointer can point to a base class object or any of its derived class objects. The vptr in the object tells the machine what type of object it is and which virtual table to use. If the vptr wasn't in the object, we'd need an external table or other machnism to resolve the object address to virtual table mapping. That's just a very short answer.

RoboSmith said...

@Nguyễn: Sorry for this long late reply. I've read the book long time ago and don't remember it mentioning RTTI to be the first entry in vtable. Also, as I mentioned in the article, this trick is very specific to VC++, could be specific to versions and only in Release mode.

RoboSmith said...

@Anonymous about debugging ECX: Sorry for the late reply. Before debugging this sort of hack in a debugger, I'd read details about how my particular version of debugger (vc++/gdb or something else) works with these registers. I had done some reading many years ago and the details are not in top of my head. So unable to help further here.

RoboSmith said...

@Anonymous about first argument: I guess if the function calling convention is right, this will work on some compilers. It's probably one of most common theories behind how C++ member functions are work.

However, if we look into how a function is called, we'll see the parameters and return address pushed to the stack. Then inside the function, after popping them back, registers can be used for optimization. I'm sure most compilers will use some sort of register for 'this' to avoid memory operations.

RoboSmith said...

@xcyanx: "Inside C++ object model" book has some (but still limited) example of how it works with multiple inheritance. I'm sure more detailed articles are also out there today.

Andrew said...

Wonderful post!!! And so much fun! Thank you very much.

This one is going to keep me busy and entertained for a while...

Kaisar said...

thanks Andrew!

ottointhesky said...

With a few changes to the original code, you can avoid the assembler line and it works in 32 and 64bit (tested in VS2010)
cheers,
johannes

template
T1 union_cast(T2 v)
{
static_assert(sizeof(T1) >= sizeof(T2), "Bad union_cast!");
union UT {T1 t1; T2 t2;} u;
u.t2 = v;
return u.t1;
}


int main()
{
//create an object (obj) of class X
X *obj = new X();
obj->n = 10;

//get the virtual table pointer of object obj
size_t* vptr = *(size_t**)obj;

//function fn is the first entry of the virtual table, so it's vptr[0]
void (X::*memFunc)(void) = union_cast< void (X::*)(void) >(vptr[0]);
(obj->*memFunc)();

//the above is the same as the following
//obj->fn();

return 0;
}

Anonymous said...

THank you very its really helpful in understanding the internal compiler.