In the second article we will confront performance issues caused by passing objects as parameters.
A simple workhorse class
Before we get into the actual examples we need to create a class we can use for our measurements and evaluations. For this purpose we create a “DynamicString” class which implements some sort of Python-like variable length string which can be assigned and manipulated.
We will not implement it to the point of being actually usable (it will not even be barely close to it) but it will have all the nasty bits which can make C++ perform much slower than it could if you don’t use it properly. Also, we are not including all the safe programming practices, such as testing null pointers before using them, to keep code shorter and more readable.
Let’s jump straight to the code.
class DynamicString
{
public:
DynamicString()
{
str = strdup("");
}
DynamicString(const char *str)
{
this->str = strdup(str);
}
operator const char *() const
{
return str;
}
protected:
char *str;
char useless[16];
};
Our class consists of:
- A protected member “str” containing a pointer to the dynamic string’s contents
- A protected member “useless” to eat up some space and avoid the optimizer does too much of a good job (with tiny classes like this one, the optimizer can pass an entire instance in a register rather than on the stack, making the difference between passing by value and by reference less clear
- A constructor with no parameters, which will allocate and assign a new empty string
- A constructor with a “const char *” parameter which will make a copy of the input string and assign it to the “str” member
- A cast operator to “const char *” to be able to access contents within the class in read only mode
This class has several flaws, the easiest to spot of which is the lack of a destructor which will clean up dynamic memory allocated in the constructors, but we will keep it like this to demonstrate what’s going on one step at a time.
Here is a simple main function using our DynamicString class in the most simplistic way: declare it and print it (mostly because in its current state, you can do nothing with it).
int main()
{
DynamicString str1("hello");
DynamicString str2;
printf("str1=<%s>, str2=<%s>\n", (const char *)str1, (const char *)str2);
return 0;
}
Unsurprisingly, this little program produces the following output:
Which shows we can successfully write a program doing nothing.
The obvious: passing by value and by reference
Let’s now take our DynamicString class and try to pass instances to functions, and check the performance impact if we pass the object by value or by reference. We will write two simple functions in another cpp file to avoid the optimizer makes them inline:
volatile char c;
void ByValue(DynamicString s)
{
c = ((const char *)s)[0];
}
void ByRef(const DynamicString &s)
{
c = ((const char *)s)[0];
}
A couple of quick notes:
- c is declared volatile to avoid smart optimizer tricks (unlikely here because we are in a separate compilation unit, but anyway…)
- we are using the DynamicString’s cast operator to “const char *” to access its contents
Now let’s build a main function to call the two functions and measure performance:
gettimeofday(&before, NULL);
for(int i=0; i<100000000; i++)
{
ByValue(str1);
}
gettimeofday(&after, NULL);
printf("Time passing by value: %0.6f\n", DeltaTime(before, after));
gettimeofday(&before, NULL);
for(int i=0; i<100000000; i++)
{
ByRef(str1);
}
gettimeofday(&after, NULL);
printf("Time passing by reference: %0.6f\n", DeltaTime(before, after));
This program produces the following output:
So passing by value, as can be expected, is slower than passing by reference. But why ? Let’s have a look at how the two functions have been implemented:
_Z7ByValue13DynamicString:
movq 8(%rsp), %rax
movzbl (%rax), %edx
movq c@GOTPCREL(%rip), %rax
movb %dl, (%rax)
ret
_Z5ByRefRK13DynamicString:
movq (%rdi), %rax
movzbl (%rax), %edx
movq c@GOTPCREL(%rip), %rax
movb %dl, (%rax)
ret
With a minor difference in the first assembly line, which fetches the address of the input parameter, the two functions are identical. They both fetch, in their assembly implementation, a pointer to a DynamicString.
The real difference is how the two functions get called in the main function.
Here is the code generated to pass str1 by value, effectively making a full copy of str1’s contents onto the stack:
subq $8, %rsp
pushq 56(%rsp)
pushq 56(%rsp)
pushq 56(%rsp)
call _Z7ByValue13DynamicString@PLT
And here is what happens when passing str1 by reference, just loading str1’s address into a register:
movq %rbp, %rdi call _Z5ByRefRK13DynamicString@PLT
The larger the object’s size, the larger the difference becomes. Let’s expand “useless”:
char useless[1024];
And the code generated by the caller becomes a loop which makes very clear what’s happening under the hood:
subq $1040, %rsp ; Make space in the stack
; for temp variable
movl $129, %ecx ; Load copy counter
movq %rbp, %rsi ; source address: str1 location
movq %rsp, %rdi ; destination address:
; temp variable in the stack
rep movsq ; copy
call _Z7ByValue13DynamicString@PLT ; call
“rep movsq” is an x64 instruction which looks like a single operation, but in fact it copies memory from %rsi (source address register) to %rdi (destination address register) for a %ecx (count register) times.
So far we haven’t seen anything really new, any C programmer knows that passing a struct by value is slower than passing it by pointer.
But we will soon find out that C++ has plenty of surprises for us.
Add a destructor and die
As we have discussed before, DynamicString suffers from several issues, among which is the lack of a destructor. As a result, allocating and deallocating a DynamicString will cause a memory leak because nobody is freeing the memory allocated by the constructor.
So here’s our destructor:
~DynamicString()
{
free(str);
}
Yes yes… there’s no check on str != nullptr to keep code simple, especially when viewing assembly.
Running the same program as before gives this very disappointing result:
The problem here is that the compiler will:
- create a copy of str1 into the stack as a temporary variable
- call ByValue
- call the destructor for the temporary variable
- repeat
The temporary variable contains a “dumb” copy of str1, hence it shares the pointer to “str” with str1:
Step 3 in the sequence above will therefore destroy the pointer which is shared with str1, and the next iteration will try to use / free a pointer which has already been destroyed.
We then get an error from the runtime library (If we are lucky… in many cases we just get a crash).
Let’s see how calling code has been implemented:
; Load loop counter in r12
; (as "i" is used just for counting
; it's more efficient to start from
; top and count down to zero as
; comparing with zero is faster
; than comparing with constant,
; the optimizer has sorted that out for us)
movl $100000000, %r12d
.L4:
; Destination register for copy:
; address of temp variable in stack
movq %rbx, %rdi
; data size counter
movl $129, %ecx
; source register for copy:
; address of str1
movq %rbp, %rsi
; copy
rep movsq
; pass address of temp variable to ByValue
movq %r13, %rdi
movq %r13, %rbx
; call ByValue
call _Z7ByValue13DynamicString@PLT
; inlined destructor: free "str"
; inside the temp variable which is however
; shared with str1
movq 1072(%rsp), %rdi
call free@PLT
; decrement r12 (which is "i")
subl $1, %r12d
; keep looping until done
jne .L4
The “rep movsq” instruction, which we have already seen before, performs a memory copy of str1’s memory image into the temp variable’s memory image before passing the temp variable to ByValue. This is called the default copy constructor: when initializing an instance of a class starting from another instance of the same class (like in our case, initializing the temp variable out of str1) the compiler’s default behaviour is to make a copy of its memory image.
The default copy constructor is a bad idea when the class uses dynamically allocated memory or contains resources which must be claimed and then freed.
Defining the copy constructor
The solution to this problem is to define our own copy constructor. A copy constructor is just a constructor which accepts a single parameter, a const reference to an object of the same type:
DynamicString(const DynamicString &that)
{
this->str = strdup(that.str);
memcpy(this->useless, that.useless, sizeof(useless));
}
In our new copy constructor we are duplicating the string of the instance we are constructing ourselves from, rather than copying the pointer.
Notice that the new copy constructor completely replaces the default copy constructor, so we must ensure that we copy all we need (including the “useless” array in case it’s useful for something) because nobody will do that for us anymore.
Performance measurements are now as follows:
Which are even worse than before as we have the additional overhead of strdup / free at each iteration.
Let’s have a quick look at generated code in the caller (omitting the init / loop parts, just focusing on each individual call)
; -------- beginning of inlined copy constructor
; this->str = strdup(that.str);
movq 32(%rsp), %rdi
call strdup@PLT
; memcpy(this->useless, that.useless, sizeof(useless));
movl $128, %ecx
movq %r12, %rdi
movq %rbp, %rsi
rep movsq
; -------- end of inlined copy constructor
movq %r13, %rdi
movq %rax, 1072(%rsp)
call _Z7ByValue13DynamicString@PLT
; -------- beginning of inlined destructor
; free(str);
movq 1072(%rsp), %rdi
call free@PLT
; -------- end of inlined destructor
We can now see that everything is as expected, for each loop we get:
- create a temporary variable in the stack initializing it with the copy constructor which duplicates the string
- call ByValue
- call the destructor for the temporary variable which frees the string
Conclusions (and next articles)
As we have seen, passing objects to functions (and we will later see, even worse, returning objects from functions) is bad for performance, and it should be avoided as much as possible, passing objects as const references instead.
Passing objects by value can, in some cases, make a C++ program perform (almost) as slowly as the same program written in C# or Java, where everything is treated as a reference implicitly, thus providing foundation for the myth that C++ is slow.
There are rare cases, however, where it may be needed: for instance, when passed parameters need to be modified locally without affecting the parameter passed by the caller.
In the next articles we will go into more details about passing and returning objects by value, looking at assignment operators, move operators, and move constructors as techniques to mitigate the impact of temporary object construction and destruction.