C++: lvalue/rvalue for Complete Dummies

I find the concepts of lvalue and rvalue probably the most hard to understand in C++, especially after having a break from the language even for a few months. So this is an attempt to keep my memory fresh whenever I need to come back to it. This topic is also super essential when trying to understand move semantics.

Essentials

lvalue

There are plenty of resources, such as value categories on cppreference but they are lengthy to read and long to understand. In general, lvalue is:

  • Is usually on the left hand of an expression, and that’s where the name comes from - “left-value”.
  • Something that points to a specific memory location. Whether it’s heap or stack, and it’s addressable.
  • Variables are lvalues, and usually variables appear on the left of an expression. For example in an expression int x = 1; x is lvalue. x is also pointing to a memory location where value 1 is.
    • Another example is int* y = &x. In this case y is lvalue as well. x is also lvalue, but &x is not!
  • Early definitions of lvalue meant “values that are suitable fr left-hand-side or assignment” but that has changed in later versions of the language. For instance const int a = 1; declares lvalue a but obviously it cannot be assigned to, so definition had to be adjusted. Later you’ll see it will cause other confusions!

Some people say “lvalue” comes from “locator value” i.e. an object that occupies some identifiable location in memory (i.e. has an address).

More lvalue examples

  • Name of a variable, as above.

rvalue

  • rvalue is something that doesn’t point anywhere. The name comes from “right-value” because usually it appears on the right side of an expression.
  • It is generally short-lived. Sometimes referred to also as “disposable objects”, no one needs to care about them.
  • rvalue is like a “thing” which is contained in lvalue.
  • Sometimes rvalue is defined by exclusion rule - everything that is not lvalue is rvalue.
  • lvalue can always be implicitly converted to lvalue but never the other way around.
  • rvalue can be moved around cheaply

Coming back to express int x = 1;:

  • x is lvalue (as we know it). It’s long-lived and not short-lived, and it points to a memory location where 1 is. The value of x is 1.
  • 1 is rvalue, it doesn’t point anywhere, and it’s contained within lvalue x.

Here is a silly code that doesn’t compile:

int x;
1 = x; // error: expression must be a modifyable lvalue

which starts making a bit more sense - compiler tells us that 1 is not a “modifyable lvalue” - yes, it’s “rvalue”. In C++, the left operand of an assignment must be an “lvalue”. And now I understand what that means.

A definition like “a + operator takes two rvalues and returns an rvalue” should also start making sense.

“A useful heuristic to determine whether an expression is an lvalue is to ask if you can take its address. If you can, it typically is. If you can’t, it’s usually an rvalue.”

  • Effective Modern C++

References

Are references lvalues or rvalues? General rule is:

lvalue references can only be bound to lvalues but not rvalues

assumes that all references are lvalues.

In general, there are three kinds of references (they are all called collectively just references regardless of subtype):

  1. lvalue references - objects that we want to change.
  2. const references - objects we do not want to change (const references).
  3. rvalue references - objects we do not want to preserve after we have used them, like temporary objects. This kind of reference is the least obvious to grasp from just reading the title.

The first two are called lvalue references and the last one is rvalue references.

As I said, lvalue references are really obvious and everyone has used them - X& means reference to X. It’s like a pointer that cannot be screwed up and no need to use a special dereferencing syntax. Not much to add.

One odd thing is taking address of a reference:

int i = 1;
int& ii = i;    // reference to i
int* ip = &i;	// pointer to i
int* iip = ⅈ	// pointer to i, equivent to previous line

Basically we cannot take an address of a reference, and by attempting to do so results in taking an address of an object the reference is pointing to.

Another weird thing about references here. To initialise a reference to type T (T&) we need an lvalue of type T, but to initialise a const T& there is no need for lvalue, or even type T! For const references the following process takes place:

  1. Implicit type conversion to T if necessary.
  2. Resulting value is placed in a temporary variable of type T.
  3. Temporary variable is used as a value for an initialiser.

To demonstrate:

int& i = 1;	       // does not work, lvalue required
const int& i = 1;  // absolutely fine
const int& i {1};  // same as line above, OK, but syntax preferred in modern C++

Now it’s the time for a more interesting use case - rvalue references. Starting to guess what it means and run through definition above - rvalue usually means temporary, expression, right side etc. Rvalue references are designed to refer to a temporary object that user can and most probably will modify and that object will never be used again.

A classic example of rvalue reference is a function return value where value returned is function’s local variable which will never be used again after returning as a function result. It’s completely opposite to lvalue reference:

rvalue reference can bind to rvalue, but never to lvalue

Rvalue reference is using && (double ampersand) syntax, some examples:

string get_some_string();
string ls {"Temporary"};

string&& s = get_some_string();	 // fine, binds rvalue (function local variable) to rvalue reference
string&& s { ls };				// fails - trying to bind lvalue (ls) to rvalue reference
string&& s { "Temporary" };	     // fails - trying to bind temporary to rvalue reference

You could also thing of rvalue references as destructive read - reference that is read from is dead. This is great for optimisations that would otherwise require a copy constructor.

Going Deeper

Newest versions of C++ are becoming much more advanced, and therefore matters are more complicated. Generally you won’t need to know more than lvalue/rvalue, but if you want to go deeper here you are.

So, there are two properties that matter for an object when it comes to addressing, copying, and moving:

  1. Has Identity (I). The program has the name of, pointer to, or reference to the object so that it is possible to determine if two objects are the same, whether the value of the object has changed, etc.
  2. Moveable (M). The object may be moved from (i.e., we are allowed to move its value to another location and leave the object in a valid but unspecified state, rather than copying).

Cool thing is, three out of four of the combinations of these properties are needed to precisely describe the C++ language rules! Fourth combination - without identity and no ability to move - is useless. Now we can put it in a nice diagram:

image-20211206212335535

So, a classical lvalue is something that has an identity and cannot be moved and classical rvalue is anything that we allowed to move from. Others are advanced edge cases:

  • prvalue is a pure rvalue.
  • grvalue is generalised rvalue.
  • xvalue is extraordinary or expert value - it’s quite imaginative and rare.

Usually std::move(x) is an xvalue, like in the following example:

void do_something(vector<string>& v1)
{
    vector<string>& v2 = std::move(v1);
}

It both has an identity as we can refer to it as v1 and we allowed it to be moved (std::move). It’s still really unclear in my opinion, real headcracker I might investigate later.

But below statement is very important and very true:

For practical programming, thinking in terms of rvalue and lvalue is usually sufficient. Note that every expression is either an lvalue or an rvalue, but not both.

  • The C++ Programming Language

Reference Geek-Out

If you take a reference to a reference to a type, do you get a reference to that type or a reference to a reference to a type? And what kind of reference, lvalue or rvalue? And what about a reference to a reference to a reference to a type?

using rr_i = int&&;		// rvalue reference
using lr_i = int&;		// lvalue reference

using rr_rr_i = rr_i&&;  // int&&&& is an int&&
using lr_rr_i = rr_i&;   // int&&& is an int&
using rr_lr_i = lr_i&&;  // int&&& is an int&
using lr_lr_i = lr_i&;   // int&& is an int&

Meaning the rule is simple - lvalue always wins!. This is also known as reference collapse.

What is int*&? It’s a reference to a pointer.

References

Have a question⁉ Contact me.


cpp

1507 Words

2021-12-04