Built-in data type

According to ANSI C, short<=integer<=long and float<=double<=long double. Their limits is defined in <limits.h> and <float.h>: INT_MIN, INT_MAX, UINT_MIN, UINT_MAX, LONG_MIN, LONG_MAX, FLT_MIN, FLT_MAX, DBL_MIN, DBL_MAX, LDBL_MIN, LDBL_MAX.


In C, the 'int' keyword can often be omitted from type declarations. For instance, unsigned var; as shorthand for unsigned int var;. But if it is a long, (unsigned) var does not convert it to unsigned long but unsigned int, so better explicitly state data type out.



C++ has boolean-value data type: bool with constants true and false. bool are implicitly promoted in arithmetic: false becomes 0 and true becomes 1; arithmetic and pointer values are implicitly converted to bool: 0 or NULL to false and all other values to true. For portability convert integral and pointer values to bool by explicit comparison against 0 and NULL. In C++, the definition of NULL is 0, so there is only an aesthetic difference.


A generalized escape sequence takes the form \ooo, where ooo represents a sequence of as many as three octal digits. The value of the octal digits represents the numerical value of the character in the machine's character set, examples using the ASCII character set, \7 (bell), \0 (NULL) and \062 ('2').


sizeof('x') == sizeof(int) in C but not in C++ (typeof 'x' is char).


Get a char's ASCII value by simply assigning it to an integer variable. Reversely is the same, but need to force a cast to avoid compiler warnings for some compilers (ch = (char) i;).


In C, unqualified char may be either signed or unsigned; it is the implementation's choice, no matter during processing 7-bit ASCII. But when program must handle arbitrary binary data or fully 8-bit character sets, there is a problem. The most obvious issue is look-up table indexed by characters.

For instance, the character '\341' in ISO Latin 1 is SMALL LETTER A WITH ACUTE ACCENT. In the proper locale, isalpha('\341') will be true. But if you read '\341' from a file and store it in a plain char, isalpha(c) may look up character 225, or it may look up character -31. If lucky, ctype table has no entry at offset -31, so the program just crash.

It is wise to use unsigned char everywhere. This avoids all these problems. Unfortunately, the routines in <string.h> take plain char arguments, so have to remember to cast them back and forth - or avoid the use of strxxx() functions, which is probably a good idea anyway. Also, don't expect plain char to be either sign or unsigned extending.


Convert a string to integer (short/ signed/ unsigned), double and long type using the atoi(), atof() and atol() functions.

Convert an integer (short/ signed/ unsigned)/ double/long type to string using sprintf().
#include <stdio.h>
int main(void)
{
  int i;
  double d;
  long l;
  char *buf1 = "42";
  char buf2[] = "69.00";
  i = atoi(buf1);
  l = atol(buf1);
  d = atof(buf2);
 
  i = 42;
  float f = 69.0;
  d = 105.24;
  l = 23;
  char buf[50];
  sprintf(buf, "%d", i);
  sprintf(buf, "%f", f);
  sprintf(buf, "%f", d);
  sprintf(buf, "%ld", l);
  sprintf(buf, "%d %f %ld", i, f, l);
  return 0;
}
For C++ use the stringstream or ostringstream class (strstream class was deprecated).
#include <sstream>
#include <string>
using namespace std;

int main(void)
{
  stringstream ss;
  int i = 42;
  double d = 105.24;

  ss << i << " " << d;

  // Convert to string or char array
  string s = ss.str();
  char buf[50];
  sprintf(buf, ss.str().c_str());
}

Convert a hex/ octal/ any-other-base value to a number using strtol() or sscanf().
#include <stdlib.h>
#include <stdio.h>
int main(void)
{
  long l; int i;
  unsigned int ui;
  char *hexstr = "12FC3";
  char *octstr = "1245";
  char *binarystr = "1101";

  l = strtol(hexstr, NULL, 16);
  l = strtol(octstr, NULL, 8);
  l = strtol(binarystr, NULL, 2);
  sscanf("12", "%d", &i);
  sscanf("14", "%ld", &l);
  sscanf(hexstr, "%x", &ui);
  sscanf(hexstr, "%o", &ui);

  return 0;
}
For C++ use stringstream instead (strstream was deprecated).
#include <sstream>
#include <string>
using namespace std;

int main(void)
{
  long l; int i;
  char *hexstr = "12FC3";
  char *octstr = "1245";
  stringstream ss;

  ss << hex << hexstr;
  ss >> l;
  ss.clear();
  ss << oct << octstr;
  ss >> i;

  return 0;
}
Convert an integer to hexadecimal (hex) /octal (oct) use sprintf().
#include <stdio.h>
int main(void)
{
  int i = 42;
  char buf[50];
  sprintf(buf, "%x", i); /* convert to hex */
  sprintf(buf, "%o", i); /* convert to octal */
  return 0;
}
For C++ use stringstream instead (strstream was deprecated).
#include <sstream>
#include <string>
using namespace std;

int main(void)
{
  int i = 42;
  char buf[50];
  stringstream ss;
  ss << hex << i;
  ss >> buf;

  ss.clear();
  ss << oct << i;
  ss >> buf;
  return 0;
}

Bitvector is object utilized as a discrete collection of single bit. It is a compact method of keeping flags information on a set of items or conditions. The iostream library uses bitvectors to represent its format state, such as whether the integral output should be displayed in decimal, hexadecimal or octal. Two approaches to support bitvectors:
  1. C way: use unsigned built-in integral type as bit container. Then manipulate it using bitwise operators. For bitwise XOR (exclusive or) operator (^), the result is 1 if either but not both operands contain 1; otherwise, the result is 0.
    bitvec |= 1<<bitpos; // turn-on bitpos
    bitvec &= ~(1<<bitpos); // turn-off bitpos
    bitvec ^= (1<<bitpos); // flip bitpos
    bool isOn = bitvec & (1<<bitpos); // test bitpos
    
  2. C++ way: use bitset class that supports the bitvector class abstraction (in <bitset>).
    bitset<32> bitvec1; // 32 bits bitvector (0 to 31), all initialized to zero
    bitset<32> bitvec2(0xffff); // initialize first N LSB to unsigned argument
    bitset<32> bitvec3(string("1010")); // string collection of zeros and ones
    string sbit("1111110101100011010101");
    bitset<32> bitvec4(sbit, 6); // start at position 6: 10101100011010101
    bitset<32> bitvec5(sbit, 6, 4); // length of 4: 1010
    string bitval(bitvec1.to_string());
    unsigned long bitvec = bitvec1.to_ulong(); // if bitset can be represented as unsigned long
    
    bitvec1.set(bitpos); // turn-on bitpos
    // or
    bitvec1[bitpos] = 1;
    bitvec1.reset(bitpos); // turn-off bitpos
    // or
    bitvec1[bitpos] = 0;
    bitvec1.set(); // turn-on all bits
    bitvec1.reset(); // turn-off all bits
    bitvec1.flip(bitpos); // flip bitpos
    // or
    bitvec1[bitpos].flip();
    bitvec1.flip(); // flip all bits
    bool isOn = bitvec1.test(bitpos); // test bitpos
    bitset<32> bitvec7 = bitvec1 & bitvec2; // support bitwise operators
    
    // extra features
    bool isAnyBitOn = bitvec1.any();
    bool isAllBitOff = bitvec1.none();
    int onBitNum = bitvec1.count();
    int size = bitvec1.size();
    

Arithmetic conversions ensure that the two operands of a binary operator, such as addition or multiplication, are promoted to a common type, which then represents the result type. The two general guidelines are the following:
  1. Types are always promoted, if necessary, to a wider type in order to prevent any loss of precision.
  2. All arithmetic expressions involving integral types smaller than an integer are promoted to an integer before evaluation.
The rules are defined as follows, defining a hierarchy of type conversions. Begin with the widest type, long double.
  1. If one operand is of type long double, then the other is converted to type long double regardless of what the second type is.
    3.14159L + 'a'; // 'a' => long double
    
  2. Otherwise, if neither is of type long double and if one operand is of type double, the other is converted to type double.
    						
    int i;
    float f;
    double d;
    d + f + i; // f and i => double prior to addition
    
  3. If neither is of type double and if one operand is of type float, then the other is converted to type float.
    char c;
    int i;
    float f;
    c + f + i; // i and c => float prior to addition
    
  4. Otherwise, because neither of the operands is of either of the three floating point types, they must both be of some integral type. Prior to determining the common target promotion type, a process spoken of as integral promotion is applied to all integral types smaller than an int.
    1. Types char, signed char, unsigned char and short int are promoted to type int.
    2. unsigned short int is converted to type int if the int type on the machine is large enough to represent all the values of the unsigned short (usually this happens if the short is represented as a half word and the int as a word); otherwise, it is promoted to type unsigned int.
    3. The types wchar_t and an enumeration type are promoted to the smallest integer type that can represent all the values of its underlying type.
      					
      enum status { bad, ok };
      // can stored in a char, char is the underlying type of this enumeration => int
      
    char c;
    bool b;
    enum E { e1, e2, e3 } e;
    unsigned long ul;
    c + ul; ul + b; e + ul; // c, b and e => int prior promote to unsigned long
    
  5. If one operand is of type long and the other is of type unsigned int, then the unsigned int is promoted to type long only if type long on the machine is large enough to represent all the values of the unsigned int (usually this is not true on a 32-bit operating system in which long and int are both represented as a word size); otherwise, both are promoted to type unsigned long.
  6. Otherwise, if neither is of type long and if one operand is of type unsigned int, the other is converted to type unsigned int. Otherwise, both operands must be of type int.

What does this mean?
  x = (T)y;
It depends on the type T and the types of x and y. T could be the name of a class, a typedef, or maybe a template parameter. Maybe x and y are scalar variables and (T) represents a value conversion. Maybe x is of a class derived from y's class and (T) is a downcast. Maybe x and y are unrelated pointer types. Because the C-style cast (T) can be used to express many logically different operations, the compiler has only the barest chance to catch misuses. For the same reason, a programmer may not know exactly what a cast does. This is sometimes considered an advantage by novice programmers and is a source of subtle errors when the novice guessed wrong.
int a = 7;
double* p1 = (double*) &a; // ok (but a doesn't point to a double)
double* p2 = static_cast<double*>(&a); // error
double* p2 = reinterpret_cast<double*>(&a); // ok: really mean it

const int c = 7;
int* q1 = &c; // error
int* q2 = (int*)&c; // ok (but *q2=2; is still invalid code and may fail)
int* q3 = static_cast<int*>(&c); // error: static_cast doesn't cast away const
int* q4 = const_cast<int*>(&c);	// ok: really mean it
Reason for introducing the new-style cast:
  1. to give programmers a chance to state their intentions more clearly and for the compiler to catch more errors.
  2. The idea is that conversions allowed by static_cast are somewhat less likely to lead to errors than those that require reinterpret_cast. In principle, it is possible to use the result of a static_cast without casting it back to its original type, whereas should always cast the result of a reinterpret_cast back to its original type before using it to ensure portability.
  3. new-style casts match the template notation, so that programmers can write their own casts, especially run-time checked casts.
  4. C-style casts are very hard to spot in a program. This near-invisibility of C-style casts is especially unfortunate because they are so potentially damaging. An ugly operation should have an ugly syntactic form. That observation was part of the reason for chosing the syntax for the new-style casts.
  5. because it is so ugly and so relatively hard to type, force programmer more likely to think twice before using one. Casts are best avoided.
The general form for the explicit cast
cast-name<type>(expression);
type is the target type of the conversion and expression is the value to be cast. cast-name is one of
  1. static_cast - normal cast.
    void *pv;
    char *pc = static_cast<char*>(pv);
    int i += static_cast<int>(d);
    char ch = static_cast<char>(d);
    

    An assignment of a larger arithmetic type to a smaller type almost always results in a compiler-generated warning alerting to a potential loss of precision. When provide the explicit cast, the warning message is turned off. The cast informs both the compiler and the reader of the program that we are aware of and are not concerned with the potential loss of precision.

    Potentially dangerous static cast:

    1. void* pointer to some explicit pointer type
    2. an arithmetic value into that of an enum
    3. a base class to that of a derived class (or pointer or reference to such classes).
    Their correctness depends on the value that happens to be contained within the object at the point at which the conversion takes place.

  2. const_cast - casts away the const-ness of its expression and also the volatility of a volatile object, using either of the other three forms results in a compile-time error. Similarly, it is a compile-time error to use it to perform a general type conversion.
    char *string_copy(char*);
    const char *pstr;
    char *pc = string_copy(const_cast<char*>(pstr));
    
  3. reinterpret_cast - generally performs a low-level reinterpretation of the bit pattern of its operands and its correctness in large part depends on the active management of the programmer.
    complex *pcom;
    char *pc = reinterpret_cast<char*>(pcom); // ok, but don't forget it's still complex*
    string str(pc); // no error or warning but use complex* as char* is undefined
    
    C++ introduced the named cast operators to highlight this paradox, given the practical infeasibility of disallowing explicit casts themselves.
    char *pc = (char*) pcom; // C-style reinterpret_cast cast, more difficult to track
    
  4. dynamic_cast - try to cast a pointer or reference of a class to a pointer or reference of another class in the same class hierarchy at run-time, normally for safe downcasting from base class to derived class. It performs two operations at once:
    1. Verify that the requested cast is valid at run-time.
    2. Then only if valid does it perform the cast. If fail,
      • For pointer type, result pointer is 0.
      • For reference type, throw bad_cast exception.
    class A {
    public:
      virtual void Run(void);
    };
    class B : public A {
    public:
      void Run(void);
    };
    class C : public A {
    public:
      void Run(void);
      bool Walk(void);
    };
    
    void D::Exersize(A *pa)
    {
      if ((C *pc = dynamic_cast<C*>(pa)) != NULL)
      {
        // can call C::Walk()
      }
      else
      {
        // can only call Run() virtual function
        // B::Run() if pa = &b; C::Run() if pa = &c; otherwise A::Run()
      }
    }
    // or
    #include <typeinfo> // for std::bad_cast
    void D::Exersize(A& a)
    {
      try {
        C& c = dynamic_cast<C&>(a);
        // can call C::Walk()
      }
      catch (std::bad_cast) {
        // can only call Run() virtual function
        // B::Run() if &a = b; C::Run() if &a = c; otherwise A::Run()
      }
    }
    

    Manipulating objects of derived class type with pointers to base class type is usually handled automatically through virtual functions. In some situations however, the use of virtual functions is not possible. Therefore it is needed to obtain a pointer or reference to a derived class to use some detail of the derived class that is otherwise not available, although this mechanism is more error prone than virtual member functions and should be used with caution. The result of dynamic_cast must always be tested to verify that the cast is sucessful before using the resulting pointer.

    dynamic_cast of pointer vs reference:
Index