|
Data in Excel worksheets |
||
Note: This document is under construction. Aug 19, 2005.
This document describes how data is stored in the cells of an Excel worksheet, and how it can be displayed. Problems with numeric precision, how large a number can be, and how much text can be stored and displayed are discussed. Number formatting which controls how numbers are displayed, is discussed.
Introduction
There are four data types that Excel uses to store information in the cells of a worksheet. There is also cell formatting, which controls the appearance of the data in the cell.
Data Types
Excel internally stores cell data using one of four data types. The term type is a programming term that refers to how data is stored. It defines how the bytes of memory are used to hold the data, and what kind of data can be stored. Excel has four data types to store anything you might put in a cell, the number data type, the text data type. the logical (also called boolean) data type, and the error data type. Most of the stuff you'll put cells will use the number or text data type.
Generally, Excel automatically chooses the data type, depending on what you've typed. If you type a number (including dates and times, dollar amounts, percentages and others), it will use the number type. If you type anything else (except a formula), anything Excel doesn't recognize as a number, it will use the text type. You can override that automatic typing for some situations.
Cell formatting
Cell formatting is a cell property that is used to determine the display characteristics of the data in the cell. It applies primarily to number types. The table below shows some examples of formatting that's automatically applied to a cell when you've typed various things. For example, a date format causes a date-time serial number to appear as a date, such as Jan 1, 2005 for 38353. See "Dates and times" below. One could argue that this makes the cell a date type. Since dates use the same internal number storage type as does any other number, I am keeping cell formatting completely separate from data types in this document.
Here are some examples of data entered into a cell, what will be stored, the data type, format code automatically applied to the cell, and the resulting display:
| Typed in: | Stored in cell: | Type | Format code (1) | Displayed in cell |
| 23462-4779 | 23462-4779 | Text | General | 23462-4779 |
| Name | Name | Text | General | Name |
| John Smith | John Smith | Text | General | John Smith |
| 123 Dutch Elm St. | 123 Dutch Elm St. | Text | General | 123 Dutch Elm St. |
| Dept 1 | Dept 1 | Text | General | Dept 1 |
| 23462 | 23462 | Number | General | 23462 |
| 23,462 | 23462 | Number | #,##0 | 23,462 |
| 2.3462E+04 | 23462 | Number | 0.00E+00 | 2.35E+04 |
| 1.95 | 1.95 | Number | General | 1.95 |
| $1.95 | 1.95 | Number | $#,##0.00_) | $1.95 |
| 15% | .015 | Number | 0% | 15% |
| 1/1/2005 | 38353 | Number | m/d/yyyy | 1/1/2005 |
| 0.5 | .5 | Number | # ?/? | 1/2 |
(1) Format - Cells - Number tab - Custom. See Help for "Number format codes."
If you've formatted the cells for text (Format - Cells - Number tab - Text) prior to typing the data, the text data type will be used, even if it looks like a number to Excel. More later. However, if you've formatted the cell for, e.g.: Number with two decimal places (Format - Cells - Number tab - Number), that number formatting will be used only when a number is in the cell. If text is entered, the text type is used to store the data, and the number formatting remains in the cell, but does nothing. Type a number into the cell later, and the number formatting would be used for the display of the number.
Number data type
The number data type us used to store numbers in cells. Internally, the number is stored in eight bytes, in binary, with some of those 8 bytes for the mantissa, and some for the characteristic for any of these numbers. It uses eight bytes of memory, regardless of the number, large or small. Excel uses the IEEE-754 standard for double-precision (64 bits) which defines how the 8 bytes are used to store the number. You can read more about it in any of several web pages. Search for IEEE-754.
Number format.
1234, 1,234 1.95, $1.95, 1.95%, 1.95E+00, dates, fractions, are all stored in cells using the number type. The difference in these is the number formatting (Format - Cells - Number tab) of the cell. The number formatting simply determines how the number is displayed in the cell. If you remove the number formatting (Format - Cells - Number tab - General), you'll see the underlying number in its raw form, displayed in decimal.
Value range.
The largest number that can be stored is >= 1.79768E+308. Put enough zeroes after 1.79768 to move the decimal point 308 places to the right, and you'll realize you have one huge number. It's way up in the gadzillions. The smallest number is <= 2.2250E-308. This is equivalent to a decimal point, followed by a string of 307 zeroes, then 22250. This is one tiny tiny number. The number of bits of the 8 bytes used for the characteristic determines this value range.1
Precision.
Regardless of the magnitude of the number, you can only use 15 decimal digits (other than zeroes before or after the number) -- 15 significant digits. This is because of the precision limit. This limit has to do with the number of bits of the 8 bytes used to store the mantissa part of the number. Note that both the characteristic and mantissa are stored in binary -- these are approximate decimal equivalents.
So you might store these two numbers:
123456789012345 0.123456789012345
Each number has 15 significant digits, though one is a lot larger in value than the other. You can't store the following as numbers, because of the 16th significant digit (the 6 at the right):
1234567890123456 0.1234567890123456
If you type either into a cell, the rightmost digit is lopped off. You will have lost some precision.
Putting too-large numbers (> 15 digits) into cells:
Some numbers, like credit card numbers, have too many decimal digits for the number type, but will still be stored as numbers if simply typed into a cell. So they will be truncated -- digits beyond the 15th will be truncated. Lopped off. Since you don't need to perform math on such numbers, you don't really need to attempt to store it as a number -- you can format the cell for text before you type the number. You're forcing Excel to use the text type to store the "number." Now you can have as many digits as you need. Well, almost, since even the text type has limits. Note that changing the formatting to text after having entered the number doesn't work -- the digits have already been lost.
Conversion and accuracy issues
Since we type our numbers in decimal, but Excel stores them in binary, we can lose accuracy when some numbers are typed into a cell. This happens when they're converted to binary, to be stored using the number type. Some numbers would require more binary digits in the mantissa than exist in the number type. And some numbers (like 1/3) simply have no binary equivalent. Heck, 1/3 doesn't have a decimal equivalent either, so we can't type it in decimal. But we can put it in as a fraction, 1/3. Excel will use all the binary digits in the mantissa, but still won't have exactly 1/3 (just like when you write .33333333333 -- you can go on forever, but you can never exactly specify exactly 1/3 in decimal).
Also, a formula result must be converted to decimal for display, and it may not be possible to specify the result precisely, for the same reason.
Example:
A2 contains 0.5
A3 contains 0.4
A4 contains 0.1
A5 contains = (A1 + A2 + A3) * 1
This ought to be zero, since .5 - .4 - .1 is zero. But instead, evaluates to .0000000000000000277556. Close, but not zero. So the following formula would incorrectly return "not equal.":
=IF( A5 = 0, "Equal", "Not equal")
To fix this, you can use the ROUND function to round the value to some particular precision:
=IF(ROUND(A5, 15) = 0, "Equal", "Not equal")
The rounding can introduce errors, but we already know we're dealing with inaccuracies; we're just trying to get them under control. They're very very small.
Another way to handle conversion issues is to use Tools - Options - Calculation - Precision as displayed (checked). This causes Excel to use the numbers as they appear in the cells, rather than the numbers stored in the cells. For example, if a cell is formatted for two decimal places, and contains 1.268, it will appear in the cell as 1.27. Formatting uses rounding to give the closest representation of the number with the digits that the formatting specifies. Normally, a formula with a reference to this cell would get the actual number, 1.268, but with the Precision as Displayed option, it will get the displayed number, 1.27.
Dates and times
Dates and times are stored in cells using the number type. The date Jan 1, 1900 is stored as 1. Any other (subsequent) date is stored as the number of days since that date, inclusive. Jan 2, 1900 is stored as 2. Jan 1, 2005 is stored as 38353. They're simply day numbers since Jan 1, 1900, inclusive. Setting a date number format in the cell causes the number to be displayed to look like a date, in any of several date formats (Jan 1, 1900, 01/01/1900, etc.). The underlying number is the same regardless of the date format used.
Since dates are stored in units of days, it follows that a fraction would be the time of the day. 0.25 is one quarter of a day, or 6:00 A.M. So 38353.25 is Jan 1, 2005 at 6:00 A.M. The time number format causes the time to be displayed in any of several ways.
This method of representing dates and times allows for simple math to be performed on them. If you add 1 to a date (= A2 + 1), you have the next day. If you add 90, you have 90 days hence. You don't have to worry about "thirty days hath September" and all that. The date formatting takes care of properly showing the calendar date for the result. If you subtract two dates, you have a simple difference in days for the result. In this case, Excel may choose to automatically format the result as a date, which is meaningless, since you want a plain number. So change the number formatting to something other than date.
All this is true of times too.
Chip Pearson's site has more on date and time formats. www.cpearson.com
Formulas and data
Excel formulas calculate based on the stored value, not the way it looks as formatted. For example, if a cell contains 2.678, but is formatted for two decimals, it will display 2.68. Any formula referring to the cell will get the true value in the cell, 2.68, unless you use Tools - Options - Precision as displayed, in which case the formula will get the displayed value, 2.68.
Other numbers
Currency, percentages, fractions, are stored with the number type. Again, the formatting determines how the number is displayed. See the table above for some examples of various number types.
Left Zeroes (postal zip codes).
Left zeroes have caused considerable consternation with Excel users. Left zeroes have no numeric significance, and are not stored. For example, you enter 01234, but Excel simply stores the raw number, 1234. Formatting can be used to force the display of left zeroes, but it remains that the number data type simply isn't concerned with left zeroes, since they don't have any effect on the value.
Postal Zip codes don't work well with the number type, unless you have only 5-digit zip codes. In that case left zeroes will not display, even if they've been keyed in. The cell can be formatted to display them (Format - Cells - Number tab - Custom - 00000). . It's best to format the Zip Code column for text before you've entered or imported the zip codes. That way, the left zeroes will be included (though they must be typed in). Another benefit is that sorting will work correctly when you have a mix of 5-digit and 9-digit zip codes.
Left zeroes and text files.
Similarly, when text files, which have no formatting information -- they're just text characters, contain such numbers, Excel uses the number type (unless otherwise set up in advance), and the left zeroes are not shown.
See "Text files and Excel" in this site for more information.
A common problem occurs when a sheet is saved as a text file, then reopened in Excel. If the left zeroes were displayed in the sheet, they will be written to the text file (you can open it with a text editor like Notepad to confirm this). But when the text file is opened in Excel, the number type is used (unless it's been set up in advance for text), and the left zeroes are lost. This leads a user to suspect the left zeroes were not saved in the text file.
Text data type
The text data type, also called strings in many programming languages, is used for, well, just that -- strings of characters. All the alphabetic characters a-z, A-Z, the decimal digits 0-9, and the special characters !@#$%, etc. can be stored in this data type. When you type something not recognized as a number or a formula (starts with the = character), it will be stored using the text data type. Dates, percentages, currency amounts, though they look like text because of the ordinary alphabetic characters and special characters ($, %, etc.), are stored as numbers. Excel often automatically applies the appropriate cell formatting to such numbers. You can force the text type by preceding the entry with an apostrophe, or by formatting the cell for text (Format - Cells - Number tab - Text) before you enter the data.
A text cell can contain 32,768 (2^15) characters, though it will display only 1024 of them in a cell. All 32,767 will display in the formula bar (though it'll likely be an unmanageable mess with so much data). You can get around the 1024 character cell display limitation by inserting carriage return characters (entered by using Alt-Enter) in the text stream.
Unless specifically formatted otherwise, text cells will be left-aligned, and number cells will use right-aligned. The alignment formatting must be General - not Left, Center or Right, since those formats force the alignment regardless of cell contents. You can determine the alignment formatting by looking at the alignment (Left, Center, Right, Justify) buttons on the Formatting Toolbar with the cell selected. If none is active, then it's General alignment (or look at Format - Cells - Alignment), and the above applies. Note that if a particular alignment format has been applied, such as left alignment, then the contents of the cell will be left-aligned regardless of the data type.
Forcing the text type (credit card numbers, zip codes, etc)
You can force the text data type by formatting the cell for text before you enter the data. Format - Cells - Number - Text. Or you can precede each entry with an apostrophe ('). Using text allows for more than 15 numeric digits, as explained above in "Putting too-large numbers" above. It also makes for better sorting of postal zip codes where some are 5 digits, and some are 9 digits. And it allows for typing a leading zero (which will be ignored if it's allowed to default to the number type).
Boolean (logical) data type
A logical or boolean expression is one that evaluates to TRUE or FALSE. Consider the following expression:
"pigs" = "fly"
Since "pigs" does not equal "fly," this expression always evaluates to FALSE. Now this one:
"pigs" = "pigs"
This evaluates to TRUE. Put the expression in a formula, and you'll see the resulting TRUE or FALSE. Note that the first = character indicates to Excel that this is a formula. The second indicates a boolean expression.
= "pigs" = "fly" You'll see the resulting FALSE boolean value displayed in the cell.
= "pigs" = "pigs" You'll see the resulting TRUE boolean value displayed in the cell.
In the following formula, if the contents of A1 is 1, then the formula above yields TRUE, otherwise FALSE.
= A1 = 1
Here are some other logical expressions:
|
A1<>"" |
(A1 isn't blank) |
| A1 > 0 | (A1 isn't 0 or negative) |
| AND(A1 >=1, A1 <=10) | (A1 is between 1 and 10, inclusive) |
IF functions
The IF() function uses logical expressions, and returns one value if the expression yields TRUE, and another if FALSE.
=IF( A1 = 1, "A1 contains 1", "A1 doesn't contain 1" )
The first argument, A1 = 1, can be any expression that yields TRUE or FALSE. The IF function yields its second argument (as separated by commas), "A1 contains 1" if TRUE, or the third, "A1 doesn't contain 1" if FALSE.
If you test a number as a logical expression, 0 will be equivalent to FALSE, and any other value will be equivalent to TRUE:
=IF( A1, "yes", "no")
IF A1 contains 0, the IF will yield "yes" -- any other number will yield "no." An empty cell will yield "no," and a text cell will cause an error.
Be careful with mathematical results. Since we put decimal numbers into Excel, and get decimal values back, but they're converted to/from binary, you don't always get exactly what you expect. A mathematical expression might presumably yield 0, but not quite, causing TRUE to be returned when FALSE is expected.
Error data type
The error data type results when an error occurs evaluating the contents of a cell
|
= 1 / 0 |
DIV/0! |
| A1 > 0 | (A1 isn't 0 or negative) |
| AND(A1 >1, A1 <=10) | (A1 is between 1 and 10, inclusive) |
| Cell contents: | Display | Code returned by ERROR.TYPE() |
| #NULL! | 1 | |
| =1/0 | #DIV/0! | 2 |
| #VALUE! | 3 | |
| damaged formula that refers to nonexistent cell (off the sheet, or deleted) | #REF! | 4 |
| =Boo (no name "Boo" defined) | #NAME? | 5 |
| #NUM! | 6 | |
| #N/A | 7 | |
| Anything else | #N/A |
Error types vs strings:
An error type is not the same as a string. Consider this situation.
A1: = 1/0 displayed: #DIV/0!
This error type is not testable as text. The following formula will not correctly indicate that the divide-by-0 error has occured:
=IF( A1 = "#DIV/0!", "Division error" , "OK")
But this one will indicate that an error exists:
=ISERROR(A1) Result: TRUE
Also, you can use the ERROR.TYPE() function to determine the type of error that exists.
References:
1. http://babbage.cs.qc.edu/courses/cs341/IEEE-754references.html