Home

Excel Truths

 

This is some basic information to consider before you undertake certain tasks

Formatting as data (sorting on color, etc.).

There's been a lot of the use of formatting, such as color, to store information lately, then attempting to do things like sorting, summarizing, etc.,  based on color. For example red for store 1, green for store 2, etc.  Note that by formatting, I mean cell formatting, such as color, font, borders, etc., not sheet layout, as the term formatting is sometimes also used. 

This isn't really the way Excel was designed. Formatting is really "output only" -- not intended to be inputted by the user, then operated on as data. It is possible to test the color (or other formatting), then act on it (sort, summarize, etc.)  but it requires macro programming.  It's better to put the information (a code of some kind, or something like "Store 1," "Store 2"...) into a column in some way.  Then use that information to cause a formatting change of color, such as with Conditional Formatting, or a Worksheet_Change sub, if more than three colors are needed. Now the cell can be easily tested, instead of trying to test the formatting. Formatting should not be manually applied to a cell to store any kind of information that will have to be acted on in any way.

Using color to indicate information is a very effective way to render information.  The point here is that it should not be only the color to indicate information.

Data across multiple sheets

Much of Excel's functionality isn't available for data that's spread across multiple sheets, as well as across workbooks. Questions abound where users already have data in separate sheets, and now want to find certain data, summarize the data, etc. There are no direct means to do that.

If the layout of the data in the sheets will be the same (same column headings), it is generally best to put all the data in a single sheet (a single table, in database parlance), with an additional column for what originally was the various sheets. For example, if you have a sheet for each month, instead put all the data in a single sheet, with an additional column for month. An Autofilter can easily reduce this consolidated sheet to the equivalent of one of the original (month) sheets, as well as other useful querying.  For totals (SUM, COUNT, etc.) the SUBTOTAL function will give the totals for the filtered (visible) rows.  The filtered sheet will look pretty much like the original separate sheet did.  Now with this consolidated data, you can do many useful things across all the data, such as sort in various useful ways, use Data - Subtotals, easily make a pivot table to summarize the data, use database functions (DSUM, COUNTIF, etc.). Strongly recommended.

If the separate sheets already exist, it's a straightforward one-time project to combine them. Just make a sheet with the extra (month) column. Now paste the records from the first sheet, and enter Jan into the new month column and copy down with the fill handle or copy/paste. Repeat for the other sheets. Then get rid of them.

TIP:  When using AutoFilter, it's useful to put your sheet headings, totals (use the SUBTOTAL function to show results for only the visible (filtered) rows), column headings, etc., in a few rows at the top, then freeze those rows with Window - Freeze Panes.  Now the user can select the set of data he wants, using the AutoFilter.  The SUBTOTAL functions will give results for only the rows being shown, and will remain on the screen as he scrolls down through the rows.

Take a look at Consolidated Filtered Table with Sort.xls for an example.

Database

Excel is a excellent data analysis tool, but simply isn't a data base program.  It can do OK for a single table data base.  You can sort, summarize, subtotal (pivot tables, data base functions, or Subtotals), and filter (Excel's way of saying "query").

But when you need more than one table, it simply falls apart. Here's what you need to consider when planning a data base. Rather than getting into normal forms for databases, and all that stuff you might read about that may not be very helpful, consider the following rules: A table should be about some kind of thing (vehicles in a fleet, employees in an organization, etc.).  Each record should be about one of those things. One record per thing. Every field in the table should be clearly about that thing. That's a pretty unbreakable set of simple rules you should keep clearly in mind.

For example, you might have a Customer table, with one record per customer. There might be a date field for when that customer became a customer (information about that customer). But if you were keeping track of customer payments (where you don't know how many payments you might ever have for a given customer), you wouldn't try to put payment fields in the Customer table, but instead would have a Payment Table. Every payment from any customer would go into that one table, as a separate record, and a field in that table would indicate the customer for that payment (information about the payment, not the customer). So for this database application, you need two tables.

This is where a database program like Access will handle your data much more easily than in Excel. It would be well worth getting a good book on Access.  Access takes care of the relationships between the tables effortlessly (Relationships), making the Customer record, and each of the Payment record(s) for that customer look like single big records, if you've set up the tables properly. This is done with a query in Access, joining the tables, and it's done.  With Excel, it can be done with VLOOKUP formulas, but isn't straightforward.  It must be done from the "many" table (the Payment table in this example), not the "one" table (Customers).  Keep this in mind when you undertake to store data with Excel.

Dates and times

See Chip Pearson's site for the skinny on working with dates and times in XL. The same questions get asked again and again. Required reading before you post date or time questions. Good stuff.
http://www.cpearson.com/excel/datetime.htm#AddingTimes

In a nutshell, a date-time number is a plain number that starts at 1 with for Jan 1, 1900, 2 for Jan 2, 1900, etc., and keeps going. At Jan 1, 1901 it was 367. At Jan 1, 2000 it was at 36,526. It's like the Energizer bunny. It just keeps going.  The date-time number format makes it look like 1/1/2000, Jan 1, 2000, January 1, 2000 or pretty much whatever you want, but it's still the same number underneath the formatting. You change the date formatting with Format - Cells - Number - Date or Time. 

The time is the fractional part. 6:00 AM is 0.25 (a quarter of the day). So Jan 1, 2000 at six in the morning is 36526.25 Again the date-time format makes it look like 01/01/2000 6:25 A.M. etc.

This is why you can do simple math on dates and/or times. If you add 1 to a date, it's the next day's date.  If you add 0.5, you've added 12 hours.  Always check the number formatting (Format, Cells, Number tab) if the answer doesn't look right.

Numbers in a formula

You may be tempted to put numbers in a formula, like =A2 * 21%, where 21% is a tax rate or something like that.  This isn't good programming design.  Good computer programmers call these "magic numbers" and avoid them.   Should you have to change that value, you'll have to change all of your formulas.  If you miss one, you'll likely have errors in your results.  If you do a wholesale replacement with Edit - Replace, you might inadvertently replace a value that shouldn't have been replaced.  And anyone looking at the formula later has to guess what the value means, since it isn't labeled. 

It's far better to put that number in a cell, and label it (in an adjacent cell), and/or give it a name (a range name, via Insert - Name - Define).  Now use that cell or defined name in your formula.  Now that value lives in only one place, and you have only one place to change it when you do maintenance.  And it makes for clearer formulas, as anyone looking at the formula doesn't have to guess what the number is.

=A2 * D20                 where D20 is a labeled cell containing the value

or

=A2 * TaxRate           where TaxRate is a defined name.

Running total in a cell using a circular reference

It's possible to have a cell keep a running total of numbers entered into another cell, but not a good idea

You do it like this:
A1: a number
A2: =A1+A2

You must use Tools - Options - Calculation - Iterations checked, and Iterations: 1.  Any number entered into A1 gets immediately summed with the current total in A2. Enter a negative number equal to the current total to reset the accumulator to zero.

The reasons this is not a good solution are many.  You have no audit trail of the numbers that have been summed. The formula cell just eats them up. How do you know if a particular entry was ever made or not? Also, if Undo is used, the total doesn't get backed down, it goes up.  Any recalculation causes the number to be summed again. It's much better to list the numbers, and total with with a SUM() formula.

Sorting in Excel

There are occasional horror stories of a table getting hopelessly mixed up because of faulty sorting with Excel. There are improved warnings in the sort commands of Excel's newer releases.  But some awareness of how sorting is done is a good thing. 

You can (1) use Data - Sort, which gives you more control over the sorting, or (2) sort with the Sort buttons in the Standard Toolbar.

First, the case of Data - Sort.  You need to:  (1) ensure that the entire range is selected first, (2) ensure that Excel knows which column on which it's to be sorted, and (3) ensure that it knows whether that your first row is a header (which should stay at the top), or a regular record which should be sorted. For the selection part, you can either select one cell, letting Excel expand the selection, or manually select the entire range.  If you're clicking just one cell then there can be no blank rows or columns within your table (they will stop the selection expansion), and there should be no cells immediately adjoining the table that aren't part of it.   Having column headers will prevent columns with no data from limiting the selection column-wise, even if a data column was otherwise empty.. You still must not have a completely blank row, or the selection will end there. It's always safer to select the entire range, rather than let excel expand from one cell. If you select more than one cell, that's the selection -- it won't expand it -- that's all that will be sorted.  Select one cell and let it expand, or select the entire range -- nothing else.  And whatever you do, don't select just a column (unless you want to sort only that column). 

You can select to the bottom of the worksheet which will accommodate future records. This is particularly useful if you're using a range name for quick selection. Not so good if there are totals at the end of your rows though.

If you're using the toolbar sort buttons, the range expansion from the single cell you've selected will tell it the range to be sorted, and will work OK, as long as the conditions above are met. The column in which you've selected the cell tells Excel to sort on that column. For Excel to know that your header row is in fact a header row, have it formatted differently (bold, different font, etc).

For the safest sorting:

1. Use column headers.
1. Select the entire range manually (or via a range name).
3. Use Data, Sort, rather than the toolbar sort buttons.

Never select a column, then click a Sort toolbar button unless you meant to sort only that column.  Excel will sort only that column, leaving the remaining columns where they were, and your table is ruined. This is the "Sort of Disaster," and has caused grief when later discovered. Sorts are undoable, by the way.

Sorting dates in Excel

True dates in Excel are numbers, not text. They will sort in date order (year, then month, then date). But if they're text, they'll sort aphanumerically, not by date. Imported data is often text. To determine whether they're dates or text, do Format, Cells, Number. If they're set as a Date format, change it to another, like from mm/dd/yy to mmm dd, yyyy if they still appear in the original date format, they're not dates, and will have to be converted to true dates.

Comments about this document?  Please feel free to write me.  My email address is on the Home page.