Easing the transition from spreadsheet to database
Most small businesses use an Excel spreadsheet at some point in order to handle the information they find most useful for their particular business. It may be as simple as a list of part numbers and prices or a list of contacts with names, phone numbers and addresses. It could also be as complicated as several thousand records to track some kind of statistics. Whichever way the information is organized, people start looking for a better way to store and analyze their data when the collection gets big enough. Transferring all this data to a database seems like a simple enough evolution; just copy the table from Excel into the database and hey, presto! It’s done, right?
Not so fast. What about all the metadata the database engine needs to operate properly? What about unique identifiers and foreign key relationships? For example, there are data types, character length limits and columns/table names. These are some (but by no means all) of the components that must be considered when developing the database.
If the data in the spreadsheets are not organized properly and there aren’t rules established, the cost and time of development could increase drastically. With a list of names, for example, you would design the table to have a first, last, middle and maybe even a suffix. But if the spreadsheet simply has a list of names all in one column, it can take hours to either fix the spreadsheet to make it easier to load or to develop the select statement necessary to parse out the specific pieces of the name. And if the spreadsheet has one record that says “Smith, John” and another record that says “Jane Doe” and another that says “Jones, David S. JR” then there will necessarily be much more manual work to straighten out the spreadsheet before it can ever be loaded into the database with any certainty of quality. A human can easily look at those names and tell which is which, but a computer is not capable of disseminating that kind of information without a great deal of help from a developer. The simple act of establishing rules for the data at the outset can reduce the cost and time of development dramatically.
Sean Caswell, Associate Consultant