In the world of software development, a code smell is a readily identified surface indication that often corresponds to a deeper problem in a system. Code smells are not bugs but can reveal a codebase that is fragile or may be difficult to maintain in the long term. With a little bit of practice code smells, which have entertaining names like “feature envy” and “shotgun surgery”, are generally easy to spot but understanding how to fix smelly code can take a lot of knowledge and experience.
Spreadsheets can be an incredibly powerful tool for quickly testing concepts, analysing data and visualising outputs, but their ubiquity and apparent ease of use often means they get taken beyond their point of efficiency. The following “spreadsheet smells”, which you may even find combining in the same spreadsheet, are intended to help you quickly identify when things might be getting out of control and present a few possible solutions.
Because spreadsheet programs are kind of a “Swiss army knife” of tools, including everything from data operations and advanced mathematical functions to charting, it is dangerously easy to muddle input data, processing, analysis and outputs within the same spreadsheet. Although this flexibility makes spreadsheets so powerful, it can become nearly impossible to update data or unpick problems in a complex spreadsheet. Examples of this smell can include business processes or calculation “tools” which are implemented in spreadsheets.
Having a very clear workflow from input data through calculations and intermediate data to outputs makes any process a lot more legible. Not only is it quicker to find and fix errors, but changes or data updates are a lot simpler to implement. There are a wide range of potential options available for better separation of concerns, which may be suitable alone or in combination depending upon the specific use case:
Although Excel will let you create a spreadsheet with a million rows, sixteen thousand columns and as many sheets as your computer’s memory can handle, you probably shouldn’t… There is no hard and fast rule for when a spreadsheet is too bloated, but there does come a point where Excel will take forever to open, slow to a crawl on any change and the file will become increasingly difficult to share.
Migrate your data into a relational database. These are specifically designed to handle huge amounts of data and perform highly optimised queries using a Structured Query Language (SQL). Getting started with databases does require a bit of learning and set up but in the end promotes a much stronger separation of concerns. Databases broadly fall into two categories, and picking the appropriate one depends on the use case:
Databases focus on data storage, and SQL enables queries and simple calculations on the data, but typically the presentation of this data will require something like Power BI or a web application.
Because spreadsheets are so ubiquitous and portable it is almost inevitable that version management becomes an issue: the file gets emailed and changed by the recipient or someone already has it open when you need it and suddenly there are a raft of versions. In larger projects identifying the appropriate version and reconciling differences can become a major headache and trying to find the correct version a few months down the track can be almost impossible.
Defining strict file naming conventions can help in some instances, although in practice these are typically difficult to adhere to and don’t solve issues around multi-user access and reconciliation. Cloud document management tools like Google Drive, Dropbox or OneDrive enable shared editing and all provide some form of revision control, although I’ve rarely seen this used well. Additionally, they are still prone to people creating new versions, presenting the same reconciliation problems.
If concerns are well separated and some form of code is used for the analysis/calculation, distributed software version control tools like Git can make version management easier. Git uses a “branching” model that simplifies workflows when multiple people are working on the same project, clarifying the merging process, it also stores a permanent record of every single change committed to the codebase.
Short formulae in Excel can be useful – once you match up the colours and cells you can usually tell what is going on without too much head scratching. However, once a formula gets beyond a certain size it can become extremely difficult to decipher, especially in confounded spreadsheets without a good process flow. You will see this smell exhibiting itself through length of formulae, number of formulae, formulae pointing all over the place and deeply nested logic (such as chained IF, AND, OR statements).
Spreadsheet formulae are easy to create, but can quickly get out of hand, becoming difficult to debug and test.
Sometimes a spreadsheet with fiendish formulae can be simplified by breaking parts of the formula into smaller sequential calculation steps, each with a clearly labelled column, although this can introduce new problems if it results in an inappropriate number of columns or sheets. The best solution is typically to migrate the calculations to something like Python scripts, using spreadsheets as data input/output only (or, if applicable, a database). Python scripts can make long or complex calculations a lot more legible, as it is much easier to visually step through the calculation process, with the added advantage that scripts can readily be version controlled. The Python ecosystem also provides great tools for testing your code, so you can ensure that each of your calculation steps is behaving as it should and have confidence that you aren’t introducing new bugs as you modify it.
Tests in Python help give you confidence your code is doing what you expect.
Although mapping technology has become increasingly accessible, spatial data and the tools and techniques used to solve various spatial problems are still a relatively specialised domain. Recent versions of Excel have incorporated the ability to show maps and plot spatial data, but caution should be exercised in doing much more than simple visualisations. You may see this smell exhibited in a few ways:
For the examples above:
Whatever your spreadsheet woes it is always worth re-evaluating what you are doing. Is a spreadsheet still working for you or can you detect some of these smells? There are an array of tools and software that can help make sense of data. Do you need a web application to help guide a data flow or enforce data standards? Contact Stacy Rendall to discuss options with our software team. Would visualising your data help? Contact Fiona Mohan for spatial analyses. Do you need to move large quantities of data into a single source and manipulate it from there? Contact Gavin Jeter for specialised FME support. Or just ring up for a chat – your spreadsheet might just be fine as is with a little bit of tweaking from our experts.