10  All about spreadsheets

A spreadsheet is a digital tool for organising, analysing, and storing data in tables, originally developed as an electronic version of paper accounting worksheets. It allows users to enter numerical or textual data and formulas that reference other cells, enabling dynamic calculations.

Spreadsheets are interactive with users able to modify values and observe immediate changes in calculated results, facilitating “what-if” analysis. Beyond basic arithmetic, spreadsheets offer financial, statistical, and conditional functions, enhancing their analytical capabilities. Composed of rows (numbered) and columns (labelled with letters), cells are referenced by their alphanumeric coordinates (e.g., A1). Modern spreadsheet applications include multiple worksheets within a workbook, allowing for complex data management. It can also display data graphically, aiding in understanding trends and patterns.

Spreadsheets have revolutionised business processes by replacing manual systems, offering versatility across various applications where tabular data is essential. Their dynamic cell referencing system allows for efficient and flexible data manipulation, making them indispensable in both professional and personal contexts.

10.1 History of spreadsheets

10.1.1 Paper spreadsheet

The concept of organising data into tabular formats dates back to ancient times, with examples such as Babylonian clay tablets from 1800 BCE as shown in Figure 10.1. In accounting, the term “spread sheet” was used by at least 1906 to describe a grid system in ledgers1.

Figure 10.1: A Babylonian clay tablet believed to have been written around 1800 BC containing mathematical table written in cuneiform script

Before digital spreadsheets, “spread” referred to large, two-page layouts in publications. The evolution of the term “spread-sheet” reflects its transition from physical, oversized ledger pages with examples shown in Figure 10.2, Figure 10.3 to the digital tool we use today, emphasising their role in accounting and data organisation.

Figure 10.2: Accounting ledger from 1911
Figure 10.3: Manifest of passengers of the Titanic

10.1.2 Electronic spreadsheet

Figure 10.4 shows key development milestones of the electronic spreadsheet.

Figure 10.4: Timeline of development of the electronic spreadsheet

10.2 Spreadsheets as databases

Spreadsheets and databases share similarities but are fundamentally different. A spreadsheet is essentially a single table, while a database consists of multiple tables with machine-readable relational structures. Although a spreadsheet workbook containing multiple sheets has interacting tables, it lacks the relational complexity of a database. Spreadsheets and databases are interoperable, however, with spreadsheets being able to be converted into database tables, and database queries being able to be exported to spreadsheets for analysis.

10.3 Spreadsheets as multi-function tools

Spreadsheets are widely used software tools for every step of the data pathway - data entry, storage, analysis, and visualisation. Spreadsheets are able to implement such functionality through an end-user development approach.

10.3.1 End-user development

Spreadsheets are designed as end-user development (EUD) tools. EUD refers to techniques in which non-professional developers are able to create automated tasks and complex data objects without in-depth knowledge of a programming language. Many find using spreadsheets for calculations and analysis much easier. This most likely stems from the following key features:

Ease of use

Spreadsheets leverage spatial relationships, making it intuitive to establish program connections, unlike sequential programming which requires extensive text.

Forgiving nature

Partial results and functions can operate even if other parts are incomplete or contain errors, simplifying the development process.

Visual enhancements

Modern spreadsheets use colours, fonts, and lines to provide visual cues, aiding comprehension and organisation.

Advanced functionality

Extensions enable users to create complex functions and integrate machine learning models, expanding their capabilities beyond basic calculations.

Versatility

Beyond numerical data, spreadsheets support Boolean logic, graphical design, and even SQL queries through relational data storage and formula-based expressions.

In essence, spreadsheets offer a flexible, powerful platform that caters to diverse tasks, making them an invaluable tool for many users despite their limitations compared to traditional programming environments.

10.3.2 Limitations and shortcomings of spreadsheets

Unfortunately, the same multi-functionality and features that make spreadsheets user-friendly and easily accessible to most also make them fragile, non-robust, and prone to causing/producing errors.

In order to be able to function as a tool for the various steps in the data pathway while still being user-friendly meant combining the data interface functionality (data storage and data access) with the programming/scripting capabilities (for data cleaning/processing, analysis, and visualisation) into a single graphical user interface with no clear distinction between them and no clear mechanism for programming/script testing. This brings about the following limitations and shortcomings:

Lack of standard mechanisms for management and quality assurance of spreadsheets produced by organisations

Given that data storage and access along with data processing/cleaning, analysis and visualisation capabilities sit side-by-side within the spreadsheet tool/software, developing routine and automated audit mechanisms for both data validity/quality and accuracy/correctness of data processing, analysis, and visualisation is nearly impossible. These audits will need to be done manually and line-by-line making them highly onerous. This is most likely the reason why a survey conducted in 2011 of nearly 1,500 people in the UK saw 72% reporting that no internal department checks their spreadsheets for accuracy, that only 13% said that internal audit reviews their spreadsheets, while a mere 1% receive checks from their risk department (“Spreadsheet Risk Management Within UK Organisations,” n.d.).

Reliability issues

An estimated 1% of all formulas in operational spreadsheets are in error (S. Powell et al., 2009).

Practical expressiveness is limited

Whilst the graphical user interface of a spreadsheet using its cell-at-a-time approach is accessible and user-friendly for most users and for simple data operations, applying the same to a complex data model requiring more complicated calculations require tedious attention to detail. Users will tend to have difficulty remembering the meanings of hundreds or thousands of alphanumeric cell addresses that appear in per cell formulas.

Formulas expressed in terms of cell addresses are hard to keep straight and hard to audit

A research paper critically reviewing spreadsheet errors has shown that auditors who check both numerical results and the cell formulas find no more errors than auditors who only check numerical results (S. G. Powell et al., 2008). By the nature of the cell-at-a-time approach, spreadsheets typically contain many copies of the same formula. Thus, when a formula needs to be edited, these edits will need to be applied to all cells containing that formula. This is in sharp contrast to a well-known principle in programming - do not repeat yourself or DRY - which emphasises the best practice of not repeating code to implement/achieve the same process or output. The DRY approach makes code implementation much more efficient and code auditing much more feasible and robust.

Maintenance of volumes of spreadsheets is challenging

Creating and managing a system to maintain vast amounts of spreadsheets for an individual or for an organisation is a challenging endeavour. Without built-in functionalities for proper security, version control and audit trails, and prevention of unintentional introduction of errors, it is more likely that management of spreadsheets end up being ad hoc, non-systematic, and tedious to implement.




  1. “We maintain, in our general ledger, a so-called Spread Sheet which is a long sheet with the name of each individual plant in a particular column.” (from Middleton, 1933, p. 763)↩︎