Concerns about longevity of web-based publications, and digital editions specifically, are widespread and not exaggerated. For example, a recent survey completed by The Endings Project documented the situation in the field of digital humanities with a focus on digital editions (Arneil).1 Varied approaches to preservation have been tried at numerous institutions, and a common finding is that sustainability is more likely if designed from the beginning (Neuefeind). Most reasonable researchers would probably admit that “I want it all, I want it now, and I want it forever” is an unrealistic expectation. Nevertheless, research (including that referenced above and others referenced in those publications) suggests researchers do not have a sense of the degree to which “all,” “now,” and “forever” are likely in tension with each other, let alone how to balance them, especially in the case where long-term resources for maintenance are virtually zero. The focus in this paper is on digital editions—or what Jennifer Edmond and Francesca Morselli call a “narrow” vision of scholarship—rather than inherently dynamic products (e.g., interactive platforms or sites processing real-time data feeds) as digital editions are common and allow me to make my points within the constraints of space for this article.
Infrastructure can be thought of as any combination of resources (people, money, technology, policies) affecting the development and deployment of a project. The complexity of the infrastructure is not necessarily correlated with the conceptual or academic complexity of the project. For example, in many digital edition projects, the essential features can be provided by static HTML pages made in advance rather than pages created dynamically by an environment such as WordPress. Infrastructure that optimizes maintainability, especially over time and with minimal resources available, tends to be simpler and thus less vulnerable to failure.
In this paper I will discuss how infrastructure relates to productivity and maintainability for various aspects of a project. I will then discuss how the resources (money, labour, technology) available to support a project will inevitably vary (sometimes unpredictably) and typically decline over time and consider the implications of that fact on the technological aspects of a project. This will lead me to argue for producing a technologically simple output for the critical components of a project which will likely be independent of the platform used to create it, and to work through the implications of that approach on the planning, implementation, and long-term maintenance of a digital project.
To maintain a project, it is essential that the project team decide which features of the project are critical to the academic objectives and which can be modified, gracefully degraded, or eliminated if necessary (data or media of various kinds, forms of presentation, kinds of interaction, specifics of interface). For example, how critical are specialized search capabilities beyond what the anticipated user can do with generic tools? Can visualizations or pan-and-zoom maps be replaced with simpler images? It is also essential that all technical dependencies are identified (e.g., a MySQL database, PHP software library version X, Google maps, WordPress, etc.) to estimate the resources required for maintenance.
The hard truth is that complexity of infrastructure is positively correlated with productivity and negatively correlated with low-cost maintainability (Figure 1). For example, consider again the static versus WordPress-generated HTML pages, or research assistants entering data using a simple text editor to edit shared files on a server versus using an XML editor which validates their work on the fly and submits to a version control system. Therefore, careful decisions must be made at the planning stage to optimize both productivity and maintainability within anticipated affordances and constraints. For example, a platform such as WordPress maintained by an institution rather than the project increases the chances of long-term viability.
Figure 1: Increasing infrastructure increases productivity and decreases viability
This hard truth applies to various aspects of the project. I will focus on the technological aspect but first will quickly cover some others. For each I will provide both high-maintenance and low-maintenance characteristics summarized in Table 1 below, and a brief discussion.
Table 1: High and Low Maintenance Characteristics of Digital Edition Projects
Aspect | High Maintenance | Low Maintenance |
---|---|---|
Intellectual property | • proprietary • unclear or contested claims • unclear or contested credits | • open source • proper licensing and terms • clear credits and claims |
Labour | • students and contractors are untrained or ad-hoc • managers are uncommitted or ad-hoc | • workers are trained, committed, disciplined • managers are skilled, committed, stable |
Institutional support | • lack of academic support • lack of or unhelpful technical support • absent or volatile policies | • committed academic support • committed, helpful technical support • stable policies |
Technology | • external dependencies • third-party libraries/utilities/databases • frameworks, server scripts • proprietary | • lack of external dependencies • code likely to remain secure • simple servers (HTML, JavaScript, CSS) • compliant with open standards |
In addition to the obvious pitfalls of failure to obtain durable permissions, if the intellectual property terms of a project allow for multiple copies or derived works, that increases the chances of long-term survival of that project or at least components of it. Careful, detailed treatment of credits makes the publication far more valuable for documenting individual contributions to a collaborative work, especially for students and early career scholars.
The project is likely to involve both students unskilled at what the researcher is skilled at, and contractors who are skilled at what the researcher is not skilled at. Judicious ongoing senior researcher involvement for the former and the engagement of consultants (formal or informal) to advise on communications with the latter are necessary costs to optimize development and long-term maintainability of the outputs.
Some institutional infrastructure is bound to surround the project, for better and worse, so it is important to know the nature of that infrastructure. For example, institutional policies may affect or preclude long-term support of the project depending on how the project is implemented, regardless of the quality of the technical support available. Institutional policies change over time, which suggests multiple installations or variants of output. Issues of institutional infrastructure are further complicated if participants from more than one institution are involved, or if the institutional home for the project moves to another university.
A high-maintenance technological infrastructure can arise for at least two reasons: (1) the nature of the academic treatment requires that infrastructure, or (2) to increase productivity by simplifying the labour or allowing for lower cost, quicker start-up, and modification. In the first case, the project is committed to the infrastructure and must accept the costs associated with maintaining it; in the second, it might be possible to reduce dependence on that infrastructure over time at modest increases in the other resources required. For example, if the project is using a content management system (CMS) like WordPress to create a website, is that because the academic objectives of the site inherently require a CMS (or alternative approaches are prohibitive) or is it because the CMS facilitates start-up and creation of content at low cost?
So far, the reality check for long-term maintainability has claimed that increasing complexity of infrastructure is positively correlated with productivity and negatively correlated with low-cost maintainability, and that a project should (1) determine what is essential; (2) consider intellectual property, labour, and institutional contexts; and (3) distinguish which high-maintenance technologies are required and which are conveniences. The reality check arrives at the possibly counter-intuitive notion that the project should aim for a low-maintenance solution for essential output elements as far as possible, even at some expense to other project priorities. In other words, the preferred analogy for a digital edition is not a book but rather a stage production, a road-worthy car, or even a pet animal. The next section will explore this further.
It is almost certain that the resources (money, labour, technology) available to a project will vary over time, though the specific pattern varies. Compare a five-year grant-funded project to an ongoing research interest of one researcher. The first project can afford a high maintenance environment and reap whatever benefits that environment provides for as long as it is adequately resourced, possibly indefinitely. Over time, though, you can expect the costs to address issues that arise to increase dramatically due to reduced availability of knowledgeable, specialized labour and increased obsolescence of elaborate technologies.
A second hard truth is that, most likely, the outputs of the project will have to exist in a resource-poor environment (and specifically not the resource environment in which they were created) for a much longer time than in a resource-rich one. Table 2 lists typical features of a web-based digital publication and for each a high-maintenance (and thus high vulnerability to failure) and low-maintenance (thus low vulnerability to failure) treatment. This table is derived from the principles of The Endings Project, of which I am a member. The first three rows are fairly straightforward. It is the last two rows (interfaces and interactions) where careful planning and execution of low vulnerability/maintenance solutions need not compromise the value or utility of the site for many users for a long time. In cases where no low-maintenance solution is a close approximation of the high-maintenance solution, the project is committed to maintaining the high-maintenance infrastructure or the output fails—although it might be possible to design for graceful degradation of certain features. In case of partial failure, good planning can result in graceful rather than catastrophic degradation. In case of failure, proper archiving of the original files and documentation of the user experience will help people in the future decide what those capabilities were and what would be involved in recreating them.
Table 2: High and Low Vulnerability Versions of Output Features
Feature | High Vulnerability | Low Vulnerability |
---|---|---|
Media | • non-standard • technically inconsistent • requires special software | • standards-compliant • consistent |
Text, metadata, markup | • unstructured • inconsistent, idiosyncratic • non-standard | • technically validated • consistent • standards-compliant |
Network, live feeds, user content | • dynamic, inherently vulnerable data | • static (saved) examples to demonstrate treatment or support argument |
Specific user interfaces | • virtual reality • requires special hardware • complex, faceted searches | • video or screen capture to document experience • multiple simpler searches, data exports |
Kinds of interactions | • database queries requiring code running on server • sophisticated visualizations • pan and zoom maps | • sophisticated indexing used by code running in browser • simplified interface for essential visual content • static (saved) examples to demonstrate treatment or support argument |
These two hard truths—productivity versus maintainability, and loss of resources over time—together suggest that it is wise to distinguish two phases of your project: development and deployment. They also suggest it is wise to exploit the productivity advantages of high-maintenance infrastructure in the development phase, when you have lots of resources and are highly productive, and to deploy in a way that relies on low-maintenance infrastructure, when you have fewer resources and are producing less. These two phases may or may not be at the same time. Using the WordPress example, the project might use WordPress as an editing tool to assemble content and submissions and then use a program and manual procedure to export that content into static web pages (no longer dependent on WordPress) for long-term maintainability. The program and manual export procedure could be run a number of times, as content is added in WordPress.
Projects that manifest researchers’ academic priorities in products that are maintainable result from input at the planning stage from librarians, archivists, systems experts, experienced colleagues, and possibly creators of model projects, as well as the thoughtful decisions researchers make based on that range of input. The most common significant error is to assume that what is good for development is good for deployment or, in other words, to overestimate the gain of short-term benefits and underestimate the pain of long-term costs implicit with high-maintenance infrastructure (e.g., technology that allows the project to “hit the ground running”). The project can be designed to produce both high-maintenance and low-maintenance outputs for as long as feasible and then gracefully fall back to only the low-maintenance output, though there are more development costs to produce multiple outputs. If so, the default should be to create low-maintenance outputs, and to create high-maintenance outputs only after careful justification.
It is probably not possible to “have it all, have it now, and have it forever.” Nobody likes hearing “not so fast,” let alone “no,” but sometimes the risk is not hearing that until too late. The programming and archiving members of a project team often raise what may be unwelcome reality checks so careful thought can be given to the best compromise between long-term viability and short-term productivity for the project. That thought should reflect that the project has established which features are academic priorities, has identified resource constraints (now and into the future), and has aimed for low-maintenance outputs for essential elements, even if the project also generates high-maintenance outputs for those elements. This approach is likely to cost short-term pain in the interests of long-term gain and to justify the decisions researchers made in the interest of the overall success of their project.
Arneil, Stewart, et al. “Project Endings: Early Impressions from Our Recent Survey on Project Longevity In DH.” DH2019, 11 July 2019, Utrecht University, staticweb.hum.uu.nl/dh2019/dh2019.adho.org/papers/.
Edmond, Jennifer, and Francesca Morselli. “Sustainability of Digital Humanities Projects as a Publication and Documentation Challenge.” Journal of Documentation, vol. 76, no. 5, Feb. 2020, pp. 1019–1031, doi.org/10.1108/JD-12-2019-0232.
The Endings Project. “Endings Principles for Digital Longevity.” 16 Sept. 2021, endings.uvic.ca/principles.html.
Neuefeind, Claes, et al. “Sustainability Strategies for Digital Humanities Systems.” DH2020, dh2020.adho.org/wp-content/uploads/2020/07/565_SustainabilityStrategiesforDigitalHumanitiesSystems.html.